Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent analysis of symbols < and > #45

Open
arademaker opened this issue Nov 20, 2023 · 4 comments
Open

inconsistent analysis of symbols < and > #45

arademaker opened this issue Nov 20, 2023 · 4 comments

Comments

@arademaker
Copy link
Member

Try parse

The symbol > is cool
The symbol < is cool

The fist sentence has 2 readings. The second one 7 readings. The < is always interpreted as _less+than_a_1. The > can be _greater+than_a_1 or quoted.

@arademaker
Copy link
Member Author

arademaker commented Nov 20, 2023

also, only for <word> ERG keep the symbols. Try

  1. I have a [cat].
  2. I have a (cat).
  3. I have a <cat>.

@danflick see also delph-in/pydelphin#371

@danflick
Copy link
Collaborator

The lexicon already includes a separate NP entry for the use of ">" as the name of the symbol, but lacked an analogous entry for "<". I have added the missing entry, and will check it in with the next update.
As for the brackets surrounding a word as in "I have a [cat]" it does not seem desirable to try to insert them into the name of the predicate, or into the value of the ARG attribute when the token is a named entity. I agree that it would be good to find some way to record the presence of these bracketing punctuation marks in the resulting MRS, but we'll need to figure out how best to do so.

@arademaker
Copy link
Member Author

arademaker commented Nov 20, 2023

Sorry GitHub interpreted the greater-than and less-than symbols, I edited my previous comment.

@danflick, the crucial problem is the presence of < in the name of the predicate without any escape or double quotes. For parsing the text representation of the MRS, we need help to distinguish it easily from the beginning of the Link (character positions). See here

@arademaker
Copy link
Member Author

As for the brackets surrounding a word as in "I have a [cat]" it does not seem desirable to try to insert them into the name of the predicate, or into the value of the ARG attribute when the token is a named entity. I agree that it would be good to find some way to record the presence of these bracketing punctuation marks in the resulting MRS, but we'll need to figure out how best to do so.

@danflick, my problem is the opposite if I understood your comment above. Why preserve the < and > in the token?! I was expecting the same behavior for all, that is, separate tokens for <, >, [, ], ( and ).

% ace -g ../erg.dat -E   
The <cat> is write
The <cat> is write

The [cat] is white 
The [ cat ] is white

The (cat) is white
The ( cat ) is white

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants