inconsistent analysis of symbols < and > #45

arademaker · 2023-11-20T00:10:04Z

Try parse

The symbol > is cool
The symbol < is cool

The fist sentence has 2 readings. The second one 7 readings. The < is always interpreted as _less+than_a_1. The > can be _greater+than_a_1 or quoted.

The text was updated successfully, but these errors were encountered:

arademaker · 2023-11-20T22:26:19Z

also, only for <word> ERG keep the symbols. Try

I have a [cat].
I have a (cat).
I have a <cat>.

@danflick see also delph-in/pydelphin#371

danflick · 2023-11-20T22:55:59Z

The lexicon already includes a separate NP entry for the use of ">" as the name of the symbol, but lacked an analogous entry for "<". I have added the missing entry, and will check it in with the next update.
As for the brackets surrounding a word as in "I have a [cat]" it does not seem desirable to try to insert them into the name of the predicate, or into the value of the ARG attribute when the token is a named entity. I agree that it would be good to find some way to record the presence of these bracketing punctuation marks in the resulting MRS, but we'll need to figure out how best to do so.

arademaker · 2023-11-20T23:06:16Z

Sorry GitHub interpreted the greater-than and less-than symbols, I edited my previous comment.

@danflick, the crucial problem is the presence of < in the name of the predicate without any escape or double quotes. For parsing the text representation of the MRS, we need help to distinguish it easily from the beginning of the Link (character positions). See here

arademaker · 2023-12-26T23:31:17Z

As for the brackets surrounding a word as in "I have a [cat]" it does not seem desirable to try to insert them into the name of the predicate, or into the value of the ARG attribute when the token is a named entity. I agree that it would be good to find some way to record the presence of these bracketing punctuation marks in the resulting MRS, but we'll need to figure out how best to do so.

@danflick, my problem is the opposite if I understood your comment above. Why preserve the < and > in the token?! I was expecting the same behavior for all, that is, separate tokens for <, >, [, ], ( and ).

% ace -g ../erg.dat -E   
The <cat> is write
The <cat> is write

The [cat] is white 
The [ cat ] is white

The (cat) is white
The ( cat ) is white

arademaker mentioned this issue Nov 20, 2023

parsing MRS delph-in/pydelphin#371

Closed

arademaker mentioned this issue Dec 26, 2023

The Simple MRS parser: the MRS text representation to MRS object arademaker/delphin#1

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inconsistent analysis of symbols < and > #45

inconsistent analysis of symbols < and > #45

arademaker commented Nov 20, 2023

arademaker commented Nov 20, 2023 •

edited

Loading

danflick commented Nov 20, 2023

arademaker commented Nov 20, 2023 •

edited

Loading

arademaker commented Dec 26, 2023

inconsistent analysis of symbols < and > #45

inconsistent analysis of symbols < and > #45

Comments

arademaker commented Nov 20, 2023

arademaker commented Nov 20, 2023 • edited Loading

danflick commented Nov 20, 2023

arademaker commented Nov 20, 2023 • edited Loading

arademaker commented Dec 26, 2023

arademaker commented Nov 20, 2023 •

edited

Loading

arademaker commented Nov 20, 2023 •

edited

Loading