-
Notifications
You must be signed in to change notification settings - Fork 771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenize Quotes #8918
base: master
Are you sure you want to change the base?
Tokenize Quotes #8918
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @NickWingate. This is a good bug to fix.
I worry about the efficiency of the code, though. Given how frequently the tokenizers are called, we really can't afford to have an algorithm that's not linear. I left some notes and tips, but please feel free to reach out if need further help. Re-writing these functions to be optimal might not be trivial, so I suggest splitting them, since it seems we only care about supporting quotes in exactly one case. That single case can be less efficient, until we improve the implementation. I just don't want to make all the command processing code slower because of that single case.
tokenize_outside_quotes tokenises a given string, ignoring delimiters that are in quotes. Previously tokenize() caused problems especially as space is the default delimiter, so if the input string has any key="str ing" tokens a space can cause errors Signed-off-by: NickWingate <[email protected]> Change-Id: I54578646a19a1cfea18c9f73c6b25d46c61ed20f
24130fe
to
c48adf3
Compare
Unocommand was being incorrectly tokenized as space is the default delimiter: uno .uno:ExecuteSearch {"SearchItem.SearchString":{"type":"string","value":" "}, ... ^^^ was being incorrectly parsed. Signed-off-by: NickWingate <[email protected]> Change-Id: I572f4eff0f23bf9d5c275fe0acad74e9633a9b9d
c48adf3
to
eda6720
Compare
Made a new function Additionally |
Thank you, @NickWingate, for the update. I went through the code, but, before commenting on the code, I'm wondering why there is no test with the reported issue of searching for Also, I don't understand why we have implemented so many variations of the tokenizer when it would seem that we only use one. Perhaps I missed something, but I don't see the code using all these variations. Finally, if we have to implement a tokenizer that takes quotes into account, why do we care to implement a special version for when there are no quotes? Is that an optimization? Have we measured the three versions? Thank you. |
Yes, good point. I will add the test.
No, I just created further overloads of
I did this as suggested:
In each |
Summary
When ignoreQuotes = false delimiters within quotes (' or ") aren't tokenised.
Causes problems especially as space is the default delimiter, so if the input string has any
key="string"
pattern tokens a space will cause issuesFor #8103
Unocommand was being incorrectly tokenized as space is the default delimiter:
uno .uno:ExecuteSearch {"SearchItem.SearchString":{"type":"string","value":" "}, ...
^^^
TODO
Checklist
make check
make run
and manually verified that everything looks okay