Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenizer -x option is confusing #98

Open
ZJaume opened this issue May 5, 2020 · 3 comments
Open

Tokenizer -x option is confusing #98

ZJaume opened this issue May 5, 2020 · 3 comments
Labels
cli enhancement New feature or request

Comments

@ZJaume
Copy link
Collaborator

ZJaume commented May 5, 2020

The -x option says on the usage:

-x, --xml-escape               Escape special characters for XML.

And it does the same as -no-escape option in Moses.

@alvations
Copy link
Contributor

Hmmm true. But the point is to keep the interface pythonic, but I agree it's confusing. Let me think of a better wording for the feature =)

@alvations alvations added cli enhancement New feature or request labels Jun 4, 2020
@ZJaume
Copy link
Collaborator Author

ZJaume commented Jun 30, 2020

What about something like

-x, --no-xml-escape      Don't perform escaping special characters for XML.

or just removing the shortened form -x and leave the --no-xml-escape? If --no-xml-escape is too long why not simply --no-escape like Moses?

I think it should at least have the "negation" on the help message because it is very confusing.

@bricksdont
Copy link

Agreed, the option name and help text definitely do not make sense.

But then, does the default behaviour need to be that special XML characters are escaped (legacy behaviour from SMT/Moses)? I totally understand if the argument is that sacremoses should behave exactly like the original Moses tokenizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants