-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"p.m." is not tokenized as in the original script. #21
Comments
The original script added that new hack that changed quite recently: moses-smt/mosesdecoder#204 This difference isn't accounted for in sacremoses. And I'm really not sure whether we should or not. |
Why sacremoses shouldn't include this? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I could not yet figure out why, but in the original script, the dot in
p.m.
at the end of a sentence is not split up, while with this port it is.The original script even explicitly leaves out
p.m
from its nonbreaking prefixes, so i'd expect the behavior seen in the port.The text was updated successfully, but these errors were encountered: