-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flexible normalization #14174
Labels
Comments
Hi @frankiedrake |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
I noticed that when using
Normalizer
, if my sentence contains punktuation that is not surrounded by spaces, I get the words joined together.For example:
"My dog is quite fast/furious and when hungry he can chew furniture,flowers and other things"
Becomes:
"My dog is quite fastfurious and when hungry he can chew furnitureflowers and other things"
I don't what way would be the most efficient, but would be good if we can somehow tune the behaviour of the normalizer. Despite this is quite easy step (for example preprocessing data with some regular expression) - this seems like a part of normalization and this is what we don't want to do before the actual (?) normalization
Preferred Solution
This can be some boolean parameter which will respect presence of spaces (adding them if needed) or maybe some cleanup stage that we can execute before the normalization?
Additional Context
The text was updated successfully, but these errors were encountered: