-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems parsing company names with punctuations #29
Comments
Perhaps the issue here is that there is no space between the name and the suffix? What countries are these companies based in? |
Correct, as long as there is a white space it is parsed ok. |
I see how this could be an issue, but only because you didn't clean up your data first. What is typical is that there is whitespace and then the entity abbreviation. That is how everyone writes these business name strings. I don't think the script should look for whitespace and/or any non character symbol and then run a lookup; I don't think it is responsible for adding spaces after symbols either. Edit: Yes, spaces and a trailing comma are removed, only because (again) this is a standard way to write a business name. |
I have seen the entity abbreviation being separated by a comma (more often comma + whitespace, actually). Although I'd agree that whitespace (no comma) is a more common separator. I guess we could replace commas with whitespace as a preprocessing step? I am a little surprised we did not already have this :) In any case, I don't have time to work on that. As @psolin pointed out, replacing commas with whitespace would probably be an easy data cleanup workaround. |
Hello,
Very nice module but it doesn't always handle well some real human entered company names we deal a lot with. Below some obvious examples where the name is not parsed:
LIBGAS,LTD -> LIBGAS,LTD
AIRDAS USA,LLC -> AIRDAS USA,LLC
GF LOGISTICS.INC -> GF LOGISTICS.INC
HAKUTATZ.TECH.CO.,LTD. -> HAKUTATZ.TECH.CO.,LTD
Thanks
The text was updated successfully, but these errors were encountered: