Problems parsing company names with punctuations #29

rychoo2 · 2017-01-04T15:19:18Z

Hello,

Very nice module but it doesn't always handle well some real human entered company names we deal a lot with. Below some obvious examples where the name is not parsed:

LIBGAS,LTD -> LIBGAS,LTD
AIRDAS USA,LLC -> AIRDAS USA,LLC
GF LOGISTICS.INC -> GF LOGISTICS.INC
HAKUTATZ.TECH.CO.,LTD. -> HAKUTATZ.TECH.CO.,LTD

Thanks

petri · 2017-01-04T18:52:41Z

Perhaps the issue here is that there is no space between the name and the suffix? What countries are these companies based in?

rychoo2 · 2017-01-05T08:37:08Z

Correct, as long as there is a white space it is parsed ok.
These companies are based in USA and China but I believe the key is that probably the data was entered in China where they're not used to white spaces.
I believe the library could be immune to that.

psolin · 2019-01-26T15:54:59Z

I see how this could be an issue, but only because you didn't clean up your data first. What is typical is that there is whitespace and then the entity abbreviation. That is how everyone writes these business name strings. I don't think the script should look for whitespace and/or any non character symbol and then run a lookup; I don't think it is responsible for adding spaces after symbols either.

Edit: Yes, spaces and a trailing comma are removed, only because (again) this is a standard way to write a business name.

petri · 2019-02-07T15:15:33Z

I have seen the entity abbreviation being separated by a comma (more often comma + whitespace, actually). Although I'd agree that whitespace (no comma) is a more common separator.

I guess we could replace commas with whitespace as a preprocessing step? I am a little surprised we did not already have this :) In any case, I don't have time to work on that.

As @psolin pointed out, replacing commas with whitespace would probably be an easy data cleanup workaround.

rychoo2 changed the title ~~Unparsed company names~~ Problems parsing company names with punctuations Jan 4, 2017

petri added the enhancement label Feb 7, 2019

petri added the ISO 20275 Re-evaluate when ISO std. support lands label Apr 26, 2020

petri added parsing Name parsing result is not correct and removed enhancement labels Jan 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems parsing company names with punctuations #29

Problems parsing company names with punctuations #29

rychoo2 commented Jan 4, 2017

petri commented Jan 4, 2017

rychoo2 commented Jan 5, 2017 •

edited

Loading

psolin commented Jan 26, 2019 •

edited

Loading

petri commented Feb 7, 2019

Problems parsing company names with punctuations #29

Problems parsing company names with punctuations #29

Comments

rychoo2 commented Jan 4, 2017

petri commented Jan 4, 2017

rychoo2 commented Jan 5, 2017 • edited Loading

psolin commented Jan 26, 2019 • edited Loading

petri commented Feb 7, 2019

rychoo2 commented Jan 5, 2017 •

edited

Loading

psolin commented Jan 26, 2019 •

edited

Loading