Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect detection of "Pty Limited" Suffix #41

Open
Sir-Onion opened this issue Feb 28, 2020 · 3 comments
Open

Incorrect detection of "Pty Limited" Suffix #41

Sir-Onion opened this issue Feb 28, 2020 · 3 comments
Labels
ISO 20275 Re-evaluate when ISO std. support lands

Comments

@Sir-Onion
Copy link

>>> cleanco("Example Example Pty Ltd").clean_name() # CORRECT
'Example Example'
>>> cleanco("Example Example Pty Limited").clean_name() # Not so good
'Example Example Pty'

The give you a view on the scope of the problem: I'm working to normalise a database of around on processing a database of around 900k company names which have been typed into an application over a 10 year period. The database contains primarily companies from anglophone countries. Of these, around 580 have a company name like this.

Do you see this as a problem also? If so, I'm happy to put together a patch.

@petri
Copy link
Collaborator

petri commented Apr 16, 2020

Thank you. I did a quick google on the topic and this seems valid. Please, a github PR is welcomed if you can submit one.

@psolin psolin added this to the Version 2.0 milestone Apr 19, 2020
@psolin psolin linked a pull request Apr 19, 2020 that will close this issue
@petri
Copy link
Collaborator

petri commented Apr 25, 2020

@tubasal is "pty ltd" (or "pty limited") its own legal form or is this suffix just a concatenation of two different suffixes? You can get rid of multiple suffixes by running the removal twice.

@petri
Copy link
Collaborator

petri commented Apr 26, 2020

I took a look at the term definitions. We don't have pty as a separate term, nor do we have pty limited. So this cannot work. Presuming the work on using ISO standard 20275 bears fruit, this issue might become fixed by improved term definitions that the standard provides. On the other hand, it's possible that the term definitions there might fall short the same way as here.

@petri petri added the ISO 20275 Re-evaluate when ISO std. support lands label Apr 26, 2020
@petri petri removed this from the Version 2.0 milestone Apr 26, 2020
@petri petri removed the enhancement label Jan 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ISO 20275 Re-evaluate when ISO std. support lands
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants