Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base name is not working with some names #77

Open
sandeepnatoo opened this issue Jan 12, 2022 · 4 comments
Open

Base name is not working with some names #77

sandeepnatoo opened this issue Jan 12, 2022 · 4 comments

Comments

@sandeepnatoo
Copy link

sandeepnatoo commented Jan 12, 2022

I checked some of the scenarios where basename function giving empty result.
from cleanco import basename
print("Base name name for {} : {}".format('IKS APS', basename("IKS APS")))
print("Base name name for {} : {}".format('S.C.S & COMPANY', basename("S.C.S & COMPANY")))
print("Base name name for {} : {}".format('COOP', basename("COOP")))

@petri
Copy link
Collaborator

petri commented Feb 9, 2022

Yes, the point of basename is removing common suffixes, prefixes etc. to leave just the base name. You're basically giving those suffixes/prefixes there, or combinations of them. What is the problem you're having with this? Are those actual company names that you try to normalize?

@FBnil
Copy link
Contributor

FBnil commented Aug 16, 2022

Coop is a Dutch supermarket (full name: 'Coop Supermarkten BV', but the full name actually works fine). And indeed, the basename of Coop is "" (empty string). Same for SCS, it's a key in "Limited" (dict terms_by_type). Where the full name 'SCS Software s.r.o.' also works just fine.

I think the code, in the last iteration removing things, if it finds that it has to remove everything, there must be a way to recover the iteration before that. (but maybe not by default, because it's actually handy to remove multiple terms). Of course, this check can be done at the userside too, and should at least be mentioned in the readme/documentation.

@sandeepnatoo
Copy link
Author

Coop is a Dutch supermarket (full name: 'Coop Supermarkten BV', but the full name actually works fine). And indeed, the basename of Coop is "" (empty string). Same for SCS, it's a key in "Limited" (dict terms_by_type). Where the full name 'SCS Software s.r.o.' also works just fine.

I think the code, in the last iteration removing things, if it finds that it has to remove everything, there must be a way to recover the iteration before that. (but maybe not by default, because it's actually handy to remove multiple terms). Of course, this check can be done at the userside too, and should at least be mentioned in the readme/documentation.

Yes, agree with you

@sandeepnatoo
Copy link
Author

Yes, the point of basename is removing common suffixes, prefixes etc. to leave just the base name. You're basically giving those suffixes/prefixes there, or combinations of them. What is the problem you're having with this? Are those actual company names that you try to normalize?

Yes, these are the some of the organization names I came across.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants