Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow closing parenthesis as valid trailing symbol #88

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

JonasR
Copy link

@JonasR JonasR commented Jul 2, 2023

Fixes #61
Since we are looking at the tail, I decided to only allow ). I don't see how a valid name could end in (

@petri
Copy link
Collaborator

petri commented Apr 27, 2024

Where have you encountered this kind of company names? In what country, what name? Any other relevant info (e.g. what is typically inside the parenthesis)?

@JonasR
Copy link
Author

JonasR commented Apr 28, 2024

As shown in the linked issue I often come across content where a country is added behind the company name in parentheses. You could argue the country should also be removed, but I'm not sure that would be in-scope for this package. Certainly it shouldn't be removing just the final closing parenthesis.

@petri
Copy link
Collaborator

petri commented Apr 28, 2024

As shown in the linked issue I often come across content where a country is added behind the company name in parentheses.

Yes I understood you have encountered such real cases. Could you provide some actual real world examples?

Do you know if the country specifier part inside parenthesis is actually part of the official legal name as registered in whatever national jurisdiction? Or has it been added by or in whatever system (to help differentiate company names)? People sometimes use this kind of patterns when entering data in CRMs etc.

@JonasR
Copy link
Author

JonasR commented May 28, 2024

Oh, I see what you mean. Sure, below are some examples from a vendor's database. From what I can tell the data in parentheses is never part of the legal name, it's added e.g. for trade names, distinguishing subsidiaries, or differentiating to other companies with the same name in a different country.

Abzena (UK) Ltd
CartiHeal (2009) Ltd
BeiGene (Beijing) Co Ltd
Moberg Pharma AB (publ)
M ARKIN (1999) Ltd
Senzime AB (Publ)
GDM Seeds (AR)
MH Sub I LLC (d/b/a Internet Brands)
AngioDesign (UK) Ltd
Anticancer Biotech (Beijing) Co Ltd
Neo Modulus (Suzhou) Medical Sci-Tech Co Ltd
CTI Biotechnology (Suzhou) Co Ltd
Genfleet Therapeutics (Shanghai) Co Ltd
Viamet Pharmaceuticals (Bermuda) Ltd
Cytocares (Shanghai) Inc
HUTCHMED (China) Ltd
ProfoundBio (Suzhou) Co Ltd
GX Pharma (Beijing) Co Ltd
Ji Xing Pharmaceuticals (Shanghai) Co Ltd
Mission Health Labs Inc (d/b/a PicnicHealth)
Fertility (ITC) Services LLC
Precision Scientific (Beijing) Co Ltd
Cascade Bio Inc (d/b/a ScienceIO)

@alexanderlukanin13
Copy link

Hello, any update on this? I came across the same problem, and it's not limited to country names in real-world datasets.

Whether to retain (Something in brackets) or not is up to user, I believe it should be done (or not done) in preprocessing and is out of scope of this package. But if the user have decided to retain the brackets, cutting off the trailing ) is obviously a bug. I couldn't imagine any scenario where that would be an expected behavior.

Also: would you accept a separate PR which allows cleanco to handle removable elements in brackets like Company (Pty) Ltd -> Company? It seems like missing feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

brackets handled incorrectly
3 participants