Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use ISO 20275 data from GLEIF #32

Open
petri opened this issue Feb 7, 2017 · 8 comments
Open

use ISO 20275 data from GLEIF #32

petri opened this issue Feb 7, 2017 · 8 comments
Labels
ISO 20275 Re-evaluate when ISO std. support lands
Milestone

Comments

@petri
Copy link
Collaborator

petri commented Feb 7, 2017

See https://www.gleif.org/en. There's a lot of data that would help improve the legal affix database of cleanco.

@petri petri added the question label Feb 8, 2017
@psolin
Copy link
Owner

psolin commented Jan 26, 2019

The ELF Code List definitely has more abbreviations: https://www.gleif.org/en/about-lei/code-lists

I am just not sure what the equivalents are in some of the languages to US/UK. However, there may be some that have been missed which are more obvious. I will keep a note of this.

@petri
Copy link
Collaborator Author

petri commented Apr 26, 2020

I suspected someone might have done this by now, and sure enough: https://pypi.org/project/iso-20275 .

Since 2017, there now exists ISO standard 20275 ‘Financial Services – Entity Legal Forms (ELF).

@psolin
Copy link
Owner

psolin commented Apr 26, 2020

Cleanco was still built to ID entity types in strings, so I think it’s fine to move towards incorporating this package. It was only a matter of time before the data was standardized and put into a python package. Moving away from solely being US/UK based and towards an international standard is for the best for this package.

If incorporated, it would fix most of our open issues as well. I’ll look into doing this.

@petri petri added this to the ISO 20275 milestone Apr 26, 2020
@petri
Copy link
Collaborator Author

petri commented Apr 26, 2020

For getting the base name without legal term affixes, the unique terms list from the ISO standard should probably be patched in here:
https://github.com/psolin/cleanco/blob/master/cleanco/clean.py#L25-L29

@petri petri changed the title idea: parse data from GLEIF parse ISO 20275 data from GLEIF Apr 26, 2020
@petri petri removed the question label Apr 26, 2020
@petri petri changed the title parse ISO 20275 data from GLEIF use ISO 20275 data from GLEIF Apr 26, 2020
@petri
Copy link
Collaborator Author

petri commented Apr 26, 2020

This could be broken into two or three different tickets;

  • one for using in base name deduction
  • one for country decuction, and
  • one for legal entity detection.

@psolin
Copy link
Owner

psolin commented Apr 26, 2020

Just to give you an idea of where this is going - I am counting 1,180 unique business entity affixes in this package to our 202. These are the classifiers (properties) that they use as well:

['alpha2', 'alpha2_2', 'country', 'creation_date', 'elf', 'jurisdiction', 'local_abbreviations', 'local_name', 'modification', 'modification_date', 'reason', 'status', 'transliterated_abbreviations', 'transliterated_name']

@petri
Copy link
Collaborator Author

petri commented May 5, 2020

Given we now understand more the differences between iso20275 data and cleanco termdata, it seems to me we need a decisions on data strategy. The current PR gets rid of cleanco termdata in favour of iso20275. But in hindsight it seems to me that instead, iso20275 should be used just a primary, but not exclusive source.

On the other hand, both iso20275 and clanco also need a mechanism by which users can use their own legal form data if needed. It would make sense if both packages used the same mechanisms and formats.

Thoughts?

@petri petri added ISO 20275 Re-evaluate when ISO std. support lands and removed enhancement labels Jan 30, 2021
@FBnil
Copy link
Contributor

FBnil commented Aug 16, 2022

Replying to your "Thoughts?",
At first I was happy, for example, Netherlands has all the forms included in cleanco.
But then Japanese does not have the romanji versions (Y.K. - which termdata will have, if a pull request is accepted), only the kanji versions (有 and only the first character of 有限会社, which I don't know if it's written out like that - But in Chinese data, it's written out).

https://en.wikipedia.org/wiki/Y%C5%ABgen_gaisha

And even Dutch is incomplete; for example, "Foundation":
"V44D","Netherlands","NL","","","stichting","Dutch","nl","stichting","","","2017-11-30","ACTV","","",""

Looking it up it seems that "st." is the official one and fdn (and lesser: fndn. or fou.) Although in practice the word is written out full, because hey, you want to state clearly you are a foundation.

Thus, in my conclusion, there is still not a good list and I join @petri that maybe both lists need to be eligible. Or at least that we can merge the differences into a new version of iso20275 including many missed data that termdata does have, and then we can use that as a master list.

In practice it means we need to fix the bug where custom_basename() is unusable in it's current state and let users add their settings in an easy way, without jumping through hoops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ISO 20275 Re-evaluate when ISO std. support lands
Projects
None yet
Development

No branches or pull requests

3 participants