Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding Luxembourg administrative sanctions #849

Merged
merged 10 commits into from
May 24, 2024

Conversation

bgmello
Copy link
Contributor

@bgmello bgmello commented May 14, 2024

@bgmello bgmello marked this pull request as draft May 16, 2024 13:41
@bgmello
Copy link
Contributor Author

bgmello commented May 16, 2024

There are some sanctions where the title doesn't mention the company, such as "Sanction administrative prononcée à l’encontre d’un réviseur d’entreprises agréé" (Administrative sanction pronounced against an approved company auditor). We are currently ignoring those.

There also appears to be a bug where the lookup doesn't recognize 2 titles:

  • Sanction administrative prononcée à l’encontre de l’établissement de crédit BEMO EUROPE – Banque Privée S.A.
  • Sanctions et mesures administratives prononcées à l’encontre de quatorze professionnels agréés ou enregistrés soumis à la surveillance de la Commission de Surveillance du Secteur Financier (la « CSSF ») en matière de lutte contre le blanchiment et le financement du terrorisme

Thus they are raising warnings even though they are already in the YAML

Copy link
Contributor

@jbothma jbothma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some potential improvements

datasets/lu/administrative_sanctions/crawler.py Outdated Show resolved Hide resolved
datasets/lu/administrative_sanctions/crawler.py Outdated Show resolved Hide resolved
Comment on lines 38 to 50
subtitle = (
response.find('.//*[@class="single-news__subtitle"]').text_content().strip()
)
res = context.lookup("subtitle_to_names", subtitle)
if res:
names = cast("List[str]", res.names)
else:
context.log.warning("Can't find the name of the company", text=subtitle)
names = [subtitle]

# If the subtitle doesn't contain any names
if not names or len(names) == 0:
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of maintaining all items in the yaml, I think we can come up with an algorithm that's fairly safe against accidental nonsense names:

  1. strip r"(Sanction|amende) administrative prononcée à l’encontre d[eu] (gestionnaire de fonds d’investissement)?" from the start
  2. if the rest starts with upper case, that looks like a name, so use it
  3. else if there's a match in lookups, use the values from that
  4. else use the rest as a name and warn to check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more rules to the REGEX and I was able to get more names, but there was still some that required lookups

Copy link
Contributor

@jbothma jbothma May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah totally fine - it's just good to automatically handle the common cases so we don't literally have to add each and every record.

I realise now, the instances where they don't mention a name, we'll have to manually pull the name from PDF. How about the following logic to do a second lookup:

if not res.names:
    pdf_res = lookups(detail_url)
    if pdf_res is None:
        #warn
    names = names_res.names
else:
    names = res.names
for name in names:
    ...

the fist lookup handles if we can't parse a name from the subtitle
the second lookup happens if the first lookup gave us an empty list, then we use the detail url to look up a name someone found in the PDF and added - for cases like these https://github.com/opensanctions/opensanctions/pull/849/files#diff-31dfffcd8054c60130382e3389c526fa63b3e350581059737aaf4f8ab5c12834R55

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to assume that we can give the name AIFM Law for Sanction administrative prononcée à l’encontre d’un gestionnaire de fonds d’investissement - it's totally conceivable that they'd use the same string again for a different sanction, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked all links and could only find 3 with a company name in the PDF. From what I understood, AIFM Law is not a name of a company, it's the Alternative Investment Fund Managers Law.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol shows how well I read that. sorry. But this, looks good!

datasets/lu/administrative_sanctions/crawler.py Outdated Show resolved Hide resolved
datasets/lu/administrative_sanctions/crawler.py Outdated Show resolved Hide resolved
datasets/lu/administrative_sanctions/crawler.py Outdated Show resolved Hide resolved
datasets/lu/administrative_sanctions/crawler.py Outdated Show resolved Hide resolved
@jbothma
Copy link
Contributor

jbothma commented May 22, 2024

Thanks! looking good! Just a couple of changes then we're there

@jbothma jbothma merged commit 19e9606 into opensanctions:main May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Luxembourg Administrative sanctions
2 participants