Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Improve 50-a Data Collection #388

Open
DMalone87 opened this issue May 29, 2024 · 0 comments
Open

[FEATURE] Improve 50-a Data Collection #388

DMalone87 opened this issue May 29, 2024 · 0 comments
Labels

Comments

@DMalone87
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Currently, our 50-a Scraper does not properly capture officer data. First, officers are not being associated with their Units. We are collecting the unit names, but we aren't taking the step of connecting each officer to the unit(s) that they've worked for. Second, we aren't properly collecting the complaints associated with each officer. We are collecting the dispositions of the complaints, but we aren't associating complaint data with individual officers.

Describe the solution you'd like
When scraping officer data from 50-a.org, make the following adjustments:

  • Include a list of complaint numbers associated with the officer.
  • Include the Tax Number for each officer

This means an entry in the JSON output might change from this:

{"scraped_at": "2024-05-15 00:05:13", "url": "https://www.50-a.org/officer/TK8M", "name": "Benjamin F. Colecchia", "badge": "Badge #3490", "race": "White", "gender": "Male", "complaints": [{"name": "complaints", "count": 1}, {"name": "allegations", "count": 1}, {"name": "substantiated", "count": 0}, {"name": "Exonerated", "count": 1}], "age": null}
{"scraped_at": "2024-05-15 00:05:13", "url": "https://www.50-a.org/officer/7G3P", "name": "Ernesto Nieves", "badge": "Badge #4684", "race": "Hispanic", "gender": "Male", "complaints": [{"name": "complaints", "count": 2}, {"name": "allegations", "count": 2}, {"name": "substantiated", "count": 0}, {"name": "Complaint Withdrawn", "count": 1}, {"name": "Exonerated", "count": 1}], "age": "23"}

To this:

{"scraped_at": "2024-05-15 00:05:13", "url": "https://www.50-a.org/officer/TK8M", "name": "Benjamin F. Colecchia", "badge": "Badge #3490", "race": "White", "gender": "Male", "complaints": [9800290], "age": null, "taxnum": "918638"}
{"scraped_at": "2024-05-15 00:05:13", "url": "https://www.50-a.org/officer/7G3P", "name": "Ernesto Nieves", "badge": "Badge #4684", "race": "Hispanic", "gender": "Male", "complaints": [200410455, 200207742], "age": "23", "taxnum": "922871"}

When scraping command data, make the following adjustments:

  • Collect the officers who have worked for that command and each officer's most recent employment with that command.
  • Collect the commanding officer of each command.
  • Collect the official website url for each command.
  • Collect the description and address from each command.

Therefore this:

{"scraped_at": "2024-05-15 14:17:28", "name": "24th Precinct", "url": "https://www.50-a.org/command/24pct"}

Will become this:

{"scraped_at": "2024-05-15 14:17:28", "name": "24th Precinct", "url": "https://www.50-a.org/command/24pct"}, "website_url": "https://www1.nyc.gov/site/nypd/bureaus/patrol/precincts/24th-precinct.page", "commanding_officer": "https://www.50-a.org/officer/KYGH", "address": "151 W 100th St, New York, NY 10025", "description": "The 24th Precinct is located on the Upper West Side of Manhattan and encompasses Manhattan Valley and a portion of Riverside Park. It is a residential and commercial community of multiple dwelling homes and one major housing development.", "officers": [{"url": "https://www.50-a.org/officer/WHJ5", "most_recent": 2024}, {"url": "https://www.50-a.org/officer/4JJ9", "most_recent": 2024}, {"url": "https://www.50-a.org/officer/J7Y3", "most_recent": 2023}]}

Additional context

@DMalone87 DMalone87 added enhancement New feature or request backend labels May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants