Skip to content

Police-Data-Accessibility-Project/scrapers

Repository files navigation

Welcome!

This is the GitHub home for web scraping at the Police Data Accessibility Project.

(What do we mean by web scraping?)

How PDAP works

This repo is part of a toolkit for people all over the country to learn about our police systems. Check out our software development roadmap and high-level technical diagram to learn more about our ecosystem.

How to run a scraper

Right now, this requires some Python knowledge and patience. We're in the early stages: there's no automated scraper farm or fancy GUI yet. Scrapers can be run locally as needed.

  1. Install Python. Prefer a differently opinionated guide? Perhaps this is more your speed.
  2. Clone this repo.
  3. Find the scraper you wish to run. These are sorted geographically, so start by looking in /scrapers_library/....
  4. Follow the instructions in the scraper's README to get going. (If it's broken or simply out of date, please open an issue in this repo or submit a PR.)

Sharing back to the PDAP community

If you do something cool or interesting or fun with your shiny new data, share that in our Discord. Want to kick around an idea or share something that doesn't work as expected? Discord's a great place for that, too.

How to contribute

To write a scraper, start with CONTRIBUTING.md. Be sure to check out the /utils folder!

For everything else, start with docs.pdap.io.

Resources

Here are some potentially useful tools. If you want to make additions or updates, you can edit the docs in GitHub!