Downloads paper filings from the Companies House API.
Uses Scrapy to manage downloading account filings from the Companies House API.
NOTE: This project was created as a prototype
scrapy crawl latest_paper_filing
Assumptions:
- Conda package manager installed
Steps:
Clone the project:
clone [email protected]:ONSBigData/companies_house_filing_fetcher.git
Set up the Python environment:
cd companies_house_filing_fetcher
conda env create -f environment.yml
conda activate pdf_downloader
Download a BasicCompanyDataAsOneFile csv from http://download.companieshouse.gov.uk/en_output.html.
Copy the config files to ~/config
and edit their contents:
mkdir ~/config
cp ch_api_key.example.ini ~/config/ch_api_key_example.ini
cp filing_fetcher_config.example.yml ~/config/filing_fetcher_config.yml
Review config values in spiders/settings.py
.
Run the downloader:
scrapy crawl latest_paper_filing