This DocumentCloud add-on will monitor an RSS/Atom feed for documents and upload them to your DocumentCloud account.
Note: For now, this Add-On expects that each feed entry's <link>
element will point to PDF and that its <title>
element will indicate the PDF's title.
This repository is forked from MuckRock/documentcloud-scraper-addon
and reuses much of its code. The codebase feeds from this fork.
First, you'll need to have a verified MuckRock account. If you've ever uploaded documents to DocumentCloud before, you're already set. If not, register a free account here and then request verification.
Next, log in to DocumentCloud and create a new project to store the documents that this Add-On will upload documents to.
Click on your newly created project on the left-hand side of the screen, and note the numbers to the right of its name — this is the project ID, in this example, 207354.
Click on the Add-Ons dropdown menu -> "Browse All Add-Ons" -> "RSS Document Fetcher" -> Click the inactive button to mark the Add-On as active and finally hit Done. Click on the Add-Ons dropdown menu once more and click on the RSS Document Fetcher which will now be active.
- Re-add the extension detection from
documentcloud-scraper-addon
, to enable uploading of non-PDF files. - Re-add title detection from
documentcloud-scraper-addon
, and give the user the option to use the feed entry's<title>
element or the detected title. - Allow user to pass a regular expression or CSS selector instead of relying on each entry's
<link>
element.
Open an issue in this repository or email Jeremy Singer-Vine at [email protected]
.