Skip to content

Given an RSS feed where each entry's <link> element points to a document, upload those documents to DocumentCloud.

License

Notifications You must be signed in to change notification settings

MuckRock/documentcloud-rss-fetcher-addon

 
 

Repository files navigation

DocumentCloud Add-On: RSS Document Fetcher

This DocumentCloud add-on will monitor an RSS/Atom feed for documents and upload them to your DocumentCloud account.

Note: For now, this Add-On expects that each feed entry's <link> element will point to PDF and that its <title> element will indicate the PDF's title.

This repository is forked from MuckRock/documentcloud-scraper-addon and reuses much of its code. The codebase feeds from this fork.

Setup

1) Create your accounts if needed

First, you'll need to have a verified MuckRock account. If you've ever uploaded documents to DocumentCloud before, you're already set. If not, register a free account here and then request verification.

2) Create a DocumentCloud project for your documents

Next, log in to DocumentCloud and create a new project to store the documents that this Add-On will upload documents to.
An image of the project create button in DocumentCloud
Click on your newly created project on the left-hand side of the screen, and note the numbers to the right of its name — this is the project ID, in this example, 207354.
Screen Shot 2022-03-22 at 8 08 11 AM

3) Run the Add-On from within DocumentCloud

Click on the Add-Ons dropdown menu -> "Browse All Add-Ons" -> "RSS Document Fetcher" -> Click the inactive button to mark the Add-On as active and finally hit Done. Click on the Add-Ons dropdown menu once more and click on the RSS Document Fetcher which will now be active.

Potential future improvements

  • Re-add the extension detection from documentcloud-scraper-addon, to enable uploading of non-PDF files.
  • Re-add title detection from documentcloud-scraper-addon, and give the user the option to use the feed entry's <title> element or the detected title.
  • Allow user to pass a regular expression or CSS selector instead of relying on each entry's <link> element.

Questions

Open an issue in this repository or email Jeremy Singer-Vine at [email protected].

About

Given an RSS feed where each entry's <link> element points to a document, upload those documents to DocumentCloud.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 93.0%
  • Makefile 7.0%