You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 5, 2020. It is now read-only.
That's not something that we'll be adding to our deployed version, but I can see how that would be useful for folks. That can be done trivially, by modifying constants.py to add pdf to the list of filenames.
@knowtheory, is there any reason why this list of suffixes couldn't be moved to the config file? Would just up and adding pdf make problems, such as with the code that tries to get the title of the file?
Nope, listing PDFs should function fine. The reason we'd left them out of the list by default is just that there are so many possible non-data pdfs up on sites (reports, forms, all manner of things) that it'd be a pretty noisy signal.
And as for the list of suffixes... you mean moving them into the settings.py? Could move them in there too yep.
I'd like to set up some way that PDF metadata (using XMP) could catalog embedded or linked data, with the possibility of using annotations, bookmarks, form-data, and attached files. With an explicit manifest, you'll get less "noisy signal".
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
PDF is the worst kind of open data - 1-star open data.
Still, have the ability to optionally harvest PDFs.
The text was updated successfully, but these errors were encountered: