Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation should fail on unrecognized file type #109

Open
konklone opened this issue Aug 4, 2014 · 2 comments
Open

Validation should fail on unrecognized file type #109

konklone opened this issue Aug 4, 2014 · 2 comments

Comments

@konklone
Copy link
Member

konklone commented Aug 4, 2014

Right now, validation will fail if the file_type wasn't detected (the URL has no file extension) but will not fail if the detected file_type is unknown.

Since we only have text processors for HTML and PDF files, the file_type should be either auto-detected, or set by a scraper, to html or pdf. If it's not, it should choke and force the scraper to pick one -- and if we come across a report format that isn't HTML or PDF, then it's time to extend the system to process text from that format.

@konklone konklone mentioned this issue Aug 4, 2014
@konklone
Copy link
Member Author

konklone commented Aug 4, 2014

This can build on @divergentdave's work in 1fa8f5d, but that only patches the problem -- the file_type field should be html, for a report whose URL ends in .aspx, and the saved file should be report.html.

@audiodude
Copy link
Contributor

👍 I agree this is the more correct way to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants