You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, validation will fail if the file_type wasn't detected (the URL has no file extension) but will not fail if the detected file_type is unknown.
Since we only have text processors for HTML and PDF files, the file_type should be either auto-detected, or set by a scraper, to html or pdf. If it's not, it should choke and force the scraper to pick one -- and if we come across a report format that isn't HTML or PDF, then it's time to extend the system to process text from that format.
The text was updated successfully, but these errors were encountered:
This can build on @divergentdave's work in 1fa8f5d, but that only patches the problem -- the file_type field should be html, for a report whose URL ends in .aspx, and the saved file should be report.html.
Right now, validation will fail if the
file_type
wasn't detected (the URL has no file extension) but will not fail if the detectedfile_type
is unknown.Since we only have text processors for HTML and PDF files, the
file_type
should be either auto-detected, or set by a scraper, tohtml
orpdf
. If it's not, it should choke and force the scraper to pick one -- and if we come across a report format that isn't HTML or PDF, then it's time to extend the system to process text from that format.The text was updated successfully, but these errors were encountered: