Deferring Text Processing #461

knowtheory · 2018-08-29T00:12:27Z

As a general matter i don't recommend this, but I've tested this out in anger and it does what it's supposed to.

DocumentCloud's DocumentImport action splits into two main tasks, processing images, and processing text. We can already run each of those independently (notably for reprocessing text, or reprocessing images).

The changes here expose skipping text processing entirely. I don't generally recommend this, and there needs to be some other process which will circle back around and process text, but this would serve as a basis for doing so.

Additionally, added a mechanism for flagging incoming import jobs as api_document_import so it can be handled separately and priority given to document_import and large_document_import jobs.

… EVEN LOWER priority.

knowtheory added 2 commits August 28, 2018 17:04

Add mechanism to queue document imports as api_document_imports at an…

0f5290c

… EVEN LOWER priority.

Add a mechanism to skip text processing.

2ddb668

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deferring Text Processing #461

Deferring Text Processing #461

knowtheory commented Aug 29, 2018

Deferring Text Processing #461

Are you sure you want to change the base?

Deferring Text Processing #461

Conversation

knowtheory commented Aug 29, 2018