Skip to content
This repository has been archived by the owner on Mar 11, 2021. It is now read-only.

Deferring Text Processing #461

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

knowtheory
Copy link
Member

As a general matter i don't recommend this, but I've tested this out in anger and it does what it's supposed to.

DocumentCloud's DocumentImport action splits into two main tasks, processing images, and processing text. We can already run each of those independently (notably for reprocessing text, or reprocessing images).

The changes here expose skipping text processing entirely. I don't generally recommend this, and there needs to be some other process which will circle back around and process text, but this would serve as a basis for doing so.

Additionally, added a mechanism for flagging incoming import jobs as api_document_import so it can be handled separately and priority given to document_import and large_document_import jobs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant