-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow extension with custom extractors #6
Comments
As the order only becomes relevant if you have multiple extractors capable of extracting text from the same file types, I would suggest postponing the ordering discussion until someone has a use-case for that, this way the code stays simple until required, and you might have a better understanding of the problem once a use-case presents itself. |
If you decide to implement such a order-based registry, some prior art for that would be:
These options allow you to define an ordered list of handlers, which will be tried one after another. With these APIs, you can implement something like TextExtractor.register(OCRExtractor)
TextExtractor.register_after(OCRExtractor, FirstParagraphExtractor) Alternatively, you could also ignore the multiple-extractors-per-file-type issue at the registry at all and instead allow to register a |
Feature idea, relatively low priority:
Right now you have to modify the source in order to add a new extractor class. There should be a way to do this dynamically, i.e. by calling
TextExtractor.register(SomeCustomFileHandlerClass)
or similar. We will have to find a way to determine the ordering / precedence of extractors since the list will be dynamic.The text was updated successfully, but these errors were encountered: