Allow extension with custom extractors #6

jkraemer · 2018-02-10T00:48:31Z

Feature idea, relatively low priority:

Right now you have to modify the source in order to add a new extractor class. There should be a way to do this dynamically, i.e. by calling TextExtractor.register(SomeCustomFileHandlerClass) or similar. We will have to find a way to determine the ordering / precedence of extractors since the list will be dynamic.

The text was updated successfully, but these errors were encountered:

thegcat · 2018-02-10T09:07:03Z

As the order only becomes relevant if you have multiple extractors capable of extracting text from the same file types, I would suggest postponing the ordering discussion until someone has a use-case for that, this way the code stays simple until required, and you might have a better understanding of the problem once a use-case presents itself.

wielinde · 2018-02-12T12:11:26Z

While working on my last PR I thought about a registry as well. So yes, @jkraemer, there is some point. However, my decision was somewhat similar to the thoughts of @thegcat ("build stuff when you need it").

meineerde · 2018-02-12T15:06:54Z

If you decide to implement such a order-based registry, some prior art for that would be:

Rails' controller callbacks
Rackstash's FilterChain (which was originally inspired by the Rails API)

These options allow you to define an ordered list of handlers, which will be tried one after another. With these APIs, you can implement something like

TextExtractor.register(OCRExtractor)
TextExtractor.register_after(OCRExtractor, FirstParagraphExtractor)

Alternatively, you could also ignore the multiple-extractors-per-file-type issue at the registry at all and instead allow to register a MultiExtractor performs does this transparently for the registry. Then, you could use something much simpler such as Rackstash::ClassRegistry which is used here to register classes to names and to build instances for them. See Rackstash::Encoder for how this can be used.

jkraemer added the enhancement label Feb 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow extension with custom extractors #6

Allow extension with custom extractors #6

jkraemer commented Feb 10, 2018

thegcat commented Feb 10, 2018

wielinde commented Feb 12, 2018

meineerde commented Feb 12, 2018

Allow extension with custom extractors #6

Allow extension with custom extractors #6

Comments

jkraemer commented Feb 10, 2018

thegcat commented Feb 10, 2018

wielinde commented Feb 12, 2018

meineerde commented Feb 12, 2018