Skip to content

Developing and deploying converters

Adam Hooper edited this page Apr 25, 2018 · 1 revision

What's a converter?

A converter is a service that converts each input file into one or more output files, along with metadata.

For instance, an html converter will convert input HTML into an output PDF, text and thumbnail. A zip converter will convert input Zip into output files with unknown content-types.

The Ingest-Pipeline document describes how Overview juggles the input and output files.

Creating a converter

Converters share zero code with Overview. You can write them in any language.

A converter polls Overview's worker for tasks, via HTTP. The converter then streams its output as a multipart/form-data HTTP POST.

Follow instructions at overview-convert-framework to build a converter. In the end, you'll have it pushed to Docker Hub. In this example, let's call that image overview/overview-convert-thing:1.2.3.

Adding the converter to Overview

You'll need to change overview-server to handle the new file type. (We can't make plugins auto-register their supported file types, because Overview needs to know their file types before they start up.)

  1. Edit converter_versions.env: add CONVERT_THING_IMAGE=overview/overview-convert-thing:1.2.3`. This will be used in docker-compose files.
  2. Edit docker-compose.yml and add a clause for convert-thing. Give it a POLL_URL of http://overview-worker:9032/Thing.
  3. Edit integration-test/docker-compose.yml and add a clause for overview-convert-thing. Also add overview-convert-thing to integration-test's depends_on array.
  4. Add a test file to integration-test/files/file-upload-spec/XXX.thing. Test that it produces the desired files in integration-test/spec/file_upload_spec.rb. (Overview's integration tests just prove that Overview invokes the converter and that the converter runs. The converter itself should test that it handles all possible inputs.)
  5. Prepare to deploy to Kubernetes: add sed -e "s@CONVERT_THING_IMAGE@$CONVERT_THING_IMAGE@" to kubernetes/common. Add apply_template convert-thing.yml to kubernetes/deploy And create a convert-thing.yml config file, probably by copying convert-email.yml and replacing Email with Thing, email with thing and EMAIL with THING. Set appropriate limits, minReplicas and maxReplicas.
  6. Add to worker/src/main/scala/com/overviewdocs/ingest/process/Step.scala. For instance, an HttpStep of "Thing" -> 0.2 means: when the converter finishes outputting data, we are 20% closer to producing documents than we were before the converter ran. (If your converter outputs PDF+thumbnail+text with wantOcr:false and wantSplitByPage:false, then use "Thing" -> 1.0.)
  7. Alter worker/src/main/scala/com/overviewdocs/ingest/process/Decider.scala: add a NextStep.Thing and make some MIME types point to it.
  8. Alter worker/src/test/scala/com/overviewdocs/ingest/process/DeciderSpec.scala: add "Thing" to steps and write a test to convince yourself Overview chooses it.
  9. ./dev and test uploading a file manually.
  10. docker/build && integration-test/run-in-docker-compose
  11. Commit and push. Jenkins will deploy it to Kubernetes when integration tests pass.

Once Jenkins deploys to production, it will have pushed images to Docker Hub. Now you can use them in overview-local:

  1. Edit config/overview.defaults.env: add a CONVERT_THING_IMAGE line, and change OVERVIEW_VERSION to the version you committed in step 11.
  2. Edit config/overview.yml: add the exact clause you added to overview-server's integration-test/docker-compose.yml.
  3. ./update && ./start-after-git-pull to test.
  4. Commit and push. Users will get your new code when they ./update.

Updating a converter

  1. Release the new converter. The instructions are converter-specific, but they'll all end with a new Docker image on Docker Hub. Let's say it's overview/overview-convert-thing:1.2.4.
  2. Update overview-server:
    1. Alter converter_versions.env: CONVERT_THING_IMAGE=overview/overview-convert-thing:1.2.4
    2. integration-test/run-in-docker-compose
    3. Commit and push. Users will get your new code when they ./update.
  3. Update overview-local:
    1. Alter config/overview.defaults.env: CONVERT_THING_IMAGE=overview/overview-convert-thing:1.2.4. You don't need to edit overview-local's OVERVIEW_VERSION if you're only updating a converter; but it's good practice. Wait for Jenkins to finish with the overview-server commit you just pushed, and then update OVERVIEW_VERSION.
    2. ./update && ./start-after-git-pull to test.
    3. Commit and push. Users will get your new code when they ./update.
Clone this wiki locally