-
Notifications
You must be signed in to change notification settings - Fork 37
Developing and deploying converters
A converter is a service that converts each input file into one or more output files, along with metadata.
For instance, an html
converter will convert input HTML into an output PDF, text and thumbnail. A zip
converter will convert input Zip into output files with unknown content-types.
The Ingest-Pipeline document describes how Overview juggles the input and output files.
Converters share zero code with Overview. You can write them in any language.
A converter polls Overview's worker for tasks, via HTTP. The converter then streams its output as a multipart/form-data
HTTP POST.
Follow instructions at overview-convert-framework to build a converter. In the end, you'll have it pushed to Docker Hub. In this example, let's call that image overview/overview-convert-thing:1.2.3
.
You'll need to change overview-server to handle the new file type. (We can't make plugins auto-register their supported file types, because Overview needs to know their file types before they start up.)
- Edit
converter_versions.env
: add CONVERT_THING_IMAGE=overview/overview-convert-thing:1.2.3`. This will be used in docker-compose files. - Edit
docker-compose.yml
and add a clause forconvert-thing
. Give it aPOLL_URL
ofhttp://overview-worker:9032/Thing
. - Edit
integration-test/docker-compose.yml
and add a clause foroverview-convert-thing
. Also addoverview-convert-thing
tointegration-test
'sdepends_on
array. - Add a test file to
integration-test/files/file-upload-spec/XXX.thing
. Test that it produces the desired files inintegration-test/spec/file_upload_spec.rb
. (Overview's integration tests just prove that Overview invokes the converter and that the converter runs. The converter itself should test that it handles all possible inputs.) - Prepare to deploy to Kubernetes: add
sed -e "s@CONVERT_THING_IMAGE@$CONVERT_THING_IMAGE@"
tokubernetes/common
. Addapply_template convert-thing.yml
tokubernetes/deploy
And create aconvert-thing.yml
config file, probably by copyingconvert-email.yml
and replacingEmail
withThing
,email
withthing
andEMAIL
withTHING
. Set appropriatelimits
,minReplicas
andmaxReplicas
. - Add to
worker/src/main/scala/com/overviewdocs/ingest/process/Step.scala
. For instance, anHttpStep
of"Thing" -> 0.2
means: when the converter finishes outputting data, we are 20% closer to producing documents than we were before the converter ran. (If your converter outputs PDF+thumbnail+text with wantOcr:false and wantSplitByPage:false, then use"Thing" -> 1.0
.) - Alter
worker/src/main/scala/com/overviewdocs/ingest/process/Decider.scala
: add aNextStep.Thing
and make some MIME types point to it. - Alter
worker/src/test/scala/com/overviewdocs/ingest/process/DeciderSpec.scala
: add"Thing"
tosteps
and write a test to convince yourself Overview chooses it. -
./dev
and test uploading a file manually. docker/build && integration-test/run-in-docker-compose
- Commit and push. Jenkins will deploy it to Kubernetes when integration tests pass.
Once Jenkins deploys to production, it will have pushed images to Docker Hub. Now you can use them in overview-local:
- Edit
config/overview.defaults.env
: add aCONVERT_THING_IMAGE
line, and changeOVERVIEW_VERSION
to the version you committed in step 11. - Edit
config/overview.yml
: add the exact clause you added to overview-server'sintegration-test/docker-compose.yml
. -
./update && ./start-after-git-pull
to test. - Commit and push. Users will get your new code when they
./update
.
- Release the new converter. The instructions are converter-specific, but they'll all end with a new Docker image on Docker Hub. Let's say it's
overview/overview-convert-thing:1.2.4
. - Update overview-server:
- Alter
converter_versions.env
:CONVERT_THING_IMAGE=overview/overview-convert-thing:1.2.4
integration-test/run-in-docker-compose
- Commit and push. Users will get your new code when they
./update
.
- Alter
- Update overview-local:
- Alter
config/overview.defaults.env
:CONVERT_THING_IMAGE=overview/overview-convert-thing:1.2.4
. You don't need to edit overview-local'sOVERVIEW_VERSION
if you're only updating a converter; but it's good practice. Wait for Jenkins to finish with the overview-server commit you just pushed, and then updateOVERVIEW_VERSION
. -
./update && ./start-after-git-pull
to test. - Commit and push. Users will get your new code when they
./update
.
- Alter