The National Archives: Find Case Law

This repository is part of the Find Case Law project at The National Archives. For more information on the project, check the documentation.

PDF Conversion Service

When a file is uploaded to the S3 bucket and ends in .docx, create a PDF file at the same key (but ending .pdf instead). Uses LibreOffice to perform the conversion.

Deployment

Staging

The main branch is automatically deployed to staging with each commit.

Production

To deploy to production:

Create a new release.
Set the tag and release name to vX.Y.Z, following semantic versioning.
Publish the release.
Automated workflow will then force-push that release to the production branch, which will then be deployed to the production environment.

Republishing PDFs

You can republish a PDF by uploading the PDF again, or by sending JSON of the form:

{ "Records": [ { "s3": { "bucket": { "name": "tna-caselaw-assets" }, "object": { "key": "eat/2022/1/eat_2022_1.docx", "eTag": "fa2ef6e8abadbd5cc5cedf3f32834f1f" } } } ] }

to the Send and Receive Messages page of the Simple Queuing System on AWS.

The script scripts/create_json_for_bulk_pdf_regeneration will make that JSON file for you, if you want to remake every PDF that's backed by a docx file.

(The eTag is arbitrary but should be a sensible filename fragment, no / )

Local setup

From ds-caselaw-ingester, run docker-compose up to launch the Localstack container
From ds-caselaw-pdfconversion, run scripts/setup-localstack.sh to set up the queues etc.
From ds-caselaw-pdfconversion, run docker-compose up --build to launch the LibreOffice container (--build will ensure the converter script is in the docker container)

Local testing

pytest queue_listener/tests.py will run unit tests.

Manual integration tests, having run Local Start up tasks above:

You should see output like:

Downloading judgment.docx
...
Uploaded judgment.pdf

on startup.

Running scripts/upload_custom_file.sh will do nothing, but then scripts/upload_file.sh should not upload and display the message:

judgment.pdf is from custom-pdfs, not replacing

Name		Name	Last commit message	Last commit date
Latest commit History 269 Commits
.github/workflows		.github/workflows
aws-config		aws-config
data		data
docker_context		docker_context
fonts		fonts
queue_listener		queue_listener
scripts		scripts
.env.example		.env.example
.gitallowed		.gitallowed
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
base.xcd		base.xcd
docker-compose-localstack.yml		docker-compose-localstack.yml
docker-compose.yml		docker-compose.yml
renovate.json		renovate.json
setup.cfg		setup.cfg

License

nationalarchives/ds-caselaw-pdf-conversion

Folders and files

Latest commit

History

Repository files navigation

The National Archives: Find Case Law

PDF Conversion Service

Deployment

Staging

Production

Republishing PDFs

Local setup

Local testing

About

Topics

Resources

License

Stars

Watchers

Forks

Languages