This repository contains OCR Service built using Axum.
The full list of crates used can be found in the Cargo.toml file. However, here are some key ones:
- axum - A user-friendly, modular web framework built with Tokio, Tower, and Hyper.
- tesseract-rs - A library for OCR using Tesseract Rust bindings.
- image - An image processing library for Rust.
- Insta - A library for snapshot testing in Rust.
- utoipa - A library for generating OpenAPI documentation in Rust.
- opentelemetry-rust - OpenTelemetry for Rust.
You can use the .env.example file or src/config/app_config.rs to view and configure the application.
The TESSDATA_PATH
environment variable allows you to specify where the Tesseract data files are stored. By default, it's set to tesseract
. If you change this path, make sure to:
- Update the environment variable in your
.env
file - If using Docker or Docker Compose, pass the same path as a build argument to ensure your data gets copied into the container correctly
The OCR Service requires the Tesseract data files to be downloaded into the tesseract directory. You can download the files manually via these repositories:
Or by running the scripts/download-tessdata.sh script.
With everything else set up, all you need to do now is:
cargo run
Run tests:
cargo test
Run Snapshot tests:
cargo insta test
Run and review Snapshot tests:
cargo insta test --review
Run Clippy:
cargo clippy
Run Rustfmt:
cargo fmt
For building and running the docker image locally:
# If using the default TESSDATA_PATH (tesseract)
docker build -t ocr-service .
docker run -p 8080:8080 ocr-service
# If using a custom TESSDATA_PATH, specify it as both a build argument and environment variable
docker build --build-arg TESSDATA_PATH=your/custom/path -t ocr-service .
docker run -p 8080:8080 -e TESSDATA_PATH=your/custom/path ocr-service
If using Docker Compose, you can specify the custom path in the compose file:
services:
ocr_service:
build:
context: ../
args:
- TESSDATA_PATH=your/custom/path
dockerfile: Dockerfile
Make sure the TESSDATA_PATH
build argument matches the environment variable you've set in your .env
file.
The API documentation is available at http://localhost:8080/api-docs when running locally.
Send a file to the /api/v1/images
endpoint to process the image.
curl -X POST -F "image=@./tests/images/tessdoc-introduction.png" \
"http://localhost:8080/api/v1/images?language=eng"
Send a file to the /api/v1/images
endpoint to process the image with a specific model.
curl -X POST -F "image=@./tests/images/chinese-simplified-sign.jpg" \
"http://localhost:8080/api/v1/images?language=chi_sim&model=chi_sim"
Get all available languages and models.
curl http://localhost:8080/api/v1/languages
curl http://localhost:8080/system/health