A proof of concept of a small OCR text recognition app using python + tesseract-ocr within Docker
docker@^24.0.3
- Clone the repo executing
git clone [email protected]:dmmarmol/python-ocr.git
- Navigate into
cd python-ocr
- Run the command
make build
(Makefile shortcut to build the docker image) - Run the command
make run
- Run the command
make shell
to attach a shell to the running container - Choose between bulk process images or process text
This command will read all images inside the images/source
directory and will extract the text content from each of them putting them all together in a new file inside images/output
- Deposit any
.jpg
orjpeg
file insideimages/source
directory - Navigate inside the container using an attached shell and from the
app/
directory, run the commandpython3 src/process-images.py
This command will read all .txt
files inside the text/source
directory and will normalize the text content from each of file putting them all together in a new file inside text/output
- Deposit any
.txt
file insidetext/source
directory - Navigate inside the container using an attached shell and from the
app/
directory, run the commandpython3 src/process-text.py
Build the Dockerfile image
make build
Run a docker container instance of the Dockerimage
make run
make stop
make remove