Skip to content
forked from OCR-D/core

Collection of OCR-related python tools and wrappers from @OCR-D

License

Notifications You must be signed in to change notification settings

tdoan2010/ocrd-core

 
 

Repository files navigation

OCR-D/core

Python modules implementing OCR-D specs and related tools

image image image image Docker Automated build image image

Gitter chat

Introduction

This repository contains the python packages that form the base for tools within the OCR-D ecosphere.

All packages are also published to PyPI.

Installation

NOTE Unless you want to contribute to OCR-D/core, we recommend installation as part of ocrd_all which installs a complete stack of OCR-D-related software.

The easiest way to install is via pip:

pip install ocrd

# or just the functionality you need, e.g.

pip install ocrd_modelfactory

All python software released by OCR-D requires Python 3.6 or higher.

NOTE Some OCR-D-Tools (or even test cases) might reveal an unintended behavior if you have specific environment modifications, like:

  • using a custom build of ImageMagick, whose format delegates are different from what OCR-D supposes
  • custom Python logging configurations in your personal account

Command line tools

NOTE: All OCR-D CLI tools support a --help flag which shows usage and supported flags, options and arguments.

ocrd CLI

ocrd-dummy CLI

A minimal OCR-D processor that copies from -I/-input-file-grp to -O/-output-file-grp

Packages

ocrd_utils

Contains utilities and constants, e.g. for logging, path normalization, coordinate calculation etc.

See README for ocrd_utils for further information.

ocrd_models

Contains file format wrappers for PAGE-XML, METS, EXIF metadata etc.

See README for ocrd_models for further information.

ocrd_modelfactory

Code to instantiate models from existing data.

See README for ocrd_modelfactory for further information.

ocrd_validators

Schemas and routines for validating BagIt, ocrd-tool.json, workspaces, METS, page, CLI parameters etc.

See README for ocrd_validators for further information.

ocrd

Depends on all of the above, also contains decorators and classes for creating OCR-D processors and CLIs.

Also contains the command line tool ocrd.

See README for ocrd for further information.

bash library

Builds a bash script that can be sourced by other bash scripts to create OCRD-compliant CLI.

bashlib API

ocrd__raise

Raise an error and exit.

ocrd__log

Delegate logging to ocrd log

ocrd__minversion

Ensure minimum version

ocrd__dumpjson

Output ocrd-tool.json.

Requires $OCRD_TOOL_JSON and $OCRD_TOOL_NAME to be set:

export OCRD_TOOL_JSON=/path/to/ocrd-tool.json
export OCRD_TOOL_NAME=ocrd-foo-bar

Output file resource content.

Output file resources names.

ocrd__usage

Print usage

ocrd__parse_argv

Expects an associative array ("hash"/"dict") ocrd__argv to be defined:

declare -A ocrd__argv=()

usage: pageId=$(ocrd__input_file 3 pageId)

Testing

Download assets (make assets)

Test with local files: make test

  • Test with remote assets:
    • make test OCRD_BASEURL='https://github.com/OCR-D/assets/raw/master/data/'

See Also

About

Collection of OCR-related python tools and wrappers from @OCR-D

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.9%
  • Other 1.1%