This release brings the support of PyTorch out of beta, makes text recognition more robust, and provides light architectures for complex tasks.

Note: doctr 0.4.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

No more width limitation for text recognition

Some documents such as French ID card include very long strings that can be challenging to transcribe:

This release enables a smart split/merge strategy for wide crops to avoid performance drops. Previously the whole crop was analyzed altogether, while right now, it is split into reasonably sized crops, the inference is performed in batch then predictions are merged together.

The following snippet:

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

doc = DocumentFile.from_images('path/to/img.png')
predictor = ocr_predictor(pretrained=True)
print(predictor(doc).pages[0])

used to yield:

Page(
  dimensions=(447, 640)
  (blocks): [Block(
    (lines): [Line(
      (words): [
        Word(value='1XXXXXX', confidence=0.0023),
        Word(value='1XXXX', confidence=0.0018),
      ]
    )]
    (artefacts): []
  )]
)

and now yields:

Page(
  dimensions=(447, 640)
  (blocks): [Block(
    (lines): [Line(
      (words): [
        Word(value='IDFRABERTHIER<<<<<<<<<<<<<<<<<<<<<<', confidence=0.49),
        Word(value='8806923102858CORINNE<<<<<<<6512068F6', confidence=0.22),
      ]
    )]
    (artefacts): []
  )]
)

Framework specific predictors

PyTorch support is now no longer in beta, so we made some efforts so that switching from one deep learning backend to another is unified 🙌 Predictors are designed to be the recommended interface for inference with your models!

0.3.1 (TensorFlow)	0.3.1 (PyTorch)	0.4.0
`>>> from doctr.models import detection_predictor` `>>> predictor = detection_predictor(pretrained=True)` `>>> out = predictor(doc, training=False)`	`>>> from doctr.models import detection_predictor` `>>> import torch` `>>> predictor = detection_predictor(pretrained=True)` `>>> predictor.model.eval()` `>>> with torch.no_grad(): out = predictor(doc)`	`>>> from doctr.models import detection_predictor` `>>> predictor = detection_predictor(pretrained=True)` `>>> out = predictor(doc)`

An evergrowing model zoo 🦓

As PyTorch goes out of beta, we have bridged the gap between PyTorch & TensorFlow pretrained models' availability. Additionally, by leveraging our integration of light backbones, this release comes with lighter architectures for text detection and text recognition:

db_mobilenet_v3_large
crnn_mobilenet_v3_small
crnn_mobilenet_v3_large

The full list of supported architectures is available 👉 here

Demo live on HuggingFace Spaces

If you have enjoyed the Streamlit demo, but prefer not to run in on your own hardware, feel free to check out the online version on HuggingFace Spaces:

Courtesy of @osanseviero for deploying it, and HuggingFace for hosting & serving 🙏

Breaking changes

Deprecated crnn_resnet31 & sar_vgg16_bn

After going over some backbone compatibility and re-assessing whether all combinations should be trained, DocTR is focusing on reproducing the paper's authors' will or improve upon it. As such, we have deprecated the following recognition models (that had no pretrained params): crnn_resnet31, sar_vgg16_bn.

Deprecated models.export

Since doctr.models.export was specific to TensorFlow and it didn't bring much more value than TensorFlow tutorials, we added instructions in the documentation and deprecated the submodule.

New features

Datasets

Resources to access data in efficient ways

Added entry in vocabs for Portuguese #464 (@fmobrj), English, Spanish & German #467 (@fg-mindee), ancient Greek #500 (@fg-mindee)

IO

Features to manipulate input & outputs

Added .synthesize method to Page and Document #472 (@fg-mindee)

Models

Deep learning model building and inference

Add dynamic crop splitting for wide inputs to recognition models #465 (@charlesmindee)
Added MobileNets with rectangular pooling #483 (@fg-mindee)
Added pretrained params for db_mobilenet_v3_large #485 #487 , crnn_vgg16_bn #487, db_resnet50 #489, crnn_mobilenet_v3_small & crnn_mobilenet_v3_small #517 #516 (@charlesmindee)

Utils

Utility features relevant to the library use cases.

Added automatic font resolution function #472 (@fg-mindee)

Transforms

Data transformations operations

Added RandomCrop transformation #448 (@charlesmindee)

Test

Verifications of the package well-being before release

Added a unittest for RandomCrop #448 (@charlesmindee)
Added a unittest for crop split/merge in recognition models #465 (@charlesmindee)
Added unittests for PyTorch OCR model zoo #499 (@fg-mindee)

Documentation

Online resources for potential users

Added entry for RandomCrop #448 (@charlesmindee)
Added explanations about model export / compression #463 (@fg-mindee)
Added benchmark entry for db_mobilenet_v3_large #485 in the documentation (@charlesmindee)
Added badge with hyperlink to HuggingFace Spaces demo #501 (@osanseviero)

References

Reference training scripts

Added option to select vocab in the training of character classification and text recognition #502 (@fg-mindee)

Others

Other tools and implementations

Added CI job to validate the demo, the evaluation script and the environment collection scripts #456 (@fg-mindee), the character classification training script #457 (@fg-mindee), the analysis & evaluation scripts in PyTorch #458 (@fg-mindee), the text recognition scripts #469 (@fg-mindee), the text detection scripts #491 (@fg-mindee)
Added support of PyTorch for the analysis & evaluation scripts #458 (@fg-mindee)

Bug fixes

Datasets

Fixed submodule import #451 (@fg-mindee )
Added missing characters in French vocab #467 (@fg-mindee)

Models

Fixed PyTorch preprocessor shape resolution #453 (@charlesmindee)
Fixed Tensor cropping for channels_first format #458 #461 (@fg-mindee)
Replaced recognition models' MobileNet backbones by their rectangular pooling counterparts #483 (@fg-mindee)
Fixed crop extraction for PyTorch tensors #484 (@charlesmindee)
Fixed crop filtering on multi-page inference #497 (@fg-mindee)

Transforms

Fixed rounding errors in RandomCrop #473 (@fg-mindee)

Utils

Fixed page synthesis for characters outside of latin-1 #496 (@fg-mindee)

Documentation

Fixed READMEs of training scripts #504 #491 (@fg-mindee)

References

Fixed the requirements of the training scripts #494 #491 (@fg-mindee)

Others

Fixed the requirements of the streamlit demo #492 (@osanseviero), the API template #494 (@fg-mindee)

Improvements

Datasets

Merged DocDataset & OCRDataset #474 (@charlesmindee)
Updated DetectionDataset label format #491 (@fg-mindee)

Models

Deprecated doctr.models.export #463 (@fg-mindee)
Deprecated crnn_resnet31 & sar_vgg16_bn recognition models #468 (@fg-mindee)
Relocated DocumentBuilder to doctr.models.builder, split predictor into framework-specific objects #481 (@fg-mindee)
Added more robust argument checks in DocumentBuilder & refactored crop preparation and result processing in ocr predictors #497 (@fg-mindee)
Reflected changes of detection target formats on detection models #491 (@fg-mindee)

Utils

Improved page synthesis with dynamic font size #472 (@fg-mindee)

Documentation

Updated README badge & added release-specific documentation index #451 (@fg-mindee)
Added logo in README & documentation #459 (@charlesmindee)
Updated hyperlink to documentation in the README #462 (@fg-mindee)
Updated vocab description in the documentation #467 (@fg-mindee)
Added favicon in the documentation #466 (@fg-mindee)
Removed benchmark entry of deprecated models #468 (@fg-mindee)
Updated README of the text recognition training script #469 (@fg-mindee)
Updated performance benchmark with crop splitting #471 (@charlesmindee)
Added page synthesis example in README #472 (@fg-mindee)
Made copyright mention dynamic, improved the landing & installation pages in the documentation #475 (@fg-mindee)
Restructured the documentation #519 (@fg-mindee)

Tests

Removed legacy unittests of doctr.models.export #463 (@fg-mindee)
Removed unittests for deprecated models #468 (@fg-mindee)
Updated unittests with the new doctr.utils.font submodule #472 (@fg-mindee)
Reflected changes from predictor refactor #481 (@fg-mindee)
Extended unittest of crop extraction #484 (@charlesmindee)
Reflected changes from predictor crop preparation improvement #497 (@fg-mindee)
Reflect changes from detection target format #491 (@fg-mindee)

References

Reflected changes of detection dataset target format #491 (@fg-mindee)

Others

Specified import of file_utils #447 (@zalakbhalani)
Updated package version #451 (@fg-mindee)
Updated PIL version constraint to fix vulnerability #460 (@fg-mindee)
Updated model selection in the demo #468 (@fg-mindee)
Removed some MacOS CI jobs that were slowing down PR checks #470 (@fg-mindee)
Reflected page synthesis changes in demo #477 (@fg-mindee)
Reflected changes from predictor refactor in API & demo #481 (@fg-mindee)
Updated author_email in setup #493 (@fg-mindee)
Split CI jobs for pytest in common, pytorch & tensorflow #498 #503 #506 (@fg-mindee)
Removed unused imports #507 (@fg-mindee)

Many thanks to our contributors, we are delighted to see that there are more every week!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0: Full support of PyTorch and a growing pretrained model zoo

Highlights

No more width limitation for text recognition

Framework specific predictors

An evergrowing model zoo 🦓

Demo live on HuggingFace Spaces

Breaking changes

Deprecated crnn_resnet31 & sar_vgg16_bn

Deprecated models.export

New features

Datasets

IO

Models

Utils

Transforms

Test

Documentation

References

Others

Bug fixes

Datasets

Models

Transforms

Utils

Documentation

References

Others

Improvements

Datasets

Models

Utils

Documentation

Tests

References

Others

Contributors