v0.4.0: Full support of PyTorch and a growing pretrained model zoo
This release brings the support of PyTorch out of beta, makes text recognition more robust, and provides light architectures for complex tasks.
Note: doctr 0.4.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.
Highlights
No more width limitation for text recognition
Some documents such as French ID card include very long strings that can be challenging to transcribe:
This release enables a smart split/merge strategy for wide crops to avoid performance drops. Previously the whole crop was analyzed altogether, while right now, it is split into reasonably sized crops, the inference is performed in batch then predictions are merged together.
The following snippet:
from doctr.io import DocumentFile
from doctr.models import ocr_predictor
doc = DocumentFile.from_images('path/to/img.png')
predictor = ocr_predictor(pretrained=True)
print(predictor(doc).pages[0])
used to yield:
Page(
dimensions=(447, 640)
(blocks): [Block(
(lines): [Line(
(words): [
Word(value='1XXXXXX', confidence=0.0023),
Word(value='1XXXX', confidence=0.0018),
]
)]
(artefacts): []
)]
)
and now yields:
Page(
dimensions=(447, 640)
(blocks): [Block(
(lines): [Line(
(words): [
Word(value='IDFRABERTHIER<<<<<<<<<<<<<<<<<<<<<<', confidence=0.49),
Word(value='8806923102858CORINNE<<<<<<<6512068F6', confidence=0.22),
]
)]
(artefacts): []
)]
)
Framework specific predictors
PyTorch support is now no longer in beta, so we made some efforts so that switching from one deep learning backend to another is unified 🙌 Predictors are designed to be the recommended interface for inference with your models!
0.3.1 (TensorFlow) | 0.3.1 (PyTorch) | 0.4.0 |
---|---|---|
>>> from doctr.models import detection_predictor >>> predictor = detection_predictor(pretrained=True) >>> out = predictor(doc, training=False) |
>>> from doctr.models import detection_predictor >>> import torch >>> predictor = detection_predictor(pretrained=True) >>> predictor.model.eval() >>> with torch.no_grad(): out = predictor(doc) |
>>> from doctr.models import detection_predictor >>> predictor = detection_predictor(pretrained=True) >>> out = predictor(doc) |
An evergrowing model zoo 🦓
As PyTorch goes out of beta, we have bridged the gap between PyTorch & TensorFlow pretrained models' availability. Additionally, by leveraging our integration of light backbones, this release comes with lighter architectures for text detection and text recognition:
- db_mobilenet_v3_large
- crnn_mobilenet_v3_small
- crnn_mobilenet_v3_large
The full list of supported architectures is available 👉 here
Demo live on HuggingFace Spaces
If you have enjoyed the Streamlit demo, but prefer not to run in on your own hardware, feel free to check out the online version on HuggingFace Spaces:
Courtesy of @osanseviero for deploying it, and HuggingFace for hosting & serving 🙏
Breaking changes
Deprecated crnn_resnet31 & sar_vgg16_bn
After going over some backbone compatibility and re-assessing whether all combinations should be trained, DocTR is focusing on reproducing the paper's authors' will or improve upon it. As such, we have deprecated the following recognition models (that had no pretrained params): crnn_resnet31
, sar_vgg16_bn
.
Deprecated models.export
Since doctr.models.export
was specific to TensorFlow and it didn't bring much more value than TensorFlow tutorials, we added instructions in the documentation and deprecated the submodule.
New features
Datasets
Resources to access data in efficient ways
- Added entry in vocabs for Portuguese #464 (@fmobrj), English, Spanish & German #467 (@fg-mindee), ancient Greek #500 (@fg-mindee)
IO
Features to manipulate input & outputs
- Added
.synthesize
method toPage
andDocument
#472 (@fg-mindee)
Models
Deep learning model building and inference
- Add dynamic crop splitting for wide inputs to recognition models #465 (@charlesmindee)
- Added MobileNets with rectangular pooling #483 (@fg-mindee)
- Added pretrained params for
db_mobilenet_v3_large
#485 #487 ,crnn_vgg16_bn
#487,db_resnet50
#489,crnn_mobilenet_v3_small
&crnn_mobilenet_v3_small
#517 #516 (@charlesmindee)
Utils
Utility features relevant to the library use cases.
- Added automatic font resolution function #472 (@fg-mindee)
Transforms
Data transformations operations
- Added
RandomCrop
transformation #448 (@charlesmindee)
Test
Verifications of the package well-being before release
- Added a unittest for
RandomCrop
#448 (@charlesmindee) - Added a unittest for crop split/merge in recognition models #465 (@charlesmindee)
- Added unittests for PyTorch OCR model zoo #499 (@fg-mindee)
Documentation
Online resources for potential users
- Added entry for
RandomCrop
#448 (@charlesmindee) - Added explanations about model export / compression #463 (@fg-mindee)
- Added benchmark entry for
db_mobilenet_v3_large
#485 in the documentation (@charlesmindee) - Added badge with hyperlink to HuggingFace Spaces demo #501 (@osanseviero)
References
Reference training scripts
- Added option to select vocab in the training of character classification and text recognition #502 (@fg-mindee)
Others
Other tools and implementations
- Added CI job to validate the demo, the evaluation script and the environment collection scripts #456 (@fg-mindee), the character classification training script #457 (@fg-mindee), the analysis & evaluation scripts in PyTorch #458 (@fg-mindee), the text recognition scripts #469 (@fg-mindee), the text detection scripts #491 (@fg-mindee)
- Added support of PyTorch for the analysis & evaluation scripts #458 (@fg-mindee)
Bug fixes
Datasets
- Fixed submodule import #451 (@fg-mindee )
- Added missing characters in French vocab #467 (@fg-mindee)
Models
- Fixed PyTorch preprocessor shape resolution #453 (@charlesmindee)
- Fixed Tensor cropping for channels_first format #458 #461 (@fg-mindee)
- Replaced recognition models' MobileNet backbones by their rectangular pooling counterparts #483 (@fg-mindee)
- Fixed crop extraction for PyTorch tensors #484 (@charlesmindee)
- Fixed crop filtering on multi-page inference #497 (@fg-mindee)
Transforms
- Fixed rounding errors in
RandomCrop
#473 (@fg-mindee)
Utils
- Fixed page synthesis for characters outside of latin-1 #496 (@fg-mindee)
Documentation
- Fixed READMEs of training scripts #504 #491 (@fg-mindee)
References
- Fixed the requirements of the training scripts #494 #491 (@fg-mindee)
Others
- Fixed the requirements of the streamlit demo #492 (@osanseviero), the API template #494 (@fg-mindee)
Improvements
Datasets
- Merged
DocDataset
&OCRDataset
#474 (@charlesmindee) - Updated
DetectionDataset
label format #491 (@fg-mindee)
Models
- Deprecated
doctr.models.export
#463 (@fg-mindee) - Deprecated
crnn_resnet31
&sar_vgg16_bn
recognition models #468 (@fg-mindee) - Relocated
DocumentBuilder
todoctr.models.builder
, split predictor into framework-specific objects #481 (@fg-mindee) - Added more robust argument checks in
DocumentBuilder
& refactored crop preparation and result processing in ocr predictors #497 (@fg-mindee) - Reflected changes of detection target formats on detection models #491 (@fg-mindee)
Utils
- Improved page synthesis with dynamic font size #472 (@fg-mindee)
Documentation
- Updated README badge & added release-specific documentation index #451 (@fg-mindee)
- Added logo in README & documentation #459 (@charlesmindee)
- Updated hyperlink to documentation in the README #462 (@fg-mindee)
- Updated vocab description in the documentation #467 (@fg-mindee)
- Added favicon in the documentation #466 (@fg-mindee)
- Removed benchmark entry of deprecated models #468 (@fg-mindee)
- Updated README of the text recognition training script #469 (@fg-mindee)
- Updated performance benchmark with crop splitting #471 (@charlesmindee)
- Added page synthesis example in README #472 (@fg-mindee)
- Made copyright mention dynamic, improved the landing & installation pages in the documentation #475 (@fg-mindee)
- Restructured the documentation #519 (@fg-mindee)
Tests
- Removed legacy unittests of
doctr.models.export
#463 (@fg-mindee) - Removed unittests for deprecated models #468 (@fg-mindee)
- Updated unittests with the new
doctr.utils.font
submodule #472 (@fg-mindee) - Reflected changes from predictor refactor #481 (@fg-mindee)
- Extended unittest of crop extraction #484 (@charlesmindee)
- Reflected changes from predictor crop preparation improvement #497 (@fg-mindee)
- Reflect changes from detection target format #491 (@fg-mindee)
References
- Reflected changes of detection dataset target format #491 (@fg-mindee)
Others
- Specified import of file_utils #447 (@zalakbhalani)
- Updated package version #451 (@fg-mindee)
- Updated PIL version constraint to fix vulnerability #460 (@fg-mindee)
- Updated model selection in the demo #468 (@fg-mindee)
- Removed some MacOS CI jobs that were slowing down PR checks #470 (@fg-mindee)
- Reflected page synthesis changes in demo #477 (@fg-mindee)
- Reflected changes from predictor refactor in API & demo #481 (@fg-mindee)
- Updated
author_email
in setup #493 (@fg-mindee) - Split CI jobs for pytest in
common
,pytorch
&tensorflow
#498 #503 #506 (@fg-mindee) - Removed unused imports #507 (@fg-mindee)
Many thanks to our contributors, we are delighted to see that there are more every week!