Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[models] Add VIPTR recognition model #1867

Open
2 tasks
felixdittrich92 opened this issue Feb 5, 2025 · 3 comments
Open
2 tasks

[models] Add VIPTR recognition model #1867

felixdittrich92 opened this issue Feb 5, 2025 · 3 comments
Assignees
Labels
ext: docs Related to docs folder ext: tests Related to tests folder framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend module: models Related to doctr.models topic: documentation Improvements or additions to documentation topic: text recognition Related to the task of text recognition type: new feature New feature
Milestone

Comments

@felixdittrich92
Copy link
Contributor

felixdittrich92 commented Feb 5, 2025

🚀 The feature

Ref.: #1826

Paper: VIPTR
Implementation: https://github.com/cxfyxl/VIPTR

  • PyTorch implementation
  • TensorFlow implementation
Hi, 

I would like to suggest possibly introducing another state-of-the-art text recognition architecture to docTR.
[SVIPTR](https://paperswithcode.com/paper/viptr-a-vision-permutable-extractor-for-fast)
It's promising accurate results at low latency.

Notably, the SVIPTR-T (Tiny) variant delivers highly competitive accuracy on par with other lightweight models and achieves SOTA inference speeds. Meanwhile, the SVIPTR-L (Large) attains SOTA accuracy in single-encoder-type models, while maintaining a low parameter count and favorable inference speed.

Thanks for your consideration.

Inference latency should be comparable to crnn_mobilenet_v3_large and the results are hopefully comparable to parseq.
The addition is agreed.

@felixdittrich92 felixdittrich92 added ext: docs Related to docs folder ext: tests Related to tests folder framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend module: models Related to doctr.models topic: documentation Improvements or additions to documentation topic: text recognition Related to the task of text recognition type: new feature New feature labels Feb 5, 2025
@felixdittrich92 felixdittrich92 added this to the 1.0.0 milestone Feb 5, 2025
@felixdittrich92 felixdittrich92 self-assigned this Feb 5, 2025
@felixdittrich92
Copy link
Contributor Author

If someone wants to work on this feel free to ping here. Otherwise I planned to start working on it after we have some strategy done to make docTR multilingual.

@lkosh
Copy link

lkosh commented Feb 28, 2025

Hi, if the issue is still open, I'd like to work on the pytorch implementation of this feature. I've been using doctr quite extensively lately and I'd like to help improving this wonderful project :)

@felixdittrich92
Copy link
Contributor Author

felixdittrich92 commented Feb 28, 2025

Hi, if the issue is still open, I'd like to work on the pytorch implementation of this feature. I've been using doctr quite extensively lately and I'd like to help improving this wonderful project :)

Hey @lkosh 👋

Sure highly apprecated 👍

I started already with the PT implementation maybe this would be a good starting point for you to continue:

main...felixdittrich92:doctr:viptr-torch

Some points which are missing:

    1. Cleanup layers
    1. I had in mind to refactor the VIPBlock that the final VIPNet can inerhit from nn.Sequential comparable to all the other classification models we have -> avoid the mixer_types condition
    1. Implement the recognition part -> VIPNet as feature extractor from the classification module + custom (linear) head + CTC loss + postprocessor + base class _VIPTR for target building
    1. Testing everything (dummy run -> I can provide a dataset for testing or test it on my machine)
    1. Add missing unittest + documentation entries
  • port code to TensorFlow (optional - sec PR)

Maybe the vitstr PR as reference it shows all the required parts:

https://github.com/mindee/doctr/pull/1055/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ext: docs Related to docs folder ext: tests Related to tests folder framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend module: models Related to doctr.models topic: documentation Improvements or additions to documentation topic: text recognition Related to the task of text recognition type: new feature New feature
Projects
None yet
Development

No branches or pull requests

2 participants