A new SOTA text recognition architecture - SVIPTR #1826

milosacimovic · 2024-12-23T13:35:35Z

milosacimovic
Dec 23, 2024

Hi,

I would like to suggest possibly introducing another state-of-the-art text recognition architecture to docTR.
SVIPTR
It's promising accurate results at low latency.

Notably, the SVIPTR-T (Tiny) variant delivers highly competitive accuracy on par with other lightweight models and achieves SOTA inference speeds. Meanwhile, the SVIPTR-L (Large) attains SOTA accuracy in single-encoder-type models, while maintaining a low parameter count and favorable inference speed.

Thanks for your consideration.

felixdittrich92 · 2024-12-23T18:37:15Z

felixdittrich92
Dec 23, 2024
Maintainer

Hi @milosacimovic 👋🏼,

Thanks for sharing with us, I will have a look on it after vacation 😊
Does not look like there are benchmarks against parseq / vitstr actually right ?

0 replies

milosacimovic · 2024-12-24T08:32:01Z

milosacimovic
Dec 24, 2024
Author

Hi @felixdittrich92 👋,

Happy holidays! 🎄
Here's their benchmark table
It does not include parseq but it seems it's a bit less accurate on IC13 dataset. However, the large variant seems to be better on average than some really big models like TrOCR for example while being a lot faster and smaller.

2 replies

felixdittrich92 Dec 24, 2024
Maintainer

Thanks wish you too a few days to relax 👍🏼🤗
I quickly checked the paper and benchmarks today and it looks like a valid candidate to be added to docTR .. so feel free to open an issue ticket.. from inference latency it should be comparable with crnn_mobilenet_v3_large by keeping the capacity comparable to parseq☺️

felixdittrich92 Dec 24, 2024
Maintainer

Would you like to work on it ? :)

felixdittrich92 · 2025-02-05T12:36:31Z

felixdittrich92
Feb 5, 2025
Maintainer

#1867

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A new SOTA text recognition architecture - SVIPTR #1826

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

A new SOTA text recognition architecture - SVIPTR #1826

milosacimovic Dec 23, 2024

Replies: 3 comments · 2 replies

felixdittrich92 Dec 23, 2024 Maintainer

milosacimovic Dec 24, 2024 Author

felixdittrich92 Dec 24, 2024 Maintainer

felixdittrich92 Dec 24, 2024 Maintainer

felixdittrich92 Feb 5, 2025 Maintainer

milosacimovic
Dec 23, 2024

Replies: 3 comments 2 replies

felixdittrich92
Dec 23, 2024
Maintainer

milosacimovic
Dec 24, 2024
Author

felixdittrich92 Dec 24, 2024
Maintainer

felixdittrich92 Dec 24, 2024
Maintainer

felixdittrich92
Feb 5, 2025
Maintainer