-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TrOCR] How to run inference on multiline text image #628
Comments
@mariababich TrOCR is designed for single-line text recognition. You need to use a text detector to get textlines. |
Yes, you can combine TrOCR with CRAFT for instance:
|
@NielsRogge I just tried to use CRAFT but it's using torch < 1.0 which makes it impossible? So bard recommended paddleocr. Please let me know what you think. My final goal is to do exactly this, ocr on multiline text but my inputs are handwritten homework assignments for school kids. |
Hi @nyck33 you can try https://github.com/fcakyon/craft-text-detector which is a packaged and more up-to-date version of CRAFT |
@NielsRogge thanks! It does look more up-to-date but I was getting the
for the basic usage example in that repo and for the advanced:
|
I'll make note that I tried out a bunch and KerasOCR so far was the best at drawing bounding boxes around handwritten text images. I also tried Donut on Hugging Face but the results were disappointing. |
Hi, @nyck33, I am going through exactly the same project as you have done. Could you share your recent insights as to which handwritten text detector worked best for your images? I'd appreciate your help. Thank you! |
You won't like my answer but for me since it's a part of an app, I went with Cloud Vision on gcp. ChatGPT wrote my code to make the API calls.
Get Outlook for Android<https://aka.ms/AAb9ysg>
…________________________________
From: bit-scientist ***@***.***>
Sent: Wednesday, August 30, 2023 6:08:55 PM
To: microsoft/unilm ***@***.***>
Cc: Kim, Nobutaka ***@***.***>; Mention ***@***.***>
Subject: Re: [microsoft/unilm] [TrOCR] How to run inference on multiline text image (Issue #628)
Hi, @nyck33<https://github.com/nyck33>, I am going through exactly the same project as you have done. Could you share your recent insights as to which handwritten text detector worked best for your images? I'd appreciate your help. Thank you!
—
Reply to this email directly, view it on GitHub<#628 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGAFZKLTABR3BYZUXLB57JLXX37KPANCNFSM5O6K6P5A>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Oh, I see, thanks @nyck33. Are you using Cloud vision for text detection only or for both (detection+recognition)? How is it doing in terms of CER rate? |
I've trained a YOLOv5 model specifically for detecting both handwritten and printed texts. After that, I extract and forward the identified handwritten lines from the image to TrOCR for processing. |
I think with some extra work TrOCR can also be used for multiline text image,
And multiline text also means you need much more training samples than single-line. In some situation. text line detection is hard e.g. curved text, so I think it is meaningful to train a multiline-version TrOCR which reduce the need for text line detection |
Hello!
I am wondering how to run TrOCR for the whole image with a lot of text. The tutorials show how the model works with single line images. When tried to run it on image with a lot of text - it did not worked. How the inference could be scaled?
Thanks in advance, Mariia.
The text was updated successfully, but these errors were encountered: