[TrOCR] How to run inference on multiline text image #628

mariababich · 2022-02-21T11:43:11Z

Hello!

I am wondering how to run TrOCR for the whole image with a lot of text. The tutorials show how the model works with single line images. When tried to run it on image with a lot of text - it did not worked. How the inference could be scaled?

Thanks in advance, Mariia.

wolfshow · 2022-02-22T08:49:38Z

@mariababich TrOCR is designed for single-line text recognition. You need to use a text detector to get textlines.

NielsRogge · 2022-02-23T14:02:07Z

Yes, you can combine TrOCR with CRAFT for instance:

CRAFT can handle the text detection
TrOCR can handle the text recognition.

nyck33 · 2023-07-15T04:16:50Z

@NielsRogge I just tried to use CRAFT but it's using torch < 1.0 which makes it impossible? So bard recommended paddleocr. Please let me know what you think. My final goal is to do exactly this, ocr on multiline text but my inputs are handwritten homework assignments for school kids.

NielsRogge · 2023-07-15T09:55:15Z

Hi @nyck33 you can try https://github.com/fcakyon/craft-text-detector which is a packaged and more up-to-date version of CRAFT

nyck33 · 2023-07-15T14:48:15Z

@NielsRogge thanks! It does look more up-to-date but I was getting the model_urls error so referenced this: clovaai/CRAFT-pytorch#191, tried downgrading torchvision to 0.13 and deleting those 2 lines and now I'm getting

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 4
      1 craft = Craft(output_dir=output_dir, crop_type="poly", cuda=True)
      3 # apply craft text detection and export detected regions to output directory
----> 4 prediction_result = craft.detect_text(image_path)
      6 #unload models from ram/gpu
      7 craft.unload_craftnet_model()

File /mnt/d/chatgpt/ocr/craft-text-detector/craft_text_detector/__init__.py:131, in Craft.detect_text(self, image, image_path)
    128     image = image_path
    130 # perform prediction
--> 131 prediction_result = get_prediction(
    132     image=image,
    133     craft_net=self.craft_net,
    134     refine_net=self.refine_net,
    135     text_threshold=self.text_threshold,
    136     link_threshold=self.link_threshold,
    137     low_text=self.low_text,
    138     cuda=self.cuda,
    139     long_size=self.long_size,
    140 )
    142 # arange regions
    143 if self.crop_type == "box":
...
--> 415         polys = np.array(polys)
    416         for k in range(len(polys)):
    417             if polys[k] is not None:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (31,) + inhomogeneous part.
Output is truncated. View as a [scrollable element](command:cellOutput.enableScrolling?6c1494cc-9da4-4d41-ad77-c5b933872a97) or open in a [text editor](command:workbench.action.openLargeOutput?6c1494cc-9da4-4d41-ad77-c5b933872a97). Adjust cell output [settings](command:workbench.action.openSettings?%5B%22%40tag%3AnotebookOutputLayout%22%5D)...

for the basic usage example in that repo and for the advanced:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 24
     21 craft_net = load_craftnet_model(cuda=True)
     23 # perform prediction
---> 24 prediction_result = get_prediction(
     25     image=image,
     26     craft_net=craft_net,
     27     refine_net=refine_net,
     28     text_threshold=0.7,
     29     link_threshold=0.4,
     30     low_text=0.4,
     31     cuda=True,
     32     long_size=1280
     33 )
     35 # export detected text regions
     36 exported_file_paths = export_detected_regions(
     37     image=image,
     38     regions=prediction_result["boxes"],
     39     output_dir=output_dir,
     40     rectify=True
     41 )

File /mnt/d/chatgpt/ocr/craft-text-detector/craft_text_detector/predict.py:91, in get_prediction(image, craft_net, refine_net, text_threshold, link_threshold, low_text, cuda, long_size, poly)
     89 # coordinate adjustment
...
--> 415         polys = np.array(polys)
    416         for k in range(len(polys)):
    417             if polys[k] is not None:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (31,) + inhomogeneous part.
Output is truncated. View as a [scrollable element](command:cellOutput.enableScrolling?0f6fa27f-da18-4605-a011-ebf8c3411d9b) or open in a [text editor](command:workbench.action.openLargeOutput?0f6fa27f-da18-4605-a011-ebf8c3411d9b). Adjust cell output [settings](command:workbench.action.openSettings?%5B%22%40tag%3AnotebookOutputLayout%22%5D)...

nyck33 · 2023-07-16T06:26:52Z

I'll make note that I tried out a bunch and KerasOCR so far was the best at drawing bounding boxes around handwritten text images. I also tried Donut on Hugging Face but the results were disappointing.

bit-scientist · 2023-08-30T09:08:45Z

Hi, @nyck33, I am going through exactly the same project as you have done. Could you share your recent insights as to which handwritten text detector worked best for your images? I'd appreciate your help. Thank you!

nyck33 · 2023-08-30T09:11:36Z

You won't like my answer but for me since it's a part of an app, I went with Cloud Vision on gcp. ChatGPT wrote my code to make the API calls. Get Outlook for Android<https://aka.ms/AAb9ysg>

…

________________________________ From: bit-scientist ***@***.***> Sent: Wednesday, August 30, 2023 6:08:55 PM To: microsoft/unilm ***@***.***> Cc: Kim, Nobutaka ***@***.***>; Mention ***@***.***> Subject: Re: [microsoft/unilm] [TrOCR] How to run inference on multiline text image (Issue #628) Hi, @nyck33<https://github.com/nyck33>, I am going through exactly the same project as you have done. Could you share your recent insights as to which handwritten text detector worked best for your images? I'd appreciate your help. Thank you! — Reply to this email directly, view it on GitHub<#628 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGAFZKLTABR3BYZUXLB57JLXX37KPANCNFSM5O6K6P5A>. You are receiving this because you were mentioned.Message ID: ***@***.***>

bit-scientist · 2023-08-30T09:34:06Z

Oh, I see, thanks @nyck33. Are you using Cloud vision for text detection only or for both (detection+recognition)? How is it doing in terms of CER rate?

anandhuh1234 · 2024-03-22T05:18:00Z

I've trained a YOLOv5 model specifically for detecting both handwritten and printed texts. After that, I extract and forward the identified handwritten lines from the image to TrOCR for processing.

myhub · 2024-03-22T06:03:10Z

I think with some extra work TrOCR can also be used for multiline text image,
Based on my experiments crnn_for_text_with_multiple_lines, To make TrOCR suitable for multiline text image, one need to:

regenerate or label training samples with multiline text
retrain the model with a larger input image size (e.g. 512*512px)

And multiline text also means you need much more training samples than single-line.
Also the input image and output sequence will be larger which means you need much more GPUs to do the work

In some situation. text line detection is hard e.g. curved text, so I think it is meaningful to train a multiline-version TrOCR which reduce the need for text line detection

wolfshow closed this as completed May 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TrOCR] How to run inference on multiline text image #628

[TrOCR] How to run inference on multiline text image #628

mariababich commented Feb 21, 2022

wolfshow commented Feb 22, 2022

NielsRogge commented Feb 23, 2022

nyck33 commented Jul 15, 2023

NielsRogge commented Jul 15, 2023

nyck33 commented Jul 15, 2023

nyck33 commented Jul 16, 2023

bit-scientist commented Aug 30, 2023

nyck33 commented Aug 30, 2023 via email

bit-scientist commented Aug 30, 2023 •

edited

Loading

anandhuh1234 commented Mar 22, 2024

myhub commented Mar 22, 2024 •

edited

Loading

[TrOCR] How to run inference on multiline text image #628

[TrOCR] How to run inference on multiline text image #628

Comments

mariababich commented Feb 21, 2022

wolfshow commented Feb 22, 2022

NielsRogge commented Feb 23, 2022

nyck33 commented Jul 15, 2023

NielsRogge commented Jul 15, 2023

nyck33 commented Jul 15, 2023

nyck33 commented Jul 16, 2023

bit-scientist commented Aug 30, 2023

nyck33 commented Aug 30, 2023 via email

bit-scientist commented Aug 30, 2023 • edited Loading

anandhuh1234 commented Mar 22, 2024

myhub commented Mar 22, 2024 • edited Loading

bit-scientist commented Aug 30, 2023 •

edited

Loading

myhub commented Mar 22, 2024 •

edited

Loading