Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong symbol bounding box coordinates #2024

Closed
vidiecan opened this issue Oct 23, 2018 · 11 comments
Closed

Wrong symbol bounding box coordinates #2024

vidiecan opened this issue Oct 23, 2018 · 11 comments

Comments

@vidiecan
Copy link

I am posting it here because it might be related to #1712 and #1192. I think it is questionable what should be the solution because some might prefer the current behaviour.
Can be closed any time.

Behaviour: doing OCR using LSTM with specific model returns invalid character bounding box when calling PageIterator::BoundingBox.

Reproducibility: very likely not because of the missing model.

Input image (without the coloured lines):
image

Before LSTM, fake words are created that contain correct blobs based on outlines. After LSTM, RecodeBeamSearch::InitializeWord, blobs are computed here

for (int i = word_start; i < word_end; ++i) {
using x positions based on the timestep. Now, the start x position where the model started to recognise 2 (plus an estimated window) is the start of the green line in the picture. It can be argued that it starts too early and stops too soon (very roughly said like before seeing the end of 2). Moreover, the computed width span is reduced here
min_half_width = xcoords[i] - xcoords[i - 1];

Then, when constructing the symbol bounding box in PAGE_RES_IT::ReplaceCurrentWord this condition is not met

src_b_it.data()->bounding_box().x_middle() < end_x) {
because the computed end_x is too far to the left. For the record, the ends are computed at
blob_end = (blob_box.right() + blob_it.data()->bounding_box().left()) / 2;
The result is that the bbox is "unitialised" (max int values for left, -max int for right).

The code that should handle this situation is below at

for (int i = 0; i < box_word->length(); ++i) {

However, it fails too because the cblobs are further to right than the wrongly computed blob end.

The unitialised bounding box gets "fixed" when calling BoundingBox at

*left = ClipToRange(static_cast<int>(box.left()), 0, pix_width);

setting the coordinates to 0, 0, image height/width.

@amitdo
Copy link
Collaborator

amitdo commented Oct 23, 2018

Do you have a fix that will work well, even in the presence of ligatures and noise?
Also, it should work for all supported langs.

@vidiecan
Copy link
Author

@amitdo I do not, I would have sent a PR if I had :)

@zdenop
Copy link
Contributor

zdenop commented Jul 1, 2019

Is this issue fixed in 4.1/current code?

@stweil
Copy link
Contributor

stweil commented Jul 3, 2019

@noahmetzger, do you get better character boxes for this example with your latest code?

@noahmetzger
Copy link
Contributor

I think so. Will test this later.

@StephenRUK
Copy link

I would also be curious if this is fixed. Working with v4.0 there is often trouble with wider characters, which then causes an offset of all following character bounding boxes. Any news above?

@zdenop
Copy link
Contributor

zdenop commented Jul 12, 2019

Did you try 4.1.0?

@StephenRUK
Copy link

I tried 4.1.0-rc1 at the time, with slightly different but not fixed results. Ok so 4.1.0 is a definite release now, great thanks! I will give the Windows installer a try in combination with the tesserocr Python package if possible.

@stweil
Copy link
Contributor

stweil commented Jul 16, 2019

This should be fixed by pull request #2576.

@Shreeshrii
Copy link
Collaborator

tesseract wrong.png  - -l eng  --tessdata-dir ~/tessdata_fast  --oem 1 --psm 6 makebox

1 32 15 40 50 0
2 113 16 136 51 0
9 139 15 162 51 0
3 167 15 186 51 0
. 193 16 200 21 0
0 206 15 227 51 0
0 232 15 254 51 0
tesseract -v

tesseract 5.0.0-alpha-322-g74ac
 leptonica-1.78.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.4.4 : libopenjp2 2.3.0

Issue is fixed in master branch.

@zdenop Please close.

@zdenop zdenop closed this as completed Jul 19, 2019
@RicketyRick
Copy link

RicketyRick commented Dec 12, 2019

I am not sure if the following observation is covered by this ticket. But maybe it needs to be reopened.
We had several issues with bounding box coordinates and tried makebox, hocr with the executable and even on API level. Now we tested the often used "thousand Billion" image on the latest 5.0.0-alpha version on windows to check if it was fixed but again got the same starting coordinates for some letters:

   <span class='ocrx_cinfo' title='x_bboxes 210 22 218 52; x_conf 99.543304'>B</span>
   <span class='ocrx_cinfo' title='x_bboxes 210 23 234 52; x_conf 99.536743'>i</span>

We have this issue on all of our example files and on all 4.* versions of tesseract, too.

Thanks for all you work and have a great christmas time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants