Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve polygons of bottom lines #544

Open
dstoekl opened this issue Sep 9, 2023 · 15 comments
Open

improve polygons of bottom lines #544

dstoekl opened this issue Sep 9, 2023 · 15 comments

Comments

@dstoekl
Copy link

dstoekl commented Sep 9, 2023

typical case:
image
image
image

@colibrisson
Copy link
Contributor

I know this is a recurrent issue. It's funny because I've never had this problem with columns in Chinese. Is it an issue with the way Shapely computes polygons?

@colibrisson
Copy link
Contributor

Which Shapely version do you have installed?

@dstoekl
Copy link
Author

dstoekl commented Sep 11, 2023

@lauxley what is on the stack on msia pls?

@colibrisson
Copy link
Contributor

colibrisson commented Sep 11, 2023

Last time Shapely>=2.0 gave me overlapping polygons (see #430). However, this affected all polygons, not just the last line. Until it got fixed in eScript we used to export the dataset and recalculate all the polygons using another Shapely version.

@PonteIneptique
Copy link
Contributor

Just adding a little note: given that all or a lot of data produced during the early days of eScriptorium had masks like that (top ones and bottom ones), I am wondering if the new model, when we are using new model, just simply learned that bottom and top have higher masks. ie is it the data or the polygon :)

@colibrisson
Copy link
Contributor

@PonteIneptique only the baseline coordinates and the image feature (and optionally other polygons) are taken into account during the polygon computation:

def calculate_polygonal_environment(im: PIL.Image.Image = None,
baselines: Sequence[Sequence[Tuple[int, int]]] = None,
suppl_obj: Sequence[Sequence[Tuple[int, int]]] = None,
im_feats: np.ndarray = None,
scale: Tuple[int, int] = None,
topline: bool = False,
raise_on_error: bool = False):

@PonteIneptique
Copy link
Contributor

PonteIneptique commented Sep 11, 2023

I stand corrected then ;) Thanks @colibrisson

@rohanchn
Copy link

rohanchn commented Oct 13, 2023

This needs work as such cases yield poor recognition results in comparison to other lines on the same page.

Whenever I have such cases, I adjust the baseline till I get the correct masks. As such adjusted baselines are retained during further training, I usually see some improvement in automatic baselines. However this is slow, plus the masks just won't budge in cases where the line is very close to the border box such as in the image here:

afshan_1876

@alexislitvine
Copy link

@rohanchn - is there a resolution to this issue? I also notice the same issue.

@dstoekl
Copy link
Author

dstoekl commented May 5, 2024

I have some hacky API workaround that (1) calculates the average line distance (2) creates dummy lines above the top and below the bottom line of each region in the same distance and (3) repolygonizes the top and bottom lines and (4) finally deletes the dummy lines.

@alexislitvine
Copy link

ahah - thanks for this @dstoekl ! Seems like a proper solution is needed but would be happy to use your temporary solution if you are willing to share it :-)

@dstoekl
Copy link
Author

dstoekl commented May 5, 2024

Sure. Just shared a colab with you. have a look at the following function among the complex ones: restrict_first_and_last_line_polygon_according_to_average_line_height.

@alexislitvine
Copy link

Thanks so much! @mittagessen - is there any chance we could do something more robust to solve this problem?

@mittagessen
Copy link
Owner

mittagessen commented May 5, 2024 via email

@mirkh
Copy link

mirkh commented May 6, 2024

Hi, we mostly have problems with top lines, and I am looking for any tips in avoiding overlapping polygons. For example if anyone has a script that recalculates overlapping polygons or tips for using other versions of modules etc. Sorry about posting in probably the wrong thread. Edit: to add that we use Kraken 4.2.
Skärmbild 2024-05-06 084812
/ Maria

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants