-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameters to improve region training #41
Comments
Can you try and visualize the results using the visualizer.py. That way we can better see what the results are doing, as well as what the GT looks like. Maybe I can give suggestions based on that. Also when you say far from ideal, what sort of issues are you running into |
Thank you very much for the images. I'm not sure why the visualization tool doesn't work it might be because the docker doesn't have a graphic interface. In that case, run with the --save flag (I think. Check the -h flag to be certain). But it will give similar images to what you have sent. For the problem of regions being both purple and red. Is there anything that distinguishes these classes except the textual data. If the only way to tell them apart is through the text, laypa alone will not be enough. Since it doesn't actually read the text and does it mainly based on layout. For example could you tell these regions apart when squinting? If not then you'll probably need to do something with the text itself. You can try to combine these region classes and then post processing for example. If they are visually distinct we'll have to look deeper at the problem. For the problem of the regions not being apart there might be something. First have a look at the GT if a lot of whitespace is labeled as being part of a class. This has proven to be a major reason for why whitespace is incorrectly assigned. But also know that this is just a problem that can occur when pixels are labeled incorrect. If you know that they will all look like this, you should also have a look at instance segmentation. That unfortunately is not completely finished in laypa. But when working with separated blocks of text might work very well. |
Ok, if that is the context than it seems plausible to me that you can indeed make these predictions without the text information. But what you are doing seems correct, so I can't pinpoint something that will definitely improve the model. I have seen this mixing of regions, but in that case they were only really different in the type of text they contained.
Just so I'm clear what would the GT for this type of data look like?
That is quite a lot of whitespace. But considering it is mostly horizontal, I'm not sure how much impact it will be. Also the boundaries of the box are fairly well defined due to the black border. You could experiment with assigning less whitespace, but considering how much work this might be we can maybe first try something else. Another idea I had was to change the scale at which the prediction is done. I think you are currently using 1024 for the smallest side? At least that was the default value. You can try to experiment with this value. Or change the resize mode to be |
Nice to see that kraken performs so well 👍 Maybe it's more suited to your particular problem. But I don't think the methods are that dissimilar, so why it performs better is something I'm very interested in. But that's for me to figure out 😄
Yes, This is done using the |
Hi again, I have now experimented with the RESIZE parameter, which unfortunately didn't influence the result. Will be grateful for any other ideas :) Some further observations: kraken segmentation performs better in terms of detecting regions, but is quite worse than Laypa in terms of drawing correct baselines. |
Thank you for your observations. I'm not sure what to do next to improve results on your side. Maybe it's possible to combine the results from Kraken and Laypa. I am gonna try to see if implementing the loss method used in Kraken (multiple BCE loss) might improve the region results of Laypa. However, I'm not sure when I'll have time for this, and when it will be finished (vacation also coming up 😄). |
Vacation is important :) Thanks for all your help and have a good rest! |
Are there any parameters in the config.yaml file that I can change to improve the training of regions?
Right now I have reached the total_loss score of 0.1351 when running 20000 iterations on 307 training images and 96 validation images, and if I just increase the number of max iterations or lower the learning rate, it doesn't influence the result much. Also, the total_loss score seems to be really good, yet when trying to recognize the regions inside the Loghi pipeline, it's far from ideal.
The text was updated successfully, but these errors were encountered: