Training Suggestions for Cyrillic + English #82

eschaffn · 2023-04-13T15:17:49Z

Would it be possible to get training recommendations w.r.t data and parameters?

I'm trying to retrain parseq with a new character set consisting of both Latin (English alphabet) and Cyrillic (Russian alphabet) characters.

I have about 3500 custom image samples that I created by running detection and then cropping out the text. Example:

I have a few questions,

Is this a suitable training image?
If I have ~3500 of images like these, both in English and Russian, how much synthetic data should I augment this with? Or do I need more real data too?
My charset is ~160 what should I set the embedding dimension too? Is 384 large enough?

Thank you, and I appreciate any suggestions you can give!

baudm · 2023-04-14T09:18:35Z

Is this a suitable training image?

Yes, this works. Just be careful about data augmentation. You might want to reduce the magnitudes first.

If I have ~3500 of images like these, both in English and Russian, how much synthetic data should I augment this with? Or do I need more real data too?

Real data is much better. Or rather, the closer the training data distribution is to the test data distribution, the better. Try using the pretrained weights, at least for the encoder.

My charset is ~160 what should I set the embedding dimension too? Is 384 large enough?

The depth of the encoder has a much bigger effect on model performance compared to the embedding dimension. But if you can use a larger number, use it. Larger models are easier to work with in the experimentation phase.

eschaffn · 2023-04-14T13:22:49Z

Real data is much better. Or rather, the closer the training data distribution is to the test data distribution, the better. Try using the pretrained weights, at least for the encoder.

How do I use just the encoder pretrained weights?
I was thinking of using ~10M synthetic images generated with (https://github.com/clovaai/synthtiger) does that seem sufficient?

The depth of the encoder has a much bigger effect on model performance compared to the embedding dimension. But if you can use a larger number, use it. Larger models are easier to work with in the experimentation phase.

What do you suggest setting the depth and embedding dimension to?

baudm · 2023-04-28T19:32:26Z

How do I use just the encoder pretrained weights?

Take a look at the examples for finetuning with PyTorch. In a nutshell, you load the model and discard the layers you want to replace (in this case, the decoder).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Suggestions for Cyrillic + English #82

Training Suggestions for Cyrillic + English #82

eschaffn commented Apr 13, 2023

baudm commented Apr 14, 2023

eschaffn commented Apr 14, 2023 •

edited

baudm commented Apr 28, 2023

Training Suggestions for Cyrillic + English #82

Training Suggestions for Cyrillic + English #82

Comments

eschaffn commented Apr 13, 2023

baudm commented Apr 14, 2023

eschaffn commented Apr 14, 2023 • edited

baudm commented Apr 28, 2023

eschaffn commented Apr 14, 2023 •

edited