New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom Trained OCR model: Mismatch Error in Character Set Size Leading #1226
Comments
Could the error be in the CTCLabelConverter because I use Attn instead of CTC for prediction? and accordingly use the AttnLabelConverter from the deep-text-recognition-benchmark repository in my training? I looked more into this and it seems that the `def get_recognizer(recog_network, network_params, character,
Can't I just use my TPS-ResNet-BiLSTM-Attn Model trained with the deep-text-recognition benchmark model here out of the box with the easyocr package using easy.ocr(recog_network=...)? |
I've trained an OCR model on a specialized dataset by following the methodology outlined in the README of the deep-text-recognition-benchmark repository. My setup includes the model's architecture defined in
my_model.py
, alongside themy_model.pth
andmy_model.yaml
files.Currently, I'm encountering an issue related to the character set used for training, which consists of 44 characters. Specifying the identical character set in the
.yaml
file triggers the followingRuntimeError
when initializing the reader witheasyocr.Reader(['en'], recogn_network='my_model')
, pointing to a discrepancy in the torch tensor dimensions by one:I found a workaround solution by incrementing the character set size by adding a additional character (including a leading whitespace), which allows the model to generate predictions. However, this adjustment negatively impacts the prediction accuracy compared to using my model directly with the deep-text-recognition benchmark.
The modification causes the model to produce significantly longer strings than expected most of the time. For instance:
The image text
AA BB 123
is predicted asAA BB 123 123 123 123
usingreader.recognize(img, allowlist=allow_list, detail=1)
[0]In contrast, direct predictions with my model output the correct
AA BB 123
.This issue leads me to believe that the adjustment might be interfering with the recognition of the stop character.
Could anyone provide insights or suggestions on how to address this problem?
Any guidance or advice would be greatly appreciated.
The text was updated successfully, but these errors were encountered: