Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

createTRAINING batch command #1149

Open
ap-mps opened this issue Aug 2, 2024 · 1 comment
Open

createTRAINING batch command #1149

ap-mps opened this issue Aug 2, 2024 · 1 comment
Labels
question There's no such thing as a stupid question

Comments

@ap-mps
Copy link

ap-mps commented Aug 2, 2024

when running this command I noticed that corresponding to a certain PDF present in the 'directory of input files' files for the header model are not generated ?

Why so and generally is there a criteria for generation of output files model wise corresponding to an input pdf?

@kermitt2
Copy link
Owner

kermitt2 commented Aug 2, 2024

Hello !

Normally it means that the PDF is image only (Grobid does not include an OCR, it has to be provided as pre-processing). Other possible explanations: encrypted PDF or corrupted PDF. Finally it's also possible that no header is detected by the segmentation model which is applied first. In the last case, it means the corrected segmentation training file has to be put first in the segmentation training and the segmentation model updated.

@lfoppiano lfoppiano added the question There's no such thing as a stupid question label Sep 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question There's no such thing as a stupid question
Projects
None yet
Development

No branches or pull requests

3 participants