Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tika server 2.9.1 Pdf tesseract Ocr #406

Open
Tarik37 opened this issue Mar 30, 2024 · 0 comments
Open

Tika server 2.9.1 Pdf tesseract Ocr #406

Tarik37 opened this issue Mar 30, 2024 · 0 comments

Comments

@Tarik37
Copy link

Tarik37 commented Mar 30, 2024

Hello,
The beginner that i am need your help, i use tika server to extract meta and text with ocr strategy auto on native pdf documents no problem as thé process Time is low but on scanned pdf files (hundreds pages) i hit the timeout of thé request throught python or curl.
Is their a way to config tika-config.yml file to make the thé ocr process all the pages with strategy auto.
Thks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant