You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OCRmyPDF is the leading command line "PDF to OCR-PDF" tool. Most image to text tools like EasyOCR and Tesseract focus on image to text conversion, to avoid the complexities of PDF - so if you have a PDF that you want to apply to OCR, you have to manually convert it to some other format. OCRmyPDF takes care of all those conversion details, using lossless conversions and meticulous attention to edge cases.
To date, OCRmyPDF has used Tesseract OCR almost exclusively. I've now created a plugin that adds support for using EasyOCR as the OCR engine.
When the plugin is installed to a virtual environment that contains OCRmyPDF, EasyOCR will be used instead of Tesseract where possible.
Currently, Tesseract is still used for page deskew determination, page rotation detection, and a few other functions. If anyone has thoughts on how to use EasyOCR or related ML models for the functions above without Tesseract, I'd be happy to incorporate them.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
OCRmyPDF is the leading command line "PDF to OCR-PDF" tool. Most image to text tools like EasyOCR and Tesseract focus on image to text conversion, to avoid the complexities of PDF - so if you have a PDF that you want to apply to OCR, you have to manually convert it to some other format. OCRmyPDF takes care of all those conversion details, using lossless conversions and meticulous attention to edge cases.
To date, OCRmyPDF has used Tesseract OCR almost exclusively. I've now created a plugin that adds support for using EasyOCR as the OCR engine.
When the plugin is installed to a virtual environment that contains OCRmyPDF, EasyOCR will be used instead of Tesseract where possible.
Currently, Tesseract is still used for page deskew determination, page rotation detection, and a few other functions. If anyone has thoughts on how to use EasyOCR or related ML models for the functions above without Tesseract, I'd be happy to incorporate them.
Beta Was this translation helpful? Give feedback.
All reactions