Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: DLL load failed while importing tesserocr: The specified module could not be found. #332

Open
joseavegaa opened this issue Oct 7, 2023 · 6 comments

Comments

@joseavegaa
Copy link

joseavegaa commented Oct 7, 2023

I used to be able to just import tesserocr and use it from any script. All of the sudden, it is now showing the error:

ImportError: DLL load failed while importing tesserocr: The specified module could not be found.

I have created a clean and new environment, using Python 3.11 and 3.12, and it fails with the same error.

Screenshot 2023-10-07 152753
Screenshot 2023-10-07 152640

I also created a new Python file that only contains the line import tesserocr and it gives the same error.

I have checked and Tesseract works as expected:

Screenshot 2023-10-07 153226

Am I missing something that recently changed and broke backwards compatibility or why is this error showing up?

EDIT: I have tried on a different machine, using Windows 11, installing a fresh copy of Miniforge3 and a new env. Just installed Python 3.12, Tesseract, and Tesserocr. Still failing with the same error. Tried to follow the instructions and compile it manually, but still fails. Any help?

@joseavegaa
Copy link
Author

joseavegaa commented Oct 9, 2023

Ok, after some testing I found that there is a compatibility issue between the latest version of Tesseract in conda-forge and tesserocr.

The ones to the left are the latest versions, which end up with ImportError: DLL load failed while importing tesserocr: The specified module could not be found.:

libarchive 3.7.2-h6f8411a_0 --> 3.6.2-h6f8411a_1
tesseract 5.3.2-hb328096_1 --> 5.3.2-hae9691c_0

If you use conda install tesseract=5.3.2=hae9691c_0 to specifically install that build, the issue is gone.

This is a temporary fix, but I am not sure if the error arises from the build of Tesseract in conda-forge or if it a problem with tesserocr itself.

@sirfz
Copy link
Owner

sirfz commented Oct 10, 2023

I guess you're installing tesserocr via conda-forge? Unfortunately the tesserocr build has been broken for a while (2 or 3 versions ago). The original maintainer isn't active on it and I'm no conda user myself, if anyone wants to take up maintenance responsibilities it would be great.

@icanhasmath
Copy link

I also found that libarchive is required to run Tesseract on Windows. I think as long as you have a working libarchive on path (and it's dependencies, I had to add openssl as well), your tesseract.exe should work.

If you want to try our build of the Tesserocr stack you can pull it from here. I've tested it on Linux and Windows using the Post Install steps.

With big thanks to @sirfz for that documentation.

@zdenop
Copy link
Contributor

zdenop commented Mar 29, 2024

libarchive and curl (which needs openssl) are not needed - these are optional dependencies for tesseract.
libarchive could be used for compressed traneddata, but you find nobody use it. People prefer speed over saving space.
curl is used for opening online images by tesseract executable which is not wrapped by tesserocr.
Both features (Saving space & reading online images) could be replaced by native python functions, so adding them as a dependancy to tesserocr makes no sence.

@icanhasmath
Copy link

There is an option in Tesseract to disable libarchive DISABLE_ARCHIVE - it is set to off by default. If libarchive is not present at build time it doesn't throw an error, but the Tesseract.exe expects the dependency to be available on start-up. I will try a rebuild without these dependencies.

@icanhasmath
Copy link

Confirmed that these settings removed the need to ship extra dependencies.

For CMake:

DISABLE_ARCHIVE=ON
DISABLE_CURL=ON       

For Autotools:

--without-archive
--without-curl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants