Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version of OCR that can run entirely offline #2

Open
simonw opened this issue Mar 30, 2024 · 4 comments
Open

Version of OCR that can run entirely offline #2

simonw opened this issue Mar 30, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Mar 30, 2024

Currently https://tools.simonwillison.net/ocr loads assets from a CDN.

A version that can run offline would be fantastic. It would be a tiny bit tricky to get versions of PDF.js and Tesseract.js (and their supporting files) that work like that, but it should absolutely be possible.

Ideally offer this as a zip file for people to download and run locally.

Could it be done such that it works from opening a HTML file in a browser, rather than needing a localhost web server? I don't think that works right now, but it may be possible with a bit more thought or some weird bundler magic.

@simonw simonw added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Mar 30, 2024
@steren
Copy link

steren commented Mar 30, 2024

For maintenance and hosting simplicity, consider vendoring these dependencies.

@Lewiscowles1986
Copy link

It would be a tiny bit tricky to get versions of PDF.js and Tesseract.js (and their supporting files) that work like that

  • Chrome -> Network tab -> save as HAR -> use a tool to extract HAR -> files
  • Firefox -> Network tab -> save as HAR -> use a tool to extract HAR -> files
  • {Browser} -> Network tab -> save as HAR -> use a tool to extract HAR -> files

Then link to those files.

I Think I must be missing something here. Is this like polyfill.js where the remote CDN is detecting the browser and serving slightly altered payloads?

@matsklevstad
Copy link

Is it possible to create a version that can handle images that are upside down or rotated? If so, how?

@Lewiscowles1986
Copy link

@matsklevstad that feels like a valid, but separate issue to the thing running offline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants