Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending to manhwa #2

Open
levavft opened this issue Apr 24, 2023 · 8 comments
Open

Extending to manhwa #2

levavft opened this issue Apr 24, 2023 · 8 comments

Comments

@levavft
Copy link

levavft commented Apr 24, 2023

Hi~ I was about to embark on creating something similar for manhwa's and then I found this very nice project.
backend-wise it should be really easy to extend this, for example using pytesseract.
I would love to create anything you need for the backend, I've created a local version of manga-ocr that uses pytesseract and its simple enough, I just don't really know how to embed it into your project as it is a bit more involved.

using pytesseract it should be possible to extend to other languages as well, easily. of course, its not as good as manga-ocr's ocr (I couldn't found good databases that I could use to copy their approach) but setting pytesseract as a default when there isn't anything better should be great :)

so, basically, tell me how to contribute and I'll have a pull request ready in no time ;p

@rDarge
Copy link

rDarge commented May 25, 2023

I've been thinking about trying to add additional backend options too - like adding an option to translate with DeepL instead of ChatGPT. If you're not sure how to contribute, but you have a stable fork of manga-ocr that uses pytesseract, can you upload it to a public repo? I'd be happy to take a look and provide some suggestions/support.

@levavft
Copy link
Author

levavft commented May 26, 2023

Alright, I'll make my version a bit more stable and clean, and upload it ^^ should take a few days at most.
I think it might be a good idea to also add a google cloud ocr option for those who have a google cloud key, I'll see what I can do ;p

@rDarge
Copy link

rDarge commented Jun 1, 2023

@levavft Just wanted to check in on this - Have you made some headway in your pytesseract fork?

@K-RT-Dev
Copy link
Owner

K-RT-Dev commented Jun 2, 2023

Thank you very much for your enthusiasm in contributing to the project :)

Unfortunately, I have had very little time to work on this side project. But I can tell you that in the next version, I will add:

  • Support for DeepL (using an API Key) and Google (without an API Key) as translators
  • Options to perform the same translation in multiple translation engines simultaneously
  • Option for GPT to take translations generated by different engines and combine them into an improved one

@rDarge The improvement to incorporate DeepL is almost ready. If you haven't started development yet, don't waste time on that.

@levavft A while ago, I had another project similar to this one (which I closed) that used pytesseract. I ended up abandoning it because creating an installable version with a moderately small size was impossible, very difficult to achieve. It would be very interesting and a great contribution if you manage to generate a Python installable that has pytesseract as a dependency.

@rDarge
Copy link

rDarge commented Jun 2, 2023

@K-RT-Dev Great! I'll create some additional issues for the other changes I've been working on so I can make sure we've got alignment before I put up a PR

@levavft
Copy link
Author

levavft commented Jun 3, 2023

Hey @K-RT-Dev and @rDarge sorry for the delayed response. I've been testing what I have against less clean text, and its awful. (My personal use case is pretty clean text). Specifically - Korean manhwa text often has bubbly letters, which pytesseract simply can't read. So to be honest I'm feeling like spending more time on tesseract might be a waste of time. Instead, it might be good to use google ocr, especially since you essentially get to use it for free if you're not planning on making money from it. I have no idea if it does better on such text (I haven't tested it at all) but at the very least it should be much easier to use and install.

The things you're currently working on sound great! I can't wait to see them in action.
I'll list some ideas that I had while playing with my tesseract version, and if something catches your eye I might spend some time on it (though, like you I seem to have somewhat reduced capacity for side projects ><)

Adding an option to view a page and create a list of bounding boxes to it, similar to this:
https://github.com/manisandro/gImageReader

assuming you like the previous idea, you can use chatgpt on the text as a whole. which should allow it to be much more natural (especially if you specifically ask chatgpt to make it sound natural)

using a spell checker to rate the quality of different ocr results (from different engines / with different pre-processing steps) and choosing the best one.

Anyhow, keep us updates and I'll update if I'll have something worth sharing ^^

@Kromtar
Copy link

Kromtar commented Jun 3, 2023

@levavft Could you share the set of images you're using to test text extraction? I have some models I could try to see their performance.

Adding an option to view a page and create a list of bounding boxes to it, similar to this: https://github.com/manisandro/gImageReader

We are aligned in our ideas. Precisely, the second mode of operation I am planning to integrate into the system consists of this.
My idea is as follows:

  1. From a page, be able to manually select the order of the texts.
  2. Have an option to write free text to describe what is happening in the accompanying images.
  3. Send the corresponding text sequence to GPT and add the personalized context if it is present.

I have conducted manual tests using this method, and the results are incredible. When GPT has the complete text from one or more narrators, it can infer dialogue exchanges much better. Additionally, if it has context describing what is happening (for example, "People are talking while they see a landscape"), it helps in identifying pronouns and verb tenses more accurately.

@levavft
Copy link
Author

levavft commented Jun 4, 2023

@Kromtar Sure! Here are the ones I had the most trouble with:
https://github.com/levavft/manhwa-ocr-test-files

Good to see people are on the same page, this could become a very nice tool for translators or just those who want to read manhwa ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants