Skip to content
/ Hanashi Public

A library for extracting text from manga pages

License

Notifications You must be signed in to change notification settings

Filo01/Hanashi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hanashi

A library to extract text from mangas.

OCR is done by tesseract.

Code Example

import pytesseract
from import hanashi.processor import page_processor

masks, lines, rectangles = page_processor.process(filename)
results = page_processor.extract_text(masks)
blocks_of_text = [result[0] for result in results]
print("\n----\n".join(blocks_of_text))

Prerequisites

$ sudo apt-get update
$ sudo apt-get install tesseract-ocr

Installing

First make a directory and clone the repo

$ cd ./github
$ git clone https://github.com/Filo01/Hanashi.git

Use venv and use setup.py

python setup.py install

Running Tests

To run all tests

$ python -m unittest

To run specific tests

$ python -m unittest tests.<module_name>

License

This project is licensed under the MIT License - see the LICENSE file for details

About

A library for extracting text from manga pages

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages