EXTRACT is an optical character recognition engine for various operating systems which extracts texts from an image and converts them to plain text.
This model is a very primitive form of the original google tesseract which extracts texts from an image and converts them to plain text.
- os
- numpy
- PIL
- sys
- keras
- cropyble
- cv2
- shutil
a) Extracts text from input image
b) Works on lowercase, uppercase, numbers and special characters.
c) Saves the output in output.txt to allow search.
NOTE1:- The trained model is not provided. So for the very first time run the script as it is. Once the model is trained: COMMENT OUT 'Train_Model()' then run the script for further use.
Run the script on your terminal: 'python3 tesseract.py':
Input Image | Output |
---|---|
- Akarsh Malik
- Angad Ripudaman Singh Bajwa
- To add characters of your own, make sure to add them in the train and test dataset
- Change the output of the softmax layer in Train_Model function to the total number of trained characters.
- Re-train the model
- Test your image