Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.
text
dataset
ground-truth
text-data
binarization
ocr-database
text-database
old-books
old-documents
books-dataset
ocr-dataset
binarized-dataset
groundtruth
-
Updated
Aug 25, 2017 - HTML