bbz-segment

00_demo_data gives sample data that can be used to run the script in 02_preprocessing. Our full annotated data that was used in the paper can be found on Dropbox.
01_selection contains a random page selection script.
02_preprocessing contains the full pipeline used to postprocess the ground truth (before DNN training).
03_training contains the code used to train the DNN networks. Note that train.py contains AdamW optimizer code copied from https://github.com/OverLordGoldDragon/keras-adamw.
04_evaluation contains various scripts for evaluating performance, as well as our raw data (as sacred runs, see 04_evaluation/data).
05_prediction gives scripts for running our final models for prediction (see graphics below for the demo result). To run it yourself on on this or other document images, first download the models from Dropbox and move them to 05_prediction/data/models. Then run 05_prediction/src/main.py to predict the files in 05_prediction/data/pages. Note that you need to have numpy, tensorflow and segmentation_models installed.

Demo Page

Legend. Red: Background, Orange: Horizontal Separators, Green: Vertical Separators, Blue: Table Column Separators.

Legend. Red: Background, Blue: Text Region, Orange: Table Region, Green: Illustrations/Borders.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
00_demo_data/corpus/0000		00_demo_data/corpus/0000
01_selection		01_selection
02_preprocessing		02_preprocessing
03_training		03_training
04_evaluation		04_evaluation
05_prediction		05_prediction
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md