Skip to content

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

License

Notifications You must be signed in to change notification settings

hiarindam/document-image-classification-TL-SG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

Contributors: Arindam Das, Saikat Roy, Ujjwal Bhattacharya, S.K. Parui

This research work has been made available here.

This page is published with intention to provide region based pre-trained models for document image classification for document structure learning. For using weight matrices, please note that we used theano as the backend for all our experiments hence everything is ordered per theano's style.

Please cite our work if you find it useful for you research.

@inproceedings{das2018document,
  title={Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks},
  author={Das, Arindam and Roy, Saikat and Bhattacharya, Ujjwal and Parui, Swapan K},
  booktitle={2018 24th International Conference on Pattern Recognition (ICPR)},
  pages={3180--3185},
  year={2018},
  organization={IEEE}
}

Theano to Tensorflow Weight Convertor

There has been an ongoing issue by users unable to use (properly load) the weights in tensorflow using a convertor or otherwise since the version of theano and keras used for this project was pretty old (late 2017/early 2018). Please also look at the section on preprocessing the input. This section deals with weight conversion from theano to tensorflow. This particular module was developed by Auke Zijlstra ([email protected]) and although he was unable to replicate the exact results we had using this script, he did get things working. We provide excerpts from his communication with us on the usage of the script.

"... Although I have not been able to fully replicate your results, I have been able to achieve 0.87 accuracy score on the RVL-CDIP test set using your holistic model weights with a Keras+tensorflow setup. My steps to convert your Theano ordered weights into Tensorflow ordering were as follows:

Hopefully this gives a way forward for people having issues using our weights for newer versions of keras, theano, tensorflow and the like.

Detailed Guide for Tensorflow 2.0

Martin H. Normark was nice enough to provide a detailed guide for running the models with Tensorflow 2.0.

Dataset

RVL-CDIP has been used to validate the proposed methodology. This dataset consists of 400000 scanned grayscale images distributed among 16 categories. Also this collection is subdivided into training, validation and test sets each containing 320000, 40000 and 40000 images respectively.

Preprocessing

Please look at this comment to see a small example on how to preprocess the input for the networks.

Proposed Architecture

Experimental Results

Performance Comparison with State-of-the-art Approaches
Method Accuracy(%) Comments
Harley et al. [1] 89.90 Document region based DCNN models with transfer learning
Tensmeyer et al. [2] 89.31 Spatial pyramidal pooling based AlexNet without transfer learning
Tensmeyer et al. [2] 90.94 Same model as above with increased image dimension (384X384) keeping aspect ratio same
Csurka et al. [3] 90.70 GoogleNet with weights transferred from ImageNet
Afzal et al. [4] 90.97 VGG-16 with weights transferred from ImageNet
Kölsch et al. [5] 90.05 Weights transferred from ImageNet to VGG-16 and adding ELM in place of MLP
Proposed 91.11 VGG-16 model trained on holistic samples with weights transferred from ImageNet
Proposed 92.21 Inter and intra domain transfer learning on region based DCNNs and MLNN based stacking

Pre-trained Models

Trained models in this publication have been made available here. Please note that all weight matrices are formatted with theano as a background and not tensorflow. That also includes theano style input dimension ordering.

References

[1] A. W. Harley, A. Ufkes, and K. G. Derpanis, “Evaluation of deep convolutional nets for document image classification and retrieval,” in Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 2015, pp. 991–995.

[2] C. Tensmeyer and T. Martinez, “Analysis of convolutional neural networks for document image classification,” arXiv preprint arXiv:1708.03273, 2017.

[3] G. Csurka, D. Larlus, A. Gordo, and J. Almazan, “What is the right way to represent document images?” arXiv preprint arXiv:1603.01076, 2016.

[4] M. Z. Afzal, A. K¨olsch, S. Ahmed, and M. Liwicki, “Cutting the error by half: Investigation of very deep cnn and advanced training strategies for document image classification,” arXiv preprint arXiv:1704.03557, 2017.

[5] Andreas Kölsch, Muhammad Zeshan Afzal, Markus Ebbecke, Marcus Liwicki, "Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification", arXiv preprint arXiv:1704.03557, 2017.