ABSTRACT
When digitizing a document into an image, it is common to include a surrounding border region to visually indicate that the entire document is present in the image. However, this border should be removed prior to automated processing. In this work, we present a deep learning system, PageNet, which identifies the main page region in an image in order to segment content from both textual and non-textual border noise. In PageNet, a Fully Convolutional Network obtains a pixel-wise segmentation which is post-processed into a quadrilateral region. We evaluate PageNet on 4 collections of historical handwritten documents and obtain over 94% mean intersection over union on all datasets and approach human performance on 2 collections. Additionally, we show that PageNet can segment documents that are overlayed on top of other documents.
- Y. Y. Boykov and M. P. Jolly. 2001. Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images. In Proc. Eighth Int. Conf. on Computer Vision., Vol. 1. 105--112 vol.1. https://doi.org/10.1109/ICCV.2001.937505 Google ScholarCross Ref
- G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000).Google Scholar
- Syed Saqib Bukhari, Faisal Shafait, and Thomas M Breuel. 2011. Border Noise Removal of Camera-Captured Document Images Using Page Frame Detection.. In CBDAR. Springer, 126--137.Google Scholar
- Arpita Chakraborty and Michael Blumenstein. 2016. Marginal Noise Reduction in Historical Handwritten Documents--A Survey. In Document Analysis Systems (DAS), 2016 12th IAPR Workshop on. IEEE, 323--328.Google ScholarCross Ref
- Arpita Chakraborty and Michael Blumenstein. 2016. Preserving Text Content from Historical Handwritten Documents. In Document Analysis Systems (DAS), 2016 12th IAPR Workshop on. IEEE, 329--334. Google ScholarCross Ref
- Kai Chen and Mathias Seuret. 2017. Convolutional Neural Networks for Page Segmentation of Historical Document Images. (April 2017). arXiv:arXiv:1704.01474Google Scholar
- Kuo-Chin Fan, Yuan-Kai Wang, and Tsann-Ran Lay. 2002. Marginal noise removal of document images. Pattern Recognition 35, 11 (2002), 2593--2611. Google ScholarCross Ref
- Andreas Fischer, Volkmar Frinken, Alicia Fornés, and Horst Bunke. 2011. Transcription Alignment of Latin Manuscripts Using Hidden Markov Models. In Proc. of Workshop on Historical Document Imaging and Processing (HIP '11). ACM, New York, NY, USA, 29--36. https://doi.org/10.1145/2037342.2037348 Google ScholarDigital Library
- Andreas Fischer, Andreas Keller, Volkmar Frinken, and Horst Bunke. 2012. Lexicon-free handwritten word spotting using character HMMs. Pattern Recognition Letters 33, 7 (2012), 934--942. Google ScholarDigital Library
- Tobias Grüning, Roger Labahn, Markus Diem, Florian Kleber, and Stefan Fiel. 2017. READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents. arXiv preprint arXiv:1705.03311 (2017).Google Scholar
- Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial Transformer Networks. In Advances in Neural Information Processing Systems 28. 2017--2025.Google Scholar
- L Jagannathan and CV Jawahar. 2005. Perspective correction methods for camera based document analysis. In CBDAR. 148--154.Google Scholar
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).Google Scholar
- Michael Kass, Andrew Witkin, and Demetri Terzopoulos. 1987. Snakes: Active contour models. In Proc. 1st Int. Conf. on Computer Vision, Vol. 259. 268.Google Scholar
- Asanobu Kitamoto. 2017. Release of PMJT character shape dataset and expectation for its usage. In Second CODH Seminar: Old Japanese Character Challenge - Future of Machine Recognition and Human Transcription -. https://doi.org/10.20676/00000004Google Scholar
- Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proc. of Conf. on Computer Vision and Pattern Recognition. 3431--3440. Google ScholarCross Ref
- Eric N Mortensen and William A Barrett. 1995. Intelligent scissors for image composition. In ACM SIGGRAPH 1995 Papers. ACM, 191--198.Google ScholarDigital Library
- Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. "GrabCut": Interactive Foreground Extraction Using Iterated Graph Cuts. In ACM SIGGRAPH 2004 Papers. 309--314. https://doi.org/10.1145/1186562.1015720 Google ScholarDigital Library
- Faisal Shafait and Thomas M Breuel. 2009. A simple and effective approach for border noise removal from document images. In IEEE 13th International Multitopic Conference (INMIC). IEEE, 1--5.Google ScholarCross Ref
- Faisal Shafait, Joost Van Beusekom, Daniel Keysers, and Thomas M Breuel. 2008. Document cleanup using page frame detection. IJDAR 11, 2 (2008), 81--96. Google ScholarDigital Library
- Nikolaos Stamatopoulos, Basilios Gatos, and Thodoris Georgiou. 2010. Page frame detection for double page document images. In DAS. ACM, 401--408. Google ScholarDigital Library
- Chris Tensmeyer and Tony Martinez. 2017. Document Image Binarization with Fully Convolutional Neural Networks. (2017). arXiv:arXiv:1708.03276Google Scholar
- Godfried T Toussaint. 1983. Solving geometric problems with the rotating calipers. In Proc. IEEE Melecon, Vol. 83. A10.Google Scholar
- Shih-Jui Yang, Chian C Ho, Jian-Yuan Chen, and Chuan-Yu Chang. 2012. Practical Homography-based perspective correction method for License Plate Recognition. In Int. Conf. on Information Security and Intelligence Control (ISIC). IEEE, 198--201. Google ScholarCross Ref
- Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip HS Torr. 2015. Conditional random fields as recurrent neural networks. In Proc. of the Int. Conf. on Computer Vision. 1529--1537. Google ScholarDigital Library
Index Terms
PageNet: Page Boundary Extraction in Historical Handwritten Documents
Recommendations
Multi-page document analysis based on format consistency and clustering
In multi-page documents, document elements belonging to the same component usually share format regularity. We call this regularity 'document component intrinsic format consistency' (DCIFC). We present a new document analysis method based on DCIFC, ...
Document cleanup using page frame detection
When a page of a book is scanned or photocopied, textual noise (extraneous symbols from the neighboring page) and/or non-textual noise (black borders, speckles, ...) appear along the border of the document. Existing document analysis methods can handle ...
The lifecycle of a digital historical document: structure and content
DocEng '04: Proceedings of the 2004 ACM symposium on Document engineeringThis paper describes the lifecycle of a digital historical document, from template-based structure definition through to content extraction from the scanned pages and its final reconstitution as an electronic document (combining content and semantic ...
Comments