skip to main content
10.1145/3151509.3151522acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipConference Proceedingsconference-collections
research-article

PageNet: Page Boundary Extraction in Historical Handwritten Documents

Published:10 November 2017Publication History

ABSTRACT

When digitizing a document into an image, it is common to include a surrounding border region to visually indicate that the entire document is present in the image. However, this border should be removed prior to automated processing. In this work, we present a deep learning system, PageNet, which identifies the main page region in an image in order to segment content from both textual and non-textual border noise. In PageNet, a Fully Convolutional Network obtains a pixel-wise segmentation which is post-processed into a quadrilateral region. We evaluate PageNet on 4 collections of historical handwritten documents and obtain over 94% mean intersection over union on all datasets and approach human performance on 2 collections. Additionally, we show that PageNet can segment documents that are overlayed on top of other documents.

References

  1. Y. Y. Boykov and M. P. Jolly. 2001. Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images. In Proc. Eighth Int. Conf. on Computer Vision., Vol. 1. 105--112 vol.1. https://doi.org/10.1109/ICCV.2001.937505 Google ScholarGoogle ScholarCross RefCross Ref
  2. G. Bradski. 2000. The OpenCV Library. Dr. Dobb's Journal of Software Tools (2000).Google ScholarGoogle Scholar
  3. Syed Saqib Bukhari, Faisal Shafait, and Thomas M Breuel. 2011. Border Noise Removal of Camera-Captured Document Images Using Page Frame Detection.. In CBDAR. Springer, 126--137.Google ScholarGoogle Scholar
  4. Arpita Chakraborty and Michael Blumenstein. 2016. Marginal Noise Reduction in Historical Handwritten Documents--A Survey. In Document Analysis Systems (DAS), 2016 12th IAPR Workshop on. IEEE, 323--328.Google ScholarGoogle ScholarCross RefCross Ref
  5. Arpita Chakraborty and Michael Blumenstein. 2016. Preserving Text Content from Historical Handwritten Documents. In Document Analysis Systems (DAS), 2016 12th IAPR Workshop on. IEEE, 329--334. Google ScholarGoogle ScholarCross RefCross Ref
  6. Kai Chen and Mathias Seuret. 2017. Convolutional Neural Networks for Page Segmentation of Historical Document Images. (April 2017). arXiv:arXiv:1704.01474Google ScholarGoogle Scholar
  7. Kuo-Chin Fan, Yuan-Kai Wang, and Tsann-Ran Lay. 2002. Marginal noise removal of document images. Pattern Recognition 35, 11 (2002), 2593--2611. Google ScholarGoogle ScholarCross RefCross Ref
  8. Andreas Fischer, Volkmar Frinken, Alicia Fornés, and Horst Bunke. 2011. Transcription Alignment of Latin Manuscripts Using Hidden Markov Models. In Proc. of Workshop on Historical Document Imaging and Processing (HIP '11). ACM, New York, NY, USA, 29--36. https://doi.org/10.1145/2037342.2037348 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Andreas Fischer, Andreas Keller, Volkmar Frinken, and Horst Bunke. 2012. Lexicon-free handwritten word spotting using character HMMs. Pattern Recognition Letters 33, 7 (2012), 934--942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tobias Grüning, Roger Labahn, Markus Diem, Florian Kleber, and Stefan Fiel. 2017. READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents. arXiv preprint arXiv:1705.03311 (2017).Google ScholarGoogle Scholar
  11. Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial Transformer Networks. In Advances in Neural Information Processing Systems 28. 2017--2025.Google ScholarGoogle Scholar
  12. L Jagannathan and CV Jawahar. 2005. Perspective correction methods for camera based document analysis. In CBDAR. 148--154.Google ScholarGoogle Scholar
  13. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).Google ScholarGoogle Scholar
  14. Michael Kass, Andrew Witkin, and Demetri Terzopoulos. 1987. Snakes: Active contour models. In Proc. 1st Int. Conf. on Computer Vision, Vol. 259. 268.Google ScholarGoogle Scholar
  15. Asanobu Kitamoto. 2017. Release of PMJT character shape dataset and expectation for its usage. In Second CODH Seminar: Old Japanese Character Challenge - Future of Machine Recognition and Human Transcription -. https://doi.org/10.20676/00000004Google ScholarGoogle Scholar
  16. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proc. of Conf. on Computer Vision and Pattern Recognition. 3431--3440. Google ScholarGoogle ScholarCross RefCross Ref
  17. Eric N Mortensen and William A Barrett. 1995. Intelligent scissors for image composition. In ACM SIGGRAPH 1995 Papers. ACM, 191--198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Carsten Rother, Vladimir Kolmogorov, and Andrew Blake. 2004. "GrabCut": Interactive Foreground Extraction Using Iterated Graph Cuts. In ACM SIGGRAPH 2004 Papers. 309--314. https://doi.org/10.1145/1186562.1015720 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Faisal Shafait and Thomas M Breuel. 2009. A simple and effective approach for border noise removal from document images. In IEEE 13th International Multitopic Conference (INMIC). IEEE, 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  20. Faisal Shafait, Joost Van Beusekom, Daniel Keysers, and Thomas M Breuel. 2008. Document cleanup using page frame detection. IJDAR 11, 2 (2008), 81--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Nikolaos Stamatopoulos, Basilios Gatos, and Thodoris Georgiou. 2010. Page frame detection for double page document images. In DAS. ACM, 401--408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chris Tensmeyer and Tony Martinez. 2017. Document Image Binarization with Fully Convolutional Neural Networks. (2017). arXiv:arXiv:1708.03276Google ScholarGoogle Scholar
  23. Godfried T Toussaint. 1983. Solving geometric problems with the rotating calipers. In Proc. IEEE Melecon, Vol. 83. A10.Google ScholarGoogle Scholar
  24. Shih-Jui Yang, Chian C Ho, Jian-Yuan Chen, and Chuan-Yu Chang. 2012. Practical Homography-based perspective correction method for License Plate Recognition. In Int. Conf. on Information Security and Intelligence Control (ISIC). IEEE, 198--201. Google ScholarGoogle ScholarCross RefCross Ref
  25. Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip HS Torr. 2015. Conditional random fields as recurrent neural networks. In Proc. of the Int. Conf. on Computer Vision. 1529--1537. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PageNet: Page Boundary Extraction in Historical Handwritten Documents

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          HIP '17: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing
          November 2017
          129 pages
          ISBN:9781450353908
          DOI:10.1145/3151509

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 November 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited

          Acceptance Rates

          HIP '17 Paper Acceptance Rate19of33submissions,58%Overall Acceptance Rate52of90submissions,58%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader