skip to main content
10.1145/3151509.3151521acmotherconferencesArticle/Chapter ViewAbstractPublication PageshipConference Proceedingsconference-collections
research-article

Robust Heartbeat-based Line Segmentation Methods for Regular Texts and Paratextual Elements

Authors Info & Claims
Published:10 November 2017Publication History

ABSTRACT

We have developed a simple, but powerful extension for two well known line segmentation methods which makes them more robust when working on historical manuscripts with almost regular line spacing. Against the intuitive impression that such manuscripts are easy to be handled, existing methods and tools fail to correctly segment some columns, mainly because of empty or nearly empty lines. Since historical documents frequently do have a regular occurrence of lines it is advisable to take this knowledge into account. From a literature review, our method seems to be the only one allowing to detect willingly empty lines between text lines, i. e., lines skipped by the scribe. Such paratextual information can contain immense importance for the understanding of the layout of documents. This heartbeat can be used for filtering out irregular candidates and finally bridging eventually resulting gaps. We tested our approach using the appearance heartbeat of the text lines on two well known line segmentation methods and show that it improves significantly the result quality and simplifies parameter tuning.

References

  1. T. Breuel, "Ocropy." http://github.com/tmbdev/ocropy, 2014.Google ScholarGoogle Scholar
  2. M. Würsch, R. Ingold, and M. Liwicki, "Divaservices -- a restful web service for document image analysis methods," in Digital Humanities, (Sydney, Australia), 07/2015 2015.Google ScholarGoogle Scholar
  3. C. Clausner, S. Pletschacher, and A. Antonacopoulos, "Aletheia-an advanced document layout and text ground-truthing system for production environments," in Document Analysis and Recognition (ICDAR), 2011 International Conference on, pp. 48--52, IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Fischer, E. Indermühle, H. Bunke, G. Viehhauser, and M. Stolz, "Ground truth creation for handwriting recognition in historical documents," in Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 3--10, ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Asi, R. Saabni, and J. El-Sana, "Text line segmentation for gray scale historical document images," in Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. "Materials and Techniques of Manuscript Production." http://web.ceu.hu/medstud/manual/MMM/ruling.html. Accessed: 2017-04-05.Google ScholarGoogle Scholar
  7. E. Tov, Scribal Practices and Approaches Reflected in the Texts Found in the Judean Desert. Brill, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  8. G. Genette, Paratexts. Thresholds of interpretation. Cambridge University Press, 1997. Google ScholarGoogle ScholarCross RefCross Ref
  9. D. Byrne, "Manuscript ruling and pictorial design in the work of the limbourgs, the bedford master and the boucicaut master," The Art Bulletin, vol. 66, no. 1, pp. 118--136, 1984. Google ScholarGoogle ScholarCross RefCross Ref
  10. R. A. Millikan, "On the elementary electrical charge and the avogadro constant," Physical Review, vol. 2, no. 2, p. 109, 1913. Google ScholarGoogle ScholarCross RefCross Ref
  11. K. Y. Wong, R. G. Casey, and F. M. Wahl, "Document analysis system," IBM journal of research and development, vol. 26, no. 6, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Z. Razak, K. Zulkiflee, M. Y. I. Idris, E. M. Tamil, M. N. M. Noor, R. Salleh, M. Yaakob, Z. M. Yusof, and M. Yaacob, "Off-line handwriting text line segmentation: A review," International journal of computer science and network security, vol. 8, no. 7, pp. 12--20, 2008.Google ScholarGoogle Scholar
  13. V. Shapiro, G. Gluhchev, and V. Sgurev, "Handwritten document image segmentation and analysis," Pattern Recognition Letters, vol. 14, no. 1, pp. 71--78, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Antonacopoulos and D. Karatzas, "Document image analysis for world war ii personal records," in Document Image Analysis for Libraries, 2004. Proceedings. First International Workshop on, pp. 336--341, IEEE, 2004. Google ScholarGoogle ScholarCross RefCross Ref
  15. J. Pastor, A. Garz, R. Ingold, and M.-J. Castro-Bleda, "Combining learned script points and combinatorial optimization for text line extraction," in Proceedings of the 2015 Workshop on Historical Document Images and Processing, HIP'15, pp. 89--96, ACM, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Simistira, M. Bouillon, M. Seuret, M. Würsch, M. Alberti, R. Ingold, and M. Liwicki, "ICDAR2017 Competition on Layout Analysis for Challenging Medieval Manuscripts," in ICDAR, 2017 (to appear).Google ScholarGoogle Scholar
  17. M. Würsch, R. Ingold, and M. Liwicki, "DivaServices -- A RESTful web service for Document Image Analysis methods," Digital Scholarship in the Humanities, p. fqw051, 2016.Google ScholarGoogle Scholar

Index Terms

  1. Robust Heartbeat-based Line Segmentation Methods for Regular Texts and Paratextual Elements

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        HIP '17: Proceedings of the 4th International Workshop on Historical Document Imaging and Processing
        November 2017
        129 pages
        ISBN:9781450353908
        DOI:10.1145/3151509

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 November 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        HIP '17 Paper Acceptance Rate19of33submissions,58%Overall Acceptance Rate52of90submissions,58%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader