ABSTRACT
We have developed a simple, but powerful extension for two well known line segmentation methods which makes them more robust when working on historical manuscripts with almost regular line spacing. Against the intuitive impression that such manuscripts are easy to be handled, existing methods and tools fail to correctly segment some columns, mainly because of empty or nearly empty lines. Since historical documents frequently do have a regular occurrence of lines it is advisable to take this knowledge into account. From a literature review, our method seems to be the only one allowing to detect willingly empty lines between text lines, i. e., lines skipped by the scribe. Such paratextual information can contain immense importance for the understanding of the layout of documents. This heartbeat can be used for filtering out irregular candidates and finally bridging eventually resulting gaps. We tested our approach using the appearance heartbeat of the text lines on two well known line segmentation methods and show that it improves significantly the result quality and simplifies parameter tuning.
- T. Breuel, "Ocropy." http://github.com/tmbdev/ocropy, 2014.Google Scholar
- M. Würsch, R. Ingold, and M. Liwicki, "Divaservices -- a restful web service for document image analysis methods," in Digital Humanities, (Sydney, Australia), 07/2015 2015.Google Scholar
- C. Clausner, S. Pletschacher, and A. Antonacopoulos, "Aletheia-an advanced document layout and text ground-truthing system for production environments," in Document Analysis and Recognition (ICDAR), 2011 International Conference on, pp. 48--52, IEEE, 2011. Google ScholarDigital Library
- A. Fischer, E. Indermühle, H. Bunke, G. Viehhauser, and M. Stolz, "Ground truth creation for handwriting recognition in historical documents," in Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 3--10, ACM, 2010. Google ScholarDigital Library
- A. Asi, R. Saabni, and J. El-Sana, "Text line segmentation for gray scale historical document images," in Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, ACM, 2011. Google ScholarDigital Library
- "Materials and Techniques of Manuscript Production." http://web.ceu.hu/medstud/manual/MMM/ruling.html. Accessed: 2017-04-05.Google Scholar
- E. Tov, Scribal Practices and Approaches Reflected in the Texts Found in the Judean Desert. Brill, 2004.Google ScholarCross Ref
- G. Genette, Paratexts. Thresholds of interpretation. Cambridge University Press, 1997. Google ScholarCross Ref
- D. Byrne, "Manuscript ruling and pictorial design in the work of the limbourgs, the bedford master and the boucicaut master," The Art Bulletin, vol. 66, no. 1, pp. 118--136, 1984. Google ScholarCross Ref
- R. A. Millikan, "On the elementary electrical charge and the avogadro constant," Physical Review, vol. 2, no. 2, p. 109, 1913. Google ScholarCross Ref
- K. Y. Wong, R. G. Casey, and F. M. Wahl, "Document analysis system," IBM journal of research and development, vol. 26, no. 6, 1982. Google ScholarDigital Library
- Z. Razak, K. Zulkiflee, M. Y. I. Idris, E. M. Tamil, M. N. M. Noor, R. Salleh, M. Yaakob, Z. M. Yusof, and M. Yaacob, "Off-line handwriting text line segmentation: A review," International journal of computer science and network security, vol. 8, no. 7, pp. 12--20, 2008.Google Scholar
- V. Shapiro, G. Gluhchev, and V. Sgurev, "Handwritten document image segmentation and analysis," Pattern Recognition Letters, vol. 14, no. 1, pp. 71--78, 1993. Google ScholarDigital Library
- A. Antonacopoulos and D. Karatzas, "Document image analysis for world war ii personal records," in Document Image Analysis for Libraries, 2004. Proceedings. First International Workshop on, pp. 336--341, IEEE, 2004. Google ScholarCross Ref
- J. Pastor, A. Garz, R. Ingold, and M.-J. Castro-Bleda, "Combining learned script points and combinatorial optimization for text line extraction," in Proceedings of the 2015 Workshop on Historical Document Images and Processing, HIP'15, pp. 89--96, ACM, 2015. Google ScholarDigital Library
- F. Simistira, M. Bouillon, M. Seuret, M. Würsch, M. Alberti, R. Ingold, and M. Liwicki, "ICDAR2017 Competition on Layout Analysis for Challenging Medieval Manuscripts," in ICDAR, 2017 (to appear).Google Scholar
- M. Würsch, R. Ingold, and M. Liwicki, "DivaServices -- A RESTful web service for Document Image Analysis methods," Digital Scholarship in the Humanities, p. fqw051, 2016.Google Scholar
Index Terms
- Robust Heartbeat-based Line Segmentation Methods for Regular Texts and Paratextual Elements
Recommendations
Robust line matching through line-point invariants
This paper is about line matching by line-point invariants which encode local geometric information between a line and its neighboring points. Specifically, two kinds of line-point invariants are introduced in this paper, one is an affine invariant ...
Image-based Transmission Line Detection on Finite Element Line Segments
EITCE '20: Proceedings of the 2020 4th International Conference on Electronic Information Technology and Computer EngineeringDetection of transmission line is an important and challenging topic in low altitude flight. Different from the traditional transmission line detection method, in which the transmission line is detected based on the straight line, in order to extract ...
A Robust Lane Detection Method Based on Vanishing Point Estimation Using the Relevance of Line Segments
In this paper, a robust lane detection method based on vanishing point estimation is proposed. Estimating a vanishing point can be helpful in detecting lanes, because parallel lines converge on the vanishing point in a projected 2-D image. However, it ...
Comments