research-article

Ground truth creation for handwriting recognition in historical documents

Authors:
Andreas Fischer

Institute of Computer Science and Applied Mathematics, Bern, Switzerland

Institute of Computer Science and Applied Mathematics, Bern, Switzerland
View Profile

,
Emanuel Indermühle

Institute of Computer Science and Applied Mathematics, Bern, Switzerland

Institute of Computer Science and Applied Mathematics, Bern, Switzerland
View Profile

,
Horst Bunke

Institute of Computer Science and Applied Mathematics, Bern, Switzerland

Institute of Computer Science and Applied Mathematics, Bern, Switzerland
View Profile

,
Gabriel Viehhauser

Institut für Germanistik, CH, Bern

Institut für Germanistik, CH, Bern
View Profile

,
Michael Stolz

Institut für Germanistik, CH, Bern

Institut für Germanistik, CH, Bern
View Profile

DAS '10: Proceedings of the 9th IAPR International Workshop on Document Analysis SystemsJune 2010Pages 3–10https://doi.org/10.1145/1815330.1815331

Published:09 June 2010Publication History

DAS '10: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems

Pages 3–10

ABSTRACT

Handwriting recognition in historical documents is vital for the creation of digital libraries. The creation of readily available ground truth data plays a central role for the development of new recognition technologies. For historical documents, ground truth creation is more difficult and time-consuming when compared with modern documents. In this paper, we present a semi-automatic ground truth creation proceeding for historical documents that takes into account noisy background and transcription alignment. The proposed ground truth creation is demonstrated for the IAM Historical Handwriting Database (IAM-HistDB) that is currently under construction and will include several hundred Old German manuscripts. With a small set of algorithmic tools and few manual interactions, it is shown how laypersons can efficiently create a ground truth for handwriting recognition.

References

A. Antonacopoulos and A. Downton (eds.). Special issue on the analysis of historical documents. Int. Journal on Document Analysis and Recognition, 9(2--4):75--77, 2007. Google ScholarDigital Library
G. Bal, G. Agam, G. Frieder, and O. Frieder. Interactive degraded document enhancement and ground truth generation. In B. Yanikoglu and K. Berkner, editors, Document Recognition and Retrieval XV, volume 6815 of Proc. SPIE, 2008.Google Scholar
U. Bhattacharya and B. Chaudhuri. Handwritten numeral databases of Indian scripts and multistage recognition of mixed numerals. IEEE Trans. PAMI, 31(3):444--457, 2009. Google ScholarDigital Library
S. Feng, N. Howe, and R. Manmatha. A hidden Markov model for alphabet-soup word recognition. In Proc. IEEE Int. Conf. on Frontiers in Handwriting Recognition (ICFHR 2008), pages 210--215, 2008.Google Scholar
A. Fischer, M. Wüthrich, M. Liwicki, V. Frinken, H. Bunke, G. Viehhauser, and M. Stolz. Automatic transcription of handwritten medieval documents. In Proc. 15th Int. Conf. on Virtual Systems and Multimedia, volume 1, pages 137--142. IEEE, September 2009. Google ScholarDigital Library
A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, and J. Schmidhuber. A novel connectionist system for improved unconstrained handwriting recognition. IEEE Trans. PAMI, 31(5):855--868, 2009. Google ScholarDigital Library
I. Guyon, L. Schomaker, R. Plamondon, M. Liberman, and S. Janet. UNIPEN project of on-line data exchange and recognizer benchmarks. In Proc. 12th Int. Conf. on Pattern Recognition (ICPR), volume 2, pages 29--33, 1994.Google ScholarCross Ref
J. J. Hull. A database for handwritten text recognition research. IEEE Trans. PAMI, 16(5):550--554, 1994. Google ScholarDigital Library
S. Impedovo, P. Wang, and H. Bunke, editors. Automatic Bankcheck Processing. World Scientific, 1997.Google ScholarCross Ref
E. Indermühle, M. Liwicki, and H. Bunke. Combining alignment results for historical handwritten document analysis. In 10th Int. Conf. on Document Analysis and Recognition, pages 1186--1190, 2009. Google ScholarDigital Library
L. Likforman-Sulem, A. Zahour, and B. Taconet. Text line segmentation of historical documents: a survey. International Journal on Document Analysis and Recognition (IJDAR), 9(2--4):123--138, 2007. Google ScholarDigital Library
M. Liwicki and H. Bunke. IAM-OnDB - an on-line english sentence database acquired from handwritten text on a whiteboard. In Proc. 8th Int. Conf. on Document Analysis and Recognition (ICDAR), volume 2, pages 956--961, 2005. Google ScholarDigital Library
M. Liwicki, E. Indermühle, and H. Bunke. Online handwritten text line detection using dynamic programming. In Proc. 9th Int. Conf. on Document Analysis and Recognition, volume 1, pages 447--451, 2007. Google ScholarDigital Library
R. Manmatha and J. L. Rothfeder. A scale space approach for automatically segmenting words from historical handwritten documents. IEEE Trans. PAMI, 27(8):1212--1225, 2005. Google ScholarDigital Library
U.-V. Marti and H. Bunke. Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. Int. Journal of Pattern Recognition and Art. Intelligence, 15:65--90, 2001.Google ScholarCross Ref
U.-V. Marti and H. Bunke. The IAM-database: an English sentence database for off-line handwriting recognition. Int. Journal on Document Analysis and Recognition, 5:39--46, 2002.Google ScholarCross Ref
G. Nagy and D. Lopresti. Interactive document processing and digital libraries. In Proc. 2nd Int. Workshop on Document Image Analysis for Libraries (DIAL 2006), pages 2--11. IEEE Computer Society, 2006. Google ScholarDigital Library
M. Nakagawa and K. Matsumoto. Collection of on-line handwritten Japanese character pattern databases and their analysis. Int. Journal on Document Analysis and Recognition, 7(1):69--81, 2004. Google ScholarDigital Library
K. Ntzios, B. Gatos, I. Pratikakis, T. Konidaris, and S. J. Perantonis. An old Greek handwritten ocr system based on an efficient segmentation-free approach. International Journal on Document Analysis and Recognition (IJDAR), 9(2):179--192, 2007. Google ScholarDigital Library
M. Pechwitz, S. Maddouri, V. Maergner, N. Ellouze, and H. Amiri. IFN/ENIT - database of handwritten Arabic words. In Proc. on CIFED, pages 129--136, 2002.Google Scholar
L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257--285, Feb. 1989.Google ScholarCross Ref
T. M. Rath and R. Manmatha. Word spotting for historical documents. Int. Journal on Document Analysis and Recognition, 9:139--152, 2007. Google ScholarDigital Library
S. Srihari, Y. Shin, and V. Ramanaprasad. A system to read names and addresses on tax forms. Proc. IEEE, 84(7):1038--1049, 1996.Google ScholarCross Ref
K. Terasawa and Y. Tanaka. Slit style HOG features for document image word spotting. In 10th Int. Conf. on Document Analysis and Recognition, volume 1, pages 116--120, 2009. Google ScholarDigital Library
C. Viard-Gaudin, P. M. Lallican, P. Binter, and S. Knerr. The IRESTE on/off (IRONOFF) dual handwriting database. In Proc. 5th Int. Conf. on Document Analysis and Recognition (ICDAR), pages 455--458, 1999. Google ScholarDigital Library
M. Wüthrich, M. Liwicki, A. Fischer, E. Indermühle, H. Bunke, G. Viehhauser, and M. Stolz. Language model integration for the recognition of handwritten medieval documents. In Proc. 10th Int. Conf. on Document Analysis and Recognition, volume 1, pages 211--215. IEEE, July 2009. Google ScholarDigital Library
M. Zimmermann and H. Bunke. Automatic segmentation of the IAM off-line database for handwritten English text. In Proc. 16th Int. Conf. on Pattern Recognition, volume 4, pages 35--39, 2002.Google ScholarCross Ref

Index Terms

Ground truth creation for handwriting recognition in historical documents
1. Applied computing
  1. Document management and text processing

Recommendations

Handwritten text recognition for historical documents in the transcriptorium project
DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural Heritage

Transcription of historical handwritten documents is a crucial problem for making easier the access to these documents to the general public. Currently, huge amount of historical handwritten documents are being made available by on-line portals ...
Read More
Impact of the ground truth quality for handwriting recognition
SOICT '23: Proceedings of the 12th International Symposium on Information and Communication Technology

Handwriting recognition is a key technology for accessing the content of old manuscripts, helping to preserve cultural heritage. Deep learning shows an impressive performance in solving this task. However, to achieve its full potential, it requires a ...
Read More
BYANJON: A Ground Truth Preparation System for Online Handwritten Bangla Documents
The work reported in this article deals with the ground truth generation scheme for online handwritten Bangla documents at text-line, word, and stroke levels. The aim of the proposed scheme is twofold: firstly, to build a document level database so that ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DAS '10: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
June 2010
490 pages
ISBN:9781605587738
DOI:10.1145/1815330
General Chairs:
David Doermann
University of Maryland, College Park
,
Venu Govindaraju
University at Buffalo, SUNY
,
Daniel Lopresti
Lehigh University
,
Prem Natarajan
Raytheon BBN Technologies
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 June 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 55
  Total Citations
  View Citations
- 826
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Ground truth creation for handwriting recognition in historical documents

DAS '10: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Handwritten text recognition for historical documents in the transcriptorium project

Impact of the ground truth quality for handwriting recognition

BYANJON: A Ground Truth Preparation System for Online Handwritten Bangla Documents