skip to main content
10.1145/1815330.1815331acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdasConference Proceedingsconference-collections
research-article

Ground truth creation for handwriting recognition in historical documents

Authors Info & Claims
Published:09 June 2010Publication History

ABSTRACT

Handwriting recognition in historical documents is vital for the creation of digital libraries. The creation of readily available ground truth data plays a central role for the development of new recognition technologies. For historical documents, ground truth creation is more difficult and time-consuming when compared with modern documents. In this paper, we present a semi-automatic ground truth creation proceeding for historical documents that takes into account noisy background and transcription alignment. The proposed ground truth creation is demonstrated for the IAM Historical Handwriting Database (IAM-HistDB) that is currently under construction and will include several hundred Old German manuscripts. With a small set of algorithmic tools and few manual interactions, it is shown how laypersons can efficiently create a ground truth for handwriting recognition.

References

  1. A. Antonacopoulos and A. Downton (eds.). Special issue on the analysis of historical documents. Int. Journal on Document Analysis and Recognition, 9(2--4):75--77, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Bal, G. Agam, G. Frieder, and O. Frieder. Interactive degraded document enhancement and ground truth generation. In B. Yanikoglu and K. Berkner, editors, Document Recognition and Retrieval XV, volume 6815 of Proc. SPIE, 2008.Google ScholarGoogle Scholar
  3. U. Bhattacharya and B. Chaudhuri. Handwritten numeral databases of Indian scripts and multistage recognition of mixed numerals. IEEE Trans. PAMI, 31(3):444--457, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Feng, N. Howe, and R. Manmatha. A hidden Markov model for alphabet-soup word recognition. In Proc. IEEE Int. Conf. on Frontiers in Handwriting Recognition (ICFHR 2008), pages 210--215, 2008.Google ScholarGoogle Scholar
  5. A. Fischer, M. Wüthrich, M. Liwicki, V. Frinken, H. Bunke, G. Viehhauser, and M. Stolz. Automatic transcription of handwritten medieval documents. In Proc. 15th Int. Conf. on Virtual Systems and Multimedia, volume 1, pages 137--142. IEEE, September 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, and J. Schmidhuber. A novel connectionist system for improved unconstrained handwriting recognition. IEEE Trans. PAMI, 31(5):855--868, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. I. Guyon, L. Schomaker, R. Plamondon, M. Liberman, and S. Janet. UNIPEN project of on-line data exchange and recognizer benchmarks. In Proc. 12th Int. Conf. on Pattern Recognition (ICPR), volume 2, pages 29--33, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. J. Hull. A database for handwritten text recognition research. IEEE Trans. PAMI, 16(5):550--554, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Impedovo, P. Wang, and H. Bunke, editors. Automatic Bankcheck Processing. World Scientific, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  10. E. Indermühle, M. Liwicki, and H. Bunke. Combining alignment results for historical handwritten document analysis. In 10th Int. Conf. on Document Analysis and Recognition, pages 1186--1190, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Likforman-Sulem, A. Zahour, and B. Taconet. Text line segmentation of historical documents: a survey. International Journal on Document Analysis and Recognition (IJDAR), 9(2--4):123--138, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Liwicki and H. Bunke. IAM-OnDB - an on-line english sentence database acquired from handwritten text on a whiteboard. In Proc. 8th Int. Conf. on Document Analysis and Recognition (ICDAR), volume 2, pages 956--961, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Liwicki, E. Indermühle, and H. Bunke. Online handwritten text line detection using dynamic programming. In Proc. 9th Int. Conf. on Document Analysis and Recognition, volume 1, pages 447--451, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Manmatha and J. L. Rothfeder. A scale space approach for automatically segmenting words from historical handwritten documents. IEEE Trans. PAMI, 27(8):1212--1225, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. U.-V. Marti and H. Bunke. Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. Int. Journal of Pattern Recognition and Art. Intelligence, 15:65--90, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  16. U.-V. Marti and H. Bunke. The IAM-database: an English sentence database for off-line handwriting recognition. Int. Journal on Document Analysis and Recognition, 5:39--46, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  17. G. Nagy and D. Lopresti. Interactive document processing and digital libraries. In Proc. 2nd Int. Workshop on Document Image Analysis for Libraries (DIAL 2006), pages 2--11. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Nakagawa and K. Matsumoto. Collection of on-line handwritten Japanese character pattern databases and their analysis. Int. Journal on Document Analysis and Recognition, 7(1):69--81, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. Ntzios, B. Gatos, I. Pratikakis, T. Konidaris, and S. J. Perantonis. An old Greek handwritten ocr system based on an efficient segmentation-free approach. International Journal on Document Analysis and Recognition (IJDAR), 9(2):179--192, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Pechwitz, S. Maddouri, V. Maergner, N. Ellouze, and H. Amiri. IFN/ENIT - database of handwritten Arabic words. In Proc. on CIFED, pages 129--136, 2002.Google ScholarGoogle Scholar
  21. L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257--285, Feb. 1989.Google ScholarGoogle ScholarCross RefCross Ref
  22. T. M. Rath and R. Manmatha. Word spotting for historical documents. Int. Journal on Document Analysis and Recognition, 9:139--152, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Srihari, Y. Shin, and V. Ramanaprasad. A system to read names and addresses on tax forms. Proc. IEEE, 84(7):1038--1049, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  24. K. Terasawa and Y. Tanaka. Slit style HOG features for document image word spotting. In 10th Int. Conf. on Document Analysis and Recognition, volume 1, pages 116--120, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. C. Viard-Gaudin, P. M. Lallican, P. Binter, and S. Knerr. The IRESTE on/off (IRONOFF) dual handwriting database. In Proc. 5th Int. Conf. on Document Analysis and Recognition (ICDAR), pages 455--458, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Wüthrich, M. Liwicki, A. Fischer, E. Indermühle, H. Bunke, G. Viehhauser, and M. Stolz. Language model integration for the recognition of handwritten medieval documents. In Proc. 10th Int. Conf. on Document Analysis and Recognition, volume 1, pages 211--215. IEEE, July 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. Zimmermann and H. Bunke. Automatic segmentation of the IAM off-line database for handwritten English text. In Proc. 16th Int. Conf. on Pattern Recognition, volume 4, pages 35--39, 2002.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Ground truth creation for handwriting recognition in historical documents

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      DAS '10: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
      June 2010
      490 pages
      ISBN:9781605587738
      DOI:10.1145/1815330

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 June 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader