ABSTRACT
Handwriting recognition in historical documents is vital for the creation of digital libraries. The creation of readily available ground truth data plays a central role for the development of new recognition technologies. For historical documents, ground truth creation is more difficult and time-consuming when compared with modern documents. In this paper, we present a semi-automatic ground truth creation proceeding for historical documents that takes into account noisy background and transcription alignment. The proposed ground truth creation is demonstrated for the IAM Historical Handwriting Database (IAM-HistDB) that is currently under construction and will include several hundred Old German manuscripts. With a small set of algorithmic tools and few manual interactions, it is shown how laypersons can efficiently create a ground truth for handwriting recognition.
- A. Antonacopoulos and A. Downton (eds.). Special issue on the analysis of historical documents. Int. Journal on Document Analysis and Recognition, 9(2--4):75--77, 2007. Google ScholarDigital Library
- G. Bal, G. Agam, G. Frieder, and O. Frieder. Interactive degraded document enhancement and ground truth generation. In B. Yanikoglu and K. Berkner, editors, Document Recognition and Retrieval XV, volume 6815 of Proc. SPIE, 2008.Google Scholar
- U. Bhattacharya and B. Chaudhuri. Handwritten numeral databases of Indian scripts and multistage recognition of mixed numerals. IEEE Trans. PAMI, 31(3):444--457, 2009. Google ScholarDigital Library
- S. Feng, N. Howe, and R. Manmatha. A hidden Markov model for alphabet-soup word recognition. In Proc. IEEE Int. Conf. on Frontiers in Handwriting Recognition (ICFHR 2008), pages 210--215, 2008.Google Scholar
- A. Fischer, M. Wüthrich, M. Liwicki, V. Frinken, H. Bunke, G. Viehhauser, and M. Stolz. Automatic transcription of handwritten medieval documents. In Proc. 15th Int. Conf. on Virtual Systems and Multimedia, volume 1, pages 137--142. IEEE, September 2009. Google ScholarDigital Library
- A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, and J. Schmidhuber. A novel connectionist system for improved unconstrained handwriting recognition. IEEE Trans. PAMI, 31(5):855--868, 2009. Google ScholarDigital Library
- I. Guyon, L. Schomaker, R. Plamondon, M. Liberman, and S. Janet. UNIPEN project of on-line data exchange and recognizer benchmarks. In Proc. 12th Int. Conf. on Pattern Recognition (ICPR), volume 2, pages 29--33, 1994.Google ScholarCross Ref
- J. J. Hull. A database for handwritten text recognition research. IEEE Trans. PAMI, 16(5):550--554, 1994. Google ScholarDigital Library
- S. Impedovo, P. Wang, and H. Bunke, editors. Automatic Bankcheck Processing. World Scientific, 1997.Google ScholarCross Ref
- E. Indermühle, M. Liwicki, and H. Bunke. Combining alignment results for historical handwritten document analysis. In 10th Int. Conf. on Document Analysis and Recognition, pages 1186--1190, 2009. Google ScholarDigital Library
- L. Likforman-Sulem, A. Zahour, and B. Taconet. Text line segmentation of historical documents: a survey. International Journal on Document Analysis and Recognition (IJDAR), 9(2--4):123--138, 2007. Google ScholarDigital Library
- M. Liwicki and H. Bunke. IAM-OnDB - an on-line english sentence database acquired from handwritten text on a whiteboard. In Proc. 8th Int. Conf. on Document Analysis and Recognition (ICDAR), volume 2, pages 956--961, 2005. Google ScholarDigital Library
- M. Liwicki, E. Indermühle, and H. Bunke. Online handwritten text line detection using dynamic programming. In Proc. 9th Int. Conf. on Document Analysis and Recognition, volume 1, pages 447--451, 2007. Google ScholarDigital Library
- R. Manmatha and J. L. Rothfeder. A scale space approach for automatically segmenting words from historical handwritten documents. IEEE Trans. PAMI, 27(8):1212--1225, 2005. Google ScholarDigital Library
- U.-V. Marti and H. Bunke. Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. Int. Journal of Pattern Recognition and Art. Intelligence, 15:65--90, 2001.Google ScholarCross Ref
- U.-V. Marti and H. Bunke. The IAM-database: an English sentence database for off-line handwriting recognition. Int. Journal on Document Analysis and Recognition, 5:39--46, 2002.Google ScholarCross Ref
- G. Nagy and D. Lopresti. Interactive document processing and digital libraries. In Proc. 2nd Int. Workshop on Document Image Analysis for Libraries (DIAL 2006), pages 2--11. IEEE Computer Society, 2006. Google ScholarDigital Library
- M. Nakagawa and K. Matsumoto. Collection of on-line handwritten Japanese character pattern databases and their analysis. Int. Journal on Document Analysis and Recognition, 7(1):69--81, 2004. Google ScholarDigital Library
- K. Ntzios, B. Gatos, I. Pratikakis, T. Konidaris, and S. J. Perantonis. An old Greek handwritten ocr system based on an efficient segmentation-free approach. International Journal on Document Analysis and Recognition (IJDAR), 9(2):179--192, 2007. Google ScholarDigital Library
- M. Pechwitz, S. Maddouri, V. Maergner, N. Ellouze, and H. Amiri. IFN/ENIT - database of handwritten Arabic words. In Proc. on CIFED, pages 129--136, 2002.Google Scholar
- L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257--285, Feb. 1989.Google ScholarCross Ref
- T. M. Rath and R. Manmatha. Word spotting for historical documents. Int. Journal on Document Analysis and Recognition, 9:139--152, 2007. Google ScholarDigital Library
- S. Srihari, Y. Shin, and V. Ramanaprasad. A system to read names and addresses on tax forms. Proc. IEEE, 84(7):1038--1049, 1996.Google ScholarCross Ref
- K. Terasawa and Y. Tanaka. Slit style HOG features for document image word spotting. In 10th Int. Conf. on Document Analysis and Recognition, volume 1, pages 116--120, 2009. Google ScholarDigital Library
- C. Viard-Gaudin, P. M. Lallican, P. Binter, and S. Knerr. The IRESTE on/off (IRONOFF) dual handwriting database. In Proc. 5th Int. Conf. on Document Analysis and Recognition (ICDAR), pages 455--458, 1999. Google ScholarDigital Library
- M. Wüthrich, M. Liwicki, A. Fischer, E. Indermühle, H. Bunke, G. Viehhauser, and M. Stolz. Language model integration for the recognition of handwritten medieval documents. In Proc. 10th Int. Conf. on Document Analysis and Recognition, volume 1, pages 211--215. IEEE, July 2009. Google ScholarDigital Library
- M. Zimmermann and H. Bunke. Automatic segmentation of the IAM off-line database for handwritten English text. In Proc. 16th Int. Conf. on Pattern Recognition, volume 4, pages 35--39, 2002.Google ScholarCross Ref
Index Terms
- Ground truth creation for handwriting recognition in historical documents
Recommendations
Handwritten text recognition for historical documents in the transcriptorium project
DATeCH '14: Proceedings of the First International Conference on Digital Access to Textual Cultural HeritageTranscription of historical handwritten documents is a crucial problem for making easier the access to these documents to the general public. Currently, huge amount of historical handwritten documents are being made available by on-line portals ...
Impact of the ground truth quality for handwriting recognition
SOICT '23: Proceedings of the 12th International Symposium on Information and Communication TechnologyHandwriting recognition is a key technology for accessing the content of old manuscripts, helping to preserve cultural heritage. Deep learning shows an impressive performance in solving this task. However, to achieve its full potential, it requires a ...
BYANJON: A Ground Truth Preparation System for Online Handwritten Bangla Documents
The work reported in this article deals with the ground truth generation scheme for online handwritten Bangla documents at text-line, word, and stroke levels. The aim of the proposed scheme is twofold: firstly, to build a document level database so that ...
Comments