skip to main content
10.1145/1577802.1577810acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmocrConference Proceedingsconference-collections
research-article

Optical character recognition of Gurmukhi script using multiple classifiers

Published:25 July 2009Publication History

ABSTRACT

In this paper, we present a robust and font independent Gurmukhi OCR system, which performs reasonably well on old documents as well. The OCR is based on four classifiers operating in serial and parallel mode. For combining the results of the classifiers operating in parallel mode, a corpus based weighted voting method is used. Combining multiple classifiers in such a way, that their individual weaknesses are compensated while their individual strengths are preserved, results in significantly better performance than what can be achieved with a single classifier. The problem of broken characters, which frequently appear in old documents, has also been tackled using a structural feature based algorithm.

References

  1. Brill Eric and Jun Wu: Classifier Combination for Improved Lexical Disambiguation, Proceedings of the 17th international conference on Computational linguistics, vol. 1, pp. 191--195. Montreal, Quebec, Canada (1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Roli Fabio, Giacinto Giorgio, Vernazza Gianni: Methods for Designing Multiple Classifier Systems, Proceedings of the Second International Workshop on Multiple Classifier Systems, pp. 78--87. Springer-Verlag London, UK (2001). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ludmila IK.,: A Theoretical Study on Six Classifier Fusion Strategies, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 24, No. 2, pp. 281--286, (2002.). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Kittler J., Hatef M., Duin RPW, Matas J.: On Combining Classifiers, IEEE Trans. On Pat. Analysis and Machine Intel., vol. 20, No. 3, pp. 226--239 (1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Prevost L., Michel-Sendis C., Moises A., Oudot L., Milgram M: Combining model-based and discriminative classifiers: application to handwritten character recognition, ICDAR'03 (2003). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ke Chenl, Lanwang, Huisheng Chi: Methods of Combining Multiple Classifiers with Different Features and their Applications to Text-Independent Speaker Identification: International Journal of Pattern Recognition and Artificial Intelligence, Vol. 11, No. 3, pp. 417--445 (1997).Google ScholarGoogle ScholarCross RefCross Ref
  7. Lehal G. S., Singh Chandan,: A Complete Machine Printed Gurmukhi OCR System, Vivek, pp. 10--17, Vol. 16, No. 3. (2006).Google ScholarGoogle Scholar
  8. Benedicte Allier, Nadia Bali, Hubert Emptoz: Automatic accurate broken character restoration for patrimonial documents. IJDAR 8(4), pp 246--261 (2006)Google ScholarGoogle ScholarCross RefCross Ref
  9. Billawala, N., Hart, P. E., Pearis, M.: Image continuation. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 53--57, Tsukuba, Japan (1993)Google ScholarGoogle ScholarCross RefCross Ref
  10. Shi, Z., Govindaraju, V.: Character image enhancement by selective region-growing. Pattern Recognit. Lett. (17), pp. 523--527 (1996) Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yu. D., Yan, H.: Reconstruction of broken handwritten digits based on structural morphological features. Pattern Recognit. (34), pp. 235--254 (2001).Google ScholarGoogle ScholarCross RefCross Ref
  12. Whichello, A., Yan, H.: Linking broken character borders with variable sized masks to improve recognition. Pattern Recognition 29(8), pp. 1429--1435 (1996)Google ScholarGoogle ScholarCross RefCross Ref
  13. Bhattacharya U., Chaudhuri B. B.: A Majority Voting Scheme for Multi resolution Recognition of Hand printed Numerals, ICDAR'03, pp. 16--20, 3--6 (2003). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lam L., Suen, C. Y.,: Application of Majority Voting to Pattern Recognition: An Analysis of its Behaviour and Performance, IEEE Trans. on System Man and Cyebrn-Part A: Systems and Humans, vol. 27, pp. 553--568 (1997). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optical character recognition of Gurmukhi script using multiple classifiers

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Other conferences
              MOCR '09: Proceedings of the International Workshop on Multilingual OCR
              July 2009
              139 pages
              ISBN:9781605586984
              DOI:10.1145/1577802

              Copyright © 2009 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 25 July 2009

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate17of34submissions,50%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader