skip to main content
10.1145/2505377acmotherconferencesBook PagePublication PagesmocrConference Proceedingsconference-collections
MOCR '13: Proceedings of the 4th International Workshop on Multilingual OCR
ACM2013 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
MOCR '13: 4th International Workshop on Multilingual OCR Washington D.C. USA 24 August 2013
ISBN:
978-1-4503-2114-3
Published:
24 August 2013
Sponsors:
BBN Technologies

Bibliometrics
Skip Abstract Section
Abstract

It is our great pleasure to welcome all participants to the 2013 International Workshop on Multilingual OCR (MOCR 2013). This is the 4th edition of this series and emphasizes the importance of multilingual OCR in digitizing and making accessible for search and information retrieval, the enormous amounts of text material in multiple languages from across the world that is not born-digital. The scope of this field now extends into non-traditional documents such as scene images containing text captured by digital cameras as well as online recognition of finger or stylus based text input from touchscreen desktops, tablets and other mobile platforms.

In keeping with the tradition of earlier editions of this Workshop, MOCR '13 is being held in conjunction with the 12th International Conference on Document Analysis and Recognition (ICDAR 2013) at Washington, DC USA on August 24, 2013. The Workshop is being organized as a single-track, one-day event with an all oral presentation format. All the papers underwent the standard peer review process and the acceptance rate was about 50%. The official workshop proceedings are being published in the ACM International Conference Proceedings Series, available online as part of the ACM Digital Library. Please note that any citations should reference the online proceedings and not the unofficial hardcopy proceedings distributed at the workshop.

Skip Table Of Content Section
research-article
Multilingual OCR research and applications: an overview

This paper offers an overview of the current approaches to research in the field of off-line multilingual OCR. Typically, off-line OCR systems are designed for a particular script or language. However, the ideal approach to multilingual OCR would likely ...

SESSION: Script recognition & word spotting
research-article
HMM-based script identification for OCR

While current OCR systems are able to recognize text in an increasing number of scripts and languages, typically they still need to be told in advance what those scripts and languages are. We propose an approach that repurposes the same HMM-based system ...

research-article
A bilingual Gurmukhi-English OCR based on multiple script identifiers and language models

English words are frequently encountered in Gurmukhi texts. A monolingual Gurmukhi OCR will recognize such words as garbage. It becomes necessary to add bilingual capability to the Gurmukhi OCR to recognize English text too. But adding bilingual ...

research-article
Word level script recognition for Uighur document mixed with English script

Script recognition is one of the key technologies in Uighur OCR research, as it is common to find English words or sentences in Uighur documents, especially in scientific documents. A word level based script recognition is presented in this paper. The ...

research-article
Bag-of-features HMMs for segmentation-free Bangla word spotting

In this paper we present how Bag-of-Features Hidden Markov Models can be applied to printed Bangla word spotting. These statistical models allow for an easy adaption to different problem domains. This is possible due to the integration of automatically ...

SESSION: Recognition 1
research-article
Low resolution Arabic recognition with multidimensional recurrent neural networks

OCR of multi-font Arabic text is difficult due to large variations in character shapes from one font to another. It becomes even more challenging if the text is rendered at very low resolution. This paper describes a multi-font, low resolution, and open ...

research-article
Recognition of Nastalique Urdu ligatures

There has been considerable work on Arabic OCR. However, all that work is based on Naskh style. Urdu script is based on Arabic alphabet, but uses Nastalique style. The Nastalique style makes OCR in general and character segmentation in particular, a ...

research-article
An approach for Bangla and Devanagari video text recognition

Extraction and recognition of Bangla text from video frame images is challenging due to fonts type and style variation, complex color background, low-resolution, low contrast etc. In this paper, we propose an algorithm for extraction and recognition of ...

research-article
Can we build language-independent OCR using LSTM networks?

Language models or recognition dictionaries are usually considered an essential step in OCR. However, using a language model complicates training of OCR systems, and it also narrows the range of texts that an OCR system can be used with. Recent results ...

SESSION: Document analysis
research-article
Re-targeting of multi-script document images for handheld devices

We propose here a technique for transforming the layout of a printed document image to a new user-conducive layout. Its objective is to effectuate better display in a low-resolution screen for providing comfort and convenience to a viewer while reading. ...

research-article
Ruling-based table analysis for noisy handwritten documents

Table analysis can be a valuable step in document image analysis. In the case of noisy handwritten documents, various artifacts complicate the task of locating tables on a page and segmenting them into cells. Our ruling-based approach first detects line ...

research-article
A robust table registration method for batch table OCR processing

A robust table registration method is proposed in this paper for a better understanding on structured information from scanned table images. Scanned images can be heavily degraded because of scanning effects, binarization or purely document itself. For ...

research-article
Text graphic separation in Indian newspapers

Digitization of newspaper article is important for registering historical events. Layout analysis of Indian newspaper is a challenging task due to the presence of different font size, font styles and random placement of text and non-text regions. In ...

research-article
Multi-script robust reading competition in ICDAR 2013

A competition was organized by the authors to detect text from scene images. The motivation was to look for script-independent algorithms that detect the text and extract it from the scene images, which may be applied directly to an unknown script. The ...

SESSION: Recognition 2
research-article
Unconstrained handwritten Devanagari character recognition using convolutional neural networks

In this paper, we introduce a novel offline strategy for recognition of online handwritten Devanagari characters entered in an unconstrained manner. Unlike the previous approaches based on standard classifiers - SVM, HMM, ANN and trained on statistical, ...

research-article
Global and local features for recognition of online handwritten numerals and Tamil characters

Feature extraction is a key step in the recognition of online handwritten data and is well investigated in literature. In the case of Tamil online handwritten characters, global features such as those derived from discrete Fourier transform (DFT), ...

research-article
Levenshtein distance metric based holistic handwritten word recognition

The rapid spread of pen-based digital devices and touch screen devices coupled with their affordability, and capability to take technology and digitization of data to the grassroots, has made online handwriting recognition an active field of research. ...

research-article
Recognition of offline handwritten numerals using an ensemble of MLPs combined by Adaboost

In this article, we present our recent study of offline recognition of handwritten numerals of three Indian scripts -- Devanagari, Bangla and Oriya. Here, we propose a novel approach to combination of multiple MLP classifiers with varying number of ...

Contributors
  • University at Buffalo, The State University of New York
  • Amazon.com, Inc.
  • Indian Institute of Technology Jodhpur
  • Lehigh University
  • University at Buffalo, The State University of New York
  • BBN Technologies
Index terms have been assigned to the content through auto-classification.

Recommendations

Acceptance Rates

MOCR '13 Paper Acceptance Rate17of34submissions,50%Overall Acceptance Rate17of34submissions,50%
YearSubmittedAcceptedRate
MOCR '13341750%
Overall341750%