It is our great pleasure to welcome all participants to the 2013 International Workshop on Multilingual OCR (MOCR 2013). This is the 4th edition of this series and emphasizes the importance of multilingual OCR in digitizing and making accessible for search and information retrieval, the enormous amounts of text material in multiple languages from across the world that is not born-digital. The scope of this field now extends into non-traditional documents such as scene images containing text captured by digital cameras as well as online recognition of finger or stylus based text input from touchscreen desktops, tablets and other mobile platforms.
In keeping with the tradition of earlier editions of this Workshop, MOCR '13 is being held in conjunction with the 12th International Conference on Document Analysis and Recognition (ICDAR 2013) at Washington, DC USA on August 24, 2013. The Workshop is being organized as a single-track, one-day event with an all oral presentation format. All the papers underwent the standard peer review process and the acceptance rate was about 50%. The official workshop proceedings are being published in the ACM International Conference Proceedings Series, available online as part of the ACM Digital Library. Please note that any citations should reference the online proceedings and not the unofficial hardcopy proceedings distributed at the workshop.
Proceeding Downloads
Multilingual OCR research and applications: an overview
This paper offers an overview of the current approaches to research in the field of off-line multilingual OCR. Typically, off-line OCR systems are designed for a particular script or language. However, the ideal approach to multilingual OCR would likely ...
HMM-based script identification for OCR
While current OCR systems are able to recognize text in an increasing number of scripts and languages, typically they still need to be told in advance what those scripts and languages are. We propose an approach that repurposes the same HMM-based system ...
A bilingual Gurmukhi-English OCR based on multiple script identifiers and language models
English words are frequently encountered in Gurmukhi texts. A monolingual Gurmukhi OCR will recognize such words as garbage. It becomes necessary to add bilingual capability to the Gurmukhi OCR to recognize English text too. But adding bilingual ...
Word level script recognition for Uighur document mixed with English script
Script recognition is one of the key technologies in Uighur OCR research, as it is common to find English words or sentences in Uighur documents, especially in scientific documents. A word level based script recognition is presented in this paper. The ...
Bag-of-features HMMs for segmentation-free Bangla word spotting
In this paper we present how Bag-of-Features Hidden Markov Models can be applied to printed Bangla word spotting. These statistical models allow for an easy adaption to different problem domains. This is possible due to the integration of automatically ...
Low resolution Arabic recognition with multidimensional recurrent neural networks
OCR of multi-font Arabic text is difficult due to large variations in character shapes from one font to another. It becomes even more challenging if the text is rendered at very low resolution. This paper describes a multi-font, low resolution, and open ...
Recognition of Nastalique Urdu ligatures
There has been considerable work on Arabic OCR. However, all that work is based on Naskh style. Urdu script is based on Arabic alphabet, but uses Nastalique style. The Nastalique style makes OCR in general and character segmentation in particular, a ...
An approach for Bangla and Devanagari video text recognition
Extraction and recognition of Bangla text from video frame images is challenging due to fonts type and style variation, complex color background, low-resolution, low contrast etc. In this paper, we propose an algorithm for extraction and recognition of ...
Can we build language-independent OCR using LSTM networks?
Language models or recognition dictionaries are usually considered an essential step in OCR. However, using a language model complicates training of OCR systems, and it also narrows the range of texts that an OCR system can be used with. Recent results ...
Re-targeting of multi-script document images for handheld devices
We propose here a technique for transforming the layout of a printed document image to a new user-conducive layout. Its objective is to effectuate better display in a low-resolution screen for providing comfort and convenience to a viewer while reading. ...
Ruling-based table analysis for noisy handwritten documents
Table analysis can be a valuable step in document image analysis. In the case of noisy handwritten documents, various artifacts complicate the task of locating tables on a page and segmenting them into cells. Our ruling-based approach first detects line ...
A robust table registration method for batch table OCR processing
A robust table registration method is proposed in this paper for a better understanding on structured information from scanned table images. Scanned images can be heavily degraded because of scanning effects, binarization or purely document itself. For ...
Text graphic separation in Indian newspapers
Digitization of newspaper article is important for registering historical events. Layout analysis of Indian newspaper is a challenging task due to the presence of different font size, font styles and random placement of text and non-text regions. In ...
Multi-script robust reading competition in ICDAR 2013
A competition was organized by the authors to detect text from scene images. The motivation was to look for script-independent algorithms that detect the text and extract it from the scene images, which may be applied directly to an unknown script. The ...
Unconstrained handwritten Devanagari character recognition using convolutional neural networks
In this paper, we introduce a novel offline strategy for recognition of online handwritten Devanagari characters entered in an unconstrained manner. Unlike the previous approaches based on standard classifiers - SVM, HMM, ANN and trained on statistical, ...
Global and local features for recognition of online handwritten numerals and Tamil characters
Feature extraction is a key step in the recognition of online handwritten data and is well investigated in literature. In the case of Tamil online handwritten characters, global features such as those derived from discrete Fourier transform (DFT), ...
Levenshtein distance metric based holistic handwritten word recognition
The rapid spread of pen-based digital devices and touch screen devices coupled with their affordability, and capability to take technology and digitization of data to the grassroots, has made online handwriting recognition an active field of research. ...
Recognition of offline handwritten numerals using an ensemble of MLPs combined by Adaboost
In this article, we present our recent study of offline recognition of handwritten numerals of three Indian scripts -- Devanagari, Bangla and Oriya. Here, we propose a novel approach to combination of multiple MLP classifiers with varying number of ...
Cited By
-
Saout T, Lardeux F and Saubion F An Overview of Data Extraction From Invoices, IEEE Access, 10.1109/ACCESS.2024.3360528, 12, (19872-19886)
-
Ringger E, Lamiroy B, Soheili M, Kabir E and Stricker D (2015). Clustering of Farsi sub-word images for whole-book recognition IS&T/SPIE Electronic Imaging, 10.1117/12.2075931, , (94020C), Online publication date: 8-Feb-2015.
-
Coüasnon B, Ringger E, Banerjee P and Chaudhuri B (2013). Video text localization using wavelet and shearlet transforms IS&T/SPIE Electronic Imaging, 10.1117/12.2036077, , (90210B), Online publication date: 27-Dec-2013.
Index Terms
- Proceedings of the 4th International Workshop on Multilingual OCR
Recommendations
Acceptance Rates
Year | Submitted | Accepted | Rate |
---|---|---|---|
MOCR '13 | 34 | 17 | 50% |
Overall | 34 | 17 | 50% |