ABSTRACT
In this paper, we provide an overview of the FIRE 2013 track on transliterated search and describe the datasets released as part of the track. This was the first year that the track was organized. We had proposed two subtasks as part of the challenge. In the first subtask, which we had proposed for Hindi, Bangla, and Gujarati, participants had to devise an algorithm to label the true languages of words in a sentence. Additionally, if a non-English word was identified, the algorithm was also supposed to provide the transliteration of the word in the native script. The second subtask was retrieval-based, where mixed-script documents had to be retrieved and ranked by relevance in response to ad hoc queries. The queries in our dataset were Bollywood Hindi song lyrics, in Roman script. We received a total of 25 run submissions from five different teams across the world (three from India and two from abroad). Conducting this track helped us generate awareness about the importance of transliteration in the context of Indian languages. Results show that there is considerable scope for improvement of transliteration accuracies for the studied languages.
- U. Z. Ahmed, K. Bali, M. Choudhury, and S. V. B. Challenges in designing input method editors for indian languages: The role of word-origin and context. Advances in Text Input Methods (WTIM 2011), pages 1--9, 2011.Google Scholar
- P. Antony and K. Soman. Machine transliteration for indian languages: A literature survey. International Journal of Scientific & Engineering Research, IJSER, 2:1--8, 2011.Google Scholar
- K. Gupta, M. Choudhury, and K. Bali. Mining hindi-english transliteration pairs from online hindi lyrics. In LREC, pages 2459--2465, 2012.Google Scholar
- K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 20:422--446, October 2002. Google ScholarDigital Library
- B. King and S. Abney. Labeling the languages of words in mixed-language documents using weakly supervised methods. In Proceedings of NAACL-HLT, pages 1110--1119, 2013.Google Scholar
- K. Knight and J. Graehl. Machine transliteration. Computational Linguistics, 24(4):599--612, 1998. Google ScholarDigital Library
- U. Quasthoff, M. Richter, and C. Biemann. Corpus portal for search in monolingual corpora. In Proceedings of the fifth international conference on language resources and evaluation, pages 1799--1802, 2006.Google Scholar
- G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., 1986. Google ScholarDigital Library
- V. Sowmya, M. Choudhury, K. Bali, T. Dasgupta, and A. Basu. Resource creation for training and testing of transliteration systems for indian languages. In LREC, 2010.Google Scholar
- E. M. Voorhees and D. M. Tice. The TREC-8 Question Answering Track Evaluation. In TREC-8, pages 83--105, 1999.Google Scholar
Index Terms
- Overview of the FIRE 2013 Track on Transliterated Search
Recommendations
IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search
FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval EvaluationThis paper describes our submission for FIRE 2014 Shared Task on Transliterated Search. The shared task features two sub-tasks: Query word labeling and Mixed-script Ad hoc retrieval for Hindi Song Lyrics.
Query Word Labeling is on token level language ...
ISM@FIRE-2013 Shared Task on Transliterated Search
FIRE '12 & '13: Proceedings of the 4th and 5th Annual Meetings of the Forum for Information Retrieval EvaluationThis paper describes the approach we adopted during official submission of FIRE-2013 Shared Task on Transliterated Search along with few other approaches that we experimented post-submission. The techniques solve the problem of language labeling, by ...
Hindi Stemmer @ FIRE-2013
FIRE '12 & '13: Proceedings of the 4th and 5th Annual Meetings of the Forum for Information Retrieval EvaluationThis paper describes a language independent approach for extracting Hindi morpheme from a given list of Hindi words of Morpheme Extraction Task (MET) at FIRE 2013. In this approach list of Hindi word is submitted to the system and it generates stemmed ...
Comments