ABSTRACT
Recent availability of commercial online machine translation (MT) systems makes it possible for layman Web users to utilize the MT capability for cross-language information retrieval (CLIR). To study the effectiveness of using MT for query translation, we conducted a set of experiments using Google Translate, an online MT system provided by Google, for translating queries in CLIR. The experiments show that MT is an excellent tool for the query translation task, and with the help of relevance feedback, it can achieve significant improvement over the monolingual baseline. The MT based query translation not only works for long queries, but is also effective for the short Web queries.
- Aljlayl, M. and Frieder, O. Effective Arabic-English cross-language information retrieval via machine-readable dictionaries and machine translation. In Proceedings of the tenth international conference on Information and knowledge management. 2001. Google ScholarDigital Library
- Artiles, J., Gonzalo, J. and Sekine, S. The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task. In the 4th International Workshop on Semantic Evaluations (Semeval-2007). 2007.Google ScholarCross Ref
- Ballesteros, L. and Croft, B. Dictionary Methods for Cross-Lingual Information Retrieval. Proceedings of the 7th International DEXA Conference on Database and Expert Systems. pages 791--801.1996 Google ScholarDigital Library
- Ballesteros, L. and Croft, W. B. Resolving Ambiguity for Cross-language Retrieval. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pages 64--71.1998 Google ScholarDigital Library
- Bikel, D. M., Miller, S., Schwartz, R. and Weischedel, R. Nymble: a high-performance Learning Name-finder. In Fifth Conference on Applied Natural Language Processing. 1997. Google ScholarDigital Library
- Darwish, K. and Oard, D. W. Probabilistic Structured Query Methods. In Proceedings of the 26th International ACM SIGIR Conference on Research and Development in Information Retrieval. pages 338--344. 2003. Google ScholarDigital Library
- Grishman, R., Westbrook, D. and Meyers, A. NYU's English ACE 2005 System Description. In ACE 2005 Evaluation Workshop. 2005.Google Scholar
- Ji, H., Blume, M., Freitag, D., Grishman, R., Khadivi, S. and Zens, R. NYU-Fair Isaac-RWTH Chinese to English Entity Translation 07 System. In NIST ET 2007 PI/Evaluation Workshop. 2007.Google Scholar
- Lavrenko, V., Choquette, M. and Croft, W. B. Cross-Lingual Relevance Models. Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. page 175--182.2002 Google ScholarDigital Library
- Oard, D. W. A Comparative Study of Query and Document Translation for Cross-Language Information Retrieval. In the Third Conference of the Association for Machine Translation in the Americas (AMTA). 1998. Google ScholarDigital Library
- Och, F. J. Minimum Error Rate Training for Statistical Machine Translation. In the Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL2003). 2003. Google ScholarDigital Library
- Thomas Mandl and Womser-Hacker, C. The Effect of Named Entities on Effectiveness in Cross-Language Information Retrieval Evaluation. In ACM SAC'05. pages 1059--1064. 2005. Google ScholarDigital Library
- Wang, J. and Oard, D. W. Combining Bidirectional Translation and Synonymy for Cross-language Information Retrieval. In Proceedings of the ACM SIGIR 2006. pages 202--209. 2006. Google ScholarDigital Library
- Xu, J. and Weischedel, R. A Probabilistic Approach to Term Translation for Cross-Lingual Retrieval;. Language Modeling for Information Retrieval. W. B. Croft and J. Lafferty (Eds). 2003.Google Scholar
Index Terms
A study of using an out-of-box commercial MT system for query translation in CLIR
Recommendations
Translation techniques in cross-language information retrieval
Cross-language information retrieval (CLIR) is an active sub-domain of information retrieval (IR). Like IR, CLIR is centered on the search for documents and for information contained within those documents. Unlike IR, CLIR must reconcile queries and ...
Preliminary study into query translation for patent retrieval
PaIR '10: Proceedings of the 3rd international workshop on Patent information retrievalPatent retrieval is a branch of Information Retrieval (IR) aiming to support patent professionals in retrieving patents that satisfy their information needs. Often, patent granting bodies require patents to be partially translated into one or more major ...
Expanding queries with term and phrase translations in patent retrieval
IRFC'11: Proceedings of the Second international conference on Multidisciplinary information retrieval facilityPatent retrieval is a branch of Information Retrieval (IR) that aims to enable the challenging task of retrieving highly technical and often complicated patents. Typically, patent granting bodies translate patents into several major foreign languages, ...
Comments