skip to main content
10.1145/1076034.1076123acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Iterative translation disambiguation for cross-language information retrieval

Published: 15 August 2005 Publication History

Abstract

Finding a proper distribution of translation probabilities is one of the most important factors impacting the effectiveness of a cross-language information retrieval system. In this paper we present a new approach that computes translation probabilities for a given query by using only a bilingual dictionary and a monolingual corpus in the target language. The algorithm combines term association measures with an iterative machine learning approach based on expectation maximization. Our approach considers only pairs of translation candidates and is therefore less sensitive to data-sparseness issues than approaches using higher n-grams. The learned translation probabilities are used as query term weights and integrated into a vector-space retrieval system. Results for English-German cross-lingual retrieval show substantial improvements over a baseline using dictionary lookup without term weighting.

References

[1]
M. Adriani. Using statistical term similarity for sense disambiguation in cross-language information retrieval. Information Retrieval, 2(1):69--80, 2000.
[2]
C. Buckley, A. Singhal, and M. Mitra. New retrieval approaches using SMART: TREC 4. In D. Harman, editor, Proceedings of the Fourth Text REtrieval Conference (TREC-4), pages 25--48. NIST Special Publication 500--236, 1995.
[3]
A. Chen and F. C. Gey. Combining query translation and document translation in cross-language retrieval. In Proceedings of the 4th Workshop of the Cross-Language Evaluation Forum (CLEF 2003), 2003.
[4]
S. F. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th conference on Association for Computational Linguistics, pages 310--318, 1996.
[5]
K. Church and P. Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 1 (1):22--29, 1990.
[6]
A. Davison and D. Hinkley. Bootstrap Methods and Their Application. Cambridge University Press, 1997.
[7]
A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39(1):1--38, 1977.
[8]
T. Dunning. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1):61--74, 1993.
[9]
B. Efron. Bootstrap methods: Another look at the jackknife. Annals of Statistics, 7(1):1--26, 1979.
[10]
C. Fellbaum, editor. WordNet: An Electronical Lexical Database. MIT Press, 1998.
[11]
J. Gao, J.-Y. Nie, H. He, W. Chen, and M. Zhou. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependency relations. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 183--190, 2002.
[12]
V. Hollink, J. Kamps, C. Monz, and M. de Rijke. Monolingual document retrieval for European languages. Information Retrieval, 7:33--52, 2004.
[13]
M.-G. Jang, S. H. Myaeng, and S. Y. Park. Using mutual information to resolve query translation ambiguities and query term weighting. In Proceedings of 37th Annual Meeting of the Association for Computational Linguistics, pages 223--229, 1999.
[14]
F. Keller and M. Lapata. Using the web to obtain frequencies for unseen bigrams. Computational Linguistics, 29(3):459--484, 2003.
[15]
G. Kikui. Term-list translation using mono-lingual word co-occurrence vectors. In Proceedings of the 17th International Conference on Computational Linguistics(COLING 98), pages 670--674, 1998.
[16]
L. Kitchens. Exploring Statistics: A Modern Introduction to Data Analysis and Inference. Brooks/Cole Publishing Company, 2nd edition, 1998.
[17]
A. Maeda, F. Sadat, M. Yoshikawa, and S. Uemura. Query term disambiguation for web cross-language information retrieval using a search engine. In IRAL '00: Proceedings of the 5th International Workshop on on Information Retrieval with Asian Languages, pages 25--32. ACM Press, 2000.
[18]
C. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. M.I.T. Press, 1999.
[19]
C. Monz and M. de Rijke. Shallow morphological analysis in monolingual information retrieval for Dutch, German and Italian. In C. Peters, M. Braschler, J. Gonzalo, and M. Kluck, editors, Proceedings of the 2nd Workshop of the Cross-Language Evaluation Forum (CLEF 2001), LNCS 2406, pages 262--277. Springer Verlag, 2002.
[20]
F.-J. Och and H. Ney. The alignment template approach to statistical machine translation. Computational Linguitics, 30(4):417--449, 2004.
[21]
L. Page, S. Brin, R. Motwani, and T. Winograd. The Page Rank citation ranking: Bringing order to the web. Technical Report SIDL-WP-1999-0120, Stanford Digital Library, 1999.
[22]
A. Pirkola. The effects of query structure and dictionary setups in dictionary-based cross-language retrieval. In B. Crof, A. Moffat, C. van Rijsbergen, R. Wilkinson, and J. Zobel, editors, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 55--63, 1998.
[23]
M. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
[24]
J. Savoy. Statistical inference in retrieval effectiveness evaluation. Information Processing and Management, 33(4):495--512, 1997.
[25]
H. Schmid. Probabilistic part-of-speech tagging using decision trees. In Proceedings of International Conference on New Methods in Language Processing, 1994.
[26]
C. van Rijsbergen. Information Retrieval. Butterworths, 2nd edition, 1979.
[27]
A. Venugopal, S. Vogel, and A. Waibel. Effective phrase translation extraction from alignment models. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-2003), pages 319--326, 2003.
[28]
J. Wilbur. Non-parametric significance tests of retrieval performance comparisons. Journal of Information Science, 20(4):270--284, 1994.

Cited By

View all
  • (2024)State of the Art and Potentialities of Graph-level LearningACM Computing Surveys10.1145/3695863Online publication date: 12-Sep-2024
  • (2022)Cross-Lingual Product Retrieval in E-Commerce SearchAdvances in Knowledge Discovery and Data Mining10.1007/978-3-031-05936-0_36(458-471)Online publication date: 16-May-2022
  • (2021)Developing a Cross-lingual Semantic Word Similarity Corpus for English–Urdu Language PairACM Transactions on Asian and Low-Resource Language Information Processing10.1145/347261821:2(1-16)Online publication date: 18-Nov-2021
  • Show More Cited By

Index Terms

  1. Iterative translation disambiguation for cross-language information retrieval

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
      August 2005
      708 pages
      ISBN:1595930345
      DOI:10.1145/1076034
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 August 2005

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cross-language retrieval
      2. query formulation
      3. term co-occurrence measures
      4. term weighting
      5. translation disambiguation

      Qualifiers

      • Article

      Conference

      SIGIR05
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 16 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)State of the Art and Potentialities of Graph-level LearningACM Computing Surveys10.1145/3695863Online publication date: 12-Sep-2024
      • (2022)Cross-Lingual Product Retrieval in E-Commerce SearchAdvances in Knowledge Discovery and Data Mining10.1007/978-3-031-05936-0_36(458-471)Online publication date: 16-May-2022
      • (2021)Developing a Cross-lingual Semantic Word Similarity Corpus for English–Urdu Language PairACM Transactions on Asian and Low-Resource Language Information Processing10.1145/347261821:2(1-16)Online publication date: 18-Nov-2021
      • (2021)Semantic morphological variant selection and translation disambiguation for cross-lingual information retrievalMultimedia Tools and Applications10.1007/s11042-021-11074-w82:6(8197-8212)Online publication date: 11-Jun-2021
      • (2019)A survey of semantic relatedness evaluation datasets and proceduresArtificial Intelligence Review10.1007/s10462-019-09796-3Online publication date: 23-Dec-2019
      • (2017)Dimension Projection Among Languages Based on Pseudo-Relevant Documents for Query TranslationAdvances in Information Retrieval10.1007/978-3-319-56608-5_39(493-499)Online publication date: 8-Apr-2017
      • (2016)SS4MCT: A Statistical Stemmer for Morphologically Complex TextsExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-319-44564-9_16(201-207)Online publication date: 23-Aug-2016
      • (2015)A parallel cross-language retrieval system for patent documents2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)10.1109/ICSESS.2015.7339147(672-676)Online publication date: Sep-2015
      • (2014)Learning to rank for determining relevant document in Indonesian-English cross language information retrieval using BM252014 International Conference on Advanced Computer Science and Information System10.1109/ICACSIS.2014.7065896(309-314)Online publication date: Oct-2014
      • (2013)Finding synonyms and other semantically-similar terms from coselection dataProceedings of the First Australasian Web Conference - Volume 14410.5555/2527208.2527213(35-42)Online publication date: 29-Jan-2013
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media