skip to main content
10.1145/1321440.1321491acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Extending query translation to cross-language query expansion with markov chain models

Published: 06 November 2007 Publication History

Abstract

Dictionary-based approaches to query translation have been widely used in Cross-Language Information Retrieval (CLIR) experiments. However, translation has been not only limited by the coverage of the dictionary, but also affected by translation ambiguities. In this paper we propose a novel method of query translation that combines other types of term relation to complement the dictionary-based translation. This allows extending the literal query translation to related words, which produce a beneficial effect of query expansion in CLIR. In this paper, we model query translation by Markov Chains (MC), where query translation is viewed as a process of expanding query terms to their semantically similar terms in a different language. In MC, terms and their relationships are modeled as a directed graph, and query translation is performed as a random walk in the graph, which propagates probabilities to related terms. This framework allows us to incorporating different types of term relation, either between two languages or within the source or target languages. In addition, the iterative training process of MC allows us to attribute higher probabilities to the target terms more related to the original query, thus offers a solution to the translation ambiguity problem. We evaluated our method on three CLIR benchmark collections, and obtained significant improvements over traditional dictionary-based approaches.

References

[1]
Ballesteros, L. and Croft, W. B. (1998). Resolving ambiguity for cross-language retrieval. In Proceedings of ACM SIGIR. pp. 64--71.
[2]
Brémaud, P. (1999) Markov chains: Gibbs fields, Monte Carlo simulations, and queues. Springer-Verlag.
[3]
Brown, P., Della Pietra, S., Della Pietra, V., and Mercer, R. (1993). The Mathematics of Statistical Machine Translation. Computational Linguistics, 19(2): 243--311.
[4]
Collins-Thompson, K, and Callan, J. (2005). Query Expansion Using Random Walk Models. In Proceedings of CIKM, pp.704--711.
[5]
Davis, M. W., and Ogden, W. C. (1997). Free resources and advanced alignment for cross-language text retrieval. In the Proceedings of TREC6. NIST, Gaithersburg, MD.
[6]
Dempster, A., Laird, N., and Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39: 1--38.
[7]
Diligenti, M., Gori, M., and Maggini, M. (2005). Learning web page scores by error back-propagation. In the Proceedings of IJCAI. pp. 684--689.
[8]
Dunning, T. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics. 19: 61--74.
[9]
Gao, J., Qi, H., Xia, X., and Nie, J.-Y. (2005). Linear discriminative model for information retrieval. In Proceedings of ACM SIGIR, pp. 290--297
[10]
Gao, J. F., Nie, J. Y. (2006). A Study of Statistical Models for Query Translation: Find a Good Unit of Translation. In Proceedings of ACM SIGIR, pp. 194--201
[11]
Gao, J. F., Nie, J. Y., Xun, E. D., Zhang, J., Zhou, M., and Huang, C. N. (2001). Improving query translation for cross language information retrieval using statistical models. In Proceedings of ACM SIGIR, pp. 96--104.
[12]
Grefenstette, G. (1999). The World Wide Web as a resource for example-based machine translation tasks, In Proc. ASLIB translating and the computer 21 conference.
[13]
He, H. Z, Gao, J .F. (2001). NTCIR-3 CLIR Experiments at MSRA In the Proceedings of NTCIR3.
[14]
Hedlund, T., Airio, E., Keskustalo, H. Pirkola, A., Jarvelin, K. (2004) Dictionary-based Cross Language Information Retrieval: Learning Experiences from CLEF 2000-2002. Information Retrieval, 7: 99--119.
[15]
Hull, D. and Grefenstette, G. (1996). Querying across languages: A dictionary-based approach to multilingual information retrieval. In Proceedings of ACM SIGIR, pp.49--57.
[16]
Kraaij, W., Nie, J. Y., and Simard, M. (2003). Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval. Computational Linguistics, 29(3): 381--420.
[17]
Kurland, O., and Lee, L. (2005). Pagerank without hyperlinks: Structural re-ranking using links induced by language models. In Proceedings of ACM SIGIR. pp. 306--313
[18]
Kwok, K. L. (2000). Exploiting a Chinese-English bilingual wordlist for English-Chinese cross language information retrieval. In the Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages, IRAL-2000. pp. 173--179.
[19]
Lafferty, J. and Zhai, C. (2001). Document language models, query models, and risk minimization for information retrieval. In Proceedings of ACM SIGIR, pp. 111--119.
[20]
McNamee, P. and Mayfield, J. (2002). Comparing Cross-Language Query Expansion Techniques by Degrading Translation Resources. In the Proceedings of ACM SIGIR, pp. 159--166.
[21]
Minkov, E., Cohen, W., and Ng, A. (2006). A Graphical Framework for Contextual Search and Name Disambiguation in Email. In the Proceedings of ACM SIGIR, pp. 27--34.
[22]
Monz, C., and Dorr, B., (2005). Iterative translation disambiguation for cross-language information retrieval. In the Proceedings of ACM SIGIR, pp. 520--527.
[23]
Morgan, W, Strohman, T., and Henderson, J. (2004). Dicrect maximization of average precision by hill-climbing with a comparison to a maximum entropy approach. Technical report. MITRE.
[24]
Och, F., and Ney, H. (2000). Improved statistical alignment models. In Proceedings of ACL. pp. 440--447.
[25]
Och, F. J. (2003). Minimum error rate training in statistical machine translation. In Proceedings of ACL. pp. 160--67
[26]
Ogilvie, P. and Callan, J. (2001). Experiments using the lemur toolkit. In the Proceedings of TREC-10, pp.103--108.
[27]
Page, L., Brin, S., Motwani, R., and Winograd, T. (1998). The PageRank citation ranking: Bringing order to the web. Technical Report, Computer Science department, Stanford University.
[28]
Qiu, Y. G., and Frei, H. P. (1993). Concept query expansion. In the Proceedings of ACM SIGIR, pp. 160--169.
[29]
Toutanova, K., Manning, C. and Ng, A. (2004). Learning Random Walk Models for Inducing Word Dependency Distributions. In the Proceedings of the 21st International Machine Learning Conference.
[30]
Wang, J. and Oard, D. (2006). Combining Bidirectional Translation and Synonymy for Cross-Language Information Retrieval. In the Proceedings of ACM SIGIR. pp. 202--209.
[31]
Xu, J., and Weischedel, R. (2000). Cross-lingual information retrieval using Hidden Markov models. In the Proceedings of SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. pp. 95--103.
[32]
Xu, J. X., and Croft, B. (1996). Query expansion using local and global document analysis. In the Proceedings of ACM SIGIR. pp. 4--11.
[33]
Xu, J. X., Weischedel, R. (2005). Empirical studies on the impact of lexical resources on CLIR performance. Information Processing and Management. 41: 475--487.
[34]
Xu, J. X., Weischedel, R., and Nguyen, C. (2001). Evaluating a Probabilistic Model for Cross-lingual Information Retrieval. In Proceedings of ACM SIGIR, pp. 105--110.

Cited By

View all
  • (2024)A Survey of Source Code Search: A 3-Dimensional PerspectiveACM Transactions on Software Engineering and Methodology10.1145/365634133:6(1-51)Online publication date: 28-Jun-2024
  • (2021)Web Object Ranking for Location-Based Web Object SearchAdvances in Smart Communication and Imaging Systems10.1007/978-981-15-9938-5_16(151-165)Online publication date: 14-Apr-2021
  • (2018)Query translation based on visual information2018 Tenth International Conference on Advanced Computational Intelligence (ICACI)10.1109/ICACI.2018.8377521(563-567)Online publication date: Mar-2018
  • Show More Cited By

Index Terms

  1. Extending query translation to cross-language query expansion with markov chain models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
    November 2007
    1048 pages
    ISBN:9781595938039
    DOI:10.1145/1321440
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cross-language information retrieval
    2. markov chain
    3. query expansion
    4. query translation
    5. random walk

    Qualifiers

    • Research-article

    Conference

    CIKM07

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Survey of Source Code Search: A 3-Dimensional PerspectiveACM Transactions on Software Engineering and Methodology10.1145/365634133:6(1-51)Online publication date: 28-Jun-2024
    • (2021)Web Object Ranking for Location-Based Web Object SearchAdvances in Smart Communication and Imaging Systems10.1007/978-981-15-9938-5_16(151-165)Online publication date: 14-Apr-2021
    • (2018)Query translation based on visual information2018 Tenth International Conference on Advanced Computational Intelligence (ICACI)10.1109/ICACI.2018.8377521(563-567)Online publication date: Mar-2018
    • (2017)Query expansion based on term selection for Hindi – English cross lingual IRJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2017.09.002Online publication date: Sep-2017
    • (2016)Transfer Learning for Cross-Lingual Sentiment Classification with Weakly Shared Deep Neural NetworksProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2911490(245-254)Online publication date: 7-Jul-2016
    • (2016)A study of user profile representation for personalized cross-language information retrievalAslib Journal of Information Management10.1108/AJIM-06-2015-009168:4(448-477)Online publication date: 18-Jul-2016
    • (2015)A Pheromone-Like Model for Semantic Context Extraction from Collaborative NetworksProceedings of the 2015 IEEE / WIC / ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) - Volume 0110.1109/WI-IAT.2015.21(540-547)Online publication date: 6-Dec-2015
    • (2014)Semantic Heuristic Search in Collaborative NetworksProceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 0110.1109/WI-IAT.2014.27(141-148)Online publication date: 11-Aug-2014
    • (2014)Heuristic semantic walk for concept chaining in collaborative networksInternational Journal of Web Information Systems10.1108/IJWIS-11-2013-003110:1(85-103)Online publication date: 14-Apr-2014
    • (2014)Heuristics for Semantic Path Search in WikipediaProceedings of the 14th International Conference on Computational Science and Its Applications — ICCSA 2014 - Volume 858410.1007/978-3-319-09153-2_25(327-340)Online publication date: 30-Jun-2014
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media