skip to main content
10.1145/1008992.1009024acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Dependence language model for information retrieval

Published: 25 July 2004 Publication History

Abstract

This paper presents a new dependence language modeling approach to information retrieval. The approach extends the basic language modeling approach based on unigram by relaxing the independence assumption. We integrate the linkage of a query as a hidden variable, which expresses the term dependencies within the query as an acyclic, planar, undirected graph. We then assume that a query is generated from a document in two stages: the linkage is generated first, and then each term is generated in turn depending on other related terms according to the linkage. We also present a smoothing method for model parameter estimation and an approach to learning the linkage of a sentence in an unsupervised manner. The new approach is compared to the classical probabilistic retrieval model and the previously proposed language models with and without taking into account term dependencies. Results show that our model achieves substantial and significant improvements on TREC collections.

References

[1]
Buckley, D., Allan, J. and Salton, G. 1995. Automatic retrieval approaches using SMART: TREC-2. In: Information Processing and Management, 31, 315--326.
[2]
Charniak, Eugene. 2001. Immediate-head parsing for language models. In: ACL/EACL 2001, pp.124--131.
[3]
Chelba, Ciprian and Frederick Jelinek. 2000. Structured Language Modeling. In: Computer Speech and Language, Vol. 14, No. 4. pp 283--332.
[4]
Chelba, C, D. Engle, F. Jelinek, V. Jimenez, S. Khudanpur, L. Mangu, H. Printz, E. S. Ristad, R. Rosenfeld, A. Stolcke and D. Wu. 1997. Structure and performance of a dependency language model. In: Processing of Eurospeech, Vol. 5, pp 2775--2778.
[5]
Collins, Michael John. 1996. A new statistical parser based on bigram lexical dependencies. In: ACL 34, pp. 184--191.
[6]
Cooper. W. 1991. Some inconsistencies and misnomers in probabilistic information retrieval. In: SIGIR 1991, pp. 57--61.
[7]
Croft, W. B. 1986. Boolean queries and term dependencies in probabilistic retrieval models. In: JASIS, 37(2): 71--77.
[8]
Della Pietra, S., V. Della Pietra, J. Gillett, J. Lafferty, H. Printz and L. Ures. 1994. Inference and estimation of a long-range trigram model. Technical report CMU-CS- 94-188, Department of Computer Science, CMU.
[9]
Fuhr, N. 1992. Probabilistic models in information retrieval. In: The Computer Journal, 35(3): 243--255.
[10]
Harper, D. J. and C. J. van Rijsbergen. 1978. An evaluation of feedback in document retrieval using co-occurrence data. In: Journal of Documentation, 34: 189--216.
[11]
Gao, Jianfeng, Jian-Yun Nie, Hongzhao He, Weijun Chen, and Ming Zhou. 2002. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations. In: ACM SIGIR'02, pp 183--190.
[12]
Gao, Jianfeng and Hisami Suzuki. 2003. Unsupervised learning of dependency structure for language modeling. In: ACL 2003, pp. 521--528.
[13]
Harman, D. K. 1995. Overview of the fourth Text REtrieval Conference (TREC-4). In: TREC-4, pp 1--24.
[14]
Jelinek, Frederick. 1998. Statistical methods for speech recognition. The MIT Press, Cambridge, Massachusetts, London, England.
[15]
Katz, S. M. 1987. Estimation of probabilities from sparse data for other language component of a speech recognizer. In: IEEE transactions on Acoustics, Speech and Signal Processing, 35(3): 400--401.
[16]
Lewis, D. D. 1998. Naïve (Bayes) at forty: the independence assumption in information retrieval. In: EMCL 1998, pp. 4--15.
[17]
Losee, R. M. 1994. Term dependence: truncating the Bahadur Lazarsfeld expansion. In: Information Processing and Management, 30(2): 293--303.
[18]
Jones, K. S., S. Walker and S. Robertson. 1998. A probabilistic model of information retrieval: development and status. Technical Report TR-446, Cambridge University Computer Laboratory.
[19]
Katz, S. M. 1987. Estimation of probabilities from sparse data for other language component of a speech recognizer. In: IEEE transactions on Acoustics, Speech and Signal Processing, 35(3): 400--401.
[20]
Lafferty, J., Sleator, D. and Temperley, D. 1992. Grammatical trigrams: a probabilistic model of link grammar. In: Proc. of the 1992 AAAI Fall Symposium on Probabilistic Approaches to Natural Language.
[21]
Lafferty, John and Chengxiang Zhai. 2001. Document language models, query models, and risk minimization for information retrieval. In: SIGIR'01, pp. 111--119.
[22]
Miller, D. H., Leek, T. and Schwartz, R. 1999. A hidden Markov model information retrieval system. In: SIGIR'99, pp. 214--221.
[23]
Nallapati, R. and J. Allan. 2002. Capturing term dependencies using a language model based on sentence trees. In: CIKM'02, pp. 383--390.
[24]
Ponte, J. and W. B. Croft (1998). A language modeling approach to information retrieval, In: SIGIR'98, pp. 275--281.
[25]
Robertson, S. E. and S. Walker. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In: SIGIR 1994, pp. 232--241.
[26]
Robertson, S. E. and Walker, S. 2000. Microsoft Cambridge at TREC-9: Filtering track. In: TREC-9, pp. 361--368.
[27]
Song, F. and Croft, B. 1999. A general language model for information retrieval. In: CIKM'99, pp. 316--321.
[28]
Sparck Jones, K. 1998. What is the role of NLP in text retrieval? In: Naturnal language information retrieval (Ed. T. Strzalkowski), Dordrecht: Kluwer.
[29]
Srikanth, M. And Srikanth, R. 2002. Biterm language models for document retrieval. In: SIGIR 2002, pp. 425--426.
[30]
van Rijsbergen, C. J. 1977. A theoretical basis for the use of co-occurrence data in information retrieval. In: Journal of Documentation, 33(2): 106--119.
[31]
Xu, J. and Croft, W. B. 2000. Improving effectiveness of information retrieval with local context analysis. In: ACM Transactions on Information Systems, 18(1): 79--112.
[32]
Yuret, Deniz. 1998. Discovery of linguistic relations using lexical attraction. Ph.D. thesis, MIT, 1998.
[33]
Zhai, Chengxiang, and John Lafferty. 2001. Two-stage language models for information retrieval. In: SIGIR2002, pp. 49--56.

Cited By

View all
  • (2024)Conversational recommender based on graph sparsification and multi-hop attentionIntelligent Data Analysis10.3233/IDA-23014828:1(99-119)Online publication date: 3-Feb-2024
  • (2024)Automated Commit Message Generation With Large Language Models: An Empirical Study and BeyondIEEE Transactions on Software Engineering10.1109/TSE.2024.347831750:12(3208-3224)Online publication date: 1-Dec-2024
  • (2023)Information Retrieval: Recent Advances and BeyondIEEE Access10.1109/ACCESS.2023.329577611(76581-76604)Online publication date: 2023
  • Show More Cited By

Index Terms

  1. Dependence language model for information retrieval

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
    July 2004
    624 pages
    ISBN:1581138814
    DOI:10.1145/1008992
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dependence
    2. information retrieval
    3. language model
    4. parser

    Qualifiers

    • Article

    Conference

    SIGIR04
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)33
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Conversational recommender based on graph sparsification and multi-hop attentionIntelligent Data Analysis10.3233/IDA-23014828:1(99-119)Online publication date: 3-Feb-2024
    • (2024)Automated Commit Message Generation With Large Language Models: An Empirical Study and BeyondIEEE Transactions on Software Engineering10.1109/TSE.2024.347831750:12(3208-3224)Online publication date: 1-Dec-2024
    • (2023)Information Retrieval: Recent Advances and BeyondIEEE Access10.1109/ACCESS.2023.329577611(76581-76604)Online publication date: 2023
    • (2022)Early Stage Sparse Retrieval with Entity LinkingProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557588(4464-4469)Online publication date: 17-Oct-2022
    • (2022)Semantic Models for the First-Stage Retrieval: A Comprehensive ReviewACM Transactions on Information Systems10.1145/348625040:4(1-42)Online publication date: 24-Mar-2022
    • (2022)CASMSInformation and Software Technology10.1016/j.infsof.2022.106906147:COnline publication date: 1-Jul-2022
    • (2022)Methods for Domain Adaptation of Automated Systems for Aspect Annotation of Customer Review TextsHigh-Performance Computing Systems and Technologies in Scientific Research, Automation of Control and Production10.1007/978-3-030-94141-3_26(325-337)Online publication date: 17-Jan-2022
    • (2021)Deep Query Likelihood Model for Information RetrievalAdvances in Information Retrieval10.1007/978-3-030-72240-1_49(463-470)Online publication date: 30-Mar-2021
    • (2020)A Quantum Interference Inspired Neural Matching Model for Ad-hoc RetrievalProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401070(19-28)Online publication date: 25-Jul-2020
    • (2019)A novel model for phrase searching based-on Minimum Weighted Relocation ModelSignal and Data Processing10.29252/jsdp.15.4.7115:4(71-84)Online publication date: 1-Mar-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media