skip to main content
article

Extraction of coherent relevant passages using hidden Markov models

Published:01 July 2006Publication History
Skip Abstract Section

Abstract

In information retrieval, retrieving relevant passages, as opposed to whole documents, not only directly benefits the end user by filtering out the irrelevant information within a long relevant document, but also improves retrieval accuracy in general. A critical problem in passage retrieval is to extract coherent relevant passages accurately from a document, which we refer to as passage extraction. While much work has been done on passage retrieval, the passage extraction problem has not been seriously studied. Most existing work tends to rely on presegmenting documents into fixed-length passages which are unlikely optimal because the length of a relevant passage is presumably highly sensitive to both the query and document.In this article, we present a new method for accurately detecting coherent relevant passages of variable lengths using hidden Markov models (HMMs). The HMM-based method naturally captures the topical boundaries between passages relevant and nonrelevant to the query. Pseudo-feedback mechanisms can be naturally incorporated into such an HMM-based framework to improve parameter estimation. We show that with appropriate parameter estimation, the HMM method outperforms a number of strong baseline methods on two datasets. We further show how the HMM method can be applied on top of any basic passage extraction method to improve passage boundaries.

References

  1. Allan, J. 2003. Hard track overview in trec 2003: High accuracy retrieval from documents. In Proceedings of the 12th Text REtrieval Conference. 24--37.Google ScholarGoogle Scholar
  2. Callan, J. P. 1994. Passage-level evidence in document retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Dublin. Springer Verlag, New York, NY, 302--310. Google ScholarGoogle Scholar
  3. Clarke, C. L. A. and Terra, E. L. 2003. Passage retrieval vs. document retrieval for factoid question answering. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Toronto, Canada. ACM Press, New York, NY, 427--428. Google ScholarGoogle Scholar
  4. Conroy, J. and O'Leary, D. P. 2001. Text summarization via hidden Markov models and pivoted QR matrix decomposition. Tech. Rep., University of Maryland, College Park.Google ScholarGoogle Scholar
  5. Cormack, G. V., Clarke, C. L. A., Palmer, C. R., and To, S. S. L. 1998. Passage-based refinement (MultiText experiments for TREC-6). In Proceedings of the 6th Text REtrieval Conference. 303--320.Google ScholarGoogle Scholar
  6. Corrada-Emmanuel, A. and Croft, W. B. 2004. Answer models for question answering passage retrieval. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Sheffield, UK. ACM Press, New York, NY, 516--517. Google ScholarGoogle Scholar
  7. Denoyer, L. and Zaragoza, H. 2001. HMM-based passage models for document classification and ranking. In Proceedings of the 23rd BCS European Annual Colloquium on Information Retrieval.Google ScholarGoogle Scholar
  8. Freitag, D. and McCallum, A. 2000. Information extraction with HMM structure learned by stochastic optimization. In Proceedings of the 18th Conference on Artifitical Intelligence (AAAI). 584--589. Google ScholarGoogle Scholar
  9. Fung, P., Ngai, G., and Cheung, P. 2003. Combining optimal clustering and hidden Markov models for extractive summarization. In Proceedings of the ACL Workshop on Multilingual Summarization. Google ScholarGoogle Scholar
  10. He, D., Demner-Fushman, D., Oard, D. W., Karakos, D., and Khudanpur, S. 2004. Improving passage retrieval using interactive elicitation and statistical modeling. In Proceedings of the 13th Text REtrieval Conference.Google ScholarGoogle Scholar
  11. Hearst, M. A. 1997. TextTiling: Segmenting text into multi-paragraph subtopic passages. Comput. Linguist. 23, 1, 33--64. Google ScholarGoogle Scholar
  12. Jiang, J. and Zhai, C. 2004. UIUC in HARD 2004--Passage retrieval using HMMs. In Proceedings of the 13th Text REtrieval Conference.Google ScholarGoogle Scholar
  13. Kaszkiel, M. and Zobel, J. 1997. Passage retrieval revisited. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Philadelphia, PA. ACM Press, New York, NY, 178--185. Google ScholarGoogle Scholar
  14. Kaszkiel, M. and Zobel, J. 2001. Effective ranking with arbitrary passages. J. American Society Inf. Sci. 52, 4, 344--364. Google ScholarGoogle Scholar
  15. Knaus, D., Mittendorf, E., Schäuble, P., and Sheridan, P. 1996. Highlighting relevant passages for users of the interactive SPIDER retrieval system. In Proceedings of the 4th Text REtrieval Conference.Google ScholarGoogle Scholar
  16. Lavrenko, V. and Croft, W. B. 2001. Relevance-based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans, LA. ACM Press, New York, NY, 120--127. Google ScholarGoogle Scholar
  17. Liu, X. and Croft, W. B. 2002. Passage retrieval based on language models. In Proceedings of the 11th International Conference on Information and Knowledge Management. McLean, VA. ACM Press, New York, NY, 375--382. Google ScholarGoogle Scholar
  18. Mittendorf, E. and Schäuble, P. 1994. Document and passage retrieval based on hidden Markov models. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Dublin. Springer Verlag, New York, 318--327. Google ScholarGoogle Scholar
  19. Rabiner, L. R. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257--286.Google ScholarGoogle Scholar
  20. Salton, G., Allan, J., and Buckley, C. 1993. Approaches to passage retrieval in full text information systems. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Pittsburgh, PA. ACM Press, New York, NY, 49--58. Google ScholarGoogle Scholar
  21. Tellex, S., Katz, B., Lin, J., Fernandes, A., and Marton, G. 2003. Quantitative evaluation of passage retrieval algorithms for question answering. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Toronto, Canada. ACM Press, New York, NY, 41--47. Google ScholarGoogle Scholar
  22. Zajic, D., Dorr, B., and Schwartz, R. 2005. Headline generation for written and broadcast news. Tech. Rep. LAMP-TR-120, CS-TR-4698, UMIACS-TR-2005-07, University of Maryland, College Park.Google ScholarGoogle Scholar
  23. Zhai, C. and Lafferty, J. 2001a. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the 10th International Conference on Information and Knowledge Management. Atlanta, GA. ACM Press, New York, NY, 403--410. Google ScholarGoogle Scholar
  24. Zhai, C. and Lafferty, J. 2001b. A study of smoothing methods for language models applied to ad hoc information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New Orleans, LA, 334--342. Google ScholarGoogle Scholar

Index Terms

  1. Extraction of coherent relevant passages using hidden Markov models

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Information Systems
      ACM Transactions on Information Systems  Volume 24, Issue 3
      July 2006
      110 pages
      ISSN:1046-8188
      EISSN:1558-2868
      DOI:10.1145/1165774
      Issue’s Table of Contents

      Copyright © 2006 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 July 2006
      Published in tois Volume 24, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader