skip to main content
10.1145/2065003.2065019acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Utilizing sub-topical structure of documents for information retrieval

Published:28 October 2011Publication History

ABSTRACT

Text segmentation in natural language processing typically refers to the process of decomposing a document into constituent subtopics. Our work centers on the application of text segmentation techniques within information retrieval (IR) tasks. For example, for scoring a document by combining the retrieval scores of its constituent segments, exploiting the proximity of query terms in documents for ad-hoc search, and for question answering (QA), where retrieved passages from multiple documents are aggregated and presented as a single document to a searcher. Feedback in ad-hoc IR task is shown to benefit from the use of extracted sentences instead of terms from the pseudo relevant documents for query expansion. Retrieval effectiveness for patent prior art search task is enhanced by applying text segmentation to the patent queries. Another aspect of our work involves augmenting text segmentation techniques to produce segments which are more readable with less unresolved anaphora. This is particularly useful for QA and snippet generation tasks where the objective is to aggregate relevant and novel information from multiple documents satisfying user information need on one hand, and ensuring that the automatically generated content presented to the user is easily readable without reference to the original source document.

References

  1. Bates and M. J. The Design of Browsing and Berrypicking Techniques for the Online Search Interface. Online Review, 13(5):407--424, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  2. F. Y. Y. Choi. Advances in domain independent linear text segmentation. In Proceedings of the NAACL 2000, pages 26--33, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Ganguly, J. Leveling, and G. J. F. Jones. Query expansion for language modeling using sentence similarities. In Proceedings of the IRFC 2011, pages 62--77, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Ganguly, J. Leveling, and G. J. F. Jones. Simulation of within-session query variations using a text segmentation approach. In Proceedings of the CLEF 2011. (To appear). Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Ganguly, J. Leveling, and G. J. F. Jones. United we fall, divided we stand: A study of query segmentation and PRF for patent prior art search. In Proceedings of the 4th International Workshop on Patent Information Retrieval, PAIR'11. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Ganguly, J. Leveling, W. Magdy, and G. J. F. Jones. Patent query reduction using pseudo relevance feedback. In Proceedings of CIKM 2011. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Hearst and C. Plaunt. Subtopic structuring for full-length document access. In SIGIR '93, pages 59--68. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. A. Hearst. Multi-paragraph segmentation of expository text. In ACL, ACL '94, pages 9--16, Stroudsburg, PA, USA, 1994. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Kishida. Experiment on pseudo relevance feedback method using taylor formula at NTCIR-3 patent retrieval task. In NTCIR-3, 2003.Google ScholarGoogle Scholar
  10. A. M. Lam-Adesina and G. J. F. Jones. Applying summarization techniques for term selection in relevance feedback. In Proceedings of SIGIR 2001, pages 1--9. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. V. Lavrenko and B. W. Croft. Relevance based language models. In SIGIR 2001, pages 120--127. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. Magdy, J. Leveling, and G. J. F. Jones. Exploring structured documents and query formulation techniques for patent retrieval. In 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, pages 410--417, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. Magdy, P. Lopez, and G. J. F. Jones. Simple vs. sophisticated approaches for patent prior-art search. In ECIR, pages 725--728, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. I. Malioutov and R. Barzilay. Minimum cut model for spoken lecture segmentation. In In Proceedings of the COLING-ACL 2006, pages 25--32, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Mitra, A. Singhal, and C. Buckley. Improving automatic query expansion. In SIGIR 1998, pages 206--214. ACM, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Moffat, R. Sacks-Davis, R. Wilkinson, and J. Zobel. Retrieval of partial documents. In TREC, pages 181--190, 1993.Google ScholarGoogle Scholar
  17. V. Moriceau, E. SanJuan, X. Tannier, and P. Bellot. Overview of the 2009 QA track: Towards a common task for QA, focused IR and automatic summarization systems. In Focused Retrieval and Evaluation, INEX-2009, pages 355--365, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. M. Ponte. A language modeling approach to information retrieval. PhD thesis, University of Massachusetts, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. C. Reynar. Statistical models for topic segmentation. In Proceedings of the ACL-99, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. SanJuan, V. Moriceau, and X. Tannier. Overview of the INEX 2010 question answering track (QA@INEX). In Comparative Evaluation of Focused Retrieval, INEX 2010, 2010, (To appear). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Takuechi, N. Uramoto, and K. Takeda. Experiments on patent retrieval at NTCIR-5 workshop. In NTCIR-5, 2005.Google ScholarGoogle Scholar
  22. E. M. Voorhees. Overview of the TREC 2003 question answering track. pages 54--68, 2003.Google ScholarGoogle Scholar
  23. R. Wilkinson, J. Zobel, and R. Sacks-Davis. Similarity measures for short queries. In In Fourth Text REtrieval Conference (TREC-4), pages 277--285, 1995.Google ScholarGoogle Scholar
  24. J. Xu and W. B. Croft. Query expansion using local and global document analysis. In SIGIR 1996, pages 4--11. ACM, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Utilizing sub-topical structure of documents for information retrieval

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PIKM '11: Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
            October 2011
            100 pages
            ISBN:9781450309530
            DOI:10.1145/2065003

            Copyright © 2011 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 28 October 2011

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • poster

            Acceptance Rates

            Overall Acceptance Rate25of62submissions,40%

            Upcoming Conference

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader