poster

Utilizing sub-topical structure of documents for information retrieval

Authors:
Debasis Ganguly

Dublin City University, Dublin, Ireland

Dublin City University, Dublin, Ireland
View Profile

,
Johannes Leveling

Dublin City University, Dublin, Ireland

Dublin City University, Dublin, Ireland
View Profile

,
Gareth J.F. Jones

Dublin City University, Dublin, Ireland

Dublin City University, Dublin, Ireland
View Profile

PIKM '11: Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge managementOctober 2011Pages 75–78https://doi.org/10.1145/2065003.2065019

Published:28 October 2011Publication History

PIKM '11: Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management

Pages 75–78

ABSTRACT

Text segmentation in natural language processing typically refers to the process of decomposing a document into constituent subtopics. Our work centers on the application of text segmentation techniques within information retrieval (IR) tasks. For example, for scoring a document by combining the retrieval scores of its constituent segments, exploiting the proximity of query terms in documents for ad-hoc search, and for question answering (QA), where retrieved passages from multiple documents are aggregated and presented as a single document to a searcher. Feedback in ad-hoc IR task is shown to benefit from the use of extracted sentences instead of terms from the pseudo relevant documents for query expansion. Retrieval effectiveness for patent prior art search task is enhanced by applying text segmentation to the patent queries. Another aspect of our work involves augmenting text segmentation techniques to produce segments which are more readable with less unresolved anaphora. This is particularly useful for QA and snippet generation tasks where the objective is to aggregate relevant and novel information from multiple documents satisfying user information need on one hand, and ensuring that the automatically generated content presented to the user is easily readable without reference to the original source document.

References

Bates and M. J. The Design of Browsing and Berrypicking Techniques for the Online Search Interface. Online Review, 13(5):407--424, 1989.Google ScholarCross Ref
F. Y. Y. Choi. Advances in domain independent linear text segmentation. In Proceedings of the NAACL 2000, pages 26--33, 2000. Google ScholarDigital Library
D. Ganguly, J. Leveling, and G. J. F. Jones. Query expansion for language modeling using sentence similarities. In Proceedings of the IRFC 2011, pages 62--77, 2011. Google ScholarDigital Library
D. Ganguly, J. Leveling, and G. J. F. Jones. Simulation of within-session query variations using a text segmentation approach. In Proceedings of the CLEF 2011. (To appear). Springer, 2011. Google ScholarDigital Library
D. Ganguly, J. Leveling, and G. J. F. Jones. United we fall, divided we stand: A study of query segmentation and PRF for patent prior art search. In Proceedings of the 4th International Workshop on Patent Information Retrieval, PAIR'11. ACM, 2011. Google ScholarDigital Library
D. Ganguly, J. Leveling, W. Magdy, and G. J. F. Jones. Patent query reduction using pseudo relevance feedback. In Proceedings of CIKM 2011. ACM, 2011. Google ScholarDigital Library
M. Hearst and C. Plaunt. Subtopic structuring for full-length document access. In SIGIR '93, pages 59--68. ACM, 1993. Google ScholarDigital Library
M. A. Hearst. Multi-paragraph segmentation of expository text. In ACL, ACL '94, pages 9--16, Stroudsburg, PA, USA, 1994. ACM. Google ScholarDigital Library
K. Kishida. Experiment on pseudo relevance feedback method using taylor formula at NTCIR-3 patent retrieval task. In NTCIR-3, 2003.Google Scholar
A. M. Lam-Adesina and G. J. F. Jones. Applying summarization techniques for term selection in relevance feedback. In Proceedings of SIGIR 2001, pages 1--9. ACM, 2001. Google ScholarDigital Library
V. Lavrenko and B. W. Croft. Relevance based language models. In SIGIR 2001, pages 120--127. ACM, 2001. Google ScholarDigital Library
W. Magdy, J. Leveling, and G. J. F. Jones. Exploring structured documents and query formulation techniques for patent retrieval. In 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, pages 410--417, 2010. Google ScholarDigital Library
W. Magdy, P. Lopez, and G. J. F. Jones. Simple vs. sophisticated approaches for patent prior-art search. In ECIR, pages 725--728, 2011. Google ScholarDigital Library
I. Malioutov and R. Barzilay. Minimum cut model for spoken lecture segmentation. In In Proceedings of the COLING-ACL 2006, pages 25--32, 2006. Google ScholarDigital Library
M. Mitra, A. Singhal, and C. Buckley. Improving automatic query expansion. In SIGIR 1998, pages 206--214. ACM, 1998. Google ScholarDigital Library
A. Moffat, R. Sacks-Davis, R. Wilkinson, and J. Zobel. Retrieval of partial documents. In TREC, pages 181--190, 1993.Google Scholar
V. Moriceau, E. SanJuan, X. Tannier, and P. Bellot. Overview of the 2009 QA track: Towards a common task for QA, focused IR and automatic summarization systems. In Focused Retrieval and Evaluation, INEX-2009, pages 355--365, 2009. Google ScholarDigital Library
J. M. Ponte. A language modeling approach to information retrieval. PhD thesis, University of Massachusetts, 1998. Google ScholarDigital Library
J. C. Reynar. Statistical models for topic segmentation. In Proceedings of the ACL-99, 1999. Google ScholarDigital Library
E. SanJuan, V. Moriceau, and X. Tannier. Overview of the INEX 2010 question answering track (QA@INEX). In Comparative Evaluation of Focused Retrieval, INEX 2010, 2010, (To appear). Google ScholarDigital Library
H. Takuechi, N. Uramoto, and K. Takeda. Experiments on patent retrieval at NTCIR-5 workshop. In NTCIR-5, 2005.Google Scholar
E. M. Voorhees. Overview of the TREC 2003 question answering track. pages 54--68, 2003.Google Scholar
R. Wilkinson, J. Zobel, and R. Sacks-Davis. Similarity measures for short queries. In In Fourth Text REtrieval Conference (TREC-4), pages 277--285, 1995.Google Scholar
J. Xu and W. B. Croft. Query expansion using local and global document analysis. In SIGIR 1996, pages 4--11. ACM, 1996. Google ScholarDigital Library

Index Terms

Utilizing sub-topical structure of documents for information retrieval
1. Information systems
  1. Information retrieval
2. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Abstraction

Recommendations

Information Retrieval System for XML Documents
DEXA '02: Proceedings of the 13th International Conference on Database and Expert Systems Applications

In the research field of document information retrieval, the unit of retrieval results returned by IR systems is a whole document or a document fragment, like a paragraph in passage retrieval. IR systems based on the vector space model compute feature ...
Read More
Information retrieval and structured documents
Lectures on information retrieval

Standard Information Retrieval considers documents as atomic units of information that are indexed and retrieved as a whole. Modern evolution of document design and storage have since a long time introduced more elaborate representations of documents; ...
Read More
Utilizing sub-topical structure of documents for information retrieval
FDIA'11: Proceedings of the Fourth BCS-IRSG conference on Future Directions in Information Access

Recent years have witnessed an upsurge in the quantity of news, encyclopedic articles, blogs, forum and social networking posts etc. over the web. Some of these, such as the news and Wikipedia articles are carefully authored, edited and quality ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PIKM '11: Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
October 2011
100 pages
ISBN:9781450309530
DOI:10.1145/2065003
Program Chairs:
Anisoara Nica
Sybase, An SAP Company, Canada
,
Fabian M. Suchanek
INRIA, France
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
document segmentation
query segmentation
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate25of62submissions,40%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 96
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Utilizing sub-topical structure of documents for information retrieval

PIKM '11: Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Information Retrieval System for XML Documents

Information retrieval and structured documents

Utilizing sub-topical structure of documents for information retrieval