Article

Generic text summarization using relevance measure and latent semantic analysis

Authors:
Yihong Gong

NEC USA, San Jose, CA

NEC USA, San Jose, CA
View Profile

,
Xin Liu

NEC USA, San Jose, CA

NEC USA, San Jose, CA
View Profile

SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalSeptember 2001Pages 19–25https://doi.org/10.1145/383952.383955

Published:01 September 2001Publication History

SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 19–25

ABSTRACT

In this paper, we propose two generic text summarization methods that create text summaries by ranking and extracting sentences from the original documents. The first method uses standard IR methods to rank sentence relevances, while the second method uses the latent semantic analysis technique to identify semantically important sentences, for summary creations. Both methods strive to select sentences that are highly ranked and different from each other. This is an attempt to create a summary with a wider coverage of the document's main content and less redundancy. Performance evaluations on the two summarization methods are conducted by comparing their summarization outputs with the manual summaries generated by three independent human evaluators. The evaluations also study the influence of different VSM weighting schemes on the text summarization performances. Finally, the causes of the large disparities in the evaluators' manual summarization results are investigated, and discussions on human text summarization patterns are presented.

References

1.M. Sanderson, "Accurate user directed summarization from existing tools," in Proceedings of the 7'th International Conference on Information and Knowledge Management (CIKM98), 1998. Google ScholarDigital Library
2.B. Baldwin and T. Morton, "Dynamic coreference-based summarization," in Proceedings of the Third Conference on Empirical Methods in Natural Language Processing (EMNLP3), (Granada, Spain), June 1998.Google Scholar
3.R. Barzilay and M. Elhadad, "Using lexical chains for text summarization," in Proceedings of the Workshop on Intelligent Scalable Text Summarization, (Madrid, Spain), Aug. 1997.Google Scholar
4.C. Buckley and et al., "The smart/empire tipster ir system," in Proceedings of TIPSTER Phase III Workshop, 1999.Google Scholar
5.J. Goldstain, M. kantrowitz, V. Mittal, and J. Carbonell, "Summarizing text documents: Sentence selection and evaluation metrics," in Proceedings of ACM SIGIR'99, (Berkeley, CA), Aug. 1999. Google ScholarDigital Library
6.E. Hovy and C. Lin, "Automated text summarization in summarist," in Proceedings of the TIPSTER Workshop, (Baltimore, MD), 1998.Google Scholar
7.http://www.SRA.com.Google Scholar
8.W. Press and et al., Numerical Recipes in C: The Art of Scientific Computing. Cambridge, England: Cambridge University Press, 2 ed., 1992. Google ScholarDigital Library
9.S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman, "Indexing by latent semantic analysis," Journal of the American Society for Information Science, vol. 41, pp. 391-407, 1990.Google ScholarCross Ref
10.M. Berry, S. Dumais, and G. O'Brien, "Using linear algebra for intelligent information retrieval," Tech. Rep. UT-CS-94-270, University ofTennessess, Computer Science Department, Dec. 1994. Google ScholarDigital Library
11.T. Firmin and B. Sundheim, "Tipster/summac summarization analysis participant results," in TIPSTER Text Phase III Workshop, 1998.Google Scholar

Index Terms

Generic text summarization using relevance measure and latent semantic analysis

Recommendations

Text summarization using topic-based vector space model and semantic measure
Abstract
The primary shortcoming associated with extractive text summarization is redundancy, where more than one sentence representing a similar type of information are incorporated in summary. In the last two decades, a lot of extractive text ...
Read More
Text summarization of Turkish texts using latent semantic analysis
COLING '10: Proceedings of the 23rd International Conference on Computational Linguistics

Text summarization solves the problem of extracting important information from huge amount of text data. There are various methods in the literature that aim to find out well-formed summaries. One of the most commonly used methods is the Latent Semantic ...
Read More
Semantic analysis for focused multi-document summarization (fMDS) of text
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing

Excess amounts of unstructured data are easily accessible in digital format quickly, yet there is no way for a human reader to easily 'ingest and digest' as quickly. This information overload places too heavy a burden on society for its analysis and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
September 2001
454 pages
ISBN:1581133316
DOI:10.1145/383952
Chairmen:
Donald H. Kraft
Louisiana State Univ.
,
W. Bruce Croft
University of Massachusetts, (For the Americas)
,
David J. Harper
The Robert Gordon University, (For Europe and Africa)
,
Justin Zobel
RMIT University, (For Asia and Australasia)
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
generic text summarization
relevance measure
semantic analysis
Qualifiers
- Article
Conference

Acceptance Rates
SIGIR '01 Paper Acceptance Rate47of201submissions,23%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 465
  Total Citations
  View Citations
- 4,014
  Total Downloads
- Downloads (Last 12 months)103
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Generic text summarization using relevance measure and latent semantic analysis

SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Text summarization using topic-based vector space model and semantic measure

Text summarization of Turkish texts using latent semantic analysis

Semantic analysis for focused multi-document summarization (fMDS) of text