ACM Home Page
Please provide us with feedback. Feedback
A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization
Full text PdfPdf (211 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Seattle, Washington, USA
SESSION: Summarization: multidocuments and new applications table of contents
Pages: 573 - 580  
Year of Publication: 2006
ISBN:1-59593-369-7
Authors
Ani Nenkova  Stanford University
Lucy Vanderwende  Microsoft Research
Kathleen McKeown  Stanford University
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 19,   Downloads (12 Months): 192,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1148170.1148269
What is a DOI?

ABSTRACT

The usual approach for automatic summarization is sentence extraction, where key sentences from the input documents are selected based on a suite of features. While word frequency often is used as a feature in summarization, its impact on system performance has not been isolated. In this paper, we study the contribution to summarization of three factors related to frequency: content word frequency, composition functions for estimating sentence importance from word frequency, and adjustment of frequency weights based on context. We carry out our analysis using datasets from the Document Understanding Conferences, studying not only the impact of these features on automatic summarizers, but also their role in human summarization. Our research shows that a frequency based summarizer can achieve performance comparable to that of state-of-the-art systems, but only with a good composition function; context sensitivity improves performance and significantly reduces repetition.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
M. Banko and L. Vanderwende. Using n-grams to understand the nature of summaries. In Proceedings of HLT/NAACL'04, 2004.
 
2
3
 
4
J. Conroy, J. Schlesinger, J. Goldstein, and D. O'Leary. Left-brain/right-brain multi-document summarization. In Proceedings of the 4th Document Undersatnding Conference (DUC'04), 2004.
 
5
T. Copeck and S. Szpakowicz. Vocabulary agreement among model summaries and source documents. In Proceedings of the Document Understanding Conference DUC'04, 2004.
 
6
H. Daumé III and D. Marcu. Bayesian multi-document summarization at mse. In Proceedings of the Workshop on Multilingual Summarization Evaluation (MSE), Ann Arbor, MI, June 29 2005.
 
7
D. K. Elson. Project logline: Rhetorical categorization for multidocument news summarization. Master's thesis, Columbia University, 2005.
 
8
D. K. Evans and K. McKeown. Identifying similarities and differences across english and arabic news. In Proceedings of the International Conference on Intelligence Analysis, 2005.
9
 
10
 
11
12
 
13
C.-Y. Lin. Rouge: a package for automatic evaluation of summaries. In Proceedings of the Workshop in Text Summarization, ACL'04, 2004.
 
14
C.-Y. Lin and E. Hovy. Automated multi-document summarization in neats. In Proceedings of the Human Language Technology Conference (HLT2002 ), 2002.
 
15
 
16
C.-Y. Lin and F. J. Och. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), 2004.
 
17
H. P. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2):159--165, 1958.
 
18
D. Marcu and L. Gerber. An inquiry into the nature of multidocument abstracts, extracts, and their evaluation. In Proceedings of the NAACL-2001 Workshop on Automatic Summarization, 2001.
 
19
A. Nenkova and R. Passonneau. Evaluating content selection in summarization: The pyramid method. In Proceedings of HLT/NAACL 2004, 2004.
 
20
E. Newman, W. Doran, N. Stokes, J. Carthy, and J. Dunnion. Comparing redundancy removal techniques for multi-document summarisation. In Proceedings of STAIRS, pages 223--228, 2004.
 
21
P. Over and J. Yen. An introduction to duc 2004 intrinsic evaluation of generic news text summarization systems. In Proceedings of DUC 2004, 2004.
 
22
R. Passonneau, A. Nenkova, K. McKeown, and S. Sigleman. Pyramid evaluation ot duc 2005. In Proceedings of the Document Understanding Conference (DUC'05), 2005.
 
23
 
24
G. J. Rath, A. Resnick, and R. Savage. The formation of abstracts by the selection of sentences: Part 1: sentence selection by man and machines. American Documentation, 2(12):139--208, 1961.
 
25
N. Schenker and J. Gentleman. On judging the significance of differences by examining the overlap between confidence intervals. The American Statistician, 55(3):182--186, 2001.
 
26
B. Schiffman, A. Nenkova, and K. McKeown. Experiments in multidocument summarization. In Proceedings of the Human Language Technology Conference, 2002.
 
27
 
28
L. Vanderwende, M. Banko, and A. Menezes. Event-centric summary generation. In Proceedings of the Document Understanding Conference (DUC'04), 2004.
 
29
L. Vanderwende and H. Suzuki. Frequency-based summarizer and a language modeling extention. In MSE 2005 common data task evaluation, 2005.


Collaborative Colleagues:
Ani Nenkova: colleagues
Lucy Vanderwende: colleagues
Kathleen McKeown: colleagues