skip to main content
10.1145/1835449.1835532acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

EUSUM: extracting easy-to-understand english summaries for non-native readers

Published: 19 July 2010 Publication History

Abstract

In this paper we investigate a novel and important problem in multi-document summarization, i.e., how to extract an easy-to-understand English summary for non-native readers. Existing summarization systems extract the same kind of English summaries from English news documents for both native and non-native readers. However, the non-native readers have different English reading skills because they have different English education and learning backgrounds. An English summary which can be easily understood by native readers may be hardly understood by non-native readers. We propose to add the dimension of reading easiness or difficulty to multi-document summarization, and the proposed EUSUM system can produce easy-to-understand summaries according to the English reading skills of the readers. The sentence-level reading easiness (or difficulty) is predicted by using the SVM regression method. And the reading easiness score of each sentence is then incorporated into the summarization process. Empirical evaluation and user study have been performed and the results demonstrate that the EUSUM system can produce more easy-to-understand summaries for non-native readers than existing summarization systems, with very little sacrifice of the summary's informativeness.

References

[1]
M. R. Amini, P. Gallinari. The Use of Unlabeled Data to Improve Supervised Learning for Text Summarization. In Proceedings of SIGIR2002, 105--112.
[2]
R. Barzilay, N. Elhadad and K. McKeown, Inferring strategies for sentence ordering in multidocument news summarization, Journal of Artificial Intelligence Research 17, 2002.
[3]
D. Bollegala, N. Okazaki and M. Ishizuka. A bottom-up approach to sentence ordering for multi-document summarization. In Proceedings of ACL2006.
[4]
T. Brants, A. Franz. Web 1T 5-gram Version 1. Linguistic Data Consortium, Philadelphia, 2006.
[5]
J. S. Chall and E. Dale. Readability revisited: the new Dale-Chall readability formula. Brookline Books. Cambridge, MA, 1995.
[6]
C.-C. Chang and C.-J. Lin. LIBSVM : a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
[7]
K. Collins-Thompson and J. Callan. Predicting reading difficulty with statistical language models. Journal of the American Society for Information Science and Technology, 56(13), 2005.
[8]
G. ErKan, D. R. Radev. LexPageRank: Prestige in Multi-Document Text Summarization. In Proceedings of EMNLP2004.
[9]
T. L. François. Combining a statistical language model with logistic regression to predict the lexical and syntactic difficulty of texts for FFL. In Proceedings of the EACL2009 Student Research Workshop, 2009.
[10]
S. Harabagiu and F. Lacatusu. Topic themes for multi-document summarization. In Proceedings of SIGIR-05.
[11]
M. Heilman, K. Collins-Thompson, J. Callan and M. Eskenazi. Combining lexical and grammatical features to improve readability measures for first and second language texts. In Proceedings of HLT-2007.
[12]
M. Heilman, K. Collins-Thompson and M. Eskenazi. An analysis of statistical models and features for reading difficulty prediction. In Proceedings of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications, 2008.
[13]
T. Kanungo and D. Orr. Predicting the readability of short web summaries. In Proceedings of WSDM2009.
[14]
P. Kidwell, G. Lebanon and K. Collins-Thompson. Statistical estimation of word acquisition with application to readability prediction. In Proceedings of EMNLP2009.
[15]
J. Kincaid, R. Fishburne, R. Rodgers and B. Chissom. Derivation of new readability formulas for navy enlisted personnel. Branch Report 8-75. Chief of Naval Training, Millington, TN, 1975.
[16]
D. Klein and C. D. Manning. Fast Exact Inference with a Factored Model for Natural Language Parsing. In Proceedings of NIPS-2002.
[17]
J. Kupiec, J. Pedersen, F. Chen. A.Trainable Document Summarizer. In Proceedings of SIGIR1995, 68--73.
[18]
W. Li, F. Wei, Q. Lu and Y. He. PNR2: ranking sentences with positive and negative reinforcement for query-oriented update summarization. In Proceedings of COLING-08.
[19]
C. Y. Lin, E. Hovy. The Automated Acquisition of Topic Signatures for Text Summarization. In Proceedings of the 17th Conference on Computational Linguistics, 495--501, 2000.
[20]
C.-Y. Lin and E. H. Hovy. From Single to Multi-document Summarization: A Prototype System and its Evaluation. In Proceedings of ACL-02.
[21]
C.-Y. Lin and E.H. Hovy. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In Proceedings of HLT-NAACL -03.
[22]
H. P. Luhn. The Automatic Creation of literature Abstracts. IBM Journal of Research and Development, 2(2), 1969.
[23]
R. Mihalcea, P. Tarau. TextRank: Bringing Order into Texts. In Proceedings of EMNLP2004.
[24]
R. Mihalcea and P. Tarau. A language independent algorithm for single and multiple document summarization. In Proceedings of IJCNLP-05.
[25]
A. Nenkova and A. Louis. Can you summarize this? Identifying correlates of input difficulty for generic multi-document summarization. In Proceedings of ACL-08:HLT.
[26]
E. Pitler and A. Nenkova. Revisiting readability: a unified framework for predicting text quality. In Proceedings of EMNLP2008.
[27]
D. R. Radev, H. Y. Jing, M. Stys and D. Tam. Centroid-based summarization of multiple documents. Information Processing and Management, 40: 919--938, 2004.
[28]
S. Schwarm and M. Ostendorf. Reading level assessment using support vector machines and statistical language models. In Proceedings of ACL2005.
[29]
A. J. Stenner. Measuring reading comprehension with the Lexile framework. Fourth North American Conference on Adolescent/Adult Literacy, 1996.
[30]
V. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.
[31]
X. Wan, J. Yang and J. Xiao. Using cross-document random walks for topic-focused multi-documetn summarization. In Proceedings of WI2006.
[32]
X. Wan and J. Yang. Multi-document summarization using cluster-based link analysis. In Proceedings of SIGIR-08.
[33]
X. Wan, J. Yang and J. Xiao. Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction. In Proceedings of ACL2007.
[34]
K.-F. Wong, M. Wu and W. Li. Extractive summarization using supervised and semi-supervised learning. In Proceedings of COLING-08.

Cited By

View all
  • (2024)Boosting Non-Native Speaker Engagement: Simplifying Text with Large Language ModelsCollaboration Technologies and Social Computing10.1007/978-3-031-67998-8_22(274-281)Online publication date: 11-Sep-2024
  • (2022)Interactive IR User Study Design, Evaluation, and ReportingundefinedOnline publication date: 10-Mar-2022
  • (2020)Is cross‐lingual readability assessment possible?Journal of the Association for Information Science and Technology10.1002/asi.2429371:6(644-656)Online publication date: 7-May-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
July 2010
944 pages
ISBN:9781450301534
DOI:10.1145/1835449
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. EUSUM
  2. multi-document summarization
  3. reading easiness

Qualifiers

  • Research-article

Conference

SIGIR '10
Sponsor:

Acceptance Rates

SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Boosting Non-Native Speaker Engagement: Simplifying Text with Large Language ModelsCollaboration Technologies and Social Computing10.1007/978-3-031-67998-8_22(274-281)Online publication date: 11-Sep-2024
  • (2022)Interactive IR User Study Design, Evaluation, and ReportingundefinedOnline publication date: 10-Mar-2022
  • (2020)Is cross‐lingual readability assessment possible?Journal of the Association for Information Science and Technology10.1002/asi.2429371:6(644-656)Online publication date: 7-May-2020
  • (2019)Interactive IR User Study Design, Evaluation, and ReportingSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00923ED1V01Y201905ICR06711:2(i-75)Online publication date: 3-Jun-2019
  • (2014)Linear model incorporating feature ranking for Chinese documents readabilityThe 9th International Symposium on Chinese Spoken Language Processing10.1109/ISCSLP.2014.6936601(29-33)Online publication date: Sep-2014
  • (2014)An Ordinal Multi-class Classification Method for Readability Assessment of Chinese DocumentsKnowledge Science, Engineering and Management10.1007/978-3-319-12096-6_6(61-72)Online publication date: 2014
  • (2013)Self reinforcement for important passage retrievalProceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval10.1145/2484028.2484134(845-848)Online publication date: 28-Jul-2013
  • (2013)The notion of diversity in graphical entity summarisation on semantic knowledge graphsJournal of Intelligent Information Systems10.1007/s10844-013-0239-641:2(109-149)Online publication date: 1-Oct-2013
  • (2011)Improving Readability of Dyslexic Learners through Document SummarizationProceedings of the 2011 IEEE International Conference on Technology for Education10.1109/T4E.2011.49(246-249)Online publication date: 14-Jul-2011
  • (2010)Cross-language document summarization based on machine translation quality predictionProceedings of the 48th Annual Meeting of the Association for Computational Linguistics10.5555/1858681.1858775(917-926)Online publication date: 11-Jul-2010
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media