research-article

EUSUM: extracting easy-to-understand english summaries for non-native readers

Authors:

Jianguo XiaoAuthors Info & Claims

SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Pages 491 - 498

https://doi.org/10.1145/1835449.1835532

Published: 19 July 2010 Publication History

Abstract

In this paper we investigate a novel and important problem in multi-document summarization, i.e., how to extract an easy-to-understand English summary for non-native readers. Existing summarization systems extract the same kind of English summaries from English news documents for both native and non-native readers. However, the non-native readers have different English reading skills because they have different English education and learning backgrounds. An English summary which can be easily understood by native readers may be hardly understood by non-native readers. We propose to add the dimension of reading easiness or difficulty to multi-document summarization, and the proposed EUSUM system can produce easy-to-understand summaries according to the English reading skills of the readers. The sentence-level reading easiness (or difficulty) is predicted by using the SVM regression method. And the reading easiness score of each sentence is then incorporated into the summarization process. Empirical evaluation and user study have been performed and the results demonstrate that the EUSUM system can produce more easy-to-understand summaries for non-native readers than existing summarization systems, with very little sacrifice of the summary's informativeness.

References

[1]

M. R. Amini, P. Gallinari. The Use of Unlabeled Data to Improve Supervised Learning for Text Summarization. In Proceedings of SIGIR2002, 105--112.

Digital Library

[2]

R. Barzilay, N. Elhadad and K. McKeown, Inferring strategies for sentence ordering in multidocument news summarization, Journal of Artificial Intelligence Research 17, 2002.

Digital Library

[3]

D. Bollegala, N. Okazaki and M. Ishizuka. A bottom-up approach to sentence ordering for multi-document summarization. In Proceedings of ACL2006.

Digital Library

[4]

T. Brants, A. Franz. Web 1T 5-gram Version 1. Linguistic Data Consortium, Philadelphia, 2006.

[5]

J. S. Chall and E. Dale. Readability revisited: the new Dale-Chall readability formula. Brookline Books. Cambridge, MA, 1995.

[6]

C.-C. Chang and C.-J. Lin. LIBSVM : a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

Digital Library

[7]

K. Collins-Thompson and J. Callan. Predicting reading difficulty with statistical language models. Journal of the American Society for Information Science and Technology, 56(13), 2005.

Digital Library

[8]

G. ErKan, D. R. Radev. LexPageRank: Prestige in Multi-Document Text Summarization. In Proceedings of EMNLP2004.

[9]

T. L. François. Combining a statistical language model with logistic regression to predict the lexical and syntactic difficulty of texts for FFL. In Proceedings of the EACL2009 Student Research Workshop, 2009.

Digital Library

[10]

S. Harabagiu and F. Lacatusu. Topic themes for multi-document summarization. In Proceedings of SIGIR-05.

Digital Library

[11]

M. Heilman, K. Collins-Thompson, J. Callan and M. Eskenazi. Combining lexical and grammatical features to improve readability measures for first and second language texts. In Proceedings of HLT-2007.

[12]

M. Heilman, K. Collins-Thompson and M. Eskenazi. An analysis of statistical models and features for reading difficulty prediction. In Proceedings of the 3rd Workshop on Innovative Use of NLP for Building Educational Applications, 2008.

Digital Library

[13]

T. Kanungo and D. Orr. Predicting the readability of short web summaries. In Proceedings of WSDM2009.

Digital Library

[14]

P. Kidwell, G. Lebanon and K. Collins-Thompson. Statistical estimation of word acquisition with application to readability prediction. In Proceedings of EMNLP2009.

Digital Library

[15]

J. Kincaid, R. Fishburne, R. Rodgers and B. Chissom. Derivation of new readability formulas for navy enlisted personnel. Branch Report 8-75. Chief of Naval Training, Millington, TN, 1975.

[16]

D. Klein and C. D. Manning. Fast Exact Inference with a Factored Model for Natural Language Parsing. In Proceedings of NIPS-2002.

[17]

J. Kupiec, J. Pedersen, F. Chen. A.Trainable Document Summarizer. In Proceedings of SIGIR1995, 68--73.

Digital Library

[18]

W. Li, F. Wei, Q. Lu and Y. He. PNR2: ranking sentences with positive and negative reinforcement for query-oriented update summarization. In Proceedings of COLING-08.

[19]

C. Y. Lin, E. Hovy. The Automated Acquisition of Topic Signatures for Text Summarization. In Proceedings of the 17th Conference on Computational Linguistics, 495--501, 2000.

Digital Library

[20]

C.-Y. Lin and E. H. Hovy. From Single to Multi-document Summarization: A Prototype System and its Evaluation. In Proceedings of ACL-02.

Digital Library

[21]

C.-Y. Lin and E.H. Hovy. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In Proceedings of HLT-NAACL -03.

Digital Library

[22]

H. P. Luhn. The Automatic Creation of literature Abstracts. IBM Journal of Research and Development, 2(2), 1969.

Digital Library

[23]

R. Mihalcea, P. Tarau. TextRank: Bringing Order into Texts. In Proceedings of EMNLP2004.

[24]

R. Mihalcea and P. Tarau. A language independent algorithm for single and multiple document summarization. In Proceedings of IJCNLP-05.

[25]

A. Nenkova and A. Louis. Can you summarize this? Identifying correlates of input difficulty for generic multi-document summarization. In Proceedings of ACL-08:HLT.

[26]

E. Pitler and A. Nenkova. Revisiting readability: a unified framework for predicting text quality. In Proceedings of EMNLP2008.

Digital Library

[27]

D. R. Radev, H. Y. Jing, M. Stys and D. Tam. Centroid-based summarization of multiple documents. Information Processing and Management, 40: 919--938, 2004.

Digital Library

[28]

S. Schwarm and M. Ostendorf. Reading level assessment using support vector machines and statistical language models. In Proceedings of ACL2005.

Digital Library

[29]

A. J. Stenner. Measuring reading comprehension with the Lexile framework. Fourth North American Conference on Adolescent/Adult Literacy, 1996.

[30]

V. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.

[31]

X. Wan, J. Yang and J. Xiao. Using cross-document random walks for topic-focused multi-documetn summarization. In Proceedings of WI2006.

Digital Library

[32]

X. Wan and J. Yang. Multi-document summarization using cluster-based link analysis. In Proceedings of SIGIR-08.

Digital Library

[33]

X. Wan, J. Yang and J. Xiao. Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction. In Proceedings of ACL2007.

[34]

K.-F. Wong, M. Wu and W. Li. Extractive summarization using supervised and semi-supervised learning. In Proceedings of COLING-08.

Digital Library

Cited By

Pituxcoosuvarn MMurakami Y(2024)Boosting Non-Native Speaker Engagement: Simplifying Text with Large Language ModelsCollaboration Technologies and Social Computing10.1007/978-3-031-67998-8_22(274-281)Online publication date: 11-Sep-2024
Liu JShah C(2022)Interactive IR User Study Design, Evaluation, and ReportingundefinedOnline publication date: 10-Mar-2022
Madrazo Azpiazu IPera M(2020)Is cross‐lingual readability assessment possible?Journal of the Association for Information Science and Technology10.1002/asi.2429371:6(644-656)Online publication date: 7-May-2020
Show More Cited By

Index Terms

EUSUM: extracting easy-to-understand english summaries for non-native readers

Recommendations

Research on Multi-document Summarization Based on LDA Topic Model
IHMSC '14: Proceedings of the 2014 Sixth International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 02

Compared with VSM (Vector Space Model) and graph-ranking models, LDA (Latent Dirichlet Allocation) Model can discover latent topics in the corpus and latent topics are beneficial to use sentence-ranking mechanisms to form a good summary. In the paper, ...
Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text data

Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...
Exploring events and distributed representations of text in multi-document summarization

We explore an event detection framework to improve multi-document summarizationWe use distributed representations of text to address different lexical realizationsSummarization is based on the hierarchical combination of single-document summariesWe ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

July 2010

944 pages

ISBN:9781450301534

DOI:10.1145/1835449

General Chairs:
Fabio Crestani
University of Lugano, CH
,
Stéphane Marchand-Maillet
University of Geneva, CH
,
Program Chairs:
Hsin-Hsi Chen
National Taiwan University, TW
,
Efthimis N. Efthimiadis
University of Washington, USA
,
Jacques Savoy
University of Neuchatel, CH

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '10

Sponsor:

SIGIR

SIGIR '10: The 33rd International ACM SIGIR conference on research and development in Information Retrieval

July 19 - 23, 2010

Geneva, Switzerland

Acceptance Rates

SIGIR '10 Paper Acceptance Rate 87 of 520 submissions, 17%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
599
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pituxcoosuvarn MMurakami Y(2024)Boosting Non-Native Speaker Engagement: Simplifying Text with Large Language ModelsCollaboration Technologies and Social Computing10.1007/978-3-031-67998-8_22(274-281)Online publication date: 11-Sep-2024
Liu JShah C(2022)Interactive IR User Study Design, Evaluation, and ReportingundefinedOnline publication date: 10-Mar-2022
Madrazo Azpiazu IPera M(2020)Is cross‐lingual readability assessment possible?Journal of the Association for Information Science and Technology10.1002/asi.2429371:6(644-656)Online publication date: 7-May-2020
Liu JShah C(2019)Interactive IR User Study Design, Evaluation, and ReportingSynthesis Lectures on Information Concepts, Retrieval, and Services10.2200/S00923ED1V01Y201905ICR06711:2(i-75)Online publication date: 3-Jun-2019
Sun GJiang ZGu QChen D(2014)Linear model incorporating feature ranking for Chinese documents readabilityThe 9th International Symposium on Chinese Spoken Language Processing10.1109/ISCSLP.2014.6936601(29-33)Online publication date: Sep-2014
Jiang ZSun GGu QChen D(2014)An Ordinal Multi-class Classification Method for Readability Assessment of Chinese DocumentsKnowledge Science, Engineering and Management10.1007/978-3-319-12096-6_6(61-72)Online publication date: 2014
Ribeiro RMarujo LMartins de Matos DNeto JGershman ACarbonell JJones GSheridan PKelly Dde Rijke MSakai T(2013)Self reinforcement for important passage retrievalProceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval10.1145/2484028.2484134(845-848)Online publication date: 28-Jul-2013
Sydow MPikuła MSchenkel R(2013)The notion of diversity in graphical entity summarisation on semantic knowledge graphsJournal of Intelligent Information Systems10.1007/s10844-013-0239-641:2(109-149)Online publication date: 1-Oct-2013
Nandhini KBalasundaram S(2011)Improving Readability of Dyslexic Learners through Document SummarizationProceedings of the 2011 IEEE International Conference on Technology for Education10.1109/T4E.2011.49(246-249)Online publication date: 14-Jul-2011
Wan XLi HXiao JHajič J(2010)Cross-language document summarization based on machine translation quality predictionProceedings of the 48th Annual Meeting of the Association for Computational Linguistics10.5555/1858681.1858775(917-926)Online publication date: 11-Jul-2010
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten