skip to main content
10.1145/1456536.1456540acmconferencesArticle/Chapter ViewAbstractPublication PagesdocConference Proceedingsconference-collections
research-article

A corpus analysis of simple account texts and the proposal of simplification strategies: first steps towards text simplification systems

Published: 22 September 2008 Publication History

Abstract

In this paper we investigate the main linguistic phenomena that can make texts complex and how they could be simplified. We focus on a corpus analysis of simple account texts available on the web for Brazilian Portuguese (BP). This study illustrates the need for text simplification to facilitate accessibility to information by poor readers and by people with cognitive disabilities. It also highlights features of simplification for BP, which may differ from other languages. Moreover, we propose simplification strategies and a Simplification Annotation Editor. This study consists of the first step towards building BP text simplification systems. One of the scenarios in which these systems could be used is that of reading electronic texts produced, e.g., by the Brazilian government or by news agencies.

References

[1]
Ribeiro, V. M.: Analfabetismo e alfabetismo funcional no Brasil. Boletim INAF. São Paulo: Instituto Paulo Montenegro (2006)
[2]
Rino, L. H. M., Pardo, T. A. S., Silla Jr., C. N., Kaestner, C. A., Pombo, M.: A Comparison of Automatic Summarization Systems for Brazilian Portuguese Texts. SBIA 2004, LNAI, vol. 3171, pp. 235--244. Springer, Heidelberg (2004)
[3]
Feltrim, V., Pelizzoni, J. M., Teufel, S., Nunes, M. G. V., Aluísio, S. M.: Applying Argumentative Zoning in an Automatic Critiquer of Academic Writing. SBIA 2004, LNAI, vol. 3171, pp. 1--10. Springer, Heidelberg (2004)
[4]
Pardo, T. A. S., Nunes, M. G. V.: Review and Evaluation of DiZer - An Automatic Discourse Analyzer for Brazilian Portuguese. PROPOR 2006, LNCS, vol. 3960, pp. 180--189. (2006)
[5]
Mapleson, D. L.: Post-Grammatical Processing for Discourse Segmentation. PhD Thesis. School of Computing Sciences, University of East Anglia, Norwich (2006)
[6]
Max, A.: Writing for Language-impaired Readers. In the Proceedings of Seventh International Conference on Intelligent Text Processing and Computational Linguistics. CICLing 2006, pp. 567--570. (2006).
[7]
Petersen, S. E., Ostendorf, M.: Text Simplification for Language Learners: A Corpus Analysis. Speech and Language Technology for Education workshop, October 2007, Pennsylvania, USA. Available at: www.sarahpetersen.net/portfolio/Petersen_Ostendorf_SLaTE2007_final.pdf (2007)
[8]
Siddharthan, A. Syntactic Simplification and Text Cohesion. PhD Thesis. University of Cambridge (2003)
[9]
Siddharthan, A.: An Architecture for a Text Simplification System. In the Proceedings of the Language Engineering Conference (LEC), pp. 64--71. (2002)
[10]
Klebanov, B., Knight, K., Marcu, D.: Text Simplification for Information-Seeking Applications. On the Move to Meaningful Internet Systems. LNCS, vol. 3290, pp. 735--747. Springer-Verlag (2004)
[11]
Devlin, S. and Unthank, G.: Helping aphasic people process online information. In the Proceedings of the ACM SIGACCESS 2006, Conference on Computers and Accessibility, pp. 225--226. (2006)
[12]
Chandrasekar R., Doran C. and Srinivas, B.: Motivations and Methods for Text Simplification. COLING 1996, pp. 1041--1044. (1996)
[13]
Chandrasekar, R., Srinivas, B.: Automatic induction of rules for text simplification. Knowledge-Based Systems, 10, 183--190. (1997)
[14]
Williams, S.: Natural Language Generation (NLG) of discourse relations for different reading levels. PhD Thesis, University of Aberdeen. (2004)
[15]
Williams, S., Reiter, E.: A corpus analysis of discourse relations for Natural Language Generation Proceedings of Corpus Linguistics 2003, Lancaster University pp. 899--908. (2003)
[16]
Siddharthan, A.: Syntactic Simplification and Text Cohesion. Research on Language and Computation 4:77--109. Volume 4, Number 1 / June, (2006)
[17]
McNamara, D. S., Louwerse, M. M., Graesser, A. C.: Coh-Metrix: Automated cohesion and coherence scores to predict text readability and facilitate comprehension. Grant proposal. Available at: http://cohmetrix.memphis.edu/cohmetrixpr/publications.html (2002)
[18]
Cook, A. M., Hussey, S. M.: Assistive Technologies: Principles and Practice. Mosby (1995)
[19]
Freire, A. P., Paiva, D. M. B., Turine, M. A. S., Fortes, R. P. M.: Using Screen Readers to Reinforce Web Accessibility Education. In the Proceedings of the 12th ACM Annual Symposium on Innovation and Technology in Computer Science Education. pp. 82--86. ACM Press. (2007)
[20]
Freire, A. P., Fortes, R. P. M.: Automatic accessibility evaluation of dynamic web pages generated through XSLT. In the Proceedings of the 2005 International Cross-Disciplinary Workshop on Web Accessibility, pp. 81--84. ACM Press. (2005)
[21]
Freire, A P., Goularte, R., Fortes, R. P. M.: Techniques for Developing More Accessible Web Applications: a Survey Towards a Process Classification. In: The Proceedings of 25th ACM International Conference on Design of Communication, pp. 162--169. ACM Press. (2007)
[22]
Meireles, V., Spinillo, A. G.: Uma análise da coesão textual e da estrutura narrativa em textos escritos por adolescentes surdos. Estudos de Psicologia, V. 9, N. 1, pp. 131--144. (2004)
[23]
Inui, K.; Fujita, A., Takahashi, T., Iida, R., Iwakura, T.: Text simplification for reading assistance: a project note. In the Proceedings of the Second International Workshop on Paraphrasing, pp. 9--16. Sapporo, Japan. (2003)
[24]
Daelemans, W., Hothker, A., Sang, E. T. K.: Automatic Sentence Simplification for Subtitling in Dutch and English., LREC 2004, pp. 1045--1048. (2004)
[25]
Carroll, J., Minnen, G., Canning, Y., Devlin, S., Tait, J.: Practical simplification of English newspaper text to assist aphasic readers. In the Proceedings of AAAI-98 Workshop on Integrating Artificial Intelligence and Assistive Technology. (1998)
[26]
Gordon, W.: The Interface Between Cognitive Impairments and Access to Information Technology. In S. Keates (ed), Accessibility and Computing. ACM Special Interest Group on Accessible Computing, V. 83, pp. 3--6. (2005)
[27]
Ramos, W. M.: A compreensão leitora e a ação docente na produção do texto para o ensino a distância. Linguagem & Ensino, Vol. 9, No. 1, pp. 215--242. Universidade de Brasíília. (2006)
[28]
Widdowson, H. G.: Teaching language as communication. Oxford: Oxford University Press. (1978)
[29]
Williams S., Reiter E.: Generating basic skills reports for low-skilled readers. To appear in Natural Language Engineering. In press. (2008)
[30]
Williams S., Reiter E.: Generating Readable Texts for Readers with Low Basic Skills. Proceedings of ENLG-2005, pp. 140--147. (2005)
[31]
Carvalho Netto, J. R.: Ao Encontro da Lei: O Novo Código Civil ao alcance de todos. São Paulo: Imprensa Oficial. (2003)
[32]
Biderman, M. T. C. DICIONÁRIO ILUSTRADO DE PORTUGUÊS. São Paulo, Editora Ática. 1a. ed. São Paulo: Ática. (2005)
[33]
Janczura, G. A., Castilho, G. M., Rocha, N. O.: Normas de concretude para 909 palavras da língua portuguesa. Psic.: Teor. e Pesq. {online}., vol. 23, pp. 195--204. (2007)
[34]
Graesser, A., McNamara, D. S., Louwerse, M., & Cai, Z.: Coh-Metrix: Analysis of text on cohesion and language. Behavioral Research Methods, Instruments, and Computers, 36, pp.193--202. (2004)
[35]
Muniz, M., Paulovich, F. V., Minghim, R., Infante, K., Muniz, F., Vieira, R., Aluísio, S.: Taming the tiger topic: an XCES compliant corpus Portal to generate subcorpus based on automatic text topic identification. In: Proceedings of the Corpus Linguistics Conference. pp. 1--18 (2007)
[36]
Bick, E.: The Parsing System "Palavras": Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. PhD thesis. Aarhus University. Denmark University Press. (2000)
[37]
Muller, C., Strube, M.: Multi-Level Annotation in MMAX. In Proceedings of the 4th SIGdial Workshop on Discourse and Dialogue, Sapporo, Japan. (2003)
[38]
Specia, L.; Aluisio, S. M.; Pardo, T. A. S.: Manual de Simplificação Sintática para o Português. Technical Report NILC-TR-08-06. São Carlos-SP. (2008)

Cited By

View all
  • (2018)Identifying signs of syntactic complexity for rule-based sentence simplificationNatural Language Engineering10.1017/S135132491800038425:1(69-119)Online publication date: 31-Oct-2018
  • (2014)Simple or Not Simple? A Readability QuestionLanguage Production, Cognition, and the Lexicon10.1007/978-3-319-08043-7_22(379-398)Online publication date: 12-Nov-2014
  • (2013)Enhancing readability of web documents by text augmentation for deaf peopleProceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics10.1145/2479787.2479808(1-10)Online publication date: 12-Jun-2013
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGDOC '08: Proceedings of the 26th annual ACM international conference on Design of communication
September 2008
303 pages
ISBN:9781605580838
DOI:10.1145/1456536
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 September 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. corpus analysis
  2. natural language processing
  3. text simplification

Qualifiers

  • Research-article

Conference

SIGDOC '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 355 of 582 submissions, 61%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2018)Identifying signs of syntactic complexity for rule-based sentence simplificationNatural Language Engineering10.1017/S135132491800038425:1(69-119)Online publication date: 31-Oct-2018
  • (2014)Simple or Not Simple? A Readability QuestionLanguage Production, Cognition, and the Lexicon10.1007/978-3-319-08043-7_22(379-398)Online publication date: 12-Nov-2014
  • (2013)Enhancing readability of web documents by text augmentation for deaf peopleProceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics10.1145/2479787.2479808(1-10)Online publication date: 12-Jun-2013
  • (2010)Fostering digital inclusion and accessibilityProceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas10.5555/1868701.1868708(46-53)Online publication date: 6-Jun-2010
  • (2008)Towards Brazilian Portuguese automatic text simplification systemsProceedings of the eighth ACM symposium on Document engineering10.1145/1410140.1410191(240-248)Online publication date: 16-Sep-2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media