skip to main content
10.1145/1410140.1410191acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Towards Brazilian Portuguese automatic text simplification systems

Published: 16 September 2008 Publication History

Abstract

In this paper we investigate the main linguistic phenomena that can make texts complex and how they could be simplified. We focus on a corpus analysis of simple account texts available on the web for Brazilian Portuguese and propose simplification strategies for this language. This study illustrates the need for text simplification to facilitate accessibility to information by poor literacy readers and potentially by people with other cognitive disabilities. It also highlights characteristics of simplification for Portuguese, which may differ from other languages. Such study consists of the first step towards building Brazilian Portuguese text simplification systems. One of the scenarios in which these systems could be used is that of reading electronic texts produced, e.g., by the Brazilian government or by relevant news agencies.

References

[1]
Ribeiro, V. M. 2006. Analfabetismo e alfabetismo funcional no Brasil. Boletim INAF. São Paulo: Instituto Paulo Montenegro.
[2]
Rino, L.H.M., Pardo, T.A.S., Silla Jr., C.N., Kaestner, C.A., Pombo, M. 2004. A Comparison of Automatic Summarization Systems for Brazilian Portuguese Texts. SBIA 2004, Lecture Notes in Artificial Inteligence. 3171, Springer-Verlag, Berlin Heidelberg New York, 235--244.
[3]
Feltrim, V., Pelizzoni, J.M., Teufel, S., Nunes, M.G.V., Aluísio, S.M. 2004. Applying Argumentative Zoning in an Automatic Critiquer of Academic Writing. SBIA 2004, Lecture Notes in Artificial Inteligence. 3171, Springer-Verlag, Berlin Heidelberg New York, 1--10.
[4]
Pardo, T.A.S., Nunes, M.G.V. 2006. Review and Evaluation of DiZer - An Automatic Discourse Analyzer for Brazilian Portuguese. PROPOR 2006, Lecture Notes in Computer Science. 3960, Springer-Verlag, Berlin Heidelberg New York, 180--189.
[5]
Mapleson, D.L. 2006. Post-Grammatical Processing for Discourse Segmentation. PhD Thesis. School of Computing Sciences, University of East Anglia, Norwich.
[6]
Max, A. 2006. Writing for Language-impaired Readers. In Proceedings of Seventh International Conference on Intelligent Text Processing and Computational Linguistics (Mexico City, Mexico, February 19-25, 2006). CICLing 2006. Springer-Verlag, Berlin Heidelberg New York, 567--570.
[7]
Petersen, S. E., Ostendorf, M.: Text Simplification for Language Learners: A Corpus Analysis. 2007. In Proceedings of the Speech and Language Technology for Education Workshop (Pennsylvania, USA, October 1-3, 2007). SLaTE-2007. Carnegie Mellon University and ISCA Archive, http://www.isca-speech.org/archive/slate_2007. 69--72.
[8]
Siddharthan, A. 2003. Syntactic Simplification and Text Cohesion. PhD Thesis. University of Cambridge.
[9]
Siddharthan, A. 2002. An Architecture for a Text Simplification System. In Proceedings of the Language Engineering Conference (Hyderabad, India, December 13-15, 2002). IEEE Computer Society 2002, 64--71.
[10]
Klebanov, B., Knight, K., Marcu, D. 2004. Text Simplification for Information-Seeking Applications. On the Move to Meaningful Internet Systems. Lecture Notes in Computer Science. 3290, Springer--Verlag, Berlin Heidelberg New York, 735--747.
[11]
Devlin, S. and Unthank, G. 2006. Helping aphasic people process online information. In Proceedings of the ACM SIGACCESS 2006, Conference on Computers and Accessibility (Portland, Oregon, USA, October 23-25, 2006). ASSETS 2006. New York: ACM, 225--226.
[12]
Chandrasekar R., Doran C. and Srinivas, B. 1996. Motivations and Methods for Text Simplification. In Proceedings of the 16th International Conference on Computational Linguistics (Center for Sprogteknologi, Copenhagen, Denmark, August 5-9, 1996). COLING 1996, 1041--1044.
[13]
Chandrasekar, R., Srinivas, B. 1997. Automatic induction of rules for text simplification. Knowledge-Based Systems, 10, 183--190.
[14]
Williams, S. 2004. Natural Language Generation (NLG) of discourse relations for different reading levels. PhD Thesis, University of Aberdeen.
[15]
Williams, S., Reiter, E. 2003. A corpus analysis of discourse relations for Natural Language Generation. In Proceedings of the Corpus Linguistics 2003 (Lancaster, England, March 28 - 31, 2003), CL2003, 899--908.
[16]
Siddharthan, A. 2006. Syntactic Simplification and Text Cohesion. Research on Language and Computation, Vol. 4, 1 (June, 2006), 77--109.
[17]
McNamara, D.S., Louwerse, M.M., Graesser, A.C. 2002. Coh-Metrix: Automated cohesion and coherence scores to predict text readability and facilitate comprehension. Grant proposal. http://cohmetrix.memphis.edu/cohmetrixpr/publications.html
[18]
Cook, A.M., Hussey, S.M. 1995. Assistive Technologies: Principles and Practice. Mosby.
[19]
Freire, A.P., Fortes, R.P.M, Paiva, D.M.B., Turine, M.A.S., 2007. Using Screen Readers to Reinforce Web Accessibility Education. In Proceedings of the 12th ACM Annual SIGCSE Conference on Innovation and Technology in Computer Science Education (Dundee, Scotland, June 25-27, 2007). ITiCSE 2007. ACM Press, New York, NY, 82--86.
[20]
Freire, A.P., Fortes, R.P.M. 2005. Automatic accessibility evaluation of dynamic web pages generated through XSLT. In Proceedings of the International Cross-Disciplinary Workshop on Web Accessibility. (Chiba, Japan, May 10-14, 2005). W4A 2005. ACM Press, New York, NY, 81--84.
[21]
Freire, A P., Goularte, Fortes, R. P. M. 2007. Techniques for Developing More Accessible Web Applications: a Survey Towards a Process Classification. In Proceedings of 25th ACM International Conference on Design of Communication. (El Paso, Texas, EUA, October 22-24, 2007). SIGDOC 2007. ACM Press, New York, NY, 162--169.
[22]
Meireles, V., Spinillo, A.G. 2004. Uma análise da coesão textual e da estrutura narrativa em textos escritos por adolescentes surdos. Estudos de Psicologia, 9, 1, 131--144.
[23]
Inui, K.; Fujita, A., Takahashi, T., Iida, R., Iwakura, T. 2003. Text simplification for reading assistance: a project note. In Proceedings of the Second International Workshop on Paraphrasing (Sapporo, Japan, July 11, 2003). IWP2003. Association for Computational Linguistics, Morristown, NJ, USA, 9--16.
[24]
Daelemans, W., Hothker, A., Sang, E.T.K. 2004. Automatic Sentence Simplification for Subtitling in Dutch and English. In Proceedings of the 4th International Conference on Language Resources and Evaluation (Lisbon, Portugal, May 26-28, 2004), LREC 2004. ELRA Paris, France, 1045--1048.
[25]
Carroll, J., Minnen, G., Canning, Y., Devlin, S., Tait, J. 1998. Practical simplification of English newspaper text to assist aphasic readers. In Proceedings of the AAAI-98 Workshop on Integrating Artificial Intelligence and Assistive Technology.
[26]
Gordon, W. 2005. The Interface Between Cognitive Impairments and Access to Information Technology. In S. Keates (ed), Accessibility and Computing. ACM Special Interest Group on Accessible Computing, 83, 3--6.
[27]
Ramos, W. M. 2006. A compreensão leitora e a ação docente na produção do texto para o ensino a distância. Linguagem & Ensino, Vol. 9, No. 1, 215--242. Universidade de Brasíília.
[28]
Widdowson, H. G. 1978. Teaching language as communication. Oxford: Oxford University Press.
[29]
Williams S., Reiter E. 2008. Generating basic skills reports for low-skilled readers, Natural Language Engineering, First View article, (Apr. 2008), 1--31. Published online by Cambridge University Press 24 Apr 2008.
[30]
Williams S., Reiter E. 2005. Generating Readable Texts for Readers with Low Basic Skills. In Proceedings of the 10th European Workshop on Natural Language Generation (Aberdeen, Scotland, August 8-10, 2005). ENLG-2005, Association for Computational Linguistics, Morristown, NJ, USA, 140--147.
[31]
Carvalho Netto, J. R. 2003. Ao Encontro da Lei: O Novo Código Civil ao alcance de todos. São Paulo: Imprensa Oficial.
[32]
Biderman, M. T. C. 2005. DICIONÁRIO ILUSTRADO DE PORTUGUÊS. São Paulo, Editora Ática. 1ª. ed. São Paulo: Ática.
[33]
Janczura, G. A., Castilho, G. M., Rocha, N. O. 2007. Normas de concretude para 909 palavras da língua portuguesa. Psic.: Teor. e Pesq., vol. 23, 195--204.
[34]
Graesser, A., McNamara, D. S., Louwerse, M., & Cai, Z. 2004. Coh-Metrix: Analysis of text on cohesion and language. Behavioral Research Methods, Instruments, and Computers, 36, 193--202.
[35]
Muniz, M., Paulovich, F. V., Minghim, R., Infante, K., Muniz, F., Vieira, R., Aluísio, S. 2007. Taming the tiger topic: an XCES compliant corpus Portal to generate subcorpus based on automatic text topic identification. In Proceedings of the Corpus Linguistics 2007 (University of Birmingham, July 27-30, 2007). CL 2007.
[36]
Bick, E. 2000. The Parsing System "Palavras": Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. PhD thesis. Aarhus University. Denmark University Press.
[37]
Specia, L.; Aluisio, S.M.; Pardo, T.A.S. 2008. Manual de Simplificação Sintática para o Português. Technical Report NILC-TR-08-06. São Carlos-SP. http://www.nilc.icmc.usp.br/nilc/publications.htm#TechnicalReports
[38]
Aluísio, S.M.; Specia, L.; Pardo, T.A.S.; Maziero, E.G.; Caseli, H.M.; Fortes, R.P.M. "A Corpus Analysis of Simple Account Texts and the Proposal of Simplification Strategies: First Steps towards Text Simplification Systems", Proceedings of the 26th ACM International Conference on Design of Communication, 2008, in press.

Cited By

View all
  • (2023)Abordagem baseada em Aumento de Dados para Avaliação Automática de LeiturabilidadeDomínios de Lingu@gem10.14393/DLv17a2023-2117(e1721)Online publication date: 5-Apr-2023
  • (2022)Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on ItalianFrontiers in Psychology10.3389/fpsyg.2022.70763013Online publication date: 8-Mar-2022
  • (2021)Supporting Sign Language Narrations in the MuseumHeritage10.3390/heritage50100015:1(1-20)Online publication date: 21-Dec-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '08: Proceedings of the eighth ACM symposium on Document engineering
September 2008
312 pages
ISBN:9781605580814
DOI:10.1145/1410140
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Brazilian Portuguese
  2. corpus analysis
  3. natural language processing
  4. poor literacy readers
  5. text simplification

Qualifiers

  • Research-article

Conference

DocEng '08
Sponsor:
DocEng '08: ACM Symposium on Document Engineering
September 16 - 19, 2008
Sao Paulo, Brazil

Acceptance Rates

DocEng '08 Paper Acceptance Rate 21 of 62 submissions, 34%;
Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Abordagem baseada em Aumento de Dados para Avaliação Automática de LeiturabilidadeDomínios de Lingu@gem10.14393/DLv17a2023-2117(e1721)Online publication date: 5-Apr-2023
  • (2022)Linguistically-Based Comparison of Different Approaches to Building Corpora for Text Simplification: A Case Study on ItalianFrontiers in Psychology10.3389/fpsyg.2022.70763013Online publication date: 8-Mar-2022
  • (2021)Supporting Sign Language Narrations in the MuseumHeritage10.3390/heritage50100015:1(1-20)Online publication date: 21-Dec-2021
  • (2021)Text simplification for Malay corpus: A Review2021 International Conference on Computer & Information Sciences (ICCOINS)10.1109/ICCOINS49721.2021.9497167(345-350)Online publication date: 13-Jul-2021
  • (2018)Taking text simplification to the userProceedings of the 8th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion10.1145/3218585.3218591(88-96)Online publication date: 20-Jun-2018
  • (2018)Identifying signs of syntactic complexity for rule-based sentence simplificationNatural Language Engineering10.1017/S135132491800038425:1(69-119)Online publication date: 31-Oct-2018
  • (2015)Making It SimplextACM Transactions on Accessible Computing10.1145/27380466:4(1-36)Online publication date: 11-May-2015
  • (2015)A survey of research on text simplificationITL - International Journal of Applied Linguistics10.1075/itl.165.2.06sid165:2(259-298)Online publication date: 23-Jan-2015
  • (2014)Integration of Lexical and Syntactic Simplification Capabilities in a Text EditorProcedia Computer Science10.1016/j.procs.2014.02.01227(94-103)Online publication date: 2014
  • (2014)Text simplification resources for SpanishLanguage Resources and Evaluation10.1007/s10579-014-9265-448:1(93-120)Online publication date: 1-Mar-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media