skip to main content
10.1145/1296843.1296877acmconferencesArticle/Chapter ViewAbstractPublication PagesassetsConference Proceedingsconference-collections
Article

Corpus studies in word prediction

Published: 15 October 2007 Publication History

Abstract

Word prediction can be used to enhance the communication rate of people with disabilities who use Augmentative and Alternative Communication (AAC) devices. We use statistical methods in a word prediction system, which are trained on a corpus, and then measure the efficacy of the resulting system by calculating the theoretical keystroke savings on some held out data. Ideally training and testing should be done on a large corpus of AAC text covering a variety of topics, but no such corpus exists. We discuss training and testing on a wide variety of corpora meant to approximate text from AAC users. We show that training on a combination of in-domain data with out-of-domain data is often more beneficial than either data set alone and that advanced language modeling such as topic modeling is portable even when applied to very different text.

References

[1]
Anc second release, 2007. Accessed from http://americannationalcorpus.org/SecondRelease/ on 3/22/2007.
[2]
J. R. Bellegarda. Large vocabulary speech recognition with multispan language models. IEEE Trans. On Speech and Audio Processing, 8(1):76--84, 2000.
[3]
L. Boggess. Two simple prediction algorithms to facilitate text production. In ANLP, pages 33--40, 1988.
[4]
A. Copestake. Augmented and alternative NLP techniques for augmentative and alternative communication. In ACL-97 workshop on Natural Language Processing for Communication Aids, pages 37--42, 1997.
[5]
A. Fazly and G. Hirst. Testing the efficacy of part-of-speech information in word completion. In EACL-03 Workshop on Language Modeling for Text Entry, pages 9--16, 2003.
[6]
R. Florian and D. Yarowsky. Dynamic nonlocal language modeling via hierarchical topic-based adaptation. In ACL, pages 167--174, 1999.
[7]
G. Foster, P. Isabelle, and P. Plamondon. Word completion: A first step toward target-text mediated IMT. In COLING, pages 394--399, 1996.
[8]
D. Hindle. Deterministic parsing of syntactic non-fluencies. In ACL, pages 123--128, 1983.
[9]
S. M. Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics Speech and Signal Processing, 35(3):400--401, 1987.
[10]
G. Lesher and G. Rinkus. Domain-specific word prediction for augmentative communication. In RESNA, 2002.
[11]
G. W. Lesher, B. J. Moulton, and D. J. Higgonbotham. Effects of ngram order and training text size on word prediction. In RESNA, 1999.
[12]
J. Li and G. Hirst. Semantic knowledge in word completion. In ASSETS, pages 121--128, 2005.
[13]
M. Mahajan, D. Beeferman, and X. D. Huang. Improved topic-dependent language modeling using information retrieval techniques. In ICASSP, 1999.
[14]
J. Matiasek and M. Baroni. Exploiting long distance collocational relations in predictive typing. In EACL-03 Workshop on Language Modeling for Text Entry, pages 1--8, 2003.
[15]
R. Rosenfeld. Two decades of statistical language modeling: Where do we go from here? Proceedings of the IEEE, 88(8):1270--1278, 2000.
[16]
Santa barbara corpus of spoken american english, 2007. Accessed from http://www.linguistics.ucsb.edu/research/sbcorpus.html on 3/22/2007.
[17]
K. Seymore and R. Rosenfeld. Using story topics for language model adaptation. In Proceedings of Eurospeech '97, pages 1987--1990, Rhodes, Greece, 1997.
[18]
E. Shriberg. Disfluencies in switchboard. In International Conference on Spoken Language Processing, pages 11--14 (addendum), 1996.
[19]
SWITCHBOARD: A User's Manual, 2007. Accessed from http://www.ldc.upenn.edu/Catalog/docs/switchboard/ on 3/22/2007.
[20]
K. Trnka, D. Yarrington, J. McCaw, K. F. McCoy, and C. Pennington. The Effects of Word Prediction on Communication Rate for AAC. In NAACL, pages 173--176, 2007.
[21]
K. Trnka, D. Yarrington, K. McCoy, and C. Pennington. Topic Modeling in Fringe Word Prediction for AAC. In IUI, pages 276--278, January 2006.
[22]
P. Väyrynen. Perspectives on the utility of linguistic knowledge in English word prediction. PhD thesis, University of Oulu, 2005.
[23]
T. Wandmacher and J.-Y. Antoine. Training Language Models without Appropriate Language Resources: Experiments with an AAC System for Disabled People. In LREC, 2006.
[24]
Web 1T 5-gram Version 1, 2007. Accessed from http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp? catalogId=LDC2006T13 on 3/23/2007.

Cited By

View all
  • (2023)Next Word Prediction with Deep Learning ModelsSmart Applications with Advanced Machine Learning and Human-Centred Problem Design10.1007/978-3-031-09753-9_38(523-531)Online publication date: 1-Jan-2023
  • (2021)edATLAS: An Efficient Disambiguation Algorithm for Texting in Languages with Abugida Scripts2021 IEEE 15th International Conference on Semantic Computing (ICSC)10.1109/ICSC50631.2021.00061(325-332)Online publication date: Jan-2021
  • (2019)Real-Time Optimized N-Gram for Mobile Devices2019 IEEE 13th International Conference on Semantic Computing (ICSC)10.1109/ICOSC.2019.8665639(87-92)Online publication date: Jan-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
Assets '07: Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility
October 2007
282 pages
ISBN:9781595935731
DOI:10.1145/1296843
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. corpora
  2. language modeling
  3. statistical methods
  4. word prediction

Qualifiers

  • Article

Conference

ASSETS07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 436 of 1,556 submissions, 28%

Upcoming Conference

ASSETS '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Next Word Prediction with Deep Learning ModelsSmart Applications with Advanced Machine Learning and Human-Centred Problem Design10.1007/978-3-031-09753-9_38(523-531)Online publication date: 1-Jan-2023
  • (2021)edATLAS: An Efficient Disambiguation Algorithm for Texting in Languages with Abugida Scripts2021 IEEE 15th International Conference on Semantic Computing (ICSC)10.1109/ICSC50631.2021.00061(325-332)Online publication date: Jan-2021
  • (2019)Real-Time Optimized N-Gram for Mobile Devices2019 IEEE 13th International Conference on Semantic Computing (ICSC)10.1109/ICOSC.2019.8665639(87-92)Online publication date: Jan-2019
  • (2019)Effects of Prediction-Length on Accuracy in Automatic Assamese word prediction2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT)10.1109/ICECCT.2019.8869431(1-4)Online publication date: Feb-2019
  • (2019)Optimized and Predictive Phonemic Interfaces for Augmentative and Alternative CommunicationJournal of Speech, Language, and Hearing Research10.1044/2019_JSLHR-S-MSC18-18-018762:7(2065-2081)Online publication date: 15-Jul-2019
  • (2017)Development and Theoretical Evaluation of Optimized Phonemic InterfacesProceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3132525.3132537(230-239)Online publication date: 19-Oct-2017
  • (2017)Towards intelligent arabic text-to-speech application for disabled people2017 International Conference on Informatics, Health & Technology (ICIHT)10.1109/ICIHT.2017.7899133(1-6)Online publication date: Feb-2017
  • (2013)An Adaptive Spellchecker and Predictor for People with DyslexiaUser Modeling, Adaptation, and Personalization10.1007/978-3-642-38844-6_51(409-413)Online publication date: 2013
  • (2012)Non-syntactic word prediction for AACProceedings of the Third Workshop on Speech and Language Processing for Assistive Technologies10.5555/2392855.2392860(28-36)Online publication date: 7-Jun-2012
  • (2012)Basic word completion and prediction for hebrewProceedings of the 19th international conference on String Processing and Information Retrieval10.1007/978-3-642-34109-0_25(237-244)Online publication date: 21-Oct-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media