skip to main content
10.1145/1753326.1753584acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Effects of automated transcription quality on non-native speakers' comprehension in real-time computer-mediated communication

Published:10 April 2010Publication History

ABSTRACT

Real-time transcription has been shown to be valuable in facilitating non-native speakers' comprehension in real-time communication. Automated speech recognition (ASR) technology is a critical ingredient for its practical deployment. This paper presents a series of studies investigating how the quality of transcripts generated by an ASR system impacts user comprehension and subjective evaluation. Experiments are first presented comparing performance across three different transcription conditions: no transcript, a perfect transcript, and a transcript with Word Error Rate (WER) =20%. We found 20% WER was the most likely critical point for transcripts to be just acceptable and useful. Then we further examined a lower WER of 10% (a lower bound for today's state-of-the-art systems) employing the same experimental design. The results indicated that at 10% WER comprehension performance was significantly improved compared to the no-transcript condition. Finally, implications for further system development and design are discussed.

References

  1. Chen, S., Kingsbury, B., Mangu, L., et al. Advances in Speech Transcription at IBM under the DARPA EARS Program. IEEE Transactions on Audio, Speech, and Language Processing 14, 5 (2006), 1596--1608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Cui, X., Gu, L., Xiang, B., et al. Developing High Performance ASR in the IBM Multilingual Speech-to-Speech Translation System. In Proc. ICASSP 2008 (International Conference on Acoustics, Speech, and Signal Processing), IEEE Press (2008), 5121--5124.Google ScholarGoogle Scholar
  3. Gales, M.J.F. Maximum Likelihood Linear Transformations for HMM-based Speech Recognition. Computer Speech and Language, 12 (1998), 75--98.Google ScholarGoogle ScholarCross RefCross Ref
  4. Hamon O, Fugen C., Mostefa D., et al. End-to-End Evaluation in Simultaneous Translation. In Proc. 12th Conference of the European Chapter of the ACL, 345--353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jin Y., Psychological Measurement. East China Normal University Press, China, 2005.Google ScholarGoogle Scholar
  6. Kaiser, E.C., Barthelmess, P., Erdmann, C., et al. Multimodal Redundancy across Handwriting and Speech During Computer Mediated Human-Human Interactions. In Proc. ACM SIG'CHI 2007, ACM Press (2007), 1009--1018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kheir, R and Way, T. Inclusion of Deaf Students in Computer Science Classes Using Real--time Speech Transcription. In Proc. ITiCSE 2007 (Annual Conference on Innovation and Technology in Computer Science Education), ACM Press (2007), 261--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Lau, R., Rosenfeld, R., and Roukos, S. Adpative Language Modeling Using the Maximum Entropy Principle. In Proc. the ARPA Workshop on Human Language Technology 1993, 108--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Leith, D. and MacMilan, T. Liberated Learning Initiative Innovation Technology and Inclusion: Current Issues and Future Directions for Liberated Learning Research. Year III Report, 2003 Saint Mary's University, Nova Scotia.Google ScholarGoogle Scholar
  10. Munteanu, C., Baecker, R., Penn, G., et al. The Effect of Speech Recognition Accuracy Rates on the Usefulness and Usability of Webcast Archives. In Proc. ACM SIG'CHI 2006, ACM Press (2006), 493--502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Nakamura, S., Markov, K., Nakaiwa, H., et al. The ATR Multilingual Speech-to-Speech Translation System. IEEE Transactions on Audio, Speech, and Language Processing 10, 2 (2006), 365--376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Pan, Y., Jiang, D., Picheny, M., et al. Effects of Real-time Transcription on Non-native Speaker's Comprehension in Computer-mediated Communications. In Proc. ACM SIG'CHI 2009, ACM Press (2009), 2353--2356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ramabhadran, B., Siohan, O., and Sethy, A. The IBM 2007 Speech Transcription System for European Parliamentary Speeches. In Proc. ASRU 2007 (the Automatic Speech Recognition and Understanding Workshop), IEEE Press (2007), 472--477.Google ScholarGoogle ScholarCross RefCross Ref
  14. Sanders, G.A. and LE, A.N. Effects of Speech Recognition Accuracy on the Performance of DARPA Communicator Spoken Dialogue Systems. International Journal of Speech Technology 7 (2004), 293--309.Google ScholarGoogle ScholarCross RefCross Ref
  15. Shi, Q., Chu, S.M., Liu, W. et al. Search and Classification Based Language Model Adaptation. In Proc. Interspeech 2008 (Annual Conference of the International Speech Communication Association).Google ScholarGoogle Scholar
  16. Stark, L., Whittaker, S., and Hirschberg, J. ASR Satisficing: The Effects of ASR Accuracy on Speech Retrieval. In Proc. of ICSLP 2000 (International Conference on Spoken Language Processing).Google ScholarGoogle Scholar
  17. Stolcke, A., Anguera, X., Boakye, K., et al. The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System. Lecture Notes in Computer Science, Springer (2008).Google ScholarGoogle Scholar
  18. Tong, K. A. The effects of question type, question position, text length and answer location on ESL listening tests. Exploring Language. Hong Kong: Language Center, Hong Kong University of Science and Technology.Google ScholarGoogle Scholar
  19. Tyler, M.D. The Effect of Background Knowledge on First and Second Language Comprehension Difficulty. In Proc. ICSLP 1998 (International Conference on Spoken Language Processing.)Google ScholarGoogle Scholar
  20. Uebel, L.F. and Woodland, P.C. Speaker Adaptation Using Lattice-based MLLR. In Proc. ITRW on Adaptation Methods for Speech Recognition, 2001.Google ScholarGoogle Scholar
  21. Wald, M. Using Automatic Speech Recognition to Enhance Education for All Students: Turning a Vision into Reality. In Proc. ASEE/IEEE Frontiers in Education Conference, S3G-22-25.Google ScholarGoogle Scholar
  22. Woodland, P.C., Pye, D., and Gales, M.J.F. Iterative Unsupervised Adaptation Using Maximum Likelihood Linear Regression. In Proc. ICSLP 1996 (International Conference on Spoken Language Processing), 1133--1136.Google ScholarGoogle Scholar

Index Terms

  1. Effects of automated transcription quality on non-native speakers' comprehension in real-time computer-mediated communication

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
      April 2010
      2690 pages
      ISBN:9781605589299
      DOI:10.1145/1753326

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 April 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate6,199of26,314submissions,24%

      Upcoming Conference

      CHI '24
      CHI Conference on Human Factors in Computing Systems
      May 11 - 16, 2024
      Honolulu , HI , USA

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader