research-article

Effects of automated transcription quality on non-native speakers' comprehension in real-time computer-mediated communication

Authors:
Yingxin Pan

IBM Research-China, Beijing, China

IBM Research-China, Beijing, China
View Profile

,
Danning Jiang

IBM Research-China, Beijing, China

IBM Research-China, Beijing, China
View Profile

,
Lin Yao

Chinese Academy of Science,, Beijing, China

Chinese Academy of Science,, Beijing, China
View Profile

,
Michael Picheny

IBM Research - Watson, Yorktown Heights, USA

IBM Research - Watson, Yorktown Heights, USA
View Profile

,
Yong Qin

IBM Research- China, Beijing, China

IBM Research- China, Beijing, China
View Profile

CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsApril 2010Pages 1725–1734https://doi.org/10.1145/1753326.1753584

Published:10 April 2010Publication History

CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Pages 1725–1734

ABSTRACT

Real-time transcription has been shown to be valuable in facilitating non-native speakers' comprehension in real-time communication. Automated speech recognition (ASR) technology is a critical ingredient for its practical deployment. This paper presents a series of studies investigating how the quality of transcripts generated by an ASR system impacts user comprehension and subjective evaluation. Experiments are first presented comparing performance across three different transcription conditions: no transcript, a perfect transcript, and a transcript with Word Error Rate (WER) =20%. We found 20% WER was the most likely critical point for transcripts to be just acceptable and useful. Then we further examined a lower WER of 10% (a lower bound for today's state-of-the-art systems) employing the same experimental design. The results indicated that at 10% WER comprehension performance was significantly improved compared to the no-transcript condition. Finally, implications for further system development and design are discussed.

References

Chen, S., Kingsbury, B., Mangu, L., et al. Advances in Speech Transcription at IBM under the DARPA EARS Program. IEEE Transactions on Audio, Speech, and Language Processing 14, 5 (2006), 1596--1608. Google ScholarDigital Library
Cui, X., Gu, L., Xiang, B., et al. Developing High Performance ASR in the IBM Multilingual Speech-to-Speech Translation System. In Proc. ICASSP 2008 (International Conference on Acoustics, Speech, and Signal Processing), IEEE Press (2008), 5121--5124.Google Scholar
Gales, M.J.F. Maximum Likelihood Linear Transformations for HMM-based Speech Recognition. Computer Speech and Language, 12 (1998), 75--98.Google ScholarCross Ref
Hamon O, Fugen C., Mostefa D., et al. End-to-End Evaluation in Simultaneous Translation. In Proc. 12th Conference of the European Chapter of the ACL, 345--353. Google ScholarDigital Library
Jin Y., Psychological Measurement. East China Normal University Press, China, 2005.Google Scholar
Kaiser, E.C., Barthelmess, P., Erdmann, C., et al. Multimodal Redundancy across Handwriting and Speech During Computer Mediated Human-Human Interactions. In Proc. ACM SIG'CHI 2007, ACM Press (2007), 1009--1018. Google ScholarDigital Library
Kheir, R and Way, T. Inclusion of Deaf Students in Computer Science Classes Using Real--time Speech Transcription. In Proc. ITiCSE 2007 (Annual Conference on Innovation and Technology in Computer Science Education), ACM Press (2007), 261--265. Google ScholarDigital Library
Lau, R., Rosenfeld, R., and Roukos, S. Adpative Language Modeling Using the Maximum Entropy Principle. In Proc. the ARPA Workshop on Human Language Technology 1993, 108--113. Google ScholarDigital Library
Leith, D. and MacMilan, T. Liberated Learning Initiative Innovation Technology and Inclusion: Current Issues and Future Directions for Liberated Learning Research. Year III Report, 2003 Saint Mary's University, Nova Scotia.Google Scholar
Munteanu, C., Baecker, R., Penn, G., et al. The Effect of Speech Recognition Accuracy Rates on the Usefulness and Usability of Webcast Archives. In Proc. ACM SIG'CHI 2006, ACM Press (2006), 493--502. Google ScholarDigital Library
Nakamura, S., Markov, K., Nakaiwa, H., et al. The ATR Multilingual Speech-to-Speech Translation System. IEEE Transactions on Audio, Speech, and Language Processing 10, 2 (2006), 365--376. Google ScholarDigital Library
Pan, Y., Jiang, D., Picheny, M., et al. Effects of Real-time Transcription on Non-native Speaker's Comprehension in Computer-mediated Communications. In Proc. ACM SIG'CHI 2009, ACM Press (2009), 2353--2356. Google ScholarDigital Library
Ramabhadran, B., Siohan, O., and Sethy, A. The IBM 2007 Speech Transcription System for European Parliamentary Speeches. In Proc. ASRU 2007 (the Automatic Speech Recognition and Understanding Workshop), IEEE Press (2007), 472--477.Google ScholarCross Ref
Sanders, G.A. and LE, A.N. Effects of Speech Recognition Accuracy on the Performance of DARPA Communicator Spoken Dialogue Systems. International Journal of Speech Technology 7 (2004), 293--309.Google ScholarCross Ref
Shi, Q., Chu, S.M., Liu, W. et al. Search and Classification Based Language Model Adaptation. In Proc. Interspeech 2008 (Annual Conference of the International Speech Communication Association).Google Scholar
Stark, L., Whittaker, S., and Hirschberg, J. ASR Satisficing: The Effects of ASR Accuracy on Speech Retrieval. In Proc. of ICSLP 2000 (International Conference on Spoken Language Processing).Google Scholar
Stolcke, A., Anguera, X., Boakye, K., et al. The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System. Lecture Notes in Computer Science, Springer (2008).Google Scholar
Tong, K. A. The effects of question type, question position, text length and answer location on ESL listening tests. Exploring Language. Hong Kong: Language Center, Hong Kong University of Science and Technology.Google Scholar
Tyler, M.D. The Effect of Background Knowledge on First and Second Language Comprehension Difficulty. In Proc. ICSLP 1998 (International Conference on Spoken Language Processing.)Google Scholar
Uebel, L.F. and Woodland, P.C. Speaker Adaptation Using Lattice-based MLLR. In Proc. ITRW on Adaptation Methods for Speech Recognition, 2001.Google Scholar
Wald, M. Using Automatic Speech Recognition to Enhance Education for All Students: Turning a Vision into Reality. In Proc. ASEE/IEEE Frontiers in Education Conference, S3G-22-25.Google Scholar
Woodland, P.C., Pye, D., and Gales, M.J.F. Iterative Unsupervised Adaptation Using Maximum Likelihood Linear Regression. In Proc. ICSLP 1996 (International Conference on Spoken Language Processing), 1133--1136.Google Scholar

Index Terms

Effects of automated transcription quality on non-native speakers' comprehension in real-time computer-mediated communication
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Effects of public vs. private automated transcripts on multiparty communication between native and non-native english speakers
CHI '14: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Real-time transcripts generated by automated speech recognition (ASR) technologies have the potential to facilitate communication between native speakers (NS) and non-native speakers (NNS). Previous studies of ASR have focused on how transcripts aid NNS ...
Read More
Do automated transcripts help non-native speakers catch up on missed conversation in audio conferences?
CABS '14: Proceedings of the 5th ACM international conference on Collaboration across boundaries: culture, distance & technology

Previous work has suggested that speeded up playback of recorded audio works well for native speakers (NS) to catch up on conversation they missed in real-time audio conferences. However, this might not be the case for non-native speakers (NNS) who ...
Read More
Effects of real-time transcription on non-native speaker's comprehension in computer-mediated communications
CHI '09: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

We performed an empirical study to understand the relative contributions of real-time transcription to a non-native speaker's comprehension in audio/video meetings. 48 participants were assigned to 2 presentation modes (audio, audio+video) and 3 ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
April 2010
2690 pages
ISBN:9781605589299
DOI:10.1145/1753326
General Chair:
Elizabeth Mynatt
Georgia Institute of Technology
,
Program Chairs:
Geraldine Fitzpatrick
Vienna University of Technology
,
Scott Hudson
Carnegie Mellon University
,
Keith Edwards
Georgia Tech
,
Tom Rodden
University of Nottingham
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 April 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
automated speech recognition
cmc
experiment
non-native speakers
real-time transcription
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate6,199of26,314submissions,24%
Upcoming Conference
CHI '24

Sponsor:

sigchi

CHI Conference on Human Factors in Computing Systems

May 11 - 16, 2024

Honolulu , HI , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 426
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Effects of automated transcription quality on non-native speakers' comprehension in real-time computer-mediated communication

CHI '10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Effects of public vs. private automated transcripts on multiparty communication between native and non-native english speakers

Do automated transcripts help non-native speakers catch up on missed conversation in audio conferences?

Effects of real-time transcription on non-native speaker's comprehension in computer-mediated communications