ABSTRACT
Real-time transcription has been shown to be valuable in facilitating non-native speakers' comprehension in real-time communication. Automated speech recognition (ASR) technology is a critical ingredient for its practical deployment. This paper presents a series of studies investigating how the quality of transcripts generated by an ASR system impacts user comprehension and subjective evaluation. Experiments are first presented comparing performance across three different transcription conditions: no transcript, a perfect transcript, and a transcript with Word Error Rate (WER) =20%. We found 20% WER was the most likely critical point for transcripts to be just acceptable and useful. Then we further examined a lower WER of 10% (a lower bound for today's state-of-the-art systems) employing the same experimental design. The results indicated that at 10% WER comprehension performance was significantly improved compared to the no-transcript condition. Finally, implications for further system development and design are discussed.
- Chen, S., Kingsbury, B., Mangu, L., et al. Advances in Speech Transcription at IBM under the DARPA EARS Program. IEEE Transactions on Audio, Speech, and Language Processing 14, 5 (2006), 1596--1608. Google ScholarDigital Library
- Cui, X., Gu, L., Xiang, B., et al. Developing High Performance ASR in the IBM Multilingual Speech-to-Speech Translation System. In Proc. ICASSP 2008 (International Conference on Acoustics, Speech, and Signal Processing), IEEE Press (2008), 5121--5124.Google Scholar
- Gales, M.J.F. Maximum Likelihood Linear Transformations for HMM-based Speech Recognition. Computer Speech and Language, 12 (1998), 75--98.Google ScholarCross Ref
- Hamon O, Fugen C., Mostefa D., et al. End-to-End Evaluation in Simultaneous Translation. In Proc. 12th Conference of the European Chapter of the ACL, 345--353. Google ScholarDigital Library
- Jin Y., Psychological Measurement. East China Normal University Press, China, 2005.Google Scholar
- Kaiser, E.C., Barthelmess, P., Erdmann, C., et al. Multimodal Redundancy across Handwriting and Speech During Computer Mediated Human-Human Interactions. In Proc. ACM SIG'CHI 2007, ACM Press (2007), 1009--1018. Google ScholarDigital Library
- Kheir, R and Way, T. Inclusion of Deaf Students in Computer Science Classes Using Real--time Speech Transcription. In Proc. ITiCSE 2007 (Annual Conference on Innovation and Technology in Computer Science Education), ACM Press (2007), 261--265. Google ScholarDigital Library
- Lau, R., Rosenfeld, R., and Roukos, S. Adpative Language Modeling Using the Maximum Entropy Principle. In Proc. the ARPA Workshop on Human Language Technology 1993, 108--113. Google ScholarDigital Library
- Leith, D. and MacMilan, T. Liberated Learning Initiative Innovation Technology and Inclusion: Current Issues and Future Directions for Liberated Learning Research. Year III Report, 2003 Saint Mary's University, Nova Scotia.Google Scholar
- Munteanu, C., Baecker, R., Penn, G., et al. The Effect of Speech Recognition Accuracy Rates on the Usefulness and Usability of Webcast Archives. In Proc. ACM SIG'CHI 2006, ACM Press (2006), 493--502. Google ScholarDigital Library
- Nakamura, S., Markov, K., Nakaiwa, H., et al. The ATR Multilingual Speech-to-Speech Translation System. IEEE Transactions on Audio, Speech, and Language Processing 10, 2 (2006), 365--376. Google ScholarDigital Library
- Pan, Y., Jiang, D., Picheny, M., et al. Effects of Real-time Transcription on Non-native Speaker's Comprehension in Computer-mediated Communications. In Proc. ACM SIG'CHI 2009, ACM Press (2009), 2353--2356. Google ScholarDigital Library
- Ramabhadran, B., Siohan, O., and Sethy, A. The IBM 2007 Speech Transcription System for European Parliamentary Speeches. In Proc. ASRU 2007 (the Automatic Speech Recognition and Understanding Workshop), IEEE Press (2007), 472--477.Google ScholarCross Ref
- Sanders, G.A. and LE, A.N. Effects of Speech Recognition Accuracy on the Performance of DARPA Communicator Spoken Dialogue Systems. International Journal of Speech Technology 7 (2004), 293--309.Google ScholarCross Ref
- Shi, Q., Chu, S.M., Liu, W. et al. Search and Classification Based Language Model Adaptation. In Proc. Interspeech 2008 (Annual Conference of the International Speech Communication Association).Google Scholar
- Stark, L., Whittaker, S., and Hirschberg, J. ASR Satisficing: The Effects of ASR Accuracy on Speech Retrieval. In Proc. of ICSLP 2000 (International Conference on Spoken Language Processing).Google Scholar
- Stolcke, A., Anguera, X., Boakye, K., et al. The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System. Lecture Notes in Computer Science, Springer (2008).Google Scholar
- Tong, K. A. The effects of question type, question position, text length and answer location on ESL listening tests. Exploring Language. Hong Kong: Language Center, Hong Kong University of Science and Technology.Google Scholar
- Tyler, M.D. The Effect of Background Knowledge on First and Second Language Comprehension Difficulty. In Proc. ICSLP 1998 (International Conference on Spoken Language Processing.)Google Scholar
- Uebel, L.F. and Woodland, P.C. Speaker Adaptation Using Lattice-based MLLR. In Proc. ITRW on Adaptation Methods for Speech Recognition, 2001.Google Scholar
- Wald, M. Using Automatic Speech Recognition to Enhance Education for All Students: Turning a Vision into Reality. In Proc. ASEE/IEEE Frontiers in Education Conference, S3G-22-25.Google Scholar
- Woodland, P.C., Pye, D., and Gales, M.J.F. Iterative Unsupervised Adaptation Using Maximum Likelihood Linear Regression. In Proc. ICSLP 1996 (International Conference on Spoken Language Processing), 1133--1136.Google Scholar
Index Terms
- Effects of automated transcription quality on non-native speakers' comprehension in real-time computer-mediated communication
Recommendations
Effects of public vs. private automated transcripts on multiparty communication between native and non-native english speakers
CHI '14: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsReal-time transcripts generated by automated speech recognition (ASR) technologies have the potential to facilitate communication between native speakers (NS) and non-native speakers (NNS). Previous studies of ASR have focused on how transcripts aid NNS ...
Do automated transcripts help non-native speakers catch up on missed conversation in audio conferences?
CABS '14: Proceedings of the 5th ACM international conference on Collaboration across boundaries: culture, distance & technologyPrevious work has suggested that speeded up playback of recorded audio works well for native speakers (NS) to catch up on conversation they missed in real-time audio conferences. However, this might not be the case for non-native speakers (NNS) who ...
Effects of real-time transcription on non-native speaker's comprehension in computer-mediated communications
CHI '09: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsWe performed an empirical study to understand the relative contributions of real-time transcription to a non-native speaker's comprehension in audio/video meetings. 48 participants were assigned to 2 presentation modes (audio, audio+video) and 3 ...
Comments