skip to main content
10.1145/1124772.1124848acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
Article

The effect of speech recognition accuracy rates on the usefulness and usability of webcast archives

Published: 22 April 2006 Publication History

Abstract

The widespread availability of broadband connections has led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. In the absence of transcripts of what was said, users have difficulty searching and scanning for specific topics. This research investigates user needs for transcription accuracy in webcast archives, and measures how the quality of transcripts affects user performance in a question-answering task, and how quality affects overall user experience. We tested 48 subjects in a within-subjects design under 4 conditions: perfect transcripts, transcripts with 25% Word Error Rate (WER), transcripts with 45% WER, and no transcript. Our data reveals that speech recognition accuracy linearly influences both user performance and experience, shows that transcripts with 45% WER are unsatisfactory, and suggests that transcripts having a WER of 25% or less would be useful and usable in webcast archives.

References

[1]
Arons, B. SpeechSkimmer: A System for Interactively Skimming Recorded Speech. ACM Transactions on Computer-Human Interaction (1997), v. 4, n. 1, 3--38.]]
[2]
Baecker, R. M. A Principled Design for Scalable Internet Visual Communications with Rich Media, Interactivity, and Structured Archives. Proc. of CASCON (2003), 83--96.]]
[3]
Dufour, C., Toms, E. G., and Lewis, J. and Baecker, R. M. User Strategies for Handling Information Tasks in Webcasts Proc. of CHI (2005), 1343--1346.]]
[4]
Gauvain, J-L., Lamel, L., and Adda, G. The LIMSI Broadcast News Transcription System. Speech Communications (2002), v. 37, n. 1-2, 89--108.]]
[5]
Huang, X., Acero, A., and Hon, H.W. Spoken Language Processing. Prentice Hall, USA, 2001.]]
[6]
Howell, D.C. Statistical Methods for Psychology. Duxbury Press, USA, 1997.]]
[7]
Howell, D. C. Fundamental Statistics for the Behavioural Sciences. Duxbury Press, USA, 1999.]]
[8]
Howell, D. C. Multiple Comparisons with Repeated Measures. http://www.uvm.edu/~dhowell]]
[9]
Kato, K., Nanjo, H., and Kawahara, T. Automatic Transcription of Lecture Speech Using Topic-Independent Language Modeling. Proc. of the International Conference on Spoken Language Processing (2000), 162--165.]]
[10]
Kirk, R. E. Experimental Design: Procedures for the Behavioural Sciences. Brooks Publishing, USA, 1995.]]
[11]
LaLomia, M. J. User Acceptance of Handwritten Recognition Accuracy. The Conference Companion on Human Factors in Computing Systems (1997), 107.]]
[12]
Leeuwis, E., Federico, M., and Cettolo, M. Language Modeling and Transcription of the TED Corpus Lectures. Proc. of the IEEE Conference on Acoustics, Speech, and Signal Processing (2003), 232--235.]]
[13]
Park, A., Hazen, T.J., and Glass, J.R. Automatic Processing of Audio Lectures for Information Retrieval. Proc. of the IEEE Conference on Acoustics, Speech, and Signal Processing (2005), 497--500.]]
[14]
Pellom, B. L. Sonic: The University of Colorado Continuous Speech Recognizer. Technical Report #TR-CSLR-2001-01, Boulder, Colorado (2001).]]
[15]
Ritter, P. The Business Case for On-Demand Rich Media. Wainhouse Research Whitepapers (2004).]]
[16]
Rogina, I. and Schaaf, T. Lecture and Presentation Tracking in an Intelligent Meeting Room. Proc. of the International Conference on Multimodal Interfaces (2000).]]
[17]
Sawhney, N. and Schmandt, C. Nomadic Radio: Speech & Audio Interaction for Contextual Messaging in Nomadic Environments ACM Transactions on Computer-Human Interaction (2000), v. 7, n. 3, 353--383.]]
[18]
SPSS 13.0. http://www.spss.com]]
[19]
Stark, L., Whittaker, S., and Hirschberg, J. ASR Satisficing: The Effects of ASR Accuracy on Speech Retrieval. Proc. of the International Conference on Spoken Language Processing (2000), 1069--1072.]]
[20]
Stern, R. Specifications of the 1996 Hub 4 Broadcast News Evaluation. Proc. of the DARPA Speech Recognition Workshop (1997).]]
[21]
Toms, E. G., Dufour, C., Lewis, J., and Baecker, R. M. Assessing Tools for Use with Webcasts. Proc. of the Joint Conference on Digital Libraries (2005), 79--88.]]
[22]
Van Buskirk, R. and LaLomia, M. J. The Just Noticeable Difference of Speech Recognition Accuracy. CHI Mosaic of Creativity: The Conference Companion on Human Factors in Computing Systems (1995), 95.]]
[23]
Veronis, J. A Study of Polysemy Judgements and Inter-annotator Agreement. Proc. SENSEVAL Workshop: Evaluating Word Sense Disambiguation Programs (1998), 2--4.]]
[24]
Wald, M., Bain, K., and Basson, S.H. Speech Recognition in University Classrooms. Proc. of the International ACM SIGCAPH Conference on Assistive Technologies (2002), 192--196.]]
[25]
Wang, Y.Y., Acero, A., and Chelba, C. Is Word Error Rate a Good Indicator for Spoken Language Understanding Accuracy? Proc. of the IEEE Automatic Speech Recognition and Understanding Workshop (2003), 577--582.]]
[26]
Ward, W. and Issar, S. The CMU ATIS System. Proc. of the ARPA Workshop on Spoken Language Technology (1995), 249--251.]]
[27]
Whittaker, S., Hirschberg, J., Amento, B., Stark, L., Bacchiani, M., Isenhour, P., Stead, L., Zamchick, G., and Rosenberg, A. SCANMail: a Voicemail Interface that Makes Speech Browsable, Readable and Searchable. Proc. of the SIGCHI Conference on Human Factors in Computing Systems (2002), 275--282.]]
[28]
Whittaker, S. and Hirschberg, J. Look or Listen: Discovering Effective Techniques for Accessing Speech Data. Proc. of the Human-Computer Interaction Conference. Springer-Verlag (2003), 253--269.]]

Cited By

View all
  • (2024)SafeEar: Content Privacy-Preserving Audio Deepfake DetectionProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670285(3585-3599)Online publication date: 2-Dec-2024
  • (2023)Creating Design Resources to Scaffold the Ideation of AI ConceptsProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3596058(2326-2346)Online publication date: 10-Jul-2023
  • (2023)Analysis of a Hinglish ASR System’s Performance for Fraud DetectionSpeech and Computer10.1007/978-3-031-48312-7_4(46-58)Online publication date: 22-Nov-2023
  • Show More Cited By

Index Terms

  1. The effect of speech recognition accuracy rates on the usefulness and usability of webcast archives

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CHI '06: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
      April 2006
      1353 pages
      ISBN:1595933727
      DOI:10.1145/1124772
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 April 2006

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. automatic speech recognition
      2. navigational tools
      3. text transcripts
      4. webcast systems

      Qualifiers

      • Article

      Conference

      CHI06
      Sponsor:
      CHI06: CHI 2006 Conference on Human Factors in Computing Systems
      April 22 - 27, 2006
      Québec, Montréal, Canada

      Acceptance Rates

      Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

      Upcoming Conference

      CHI 2025
      ACM CHI Conference on Human Factors in Computing Systems
      April 26 - May 1, 2025
      Yokohama , Japan

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)17
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 14 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)SafeEar: Content Privacy-Preserving Audio Deepfake DetectionProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security10.1145/3658644.3670285(3585-3599)Online publication date: 2-Dec-2024
      • (2023)Creating Design Resources to Scaffold the Ideation of AI ConceptsProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3596058(2326-2346)Online publication date: 10-Jul-2023
      • (2023)Analysis of a Hinglish ASR System’s Performance for Fraud DetectionSpeech and Computer10.1007/978-3-031-48312-7_4(46-58)Online publication date: 22-Nov-2023
      • (2022)A Personalized Visual Aid for Selections of Appearance Building Products with Long-term EffectsProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3517659(1-18)Online publication date: 29-Apr-2022
      • (2021)Meaning Error RateProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467372(458-466)Online publication date: 14-Aug-2021
      • (2021)Effective Microphone Array Placement in Interactive Whiteboards for Smart Meeting Rooms2021 IEEE 7th World Forum on Internet of Things (WF-IoT)10.1109/WF-IoT51360.2021.9595803(852-856)Online publication date: 14-Jun-2021
      • (2020)Transcripts and Accessibility: Student Views from Using Webinars in Built Environment EducationEuropean Journal of Open, Distance and E-Learning10.2478/eurodl-2020-000923:2(37-50)Online publication date: 10-Dec-2020
      • (2020)Vocabulary Size Influences Spontaneous Speech in Native Language Users: Validating the Use of Automatic Speech Recognition in Individual Differences ResearchLanguage and Speech10.1177/002383092091107964:1(35-51)Online publication date: 30-Mar-2020
      • (2019)Neural models of text normalization for speech applicationsComputational Linguistics10.1162/coli_a_0034945:2(293-337)Online publication date: 1-Jun-2019
      • (2019)Effects of WER on ASR Correction Interfaces for Mobile Text EntryProceedings of the 21st International Conference on Human-Computer Interaction with Mobile Devices and Services10.1145/3338286.3344404(1-6)Online publication date: 1-Oct-2019
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media