skip to main content
10.1145/1180995.1181005acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

Automatic speech recognition for webcasts: how good is good enough and what to do when it isn't

Published: 02 November 2006 Publication History

Abstract

The increased availability of broadband connections has recently led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. One challenge to skimming and browsing through such archives is the lack of text transcripts of the webcast's audio channel. This paper describes a procedure for prototyping an Automatic Speech Recognition (ASR) system that generates realistic transcripts of any desired Word Error Rate (WER), thus overcoming the drawbacks of both prototype-based and Wizard of Oz simulations. We used such a system in a user study showing that transcripts with WERs less than 25% are acceptable for use in webcast archives. As current ASR systems can only deliver, in realistic conditions, Word Error Rates (WERs) of around 45%, we also describe a solution for reducing the WER of such transcripts by engaging users to collaborate in a "wiki" fashion on editing the imperfect transcripts obtained through ASR.

References

[1]
C. Dufour, E. G. Toms, J. Lewis, and R. M. Baecker, "User strategies for handling information tasks in webcasts," in Proc. of CHI, 2005.
[2]
E. Leeuwis, M. Federico, and M. Cettolo, "Language modeling and transcription of the TED Corpus lectures," in Proc. of the IEEE ICASSP, 2003.
[3]
I. Rogina and T. Schaaf, "Lecture and presentation tracking in an intelligent meeting room," in Proc. of IEEE ICMI, 2000.
[4]
K. Kato, H. Nanjo, and T. Kawahara, "Automatic transcription of lecture speech using topic-independent language modeling," in Proc. of ICSLP, 2000.
[5]
C. Munteanu, R. Baecker, G. Penn, E. Toms, and D. James, "The effect of speech recognition accuracy rates on the usefulness and usability of webcast archives," in Proc. of CHI, 2006.
[6]
S. Whittaker and J. Hirschberg, "Look or listen: Discovering effective techniques for accessing speech data," in Proc. of the Human-Computer Interaction Conference. 2003, Springer-Verlag.
[7]
A. Park, T. J. Hazen, and J. R. Glass, "Automatic processing of audio lectures for information retrieval," in Proc. of IEEE ICASSP, 2005.
[8]
L. Stark, S. Whittaker, and J. Hirschberg, "ASR satisficing: The effects of ASR accuracy on speech retrieval," in Proc. of ICSLP, 2000.
[9]
S. Whittaker, J. Hirschberg, B. Amento, L. Stark, M. Bacchiani, P. Isenhour, L. Stead, G. Zamchick, and A. Rosenberg, "SCANMail: a voicemail interface that makes speech browsable, readable and searchable," in Proc. of CHI, 2002.
[10]
N.O. Bernsen, H. Dybkjær, and L. Dybkjær, Designing Interactive Speech Systems: From First Ideas to User Testing, Springer-Verlag, 1998.
[11]
L. von Ahn and L. Dabbish, "Labeling images with a computer game," in Proc. of CHI, 2004.
[12]
T. Volkmer, J. R. Smith, and A. Natsev, "A web-based system for collaborative annotation of large image and video collections," in Proc. of ACM MM, 2005.
[13]
K. Goldberg, B. Chen, Solomon R., and S. Bui, "Collaborative teleoperation via the internet," in Proc. of the IEEE ICRA, 2000.
[14]
K. Crowston, H. Annabi, J. Howson, and C. Masango, "Effective work practices for software engineering: Free/libre open source software development," in Proc. of WISER, 2006.
[15]
N. Dahlbäck, A. Jönsson, and L. Ahrenberg, "Wizard of Oz studies -- why and how," in Proc. of the International Workshop on Intelligent User Interfaces, Orlando, Florida, USA, 1993.
[16]
B. L. Pellom, "SONIC: The University of Colorado continuous speech recognizer," Tech. Rep. TR-CSLR-2001-01, University of Colorado, 2001.
[17]
"The Wall Street Journal Dictation Corpus (DARPA-CSR)," The Linguistic Data Consortium, LDC94S13, 1992.
[18]
R. Stern, "Specifications of the 1996 Hub-4 broadcast news evaluation.," in Proc. of the DARPA Speech Recognition Workshop, 1997.

Cited By

View all
  • (2021)Can we talk? Design Implications for the Questionnaire-Driven Self-Report of Health and Wellbeing via Conversational AgentProceedings of the 3rd Conference on Conversational User Interfaces10.1145/3469595.3469600(1-11)Online publication date: 27-Jul-2021
  • (2017)Designing Speech, Acoustic and Multimodal InteractionsProceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems10.1145/3027063.3027086(601-608)Online publication date: 6-May-2017
  • (2017)Extracting audio summaries to support effective spoken document searchJournal of the Association for Information Science and Technology10.1002/asi.2383168:9(2101-2115)Online publication date: 1-Sep-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces
November 2006
404 pages
ISBN:159593541X
DOI:10.1145/1180995
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automatic speech recognition
  2. collaboration
  3. webcasts

Qualifiers

  • Article

Conference

ICMI06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Can we talk? Design Implications for the Questionnaire-Driven Self-Report of Health and Wellbeing via Conversational AgentProceedings of the 3rd Conference on Conversational User Interfaces10.1145/3469595.3469600(1-11)Online publication date: 27-Jul-2021
  • (2017)Designing Speech, Acoustic and Multimodal InteractionsProceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems10.1145/3027063.3027086(601-608)Online publication date: 6-May-2017
  • (2017)Extracting audio summaries to support effective spoken document searchJournal of the Association for Information Science and Technology10.1002/asi.2383168:9(2101-2115)Online publication date: 1-Sep-2017
  • (2016)Designing Speech and Multimodal Interactions for Mobile, Wearable, and Pervasive ApplicationsProceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems10.1145/2851581.2856506(3612-3619)Online publication date: 7-May-2016
  • (2015)Speech-based InteractionProceedings of the 20th International Conference on Intelligent User Interfaces10.1145/2678025.2716263(437-438)Online publication date: 18-Mar-2015
  • (2015)An approach for automated video indexing and video search in large lecture video archives2015 International Conference on Pervasive Computing (ICPC)10.1109/PERVASIVE.2015.7087169(1-5)Online publication date: Jan-2015
  • (2015)Optimized searching of video based on speech and video text content2015 International Conference on Soft-Computing and Networks Security (ICSNS)10.1109/ICSNS.2015.7292369(1-4)Online publication date: Feb-2015
  • (2014)Designing speech and language interactionsCHI '14 Extended Abstracts on Human Factors in Computing Systems10.1145/2559206.2559228(75-78)Online publication date: 26-Apr-2014
  • (2014)Content Based Lecture Video Retrieval Using Speech and Video Text InformationIEEE Transactions on Learning Technologies10.1109/TLT.2014.23073057:2(142-154)Online publication date: Apr-2014
  • (2013)We need to talkCHI '13 Extended Abstracts on Human Factors in Computing Systems10.1145/2468356.2468803(2459-2464)Online publication date: 27-Apr-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media