research-article

Collaborative editing for improved usefulness and usability of transcript-enhanced webcasts

Authors:

Cosmin Munteanu,

Gerald PennAuthors Info & Claims

CHI '08: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Pages 373 - 382

https://doi.org/10.1145/1357054.1357117

Published: 06 April 2008 Publication History

Abstract

One challenge in facilitating skimming or browsing through archives of on-line recordings of webcast lectures is the lack of text transcripts of the recorded lecture. Ideally, transcripts would be obtainable through Automatic Speech Recognition (ASR). However, current ASR systems can only deliver, in realistic lecture conditions, a Word Error Rate of around 45% -- above the accepted threshold of 25%. In this paper, we present the iterative design of a webcast extension that engages users to collaborate in a wiki-like manner on editing the ASR-produced imperfect transcripts, and show that this is a feasible solution for improving the quality of lecture transcripts. We also present the findings of a field study carried out in a real lecture environment investigating how students use and edit the transcripts.

References

[1]

L. von Ahn and L. Dabbish. Labeling Images With a Computer Game. Proc. ACM CHI, pp. 319--326, 2004.

Digital Library

[2]

B. Arons. Speechskimmer: A System for Interactively Skimming Recorded Speech. ACM Transactions on Computer-Human Interaction, 4(1):3--38, 1997.

Digital Library

[3]

R. Baecker. A Principled Design for Scalable Internet Visual Communications with Rich Media, Interactivity, and Structured Archives. Proc. CASCON, pp. 83--96, 2003.

Digital Library

[4]

K. Crowston, H. Annabi, J. Howson, and C. Masango. Effective Work Practices for Software Engineering: Free/Libre Open Source Software Development. Proc. ACM WISER, pp. 18--26, 2006.

Digital Library

[5]

C. Dufour, E. G. Toms, J. Lewis, and R. Baecker. User Strategies for Handling Information Tasks in Webcasts. Proc. ACM CHI, pp. 1343--1346, 2005.

Digital Library

[6]

A. Forte, and A. Bruckman. From Wikipedia to the Classroom: Exploring Online Publication and Learning. Proc. ICLS, pp. 182--188, 2006.

Digital Library

[7]

F. Fuegen et al. Advances in Lecture Recognition: The ISL RT-06S Evaluation System. Proc. Interspeech, pp. 1229--1232, 2006.

[8]

S. Furui. Recent Progress in Corpus-Based Spontaneous Speech Recognition. IEICE Transactions on Information and Systems, 88(3):366--375, 2005.

Digital Library

[9]

J. Glass et al. Recent Progress in the MIT Spoken Lecture Processing Project. Proc. Interspeech, pp. 2553--2556, 2007

[10]

K. Goldberg, B. Chen, Solomon R., and S. Bui. Collaborative Teleoperation Via The Internet. Proc. IEEE ICRA, pp. 2019--2024, 2000.

[11]

A. Hauptmann et al. Informedia at TRECVID 2003: Analyzing and Searching Broadcast News Video. Proc. (VIDEO) TREC, 2003.

[12]

S. Kuznetsov. Motivations of Contributors to Wikipedia. ACM Computers and Society, 36(2), 2006.

Digital Library

[13]

E. Leeuwis, M. Federico, and M. Cettolo. Language Modeling and Transcription of the TED Corpus Lectures. Proc. IEEE ICASSP, pp. 232--235, 2003.

[14]

S. Li and D. Coleman. Results of CSCW Supported Collaborative GIS Data Production: An Internet-based Solution. Proc. ISPRS SIPT, pp. 1--66, 2002.

[15]

C. Munteanu et al. The Effect of Speech Recognition Accuracy Rates on the Usefulness and Usability of Webcast Archives. Proc. ACM CHI, pp. 493--502, 2006.

Digital Library

[16]

C. Munteanu, G. Penn, and R. Baecker. Web-Based Language Modelling for Automatic Lecture Transcription. Proc. Interspeech, pp. 2353--2356, 2007

[17]

A. Park, T. J. Hazen, and J. R. Glass. Automatic Processing of Audio Lectures for Information Retrieval: Vocabulary Selection and Language Modeling. Proc. IEEE ICASSP, 2005.

[18]

B. L. Pellom. Sonic: The University of Colorado Continuous Speech Recognizer. Technical Report #TR-CSLR-2001-01, University of Colorado, 2001.

[19]

RealNetworks. Introduction to Streaming Media with RealPlayer. www.realnetworks.com/support//education/production.html, 2004.

[20]

I. Rogina and T. Schaaf. Lecture and Presentation Tracking in an Intelligent Meeting Room. Proc. ACM (IEEE) ICMI, pp. 47--52, 2002.

Digital Library

[21]

N. Sawhney and C. Schmandt. Nomadic Radio: Speech & Audio Interaction for Contextual Messaging in Nomadic Environments. ACM Transactions on Computer-Human Interaction, 7(3):353--383, 2000.

Digital Library

[22]

L. Stark, S. Whittaker, and J. Hirschberg. ASR Satisficing: The Effects of ASR Accuracy on Speech Retrieval. Proc. ICSLP, pp. 1069--1072, 2000.

[23]

E. G. Toms, C. Dufour, J. Lewis, and R. Baecker. Assessing Tools For Use With Webcasts. Proc. ACM/IEEE JCDL, pp. 79--88, 2005.

Digital Library

[24]

T. Volkmer, J. Smith, and A. Natsev. A Web-Based System for Collaborative Annotation of Large Image & Video Collections. Proc. ACM MM, pp. 892--901, 2005.

Digital Library

[25]

M. Wald, K. Bain, and S.H. Basson. Speech Recognition in University Classrooms. Proc. ACM SIGACCESS, pp. 192--196, 2002.

Digital Library

[26]

W. Ward and S. Issar. The CMU ATIS System. Proc. ARPA WSLT, pp. 249--251, 1995.

[27]

M. Weintraub, K. Taussig, K. Hunicke-Smith, and A. Snodgrass. Effect of Speaking Style on LVCSR Performance. Proc. Interspeech, pp. 16--19 (Addendum), 1996.

[28]

S. Whittaker et al. Scanmail: A Voicemail Interface that Makes Speech Browsable, Readable and Searchable. Proc. ACM CHI, pp. 275 -- 282, 2002.

Digital Library

[29]

S. Whittaker and J. Hirschberg. Look or Listen: Discovering Effective Techniques for Accessing Speech Data. Proc. British HCI, pp. 253--269, 2003.

Cited By

Kuhn KReuter BEgger NZimmermann G(2024)Record, Transcribe, Share: An Accessible Open-Source Video Platform for Deaf and Hard of Hearing ViewersProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3688495(1-6)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3663548.3688495
Kuhn KKersken VZimmermann G(2023)Accuracy of AI-generated Captions With Collaborative Manual Corrections in Real-TimeExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544549.3585724(1-7)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544549.3585724
Mondal SUddin GRoy C(2023)Automatic prediction of rejected edits in Stack OverflowEmpirical Software Engineering10.1007/s10664-022-10242-228:1Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1007/s10664-022-10242-2
Show More Cited By

Index Terms

Collaborative editing for improved usefulness and usability of transcript-enhanced webcasts
1. Human-centered computing
  1. Human computer interaction (HCI)
2. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

The effect of speech recognition accuracy rates on the usefulness and usability of webcast archives
CHI '06: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

The widespread availability of broadband connections has led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. In the absence of transcripts of what was said, users ...
Automatic speech recognition for webcasts: how good is good enough and what to do when it isn't
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

The increased availability of broadband connections has recently led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. One challenge to skimming and browsing through ...
Prosody modification for speech recognition in emotionally mismatched conditions

A degradation in the performance of automatic speech recognition systems (ASR) is observed in mismatched training and testing conditions. One of the reasons for this degradation is due to the presence of emotions in the speech. The main objective of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI '08: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

April 2008

1870 pages

ISBN:9781605580111

DOI:10.1145/1357054

General Chairs:
Mary Czerwinski
Microsoft Research, USA
,
Arnie Lund
Microsoft, USA
,
Program Chair:
Desney Tan
Microsoft Research, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 April 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CHI '08

Sponsor:

CHI '08: CHI Conference on Human Factors in Computing Systems

April 5 - 10, 2008

Florence, Italy

Acceptance Rates

CHI '08 Paper Acceptance Rate 157 of 714 submissions, 22%;

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025

Sponsor:
sigchi

ACM CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
819
Total Downloads

Downloads (Last 12 months)24
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kuhn KReuter BEgger NZimmermann G(2024)Record, Transcribe, Share: An Accessible Open-Source Video Platform for Deaf and Hard of Hearing ViewersProceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3688495(1-6)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3663548.3688495
Kuhn KKersken VZimmermann G(2023)Accuracy of AI-generated Captions With Collaborative Manual Corrections in Real-TimeExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544549.3585724(1-7)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544549.3585724
Mondal SUddin GRoy C(2023)Automatic prediction of rejected edits in Stack OverflowEmpirical Software Engineering10.1007/s10664-022-10242-228:1Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1007/s10664-022-10242-2
Mondal SUddin GRoy C(2021)Rollback Edit Inconsistencies in Developer Forum2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)10.1109/MSR52588.2021.00050(380-391)Online publication date: May-2021
https://doi.org/10.1109/MSR52588.2021.00050
Wang SChen THassan A(2020)How Do Users Revise Answers on Technical Q&A Websites? A Case Study on Stack OverflowIEEE Transactions on Software Engineering10.1109/TSE.2018.287447046:9(1024-1038)Online publication date: 1-Sep-2020
https://doi.org/10.1109/TSE.2018.2874470
Huang YHuang YXue NBigham JMark GFussell SLampe Cschraefel mHourcade JAppert CWigdor D(2017)Leveraging Complementary Contributions of Different Workers for Efficient Crowdsourcing of Video CaptionsProceedings of the 2017 CHI Conference on Human Factors in Computing Systems10.1145/3025453.3026032(4617-4626)Online publication date: 2-May-2017
https://dl.acm.org/doi/10.1145/3025453.3026032
Munteanu CPenn GBrdiczka OChau PCarenini GPan SKristensson P(2015)Speech-based InteractionProceedings of the 20th International Conference on Intelligent User Interfaces10.1145/2678025.2716263(437-438)Online publication date: 18-Mar-2015
https://dl.acm.org/doi/10.1145/2678025.2716263
Saito T(2015)A framework of human-based speech transcription with a speech chunking front-end2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)10.1109/APSIPA.2015.7415486(125-128)Online publication date: Dec-2015
https://doi.org/10.1109/APSIPA.2015.7415486
Valor Miró JSilvestre-Cerdà JCivera JTurró CJuan A(2015)Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositoriesSpeech Communication10.1016/j.specom.2015.09.00674:C(65-75)Online publication date: 1-Nov-2015
https://dl.acm.org/doi/10.1016/j.specom.2015.09.006
Calefato FLanubile FPrikladnicki RPinto JMorisio MDybå TTorchiano M(2014)An empirical simulation-based study of real-time speech translation for multilingual global project teamsProceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement10.1145/2652524.2652537(1-9)Online publication date: 18-Sep-2014
https://dl.acm.org/doi/10.1145/2652524.2652537
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten