skip to main content
10.1145/1124772.1124823acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
Article

Error correction of voicemail transcripts in SCANMail

Published: 22 April 2006 Publication History

Abstract

Despite its widespread use, voicemail presents numerous usability challenges: People must listen to messages in their entirety, they cannot search by keywords, and audio files do not naturally support visual skimming. SCANMail overcomes these flaws by automatically generating text transcripts of voicemail messages and presenting them in an email-like interface. Transcripts facilitate quick browsing and permanent archive. However, errors from the automatic speech recognition (ASR) hinder the usefulness of the transcripts. The work presented here specifically addresses these problems by evaluating user-initiated error correction of transcripts. User studies of two editor interfaces-a grammar-assisted menu and simple replacement by typing-reveal reduced audio playback times and an emphasis on editing important words with the menu, suggesting its value in mobile environments where limited input capabilities are the norm and user privacy is essential. The study also adds to the scarce body of work on ASR confidence shading, suggesting that shading may be more helpful than previously reported.

References

[1]
Arons, B. SpeechSkimmer: A system for interactively skimming recorded speech. ACM Transactions on Computer-Human Interaction 4, 1 (1997).
[2]
Bacchiani, M., Hirschberg, J., Rosenberg, A., Whittaker, S., Hindle, D., Isenhour, P., Jones, M., Stark, L. and Zamchick, G. SCANMail: Audio navigation in the voicemail domain. Proc. Conference on Human Language Technology Research 2001, ACM Press (2000), 1--3.
[3]
Boreczky, J., Gigensohn, A., Golovchinsky, G., and Uchihashi, S. An Interactive Comic Book Presentation for Exploring Video. Proc. CHI 2000, ACM Press (2000), 185--192.
[4]
Chase, L. Word and acoustic confidence annotation for large vocabulary speech recognition. Proc. Eurospeech 1997, (1997), 815--1818.
[5]
Degen, L., Mander, R., and Salomon, G. Working with Audio. Proc. CHI 1992, ACM Press (1992), 413--418.
[6]
Feng, J. and Sears, A. Using confidence scores to improve hands-free speech based navigation in continuous dictation systems. ACM Transactions on Computer-Human Interaction, 4,11 (2004), 329--256.
[7]
Hakkani-Tür, D., Béchet, F., Riccardi, G. and Tür, G. Beyond ASR 1-Best: Using word confusion networks in spoken language understanding. Journal of Computer Speech and Language, Elsevier, (To appear).
[8]
Hauptmann and Witbrock, M. Informedia: News-on-demand multimedia information acquisition and retrieval. Intelligent Multimedia Information Retrieval, AAAI Press (1997), 213--239.
[9]
Hazen, T., Polifroni, J., and Seneff, S. Recognition confidence scoring for use in speech understanding systems. Computer Speech and Language 16, (2002), 49--67.
[10]
Hindus, D., Schmandt, C., and Horner, C. Capturing, structuring, and representing ubiquitous audio. ACM Transactions on Information Systems 11, 4 (1993), 376--400.
[11]
Karat, C., Halverson, C., Karat J., and Horn, D. Patterns of entry and correction in large vocabulary continuous speech recognition systems. Proc. CHI 1999, ACM Press (1999), 568--575.
[12]
Kazman, R., Al-Halimi, R., Hunt, W., and Mantei, M. Four paradigms for indexing videoconferences. IEEE Multimedia 3, 1 (1996), 63--73.
[13]
Moran, T., Palen, L., Harrison, S., Chiu, P., Kimber, D., Minneman, S., van Melle, W., and Zellweger, P. "I'll get that off the audio": Salvaging in a multimedia meeting. Proc. CHI 1997, ACM Press (1997), 202--209.
[14]
Oviatt, S. Taming Recognition Errors with a Multimodal Interface. Communications of the ACM 43, ACM Press (2000), 45--51.
[15]
Stark, L., Whittaker, S., and Hirschberg, J. ASR satisficing: The effects of ASR accuracy on speech retrieval. Proc. International Conference on Spoken Language Processing, (2000).
[16]
Stifelman, L, Arons, B., and Schmandt, C. The audio notebook: Paper and pen interaction with structured speech. Proc. CHI 2001, ACM Press (2001), 182--189.
[17]
Suhm, B., Myers, B. and Waibel, A. Multimodal error correction for speech user interfaces. ACM Transactions on Computer-Human Interaction 1, 8 (2001), 60--98.
[18]
Vemuri, S., DeCamp, P., Bender, W., and Schmandt, C. Improving speech playback using time-compression and speech recognition. Proc. CHI 2004, ACM Press (2004), 295--302.
[19]
Whittaker, S. and Amento, B. Semantic Speech Editing. Proc. CHI 2004, ACM Press (2004), 527--534.
[20]
Whittaker, S. and Amento, B. Seeing what you are hearing: Co-ordinating responses to trouble reports in network troubleshooting. Proc. ECSCW, Kluwer Academic Publishers (2003), 219--238.
[21]
Whittaker, S., Davis, R., Hirshberg, J., and Muller, U. Jotmail: A voicemail interface that enables you to see what was said. Proc. CHI 2000, ACM Press (2000), 89--96.
[22]
Whittaker, S., Hirschberg, J., Amento, B., Stark, L., Bacchiani, M., Isenhour, P., Stead, L., Zamechick, G. and Rosenberg A. SCANMail: A voicemail interface that makes speech browsable, readable, and searchable. Proc. CHI 2002, ACM Press (2002), 275--282.
[23]
Whittaker, S., Hyland, P., and Wiley, M. Filochat: Handwritten notes provide access to recorded conversations. Proc. CHI 1994, ACM Press (1994), 271--277.
[24]
Wilcox, L., Chen, F., Kimber, D., and Balasubramanian, V. Segmentation of speech using speaker identification. Proc. International Conference on Acoustics, Speech, and Signal Processing (1994), 161--164.

Cited By

View all
  • (2022)Transparent-AI Blueprint: Developing a Conceptual Tool to Support the Design of Transparent AI AgentsInternational Journal of Human–Computer Interaction10.1080/10447318.2022.209377338:18-20(1846-1873)Online publication date: 17-Jul-2022
  • (2021)Capturing the Trends, Applications, Issues, and Potential Strategies of Designing Transparent AI AgentsExtended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411763.3451819(1-8)Online publication date: 8-May-2021
  • (2020)Deconstructing Human-assisted Video Transcription and Annotation for Legislative ProceedingsDigital Government: Research and Practice10.1145/33953161:3(1-24)Online publication date: 18-Nov-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI '06: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
April 2006
1353 pages
ISBN:1595933727
DOI:10.1145/1124772
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 April 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. confidence shading
  2. editor interfaces
  3. error correction
  4. speech recognition
  5. voicemail

Qualifiers

  • Article

Conference

CHI06
Sponsor:
CHI06: CHI 2006 Conference on Human Factors in Computing Systems
April 22 - 27, 2006
Québec, Montréal, Canada

Acceptance Rates

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025
ACM CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Transparent-AI Blueprint: Developing a Conceptual Tool to Support the Design of Transparent AI AgentsInternational Journal of Human–Computer Interaction10.1080/10447318.2022.209377338:18-20(1846-1873)Online publication date: 17-Jul-2022
  • (2021)Capturing the Trends, Applications, Issues, and Potential Strategies of Designing Transparent AI AgentsExtended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems10.1145/3411763.3451819(1-8)Online publication date: 8-May-2021
  • (2020)Deconstructing Human-assisted Video Transcription and Annotation for Legislative ProceedingsDigital Government: Research and Practice10.1145/33953161:3(1-24)Online publication date: 18-Nov-2020
  • (2020)Progressive DisclosureACM Transactions on Interactive Intelligent Systems10.1145/337421810:4(1-32)Online publication date: 16-Oct-2020
  • (2016)Improving Query Reformulation in Voice Search SystemProceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval10.1145/2854946.2854951(365-367)Online publication date: 13-Mar-2016
  • (2016)Investigating Critical Speech Recognition Errors in Spoken Short MessagesSituated Dialog in Speech-Based Human-Computer Interaction10.1007/978-3-319-21834-2_7(71-82)Online publication date: 21-Apr-2016
  • (2015)Error Correction Using Long Context Match for Smartphone Speech RecognitionIEICE Transactions on Information and Systems10.1587/transinf.2015EDP7179E98.D:11(1932-1942)Online publication date: 2015
  • (2014)An efficient error correction interface for speech recognition on mobile touchscreen devices2014 IEEE Spoken Language Technology Workshop (SLT)10.1109/SLT.2014.7078617(454-459)Online publication date: Dec-2014
  • (2012)Markup as you talkProceedings of the ACM 2012 conference on Computer Supported Cooperative Work10.1145/2145204.2145260(349-358)Online publication date: 11-Feb-2012
  • (2010)Third-party error detection support mechanisms for dictation speech recognitionInteracting with Computers10.1016/j.intcom.2010.02.00222:5(375-388)Online publication date: 1-Sep-2010
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media