skip to main content
10.1145/1449715.1449738acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article

Search Vox: leveraging multimodal refinement and partial knowledge for mobile voice search

Published: 19 October 2008 Publication History

Abstract

Internet usage on mobile devices continues to grow as users seek anytime, anywhere access to information. Because users frequently search for businesses, directory assistance has been the focus of many voice search applications utilizing speech as the primary input modality. Unfortunately, mobile settings often contain noise which degrades performance. As such, we present Search Vox, a mobile search interface that not only facilitates touch and text refinement whenever speech fails, but also allows users to assist the recognizer via text hints. Search Vox can also take advantage of any partial knowledge users may have about the business listing by letting them express their uncertainty in an intuitive way using verbal wildcards. In simulation experiments conducted on real voice search data, leveraging multimodal refinement resulted in a 28% relative reduction in error rate. Providing text hints along with the spoken utterance resulted in even greater relative reduction, with dramatic gains in recovery for each additional character.

Supplementary Material

JPG File (46.jpg)
JPG File (p141-paek.jpg)
FLV File (46.flv)
MOV File (p141-paek.mov)

References

[1]
Ainsworth, W. A. & Pratt, S. R. 1992. Feedback strategies for error correction in speech recognition systems. International Journal of Man-Machine Studies, 26(6), 833--842.
[2]
Church, K., Thiesson, B., & Ragno, R. 2007. K-best suffix arrays. Proc. of NAACL HLT, companion volume, 17--20.
[3]
Hsu, P., Mahajan, M. & Acero, A. 2005. Multimodal text entry on mobile devices. Proc. of ASRU.
[4]
Ipsos Insight. 2006. Mobile phones could soon to rival the PC as world's dominant Internet platform. http://www.ipsosna.com/news/pressrelease.cfm?id=3049, April 2006. Accessed January 2008.
[5]
Jelinek, F. 1998. Statistical methods for speech recognition. MIT Press.
[6]
Levenshtein, V. I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10:707--710.
[7]
Live Search Mobile: http://livesearchmobile.com/
[8]
Manber, U. & Myers, G. 1990. Suffix Arrays: A New Method for On-line String Searches, Proc. of SODA, 319--327.
[9]
Oviatt, S. & Van Gent, R. 1994. Error resolution during multimodal human-computer interaction. In Proc. of CHI, 415--422.
[10]
Oviatt, S. 1999. Mutual disambiguation of recognition errors in a multimodal architecture. In Proc. of the International Conference on Computer-Human Interaction, 576--583.
[11]
Oviatt, S. 2000. Taming recognition errors with a multimodal interface. Communications of the ACM, 43(9), 45--51.
[12]
Oviatt, S. 2000. Multimodal system processing in mobile environments. Proc. of UIST, 21--29.
[13]
Paek, T. & Ju, Y.C. 2008. Accommodating explicit user expressions of uncertainty in voice search or something like that. Proc. of Interspeech.
[14]
Rhyne, J. R. & Wolf, C. G. 1993. Recognition-based user interfaces. In Advances in Human-Computer Interaction, H. R. Hartson & D. Hix, Eds. Ablex Publishing Corp, 191--212.
[15]
Salton, G. 1983. Introduction to modern information retrieval. McGraw-Hill.
[16]
Suhm, B., Myers, B. & Waibel, A. 2001. Multimodal error correction for speech user interfaces. ACM TOCHI, 8(1), 60--98.
[17]
Tellme Press Release. 2006. Tellme to power all Cingular wireless 411 calls: Expanded relationship focuses on enhancing 411 with personalization and mobile search services, http://www.tellme.com/about/PressRoom/release/20061009, October 2006. Accessed March 2008.
[18]
Yahoo oneSearch: http://mobile.yahoo.com/onesearch
[19]
Yu, D., Ju, Y. C., Wang, Y. Y., Zweig, G., & Acero, A. 2007. Automated directory assistance system: From theory to practice. Proc. of Interspeech.

Cited By

View all
  • (2020)Towards an Understanding of Real-time Captioning on Head-worn Displays22nd International Conference on Human-Computer Interaction with Mobile Devices and Services10.1145/3406324.3410543(1-5)Online publication date: 5-Oct-2020
  • (2020)ReMapProceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology10.1145/3379337.3415592(979-986)Online publication date: 20-Oct-2020
  • (2014)Multimodal Input for Perceptual User InterfacesInteractive Displays10.1002/9781118706237.ch9(285-312)Online publication date: 12-Jul-2014
  • Show More Cited By

Index Terms

  1. Search Vox: leveraging multimodal refinement and partial knowledge for mobile voice search

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      UIST '08: Proceedings of the 21st annual ACM symposium on User interface software and technology
      October 2008
      308 pages
      ISBN:9781595939753
      DOI:10.1145/1449715
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 October 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. mobile search
      2. multimodal
      3. speech recognition

      Qualifiers

      • Research-article

      Conference

      UIST08

      Acceptance Rates

      Overall Acceptance Rate 561 of 2,567 submissions, 22%

      Upcoming Conference

      UIST '25
      The 38th Annual ACM Symposium on User Interface Software and Technology
      September 28 - October 1, 2025
      Busan , Republic of Korea

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 09 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2020)Towards an Understanding of Real-time Captioning on Head-worn Displays22nd International Conference on Human-Computer Interaction with Mobile Devices and Services10.1145/3406324.3410543(1-5)Online publication date: 5-Oct-2020
      • (2020)ReMapProceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology10.1145/3379337.3415592(979-986)Online publication date: 20-Oct-2020
      • (2014)Multimodal Input for Perceptual User InterfacesInteractive Displays10.1002/9781118706237.ch9(285-312)Online publication date: 12-Jul-2014
      • (2013)Community-oriented spoken web browser for low iiterate usersProceedings of the 2013 conference on Computer supported cooperative work10.1145/2441776.2441833(503-514)Online publication date: 23-Feb-2013
      • (2012)Index-based incremental language model for scalable directory assistanceSpeech Communication10.1016/j.specom.2011.09.00654:3(351-367)Online publication date: 1-Mar-2012
      • (2011)Classic and Alternative Mobile SearchInternational Journal of Mobile Human Computer Interaction10.4018/jmhci.20110101023:1(22-36)Online publication date: 1-Jan-2011
      • (2011)Speech and Multimodal Interaction in Mobile SearchIEEE Signal Processing Magazine10.1109/MSP.2011.94107328:4(40-49)Online publication date: Jul-2011
      • (2010)The World Wide Telecom Web browserProceedings of the First ACM Symposium on Computing for Development10.1145/1926180.1926185(1-9)Online publication date: 17-Dec-2010
      • (2009)Geo-centric language models for local business voice searchProceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics10.5555/1620754.1620811(389-396)Online publication date: 31-May-2009
      • (2009)Designing phrase builderProceedings of the 11th International Conference on Human-Computer Interaction with Mobile Devices and Services10.1145/1613858.1613868(1-10)Online publication date: 15-Sep-2009
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media