skip to main content
10.1145/1228716.1228732acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
Article

Improving human-robot interaction through adaptation to the auditory scene

Published: 10 March 2007 Publication History

Abstract

Effective communication with a mobile robot using speech is a difficult problem even when you can control the auditory scene. Robot ego-noise, echoes, and human interference are all common sources of decreased intelligibility. In real-world environments, however, these common problems are supplemented with many different types of background noise sources. For instance, military scenarios might be punctuated by high decibel plane noise and bursts from weaponry that mask parts of the speech output from the robot. Even in non-military settings, however, fans, computers, alarms, and transportation noise can cause enough interference that they might render a traditional speech interface unintelligible. In this work, we seek to overcome these problems by applying robotic advantages of sensing and mobility to a text-to-speech interface. Using perspective taking skills to predict how the human user is being affected by new sound sources, a robot can adjust its speaking patterns and/or reposition itself within the environment to limit the negative impact on intelligibility, making a speech interface easier to use.

References

[1]
Junqua, J-C. The Lombard Reflex and its Role on Human Listeners and Automatic Speech Recognizers. J. Acoustical Society Of America, 93, 1 (1993), 510--524.
[2]
Martinson, E. and Brock, D. "Auditory Perspective Taking", Proceeding of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction, Salt Lake City, UT, March 2006, 345--346.
[3]
Sofge, D., et al., "Collaborating with Humanoid Robots in Space". International Journal of Humanoid Robotics, 2,2 (2005), 181--201.
[4]
Trafton, J.G., et al., "Enabling effective human-robot interaction using perspective-taking in robots". IEEE Trans. on Systems, Man and Cybernetics, Part A, 35, 4(2005), 460--470.
[5]
Hiatt, L., Trafton, J., Harrison, A., Schultz, A. A Cognitive Model for Spatial Perspective Taking. In International Conference on Cognitive Modeling. Mahwah, NJ. 2004, 354--355.
[6]
Perzanowski, D., et al., Communicating with teams of cooperative robots. In Multi-Robot Systems: From Swarms to Intelligent Automata, A. Schultz and L. Parker, eds. 2002, Kluwer: The Netherlands, 16--20.
[7]
Brown, G. and Wang, D. "Separation of Speech by Computational Auditory Scene Analysis", Speech Enhancement, J. Benesty, S. Makino and J. Chen (Eds.), Springer, New York, 2005, 371--402.
[8]
Brock, D.P. and J.A. Ballas. Audio in VR: Beyond entertainment setups and telephones. In Proceedings of International Conference on Human-Computer Interaction. Las Vegas, NV, 2005.
[9]
Langner, B., and Black, A. Using Speech in Noise to Improve Understandability for Elderly Listeners, ASRU 2005, San Juan, Puerto Rico, 2005, 112--116.
[10]
D. Pan, B. Heng, S. Cheung, and E. Chang, Improving Speech Synthesis for High Intelligibility under Adverse Conditions. In Proceedings of the 6th International Conference on Spoken Language Processing, Beijing, China, October 2000.
[11]
Schultz, A., and Adams, W. Continuous localization using evidence grids. In Proceedings of IEEE International Conf. on Robotics and Automation, Leuven, Belgium, 1998, 2833--2839.
[12]
Yamamoto, S., et al., "Enhanced Robot Speech Recognition Based on Microphone Array Source Separation and Missing Feature Theory". Proceeding of Int. Conf. on Robotics and Automation (ICRA), Barcelona, Spain 2005.
[13]
G. Bradski, A. Kaehler, and V. Pisarevsky, Learning-based computer vision with intel's open source computer vision library. In Intel Technology Journal, 9,1, (May 2005).
[14]
Quatiri, T. Discrete Time Speech Signal Processing, Pearson Education, Dehli, India, 2002.
[15]
B. Mungamuru and P. Aarabi, Enhanced Sound Localization, IEEE Trans. on Systems, Man, and Cybernetics, 34, 2004, 1526--1540.
[16]
Martinson, E. and Schultz, A. "Auditory Evidence Grids," to be published in Proceeding of Int. Conf. on Intelligent Robots and Systems (IROS), Beijing, China 2006.
[17]
Martinson, E. and Arkin, R. Noise Maps for Acoustically Sensitive Navigation. Proceedings of SPIE, 5609 (December 2004),50--60.
[18]
K. Hughes, A. Tokuta, and N. Ranganathan, "Trulla: An Agorithm for Path Planning Among Weighted Regions by Localized Propogations," Proceedings of Int. Conf. on Intelligent Robots and Systems (IROS), Raleigh, NC, 1992.

Cited By

View all
  • (2024)No More Mumbles: Enhancing Robot Intelligibility Through Speech AdaptationIEEE Robotics and Automation Letters10.1109/LRA.2024.34011179:7(6162-6169)Online publication date: Jul-2024
  • (2023)A Literature Survey of How to Convey Transparency in Co-Located Human–Robot InteractionMultimodal Technologies and Interaction10.3390/mti70300257:3(25)Online publication date: 25-Feb-2023
  • (2022)Examining Audio Communication Mechanisms for Supervising Fleets of Agricultural Robots2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN53752.2022.9900859(293-300)Online publication date: 29-Aug-2022
  • Show More Cited By

Index Terms

  1. Improving human-robot interaction through adaptation to the auditory scene

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HRI '07: Proceedings of the ACM/IEEE international conference on Human-robot interaction
    March 2007
    392 pages
    ISBN:9781595936172
    DOI:10.1145/1228716
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 March 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. acoustics
    2. auditory perspective taking
    3. auditory scene
    4. human-robot interaction

    Qualifiers

    • Article

    Conference

    HRI07
    HRI07: International Conference on Human Robot Interaction
    March 10 - 12, 2007
    Virginia, Arlington, USA

    Acceptance Rates

    HRI '07 Paper Acceptance Rate 22 of 101 submissions, 22%;
    Overall Acceptance Rate 268 of 1,124 submissions, 24%

    Upcoming Conference

    HRI '25
    ACM/IEEE International Conference on Human-Robot Interaction
    March 4 - 6, 2025
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)30
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 16 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)No More Mumbles: Enhancing Robot Intelligibility Through Speech AdaptationIEEE Robotics and Automation Letters10.1109/LRA.2024.34011179:7(6162-6169)Online publication date: Jul-2024
    • (2023)A Literature Survey of How to Convey Transparency in Co-Located Human–Robot InteractionMultimodal Technologies and Interaction10.3390/mti70300257:3(25)Online publication date: 25-Feb-2023
    • (2022)Examining Audio Communication Mechanisms for Supervising Fleets of Agricultural Robots2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)10.1109/RO-MAN53752.2022.9900859(293-300)Online publication date: 29-Aug-2022
    • (2019)Proactive Robots With the Perception of Nonverbal Human Behavior: A ReviewIEEE Access10.1109/ACCESS.2019.29219867(77308-77327)Online publication date: 2019
    • (2017)Spoken Document Retrieval Based on Confusion Network with Syllable FragmentsInternational Journal of Advanced Robotic Systems10.5772/524549:5Online publication date: 15-May-2017
    • (2016)Fuzzy system to adapt web voice interfaces dynamically in a vehicle sensor tracking application definitionSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-015-1709-220:8(3321-3334)Online publication date: 1-Aug-2016
    • (2013)Auditory Perspective TakingIEEE Transactions on Cybernetics10.1109/TSMCB.2012.221952443:3(957-969)Online publication date: Jun-2013
    • (2011)Towards a formalization of social spaces for socially aware robotsProceedings of the 10th international conference on Spatial information theory10.5555/2040205.2040225(283-303)Online publication date: 12-Sep-2011
    • (2011)Towards a Formalization of Social Spaces for Socially Aware RobotsSpatial Information Theory10.1007/978-3-642-23196-4_16(283-303)Online publication date: 2011
    • (2010)Using reinforcement learning to create communication channel management strategies for diverse usersProceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies10.5555/1867750.1867757(53-61)Online publication date: 5-Jun-2010
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media