research-article

The cocktail party robot: sound source separation and localisation with an active binaural head

Authors:
Antoine Deleforge

INRIA, Grenoble, France

INRIA, Grenoble, France
View Profile

,
Radu Horaud

INRIA, Grenoble, France

INRIA, Grenoble, France
View Profile

HRI '12: Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot InteractionMarch 2012Pages 431–438https://doi.org/10.1145/2157689.2157834

Published:05 March 2012Publication History

HRI '12: Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction

Pages 431–438

ABSTRACT

Human-robot communication is often faced with the difficult problem of interpreting ambiguous auditory data. For example, the acoustic signals perceived by a humanoid with its on-board microphones contain a mix of sounds such as speech, music, electronic devices, all in the presence of attenuation and reverberations. In this paper we propose a novel method, based on a generative probabilistic model and on active binaural hearing, allowing a robot to robustly perform sound-source separation and localization. We show how interaural spectral cues can be used within a constrained mixture model specifically designed to capture the richness of the data gathered with two microphones mounted onto a human-like artificial head. We describe in detail a novel EM algorithm, we analyse its initialization, speed of convergence and complexity, and we assess its performance with both simulated and real data.

References

R. V. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano. The CIPIC HRTF Database. IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, pages 92--102, Oct. 2001.Google Scholar
J. Allen. Short-term spectral analysis, synthesis, and modification by discrete fourier transform. IEEE Trans. Acous., Speech and Signal Process., 25(3):235--238, 1977.Google ScholarCross Ref
M. Aytekin, C. F. Moss, and J. Z. Simon. A sensorimotor approach to sound localization. Neural Computation, 20(3):603--635, 2008. Google ScholarDigital Library
S. Bensaid, A. Schutz, and D. T. M. Slock. Single microphone blind audio source separation using EM-Kalman filter and shortGoogle Scholar
long term AR modeling. In Latent Variable Analysis and Signal Separation, pages 106--113, 2010.Google Scholar
G. Celeux and G. Govaert. A classification EM algorithm for clustering and two stochastic versions. Computational Statistics and Data Analysis, 14(3):315--332, 1992. Google ScholarDigital Library
P. Comon and C. Jutten. Handbook of Blind Source Separation, Independent Component Analysis and Applications. Academic Press (Elsevier), Feb. 2010. Google ScholarDigital Library
A. Deleforge and R. Horaud. A latently constrained mixture model for audio source separation and localization. In Latent Variable Analysis and Signal Separation, Tel Aviv, Israel, March 2012. Google ScholarDigital Library
A. Deleforge and R. P. Horaud. Learning the direction of a sound source using head motions and spectral features. Technical Report RR-7529, INRIA, Feb. 2011.Google Scholar
S. Haykin and Z. Chen. The cocktail party problem. Neural Computation, 17:1875--1902, 2005. Google ScholarDigital Library
J. Hörnstein, M. Lopes, J. Santos-Victor, and F. Lacerda. Sound localization for humanoid robots -- building audio-motor maps based on the HRTF. In Proc. of IEEE/RSJ IROS, pages 1170--1176, 2006.Google ScholarCross Ref
F. Keyrouz, W. Maier, and K. Diepold. Robotic localization and separation of concurrent sound sources using self-splitting competitive learning. In Proc. of IEEE CIISP, pages 340--345, Hawaii, Apr. 2007.Google ScholarCross Ref
F. Keyrouz, Y. Naous, and K. Diepold. A new method for binaural 3D localization based on HRTFs. In Proc. of IEEE ICASSP, volume 5, May 2006.Google Scholar
V. Khalidov, F. Forbes, and R. P. Horaud. Conjugate mixture models for clustering multimodal data. Neural Computation, 23(2):517--557, Feb. 2011. Google ScholarDigital Library
M. I. Mandel, R. J. Weiss, and D. P. W. Ellis. Model-based expectation-maximization source separation and localization. IEEE Trans. on Audio, Speech and Lang. Proc., 18:382--394, Feb. 2010. Google ScholarDigital Library
J. C. Middlebrooks and D. M. Green. Sound localization by human listeners. Annual Review of Psychology, 42:135--159, January 1991.Google ScholarCross Ref
J. Nix and V. Hohmann. Sound source localization in real sound fields based on empirical statistics of interaural parameters. Journal of the Acoustical Society of America, 119(1):463--479, 2006.Google ScholarCross Ref
J. K. O'Regan and A. Noe. A sensorimotor account of vision and visual consciousness. Behavioral and Brain Sciences, 24:939--1031, 2001.Google ScholarCross Ref
M. Otani, T. Hirahara, and S. Ise. Numerical study on source-distance dependency of head-related transfer functions. Journal of the Acoustical Society of America, 125(5):3253--61, 2009.Google ScholarCross Ref
N. Roman and D. Wang. Binaural tracking of multiple moving sources. IEEE Trans. on Acoust., Speech and Signal Process., 16(4):728--739, 2008. Google ScholarDigital Library
S. T. Roweis. One microphone source separation. In Advances in Neural Information Processing Systems, volume 13, pages 793--799. MIT Press, 2000.Google Scholar
B. Shinn-Cunningham, N. Kopco, and T. J. Martin. Localizing nearby sound sources in a classroom: Binaural room impulse responses. Journal of the Acoustical Society of America, 117(5):3100--3115, 2005.Google ScholarCross Ref
E. Vincent, R. Gribonval, and C. Févotte. Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech & Language Processing, 14(4):1462--1469, 2006. Google ScholarDigital Library
H. Viste and G. Evangelista. On the use of spatial cues to improve binaural source separation. In Proc. Int. Conf. on Digital Audio Effects, pages 209--213, 2003.Google Scholar
V. Willert, J. Eggert, J. Adamy, R. Stahl, and E. Koerner. A probabilistic model for binaural sound localization. IEEE Transactions on Systems, Man, and Cybernetics--Part B, 36(5):982--994, 2006. Google ScholarDigital Library
O. Yílmaz and S. Rickard. Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing, 52:1830--1847, 2004. Google ScholarDigital Library
A. Zhigljavsky and A. \v Zilinskas. Stochastic Global Optimization. Springer, 2008.Google Scholar

Index Terms

The cocktail party robot: sound source separation and localisation with an active binaural head
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition
2. Mathematics of computing
  1. Probability and statistics

Recommendations

Sound and Visual Tracking for Humanoid Robot

Mobile robots capable of auditory perception usually adopt the “stop-perceive-act” principle to avoid sounds made during moving due to motor noise. Although this principle reduces the complexity of the problems involved in auditory processing for mobile ...
Read More
Joint mixing vector and binaural model based stereo source separation

In this paper the mixing vector (MV) in the statistical mixing model is compared to the binaural cues represented by interaural level and phase differences (ILD and IPD). It is shown that the MV distributions are quite distinct while binaural models ...
Read More
Environmental sound recognition for robot audition using matching-pursuit
IEA/AIE'11: Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part II

Our goal is to achieve a robot audition system that is capable of recognizing multiple environmental sounds and making use of them in human-robot interaction. The main problems in environmental sound recognition in robot audition are: (1) recognition ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HRI '12: Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction
March 2012
518 pages
ISBN:9781450310635
DOI:10.1145/2157689
General Chairs:
Holly Yanco
University of Massachusetts Lowell, USA
,
Aaron Steinfeld
Carnegie Mellon University, USA
,
Program Chairs:
Vanessa Evers
University of Amsterdam, The Netherlands
,
Odest Chadwicke Jenkins
Brown University, USA
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 March 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
blind source separation
computational auditory scene analysis
em algorithm
learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate242of1,000submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 376
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The cocktail party robot: sound source separation and localisation with an active binaural head

HRI '12: Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sound and Visual Tracking for Humanoid Robot

Joint mixing vector and binaural model based stereo source separation

Environmental sound recognition for robot audition using matching-pursuit