ABSTRACT
Human-robot communication is often faced with the difficult problem of interpreting ambiguous auditory data. For example, the acoustic signals perceived by a humanoid with its on-board microphones contain a mix of sounds such as speech, music, electronic devices, all in the presence of attenuation and reverberations. In this paper we propose a novel method, based on a generative probabilistic model and on active binaural hearing, allowing a robot to robustly perform sound-source separation and localization. We show how interaural spectral cues can be used within a constrained mixture model specifically designed to capture the richness of the data gathered with two microphones mounted onto a human-like artificial head. We describe in detail a novel EM algorithm, we analyse its initialization, speed of convergence and complexity, and we assess its performance with both simulated and real data.
- R. V. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano. The CIPIC HRTF Database. IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, pages 92--102, Oct. 2001.Google Scholar
- J. Allen. Short-term spectral analysis, synthesis, and modification by discrete fourier transform. IEEE Trans. Acous., Speech and Signal Process., 25(3):235--238, 1977.Google ScholarCross Ref
- M. Aytekin, C. F. Moss, and J. Z. Simon. A sensorimotor approach to sound localization. Neural Computation, 20(3):603--635, 2008. Google ScholarDigital Library
- S. Bensaid, A. Schutz, and D. T. M. Slock. Single microphone blind audio source separation using EM-Kalman filter and shortGoogle Scholar
- long term AR modeling. In Latent Variable Analysis and Signal Separation, pages 106--113, 2010.Google Scholar
- G. Celeux and G. Govaert. A classification EM algorithm for clustering and two stochastic versions. Computational Statistics and Data Analysis, 14(3):315--332, 1992. Google ScholarDigital Library
- P. Comon and C. Jutten. Handbook of Blind Source Separation, Independent Component Analysis and Applications. Academic Press (Elsevier), Feb. 2010. Google ScholarDigital Library
- A. Deleforge and R. Horaud. A latently constrained mixture model for audio source separation and localization. In Latent Variable Analysis and Signal Separation, Tel Aviv, Israel, March 2012. Google ScholarDigital Library
- A. Deleforge and R. P. Horaud. Learning the direction of a sound source using head motions and spectral features. Technical Report RR-7529, INRIA, Feb. 2011.Google Scholar
- S. Haykin and Z. Chen. The cocktail party problem. Neural Computation, 17:1875--1902, 2005. Google ScholarDigital Library
- J. Hörnstein, M. Lopes, J. Santos-Victor, and F. Lacerda. Sound localization for humanoid robots -- building audio-motor maps based on the HRTF. In Proc. of IEEE/RSJ IROS, pages 1170--1176, 2006.Google ScholarCross Ref
- F. Keyrouz, W. Maier, and K. Diepold. Robotic localization and separation of concurrent sound sources using self-splitting competitive learning. In Proc. of IEEE CIISP, pages 340--345, Hawaii, Apr. 2007.Google ScholarCross Ref
- F. Keyrouz, Y. Naous, and K. Diepold. A new method for binaural 3D localization based on HRTFs. In Proc. of IEEE ICASSP, volume 5, May 2006.Google Scholar
- V. Khalidov, F. Forbes, and R. P. Horaud. Conjugate mixture models for clustering multimodal data. Neural Computation, 23(2):517--557, Feb. 2011. Google ScholarDigital Library
- M. I. Mandel, R. J. Weiss, and D. P. W. Ellis. Model-based expectation-maximization source separation and localization. IEEE Trans. on Audio, Speech and Lang. Proc., 18:382--394, Feb. 2010. Google ScholarDigital Library
- J. C. Middlebrooks and D. M. Green. Sound localization by human listeners. Annual Review of Psychology, 42:135--159, January 1991.Google ScholarCross Ref
- J. Nix and V. Hohmann. Sound source localization in real sound fields based on empirical statistics of interaural parameters. Journal of the Acoustical Society of America, 119(1):463--479, 2006.Google ScholarCross Ref
- J. K. O'Regan and A. Noe. A sensorimotor account of vision and visual consciousness. Behavioral and Brain Sciences, 24:939--1031, 2001.Google ScholarCross Ref
- M. Otani, T. Hirahara, and S. Ise. Numerical study on source-distance dependency of head-related transfer functions. Journal of the Acoustical Society of America, 125(5):3253--61, 2009.Google ScholarCross Ref
- N. Roman and D. Wang. Binaural tracking of multiple moving sources. IEEE Trans. on Acoust., Speech and Signal Process., 16(4):728--739, 2008. Google ScholarDigital Library
- S. T. Roweis. One microphone source separation. In Advances in Neural Information Processing Systems, volume 13, pages 793--799. MIT Press, 2000.Google Scholar
- B. Shinn-Cunningham, N. Kopco, and T. J. Martin. Localizing nearby sound sources in a classroom: Binaural room impulse responses. Journal of the Acoustical Society of America, 117(5):3100--3115, 2005.Google ScholarCross Ref
- E. Vincent, R. Gribonval, and C. Févotte. Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech & Language Processing, 14(4):1462--1469, 2006. Google ScholarDigital Library
- H. Viste and G. Evangelista. On the use of spatial cues to improve binaural source separation. In Proc. Int. Conf. on Digital Audio Effects, pages 209--213, 2003.Google Scholar
- V. Willert, J. Eggert, J. Adamy, R. Stahl, and E. Koerner. A probabilistic model for binaural sound localization. IEEE Transactions on Systems, Man, and Cybernetics--Part B, 36(5):982--994, 2006. Google ScholarDigital Library
- O. Yílmaz and S. Rickard. Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing, 52:1830--1847, 2004. Google ScholarDigital Library
- A. Zhigljavsky and A. \v Zilinskas. Stochastic Global Optimization. Springer, 2008.Google Scholar
Index Terms
- The cocktail party robot: sound source separation and localisation with an active binaural head
Recommendations
Sound and Visual Tracking for Humanoid Robot
Mobile robots capable of auditory perception usually adopt the “stop-perceive-act” principle to avoid sounds made during moving due to motor noise. Although this principle reduces the complexity of the problems involved in auditory processing for mobile ...
Joint mixing vector and binaural model based stereo source separation
In this paper the mixing vector (MV) in the statistical mixing model is compared to the binaural cues represented by interaural level and phase differences (ILD and IPD). It is shown that the MV distributions are quite distinct while binaural models ...
Environmental sound recognition for robot audition using matching-pursuit
IEA/AIE'11: Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part IIOur goal is to achieve a robot audition system that is capable of recognizing multiple environmental sounds and making use of them in human-robot interaction. The main problems in environmental sound recognition in robot audition are: (1) recognition ...
Comments