skip to main content
10.1145/2157689.2157834acmconferencesArticle/Chapter ViewAbstractPublication PageshriConference Proceedingsconference-collections
research-article

The cocktail party robot: sound source separation and localisation with an active binaural head

Published:05 March 2012Publication History

ABSTRACT

Human-robot communication is often faced with the difficult problem of interpreting ambiguous auditory data. For example, the acoustic signals perceived by a humanoid with its on-board microphones contain a mix of sounds such as speech, music, electronic devices, all in the presence of attenuation and reverberations. In this paper we propose a novel method, based on a generative probabilistic model and on active binaural hearing, allowing a robot to robustly perform sound-source separation and localization. We show how interaural spectral cues can be used within a constrained mixture model specifically designed to capture the richness of the data gathered with two microphones mounted onto a human-like artificial head. We describe in detail a novel EM algorithm, we analyse its initialization, speed of convergence and complexity, and we assess its performance with both simulated and real data.

References

  1. R. V. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano. The CIPIC HRTF Database. IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, pages 92--102, Oct. 2001.Google ScholarGoogle Scholar
  2. J. Allen. Short-term spectral analysis, synthesis, and modification by discrete fourier transform. IEEE Trans. Acous., Speech and Signal Process., 25(3):235--238, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Aytekin, C. F. Moss, and J. Z. Simon. A sensorimotor approach to sound localization. Neural Computation, 20(3):603--635, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Bensaid, A. Schutz, and D. T. M. Slock. Single microphone blind audio source separation using EM-Kalman filter and shortGoogle ScholarGoogle Scholar
  5. long term AR modeling. In Latent Variable Analysis and Signal Separation, pages 106--113, 2010.Google ScholarGoogle Scholar
  6. G. Celeux and G. Govaert. A classification EM algorithm for clustering and two stochastic versions. Computational Statistics and Data Analysis, 14(3):315--332, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Comon and C. Jutten. Handbook of Blind Source Separation, Independent Component Analysis and Applications. Academic Press (Elsevier), Feb. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Deleforge and R. Horaud. A latently constrained mixture model for audio source separation and localization. In Latent Variable Analysis and Signal Separation, Tel Aviv, Israel, March 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Deleforge and R. P. Horaud. Learning the direction of a sound source using head motions and spectral features. Technical Report RR-7529, INRIA, Feb. 2011.Google ScholarGoogle Scholar
  10. S. Haykin and Z. Chen. The cocktail party problem. Neural Computation, 17:1875--1902, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Hörnstein, M. Lopes, J. Santos-Victor, and F. Lacerda. Sound localization for humanoid robots -- building audio-motor maps based on the HRTF. In Proc. of IEEE/RSJ IROS, pages 1170--1176, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  12. F. Keyrouz, W. Maier, and K. Diepold. Robotic localization and separation of concurrent sound sources using self-splitting competitive learning. In Proc. of IEEE CIISP, pages 340--345, Hawaii, Apr. 2007.Google ScholarGoogle ScholarCross RefCross Ref
  13. F. Keyrouz, Y. Naous, and K. Diepold. A new method for binaural 3D localization based on HRTFs. In Proc. of IEEE ICASSP, volume 5, May 2006.Google ScholarGoogle Scholar
  14. V. Khalidov, F. Forbes, and R. P. Horaud. Conjugate mixture models for clustering multimodal data. Neural Computation, 23(2):517--557, Feb. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. I. Mandel, R. J. Weiss, and D. P. W. Ellis. Model-based expectation-maximization source separation and localization. IEEE Trans. on Audio, Speech and Lang. Proc., 18:382--394, Feb. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. C. Middlebrooks and D. M. Green. Sound localization by human listeners. Annual Review of Psychology, 42:135--159, January 1991.Google ScholarGoogle ScholarCross RefCross Ref
  17. J. Nix and V. Hohmann. Sound source localization in real sound fields based on empirical statistics of interaural parameters. Journal of the Acoustical Society of America, 119(1):463--479, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. K. O'Regan and A. Noe. A sensorimotor account of vision and visual consciousness. Behavioral and Brain Sciences, 24:939--1031, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Otani, T. Hirahara, and S. Ise. Numerical study on source-distance dependency of head-related transfer functions. Journal of the Acoustical Society of America, 125(5):3253--61, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  20. N. Roman and D. Wang. Binaural tracking of multiple moving sources. IEEE Trans. on Acoust., Speech and Signal Process., 16(4):728--739, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. T. Roweis. One microphone source separation. In Advances in Neural Information Processing Systems, volume 13, pages 793--799. MIT Press, 2000.Google ScholarGoogle Scholar
  22. B. Shinn-Cunningham, N. Kopco, and T. J. Martin. Localizing nearby sound sources in a classroom: Binaural room impulse responses. Journal of the Acoustical Society of America, 117(5):3100--3115, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  23. E. Vincent, R. Gribonval, and C. Févotte. Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech & Language Processing, 14(4):1462--1469, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. H. Viste and G. Evangelista. On the use of spatial cues to improve binaural source separation. In Proc. Int. Conf. on Digital Audio Effects, pages 209--213, 2003.Google ScholarGoogle Scholar
  25. V. Willert, J. Eggert, J. Adamy, R. Stahl, and E. Koerner. A probabilistic model for binaural sound localization. IEEE Transactions on Systems, Man, and Cybernetics--Part B, 36(5):982--994, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. O. Yílmaz and S. Rickard. Blind separation of speech mixtures via time-frequency masking. IEEE Transactions on Signal Processing, 52:1830--1847, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Zhigljavsky and A. \v Zilinskas. Stochastic Global Optimization. Springer, 2008.Google ScholarGoogle Scholar

Index Terms

  1. The cocktail party robot: sound source separation and localisation with an active binaural head

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              HRI '12: Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction
              March 2012
              518 pages
              ISBN:9781450310635
              DOI:10.1145/2157689

              Copyright © 2012 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 5 March 2012

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate242of1,000submissions,24%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader