ACM Home Page
Please provide us with feedback. Feedback
Multimodal multispeaker probabilistic tracking in meetings
Full text PdfPdf (654 KB)
Source International Conference on Multimodal Interfaces archive
Proceedings of the 7th international conference on Multimodal interfaces table of contents
Torento, Italy
SESSION: Recognizing communication patterns table of contents
Pages: 183 - 190  
Year of Publication: 2005
ISBN:1-59593-028-0
Authors
Daniel Gatica-Perez  IDIAP Research Institute, Martigny, Switzerland
Guillaume Lathoud  IDIAP Research Institute, Martigny, Switzerland
Jean-Marc Odobez  IDIAP Research Institute, Martigny, Switzerland
Iain McCowan  eHealth Research Centre, Brisbane, Australia
Sponsors
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 53,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1088463.1088496
What is a DOI?

ABSTRACT

Tracking speakers in multiparty conversations constitutes a fundamental task for automatic meeting analysis. In this paper, we present a probabilistic approach to jointly track the location and speaking activity of multiple speakers in a multisensor meeting room, equipped with a small microphone array and multiple uncalibrated cameras. Our framework is based on a mixed-state dynamic graphical model defined on a multiperson state-space, which includes the explicit definition of a proximity-based interaction model. The model integrates audio-visual (AV) data through a novel observation model. Audio observations are derived from a source localization algorithm. Visual observations are based on models of the shape and spatial structure of human heads. Approximate inference in our model, needed given its complexity, is performed with a Markov Chain Monte Carlo particle filter (MCMC-PF), which results in high sampling efficiency. We present results -based on an objective evaluation procedure-that show that our framework (1) is capable of locating and tracking the position and speaking activity of multiple meeting participants engaged in real conversations with good accuracy; (2) can deal with cases of visual clutter and partial occlusion; and (3) significantly outperforms a traditional sampling-based approach.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
N. Checka, K. Wilson, M. Siracusa, and T. Darrell, "Multiple person and speaker activity tracking with a particle filter," in Proc. ICASSP, May 2004.
 
3
Y. Chen and Y. Rui, "Real-time speaker tracking using particle filter sensor fusion," Proc. of the IEEE, vol. 92, no. 3, pp. 485--494, Mar. 2004.
4
 
5
J. DiBiase, H. Silverman, and M. Brandstein, "Robust localization in reverberant rooms," in Microphone Arrays, Ch. 8, pp. 157--180. Springer, 2001.
 
6
D. Gatica-Perez, G. Lathoud, I. McCowan, and J.-M. Odobez, "A mixed-state i-particle filter for multi-camera speaker tracking," in Proc. ICCV-WOMTEC, Oct. 2003.
 
7
M. Isard, Visual Motion Analysis by Probabilistic Propagation of Conditional Density, PhD Thesis, 1998.
 
8
M. Isard and J. MacCormick, "Bramble: A Bayesian multi-blob tracker," in Proc. ICCV, Jul. 2001.
 
9
Z. Khan, T. Balch, and F. Dellaert, "An MCMC-based particle filter for tracking multiple interacting targets," in Proc. ECCV, May 2004.
 
10
 
11
J.S. Liu, Monte Carlo Strategies in Scientific Computing, Springer-Verlag, 2001.
 
12
J.E. McGrath, Groups: Interaction and Performance, Prentice-Hall, 1984.
 
13
V. Pavlovic, A. Garg, and J. Rehg, "Multimodal speaker detection using error feedback dynamic Bayesian networks," in Proc. CVPR, Jun. 2000.
 
14
 
15
J. Vermaak, M. Gagnet, A. Blake, and P. Perez, "Sequential Monte Carlo fusion of sound and vision for speaker tracking," in Proc. ICCV, July 2001.
 
16
P. Viola and M. Jones, "Rapid object detection by boosted cascade of simple features," in Proc. CVPR, Dec. 2001.


Collaborative Colleagues:
Daniel Gatica-Perez: colleagues
Guillaume Lathoud: colleagues
Jean-Marc Odobez: colleagues
Iain McCowan: colleagues