skip to main content
10.1145/1180995.1181037acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

EM detection of common origin of multi-modal cues

Authors Info & Claims
Published:02 November 2006Publication History

ABSTRACT

Content analysis of clips containing people speaking involves processing informative cues coming from different modalities. These cues are usually the words extracted from the audio modality, and the identity of the persons appearing in the video modality of the clip. To achieve efficient assignment of these cues to the person that created them, we propose a Bayesian network model that utilizes the extracted feature characteristics, their relations and their temporal patterns. We use the EM algorithm in which the E-step estimates the expectation of the complete-data log-likelihood with respect to the hidden variables - that is the identity of the speakers and the visible persons. In the M-step , the person models that maximize this expectation are computed. This framework produces excellent results, exhibiting exceptional robustness when dealing with low quality data.

References

  1. J. Bilmes. A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models, 1997.Google ScholarGoogle Scholar
  2. O. Chapelle, P. Haffner, and V. Vapnik. Svms for histogram-based image classification, 1999.Google ScholarGoogle Scholar
  3. S. Chen and P. Gopalakrishnan. Speaker, environment and channel change detection and clustering via the bayesian information criterion, 1998.Google ScholarGoogle Scholar
  4. T. Darrell, J. W. Fisher, III, P. Viola, and W. Freeman. Audio-visual segmentation and "the cocktail party effect".Google ScholarGoogle Scholar
  5. J. G. Fiscus, N. Radde, J. S. Garofolo, A. Le, J. Ajot, and C. Laprun. Rich transcription 2005 spring meeting recognition evaluation, 2005.Google ScholarGoogle Scholar
  6. J. W. F. III and T. Darrel. Probabilistic models and informative subspaces for audiovisual correspondence. TUGboat, 12(2):291--301, June 1991.Google ScholarGoogle Scholar
  7. J. W. F. III and T. Darrell. Probabalistic models and informative subspaces for audiovisual correspondence. In ECCV (3), pages 592--603, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. W. F. III, T. Darrell, W. T. Freeman, and P. A. Viola. Learning joint statistical models for audio-visual fusion and segregation. In NIPS, pages 772--778, 2000.Google ScholarGoogle Scholar
  9. L. Lu and H.-J. Zhang. Speaker change detection and tracking in real-time news broadcasting analysis. In MULTIMEDIA '02: Proceedings of the tenth ACM international conference on Multimedia, pages 602--610, New York, NY, USA, 2002. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Ming, J. Lin, and F. J. Smith. A posterior union model with applications to robust speech and speaker recognition. EURASIP Journal on Applied Signal Processing, 2006, December 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Moghaddam and A. Pentland. Face recognition using view-based and modular eigenspaces. In Automatic Systems for the Identification and Inspection of Humans, SPIE'94, volume 2257, 1994.Google ScholarGoogle Scholar
  12. H. J. Nock, G. Iyengar, and C. Neti. Multimodal processing by finding common cause. Commun. ACM, 47(1):51--56, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. R. Rabiner. A Tutorial on Hidden Markov Models and Selected Apllications in Speech Recognition. Kaufmann, San Mateo, CA, 1990.Google ScholarGoogle Scholar
  14. K. Saenko, K. Livescu, M. Siracusa, K. Wilson, J. Glass, and T. Darrell. Visual speech recognition with loosely synchronized feature streams. In ICCV '05: Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 2, pages 1424--1431, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. A. Turk and A. P. Pentland. Face recognition using eigenfaces. In IEEE Conf. on Computer Vision and Pattern Recognition, pages 586--591, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  16. P. Viola and M. Jones. Robust real-time face detection. International Journal of Computer Vision, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M.-H. Yang, D. J. Kriegman, and N. Ahuja. Detecting faces in images: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 24(1):34--58, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Zhao, R. Chellappa, A. Rosenfeld, and P. Phillips. Face recognition: A literature survey, 2000.Google ScholarGoogle Scholar

Index Terms

  1. EM detection of common origin of multi-modal cues

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces
      November 2006
      404 pages
      ISBN:159593541X
      DOI:10.1145/1180995

      Copyright © 2006 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 November 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate453of1,080submissions,42%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader