Article

Extracting information from multimedia meeting collections

Authors:

Daniel Gatica-Perez,

Samy BengioAuthors Info & Claims

MIR '05: Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval

Pages 245 - 252

https://doi.org/10.1145/1101826.1101865

Published: 10 November 2005 Publication History

Abstract

Multimedia meeting collections, composed of unedited audio and video streams, handwritten notes, slides, and electronic documents that jointly constitute a raw record of complex human interaction processes in the workplace, have attracted interest due to the increasing feasibility of recording them in large quantities, by the opportunities for information access and retrieval applications derived from the automatic extraction of relevant meeting information, and by the challenges that the extraction of semantic information from real human activities entails. In this paper, we present a succint overview of recent approaches in this field, largely influenced by our own experiences. We first review some of the existing and potential needs for users of multimedia meeting information systems. We then summarize recent work on various research areas addressing some of these requirements. In more detail, we describe our work on automatic analysis of human interaction patterns from audio-visual sensors, discussing open issues in this domain.

References

[1]

M. Al-Hames, A. Dielmann, D. Gatica-Perez, S. Reiter, S. Renals, G. Rigoll, and D. Zhang, "Multimodal Integration for Meeting Group Action Segmentation and Recognition," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.

Digital Library

[2]

J. Ang, Y. Liu, and E. Shriberg, "Automatic dialog act segmentation and classification in multiparty meetings," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, Mar. 2005.

[3]

S. Bengio, "An asynchronous Hidden Markov Model for audio-visual speech recognition," in Proc. Advances in Neural Information Processing Systems, (NIPS 15), Vancouver, Dec. 2002.

[4]

S. Bengio and J. Mariethoz, "The expected performance curve: a new assessment measure for person authentication," in Proc. Odyssey, Toledo, May 2004.

[5]

S. Bengio and H. Bourlard, "Multi channel sequence processing," in Proc. PASCAL Machine Learning Workshop, Sheffield, Sep. 2004.

[6]

R.F. Bales, Interaction Process Analysis: a method for the study of small groups, Addison-Wesley, 1951.

[7]

A. H. Buist, W. Kraaij, and S. Raaijmakers, "Automatic summarization of meeting data: A feasibility study," in Proc. Meeting of Computational Linguistics in the Netherlands (CLIN), Leiden, Dec. 2004.

[8]

S. Burger, V. MacLaren, and H. Yu, "The ISL meeting corpus: The impact of meeting type on speech style," in Proc. ICSLP, Denver, Sep. 2002.

[9]

J. Carletta, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, W. Kraaij, M. Kronenthal, G. Lathoud, M. Lincoln, A. Lisowska, I. McCowan, W. Post, D. Reidsma, and P. Wellner, "The AMI meeting corpus: A pre-announcement," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.

Digital Library

[10]

L. Chen, R. Travis~Rose, F. Parrill, X. Han, J. Tu, Z. Huang, M. Harper, F. Quek, D. McNeill, R. Tuttle, and T. Huang, "VACE multimodal meeting corpus," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.

Digital Library

[11]

A. Cremers and B. Hilhorst, "What was discussed by whom, how, when and where? Personalized browsing of annotated multimedia meeting recordings," in Proc. Int. Conf. on Human-Computer Interaction (HCI International), Las Vegas, Jul. 2005.

[12]

S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition," IEEE Trans. on Multimedia, vol. 2, no. 3, pp. 141--151, Sep. 2000.

Digital Library

[13]

B. Erol and Y. Li, "An overview of technologies for e-meeting and e-lecture," in Proc. IEEE Int. Conf. on Multimedia and Expo (ICME), Amsterdam, Jul. 2005.

[14]

S. Elrod, R. Bruce, R. Gold, D. Goldberg, and F. Halasz, "LiveBoard: a large interactive display supporting group meetings, presentations and remote collaboration," in Proc. ACM Conf. on Human Factors in Computing Systems (CHI), Monterey, May 1992.

Digital Library

[15]

D. Gatica-Perez, I. McCowan, D. Zhang, and S. Bengio, "Detecting group interest-level in meetings," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, Mar. 2005.

[16]

D. Gatica-Perez, G. Lathoud, J.-M. Odobez, and I. McCowan, "Multimodal multispeaker probabilistic tracking in meetings," in Proc. Int. Conf. on Multimodal Interfaces (ICMI), Trento, Oct. 2005.

Digital Library

[17]

J.L. Gauvain and C.-H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture obervation of Markov chains," IEEE Trans. on Speech and Audio Processing, vol. 2, pp. 290--298, 1994.

[18]

A. Jaimes, K. Omura, T. Nagamine, and K. Hirata, "Memory cues for meeting video retrieval," in Proc. ACM Int. Conf. on Multimedia, Workshop on Continuous Archival and Retrieval of Personal Experiences (ACM MM-CARPE), New York, Oct. 2004.

Digital Library

[19]

A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, B. Peskin, T. Pfau, E. Shriberg, A. Stolcke, and C. Wooters, "The ICSI meeting corpus," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Hong-Kong, Apr. 2003.

[20]

G. Ji and J. Bilmes, "Dialog act tagging using graphical models," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, Mar. 2005.

[21]

N. Jovanovic and R. op den Akker, "Towards automatic addressee identification in multi-party dialogues," in Proc. SIGDial Workshop on Discourse and Dialogue, Boston, Apr. 2004.

[22]

L. Kennedy and D. Ellis, "Pitch-based emphasis detection for characterization of meeting recordings," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Virgin Islands, Dec. 2003.

[23]

A. Lisowska, A. Popescu-Belis, and S. Armstrong, "User query analysis for the specification and evaluation of a dialogue processing and retrieval system," in Proc. Int. Conf. on Language Resources and Evaluation (LREC), Lisbon, May 2004.

[24]

I. McCowan, D. Gatica-Perez, S. Bengio, G. Lathoud, M. Barnard, and D. Zhang, "Automatic analysis of multimodal group actions in meetings," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 305--317, Mar. 2005.

Digital Library

[25]

J.E. McGrath, Groups: Interaction and Performance, Prentice-Hall, 1984.

[26]

D. Mekhaldi, D. Lalanne, and R. Ingold, "Thematic segmentation of meetings through document/speech alignment," in Proc. ACM Int. Conf. on Multimedia (ACM MM), New York, Nov. 2004.

Digital Library

[27]

T. P. Moran, S. Palen, L.and~Harrison, P. Chiu, D. Kimber, S. L. Minneman, B. van Melle, and P. Zellweger, "I'll get that off the audio: a case study of salvaging captured multimedia meeting records," in Proc. ACM Int. Conf. on Human Factors in Computing Systems (CHI), Atlanta, Mar. 1997.

Digital Library

[28]

N. Morgan, D. Baron, J. Edwards, D. Ellis, D. Gelbart, A. Janin, T. Pfau, E. Shriberg, and A. Stolcke, "The meeting project at ICSI," in Proc. Human Language Technology Conf. (HLT), San Diego, CA, March 2001.

Digital Library

[29]

G. Murray, S. Renals, and J. Carletta, "Extractive summarization of meeting recordings," in Proc. European Conf. on Speech Communication and Technology (Eurospeech), Lisbon, Sep. 2005.

[30]

A. Popescu-Belis and D. Lalanne, "Detection and resolution of references to meeting documents," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.

Digital Library

[31]

S. Renals and D. Ellis, "Audio information access from meeting rooms," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Hong Kong, 2003.

[32]

R. Rienks and D. Heylen, "Automatic dominance detection in meetings using support vector machines," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.

[33]

E. Shriberg, R. Dhillon, S. Bhagat, J. Ang, and H. Carvey, "The ICSI meeting recorder dialog act (MRDA) corpus," in Proc. HLT-NAACL SIGDIAL Workshop, Boston, Apr. 2004.

[34]

E. Shriberg, "Spontaneous speech: How people really talk and why engineers should care," in Proc. European Conf. on Speech Communication and Technology (Eurospeech), Lisbon, Sep. 2005.

[35]

V. Stanford, J. Garofolo, and M. Michel, "The nist smart space and meeting room projects: Signals, acquisition, annotation, and metrics," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Hong Kong, 2003.

[36]

N. Streitz, J. Geissler, J. Haake, and J. Hol, "DOLPHIN: integrated meeting support across local and remote desktop environments and LiveBoards," in Proc. ACM Conf. on Computer Supported Cooperative Work (CSCW), Chapel Hill, Oct. 1994.

Digital Library

[37]

R. Stiefelhagen, J. Yang, and A. Waibel, "Modeling focus of attention for meeting indexing based on multiple cues," IEEE IEEE Trans. on Neural Networks, vol. 13, no. 4, pp. 928--938, 2002.

Digital Library

[38]

S. Tucker and S. Whittaker, "Accessing multimodal meeting data: Systems, problems and possibilities," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Martigny, Jun. 2004.

Digital Library

[39]

A. Vinciarelli and J.-M. Odobez, "Application of information retrieval techniques to presentation slides," IEEE Trans. on Multimedia, 2005, in press.

Digital Library

[40]

A. Waibel, M. Bett, F. Metze, K. Ries, T. Schaaf, T. Schultz, H. Soltau, H. Yu, and K. Zechner, "Advances in automatic meeting record creation and access," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, May 2001.

[41]

S. Whittaker, P. Hyland, and M. Wiley, "Filochat: handwritten notes provide access to recorded conversations," in Proc. ACM Int. Conf. on Human Factors in Computing Systems (CHI), Boston, Apr. 1994.

Digital Library

[42]

S. Whittaker, R. Laban, and S. Tucker, "Analysing meeting records: an ethnographic study and technological implications," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.

Digital Library

[43]

B. Wrede and E. Shriberg, "Spotting hotspots in meetings: Human judgments and prosodic cues," in Proc. European Conf. on Speech Communication and Technology (Eurospeech), Geneva, Sep. 2003.

[44]

B. Wrede and E. Shriberg, "The relationship between dialogue acts and hot spots in meetings," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Virgin Islands, Dec. 2003.

[45]

K. Zechner, "Automatic summarization of open-domain multiparty dialogues in diverse genres.," Computational Linguistics, vol. 28, pp. 447--485, 2002.

Digital Library

[46]

D. Zhang, D. Gatica-Perez, S. Bengio, I. McCowan, and G. Lathoud, "Multimodal group action clustering in meetings," in Proc. ACM Int. Conf. on Multimedia, Workshop on Video Surveillance and Sensor Networks (ACM MM-VSSN), New York, Oct. 2004.

Digital Library

[47]

D. Zhang, D. Gatica-Perez, S. Bengio, and I. McCowan, "Modeling individual and group actions in meetings with layered HMMs," IEEE Trans. on Multimedia, 2005, in press.

Digital Library

[48]

D. Zhang, D. Gatica-Perez, S. Bengio, and D. Roy, "Learning Influence among Interacting Markov Chains," in Proc. Advances in Neural Information Processing Systems (NIPS 18), Vancouver, Dec. 2005.

[49]

Augmented Multi-Party Interaction~(AMI) project, www.amiproject.org.

[50]

Interactive Multimodal Information Management~(IM2) project, www.im2.ch.

[51]

MultiModal Meeting Manager~(M4) project, www.m4project.org.

[52]

AMI project, "Use cases and user requirements," Public deliverable D6.2, Apr. 2005.

[53]

NIST, Proc. Rich Transcription 2005 Spring Meeting Recognition Evaluation Workshop, Edinburgh, Jul. 2005.

Cited By

Raducanu BGatica-Perez D(2012)Inferring competitive role patterns in reality TV show through nonverbal analysisMultimedia Tools and Applications10.1007/s11042-010-0545-856:1(207-226)Online publication date: 1-Jan-2012
https://dl.acm.org/doi/10.1007/s11042-010-0545-8
Gatica-Perez D(2009)Automatic nonverbal analysis of social interaction in small groupsImage and Vision Computing10.1016/j.imavis.2009.01.00427:12(1775-1787)Online publication date: 1-Nov-2009
https://dl.acm.org/doi/10.1016/j.imavis.2009.01.004
de Silva GYamasaki TAizawa K(2008)Audio analysis for multimedia retrieval from a ubiquitous homeProceedings of the 14th international conference on Advances in multimedia modeling10.5555/1785794.1785846(466-476)Online publication date: 9-Jan-2008
https://dl.acm.org/doi/10.5555/1785794.1785846
Show More Cited By

Index Terms

Extracting information from multimedia meeting collections
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

Extracting information from newspaper archives in Africa

In sub-Saharan Africa, lack of useful information for the public good is one obstacle to the development of public services (public safety, education, healthcare, etc.). This makes the extraction of data from digital archives (e.g., analog sources such ...
Extracting anchorable information units from PDF files
ICME '03: Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2

Document processing and understanding is important for a variety of applications such as office automation, creation of electronic manuals, online documentation and annotation etc. The first step towards this process often involves the extraction of ...
uMeeting, an efficient co-located meeting system on the large-scale tabletop
HCII'11: Proceedings of the 14th international conference on Human-computer interaction: users and applications - Volume Part IV

In this paper, we present the uMeeting system, a co-located meeting system on the large-scale tabletop. People are used to sitting around a table to hold a meeting. It is natural and intuitive. The table has a central role to support team activities. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MIR '05: Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval

November 2005

274 pages

ISBN:1595932445

DOI:10.1145/1101826

General Chairs:
Hongjiang Zhang
Microsoft Research Advanced Technology Center, China
,
John Smith
IBM T. J. Watson Research Center, Hawthorne, NY, USA
,
Qi Tian
University of Texas at San Antonio, San Antonio, TX, USA

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

MM&Sec '05

Sponsor:

MM&Sec '05: Multimedia and Security Workshop 2005

November 10 - 11, 2005

Hilton, Singapore

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
510
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)2

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Raducanu BGatica-Perez D(2012)Inferring competitive role patterns in reality TV show through nonverbal analysisMultimedia Tools and Applications10.1007/s11042-010-0545-856:1(207-226)Online publication date: 1-Jan-2012
https://dl.acm.org/doi/10.1007/s11042-010-0545-8
Gatica-Perez D(2009)Automatic nonverbal analysis of social interaction in small groupsImage and Vision Computing10.1016/j.imavis.2009.01.00427:12(1775-1787)Online publication date: 1-Nov-2009
https://dl.acm.org/doi/10.1016/j.imavis.2009.01.004
de Silva GYamasaki TAizawa K(2008)Audio analysis for multimedia retrieval from a ubiquitous homeProceedings of the 14th international conference on Advances in multimedia modeling10.5555/1785794.1785846(466-476)Online publication date: 9-Jan-2008
https://dl.acm.org/doi/10.5555/1785794.1785846
de Silva GYamasaki TAizawa K(2008)Audio Analysis for Multimedia Retrieval from a Ubiquitous HomeAdvances in Multimedia Modeling10.1007/978-3-540-77409-9_44(466-476)Online publication date: 2008
https://doi.org/10.1007/978-3-540-77409-9_44
Vinciarelli A(2007)Speakers Role Recognition in Multiparty Audio Recordings Using Social Network Analysis and Duration Distribution ModelingIEEE Transactions on Multimedia10.1109/TMM.2007.9028829:6(1215-1226)Online publication date: 1-Oct-2007
https://dl.acm.org/doi/10.1109/TMM.2007.902882

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten