skip to main content
10.1145/1101826.1101865acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Extracting information from multimedia meeting collections

Published: 10 November 2005 Publication History

Abstract

Multimedia meeting collections, composed of unedited audio and video streams, handwritten notes, slides, and electronic documents that jointly constitute a raw record of complex human interaction processes in the workplace, have attracted interest due to the increasing feasibility of recording them in large quantities, by the opportunities for information access and retrieval applications derived from the automatic extraction of relevant meeting information, and by the challenges that the extraction of semantic information from real human activities entails. In this paper, we present a succint overview of recent approaches in this field, largely influenced by our own experiences. We first review some of the existing and potential needs for users of multimedia meeting information systems. We then summarize recent work on various research areas addressing some of these requirements. In more detail, we describe our work on automatic analysis of human interaction patterns from audio-visual sensors, discussing open issues in this domain.

References

[1]
M. Al-Hames, A. Dielmann, D. Gatica-Perez, S. Reiter, S. Renals, G. Rigoll, and D. Zhang, "Multimodal Integration for Meeting Group Action Segmentation and Recognition," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.
[2]
J. Ang, Y. Liu, and E. Shriberg, "Automatic dialog act segmentation and classification in multiparty meetings," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, Mar. 2005.
[3]
S. Bengio, "An asynchronous Hidden Markov Model for audio-visual speech recognition," in Proc. Advances in Neural Information Processing Systems, (NIPS 15), Vancouver, Dec. 2002.
[4]
S. Bengio and J. Mariethoz, "The expected performance curve: a new assessment measure for person authentication," in Proc. Odyssey, Toledo, May 2004.
[5]
S. Bengio and H. Bourlard, "Multi channel sequence processing," in Proc. PASCAL Machine Learning Workshop, Sheffield, Sep. 2004.
[6]
R.F. Bales, Interaction Process Analysis: a method for the study of small groups, Addison-Wesley, 1951.
[7]
A. H. Buist, W. Kraaij, and S. Raaijmakers, "Automatic summarization of meeting data: A feasibility study," in Proc. Meeting of Computational Linguistics in the Netherlands (CLIN), Leiden, Dec. 2004.
[8]
S. Burger, V. MacLaren, and H. Yu, "The ISL meeting corpus: The impact of meeting type on speech style," in Proc. ICSLP, Denver, Sep. 2002.
[9]
J. Carletta, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, W. Kraaij, M. Kronenthal, G. Lathoud, M. Lincoln, A. Lisowska, I. McCowan, W. Post, D. Reidsma, and P. Wellner, "The AMI meeting corpus: A pre-announcement," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.
[10]
L. Chen, R. Travis~Rose, F. Parrill, X. Han, J. Tu, Z. Huang, M. Harper, F. Quek, D. McNeill, R. Tuttle, and T. Huang, "VACE multimodal meeting corpus," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.
[11]
A. Cremers and B. Hilhorst, "What was discussed by whom, how, when and where? Personalized browsing of annotated multimedia meeting recordings," in Proc. Int. Conf. on Human-Computer Interaction (HCI International), Las Vegas, Jul. 2005.
[12]
S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition," IEEE Trans. on Multimedia, vol. 2, no. 3, pp. 141--151, Sep. 2000.
[13]
B. Erol and Y. Li, "An overview of technologies for e-meeting and e-lecture," in Proc. IEEE Int. Conf. on Multimedia and Expo (ICME), Amsterdam, Jul. 2005.
[14]
S. Elrod, R. Bruce, R. Gold, D. Goldberg, and F. Halasz, "LiveBoard: a large interactive display supporting group meetings, presentations and remote collaboration," in Proc. ACM Conf. on Human Factors in Computing Systems (CHI), Monterey, May 1992.
[15]
D. Gatica-Perez, I. McCowan, D. Zhang, and S. Bengio, "Detecting group interest-level in meetings," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, Mar. 2005.
[16]
D. Gatica-Perez, G. Lathoud, J.-M. Odobez, and I. McCowan, "Multimodal multispeaker probabilistic tracking in meetings," in Proc. Int. Conf. on Multimodal Interfaces (ICMI), Trento, Oct. 2005.
[17]
J.L. Gauvain and C.-H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture obervation of Markov chains," IEEE Trans. on Speech and Audio Processing, vol. 2, pp. 290--298, 1994.
[18]
A. Jaimes, K. Omura, T. Nagamine, and K. Hirata, "Memory cues for meeting video retrieval," in Proc. ACM Int. Conf. on Multimedia, Workshop on Continuous Archival and Retrieval of Personal Experiences (ACM MM-CARPE), New York, Oct. 2004.
[19]
A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, B. Peskin, T. Pfau, E. Shriberg, A. Stolcke, and C. Wooters, "The ICSI meeting corpus," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Hong-Kong, Apr. 2003.
[20]
G. Ji and J. Bilmes, "Dialog act tagging using graphical models," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, Mar. 2005.
[21]
N. Jovanovic and R. op den Akker, "Towards automatic addressee identification in multi-party dialogues," in Proc. SIGDial Workshop on Discourse and Dialogue, Boston, Apr. 2004.
[22]
L. Kennedy and D. Ellis, "Pitch-based emphasis detection for characterization of meeting recordings," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Virgin Islands, Dec. 2003.
[23]
A. Lisowska, A. Popescu-Belis, and S. Armstrong, "User query analysis for the specification and evaluation of a dialogue processing and retrieval system," in Proc. Int. Conf. on Language Resources and Evaluation (LREC), Lisbon, May 2004.
[24]
I. McCowan, D. Gatica-Perez, S. Bengio, G. Lathoud, M. Barnard, and D. Zhang, "Automatic analysis of multimodal group actions in meetings," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 305--317, Mar. 2005.
[25]
J.E. McGrath, Groups: Interaction and Performance, Prentice-Hall, 1984.
[26]
D. Mekhaldi, D. Lalanne, and R. Ingold, "Thematic segmentation of meetings through document/speech alignment," in Proc. ACM Int. Conf. on Multimedia (ACM MM), New York, Nov. 2004.
[27]
T. P. Moran, S. Palen, L.and~Harrison, P. Chiu, D. Kimber, S. L. Minneman, B. van Melle, and P. Zellweger, "I'll get that off the audio: a case study of salvaging captured multimedia meeting records," in Proc. ACM Int. Conf. on Human Factors in Computing Systems (CHI), Atlanta, Mar. 1997.
[28]
N. Morgan, D. Baron, J. Edwards, D. Ellis, D. Gelbart, A. Janin, T. Pfau, E. Shriberg, and A. Stolcke, "The meeting project at ICSI," in Proc. Human Language Technology Conf. (HLT), San Diego, CA, March 2001.
[29]
G. Murray, S. Renals, and J. Carletta, "Extractive summarization of meeting recordings," in Proc. European Conf. on Speech Communication and Technology (Eurospeech), Lisbon, Sep. 2005.
[30]
A. Popescu-Belis and D. Lalanne, "Detection and resolution of references to meeting documents," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.
[31]
S. Renals and D. Ellis, "Audio information access from meeting rooms," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Hong Kong, 2003.
[32]
R. Rienks and D. Heylen, "Automatic dominance detection in meetings using support vector machines," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.
[33]
E. Shriberg, R. Dhillon, S. Bhagat, J. Ang, and H. Carvey, "The ICSI meeting recorder dialog act (MRDA) corpus," in Proc. HLT-NAACL SIGDIAL Workshop, Boston, Apr. 2004.
[34]
E. Shriberg, "Spontaneous speech: How people really talk and why engineers should care," in Proc. European Conf. on Speech Communication and Technology (Eurospeech), Lisbon, Sep. 2005.
[35]
V. Stanford, J. Garofolo, and M. Michel, "The nist smart space and meeting room projects: Signals, acquisition, annotation, and metrics," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Hong Kong, 2003.
[36]
N. Streitz, J. Geissler, J. Haake, and J. Hol, "DOLPHIN: integrated meeting support across local and remote desktop environments and LiveBoards," in Proc. ACM Conf. on Computer Supported Cooperative Work (CSCW), Chapel Hill, Oct. 1994.
[37]
R. Stiefelhagen, J. Yang, and A. Waibel, "Modeling focus of attention for meeting indexing based on multiple cues," IEEE IEEE Trans. on Neural Networks, vol. 13, no. 4, pp. 928--938, 2002.
[38]
S. Tucker and S. Whittaker, "Accessing multimodal meeting data: Systems, problems and possibilities," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Martigny, Jun. 2004.
[39]
A. Vinciarelli and J.-M. Odobez, "Application of information retrieval techniques to presentation slides," IEEE Trans. on Multimedia, 2005, in press.
[40]
A. Waibel, M. Bett, F. Metze, K. Ries, T. Schaaf, T. Schultz, H. Soltau, H. Yu, and K. Zechner, "Advances in automatic meeting record creation and access," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, May 2001.
[41]
S. Whittaker, P. Hyland, and M. Wiley, "Filochat: handwritten notes provide access to recorded conversations," in Proc. ACM Int. Conf. on Human Factors in Computing Systems (CHI), Boston, Apr. 1994.
[42]
S. Whittaker, R. Laban, and S. Tucker, "Analysing meeting records: an ethnographic study and technological implications," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.
[43]
B. Wrede and E. Shriberg, "Spotting hotspots in meetings: Human judgments and prosodic cues," in Proc. European Conf. on Speech Communication and Technology (Eurospeech), Geneva, Sep. 2003.
[44]
B. Wrede and E. Shriberg, "The relationship between dialogue acts and hot spots in meetings," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Virgin Islands, Dec. 2003.
[45]
K. Zechner, "Automatic summarization of open-domain multiparty dialogues in diverse genres.," Computational Linguistics, vol. 28, pp. 447--485, 2002.
[46]
D. Zhang, D. Gatica-Perez, S. Bengio, I. McCowan, and G. Lathoud, "Multimodal group action clustering in meetings," in Proc. ACM Int. Conf. on Multimedia, Workshop on Video Surveillance and Sensor Networks (ACM MM-VSSN), New York, Oct. 2004.
[47]
D. Zhang, D. Gatica-Perez, S. Bengio, and I. McCowan, "Modeling individual and group actions in meetings with layered HMMs," IEEE Trans. on Multimedia, 2005, in press.
[48]
D. Zhang, D. Gatica-Perez, S. Bengio, and D. Roy, "Learning Influence among Interacting Markov Chains," in Proc. Advances in Neural Information Processing Systems (NIPS 18), Vancouver, Dec. 2005.
[49]
Augmented Multi-Party Interaction~(AMI) project, www.amiproject.org.
[50]
Interactive Multimodal Information Management~(IM2) project, www.im2.ch.
[51]
MultiModal Meeting Manager~(M4) project, www.m4project.org.
[52]
AMI project, "Use cases and user requirements," Public deliverable D6.2, Apr. 2005.
[53]
NIST, Proc. Rich Transcription 2005 Spring Meeting Recognition Evaluation Workshop, Edinburgh, Jul. 2005.

Cited By

View all
  • (2012)Inferring competitive role patterns in reality TV show through nonverbal analysisMultimedia Tools and Applications10.1007/s11042-010-0545-856:1(207-226)Online publication date: 1-Jan-2012
  • (2009)Automatic nonverbal analysis of social interaction in small groupsImage and Vision Computing10.1016/j.imavis.2009.01.00427:12(1775-1787)Online publication date: 1-Nov-2009
  • (2008)Audio analysis for multimedia retrieval from a ubiquitous homeProceedings of the 14th international conference on Advances in multimedia modeling10.5555/1785794.1785846(466-476)Online publication date: 9-Jan-2008
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MIR '05: Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
November 2005
274 pages
ISBN:1595932445
DOI:10.1145/1101826
  • General Chairs:
  • Hongjiang Zhang,
  • John Smith,
  • Qi Tian
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graphical models
  2. human interaction modeling
  3. meeting
  4. semantic

Qualifiers

  • Article

Conference

MM&Sec '05
MM&Sec '05: Multimedia and Security Workshop 2005
November 10 - 11, 2005
Hilton, Singapore

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)2
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2012)Inferring competitive role patterns in reality TV show through nonverbal analysisMultimedia Tools and Applications10.1007/s11042-010-0545-856:1(207-226)Online publication date: 1-Jan-2012
  • (2009)Automatic nonverbal analysis of social interaction in small groupsImage and Vision Computing10.1016/j.imavis.2009.01.00427:12(1775-1787)Online publication date: 1-Nov-2009
  • (2008)Audio analysis for multimedia retrieval from a ubiquitous homeProceedings of the 14th international conference on Advances in multimedia modeling10.5555/1785794.1785846(466-476)Online publication date: 9-Jan-2008
  • (2008)Audio Analysis for Multimedia Retrieval from a Ubiquitous HomeAdvances in Multimedia Modeling10.1007/978-3-540-77409-9_44(466-476)Online publication date: 2008
  • (2007)Speakers Role Recognition in Multiparty Audio Recordings Using Social Network Analysis and Duration Distribution ModelingIEEE Transactions on Multimedia10.1109/TMM.2007.9028829:6(1215-1226)Online publication date: 1-Oct-2007

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media