|
ABSTRACT
Multimedia meeting collections, composed of unedited audio and video streams, handwritten notes, slides, and electronic documents that jointly constitute a raw record of complex human interaction processes in the workplace, have attracted interest due to the increasing feasibility of recording them in large quantities, by the opportunities for information access and retrieval applications derived from the automatic extraction of relevant meeting information, and by the challenges that the extraction of semantic information from real human activities entails. In this paper, we present a succint overview of recent approaches in this field, largely influenced by our own experiences. We first review some of the existing and potential needs for users of multimedia meeting information systems. We then summarize recent work on various research areas addressing some of these requirements. In more detail, we describe our work on automatic analysis of human interaction patterns from audio-visual sensors, discussing open issues in this domain.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
M. Al-Hames, A. Dielmann, D. Gatica-Perez, S. Reiter, S. Renals, G. Rigoll, and D. Zhang, "Multimodal Integration for Meeting Group Action Segmentation and Recognition," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.
|
| |
2
|
J. Ang, Y. Liu, and E. Shriberg, "Automatic dialog act segmentation and classification in multiparty meetings," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, Mar. 2005.
|
| |
3
|
S. Bengio, "An asynchronous Hidden Markov Model for audio-visual speech recognition," in Proc. Advances in Neural Information Processing Systems, (NIPS 15), Vancouver, Dec. 2002.
|
| |
4
|
S. Bengio and J. Mariethoz, "The expected performance curve: a new assessment measure for person authentication," in Proc. Odyssey, Toledo, May 2004.
|
| |
5
|
S. Bengio and H. Bourlard, "Multi channel sequence processing," in Proc. PASCAL Machine Learning Workshop, Sheffield, Sep. 2004.
|
| |
6
|
R.F. Bales, Interaction Process Analysis: a method for the study of small groups, Addison-Wesley, 1951.
|
| |
7
|
A. H. Buist, W. Kraaij, and S. Raaijmakers, "Automatic summarization of meeting data: A feasibility study," in Proc. Meeting of Computational Linguistics in the Netherlands (CLIN), Leiden, Dec. 2004.
|
| |
8
|
S. Burger, V. MacLaren, and H. Yu, "The ISL meeting corpus: The impact of meeting type on speech style," in Proc. ICSLP, Denver, Sep. 2002.
|
| |
9
|
J. Carletta, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, W. Kraaij, M. Kronenthal, G. Lathoud, M. Lincoln, A. Lisowska, I. McCowan, W. Post, D. Reidsma, and P. Wellner, "The AMI meeting corpus: A pre-announcement," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.
|
| |
10
|
L. Chen, R. Travis~Rose, F. Parrill, X. Han, J. Tu, Z. Huang, M. Harper, F. Quek, D. McNeill, R. Tuttle, and T. Huang, "VACE multimodal meeting corpus," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.
|
| |
11
|
A. Cremers and B. Hilhorst, "What was discussed by whom, how, when and where? Personalized browsing of annotated multimedia meeting recordings," in Proc. Int. Conf. on Human-Computer Interaction (HCI International), Las Vegas, Jul. 2005.
|
| |
12
|
S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition," IEEE Trans. on Multimedia, vol. 2, no. 3, pp. 141--151, Sep. 2000.
|
| |
13
|
B. Erol and Y. Li, "An overview of technologies for e-meeting and e-lecture," in Proc. IEEE Int. Conf. on Multimedia and Expo (ICME), Amsterdam, Jul. 2005.
|
 |
14
|
Scott Elrod , Richard Bruce , Rich Gold , David Goldberg , Frank Halasz , William Janssen , David Lee , Kim McCall , Elin Pedersen , Ken Pier , John Tang , Brent Welch, Liveboard: a large interactive display supporting group meetings, presentations, and remote collaboration, Proceedings of the SIGCHI conference on Human factors in computing systems, p.599-607, May 03-07, 1992, Monterey, California, United States
[doi> 10.1145/142750.143052]
|
| |
15
|
D. Gatica-Perez, I. McCowan, D. Zhang, and S. Bengio, "Detecting group interest-level in meetings," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, Mar. 2005.
|
 |
16
|
|
| |
17
|
J.L. Gauvain and C.-H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture obervation of Markov chains," IEEE Trans. on Speech and Audio Processing, vol. 2, pp. 290--298, 1994.
|
 |
18
|
Alejandro Jaimes , Kengo Omura , Takeshi Nagamine , Kazutaka Hirata, Memory cues for meeting video retrieval, Proceedings of the the 1st ACM workshop on Continuous archival and retrieval of personal experiences, October 15-15, 2004, New York, New York, USA
[doi> 10.1145/1026653.1026665]
|
| |
19
|
A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, B. Peskin, T. Pfau, E. Shriberg, A. Stolcke, and C. Wooters, "The ICSI meeting corpus," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Hong-Kong, Apr. 2003.
|
| |
20
|
G. Ji and J. Bilmes, "Dialog act tagging using graphical models," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Philadelphia, Mar. 2005.
|
| |
21
|
N. Jovanovic and R. op den Akker, "Towards automatic addressee identification in multi-party dialogues," in Proc. SIGDial Workshop on Discourse and Dialogue, Boston, Apr. 2004.
|
| |
22
|
L. Kennedy and D. Ellis, "Pitch-based emphasis detection for characterization of meeting recordings," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Virgin Islands, Dec. 2003.
|
| |
23
|
A. Lisowska, A. Popescu-Belis, and S. Armstrong, "User query analysis for the specification and evaluation of a dialogue processing and retrieval system," in Proc. Int. Conf. on Language Resources and Evaluation (LREC), Lisbon, May 2004.
|
| |
24
|
Iain McCowan , Daniel Gatica-Perez , Samy Bengio , Guillaume Lathoud , Mark Barnard , Dong Zhang, Automatic Analysis of Multimodal Group Actions in Meetings, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.27 n.3, p.305-317, March 2005
[doi> 10.1109/TPAMI.2005.49]
|
| |
25
|
J.E. McGrath, Groups: Interaction and Performance, Prentice-Hall, 1984.
|
 |
26
|
|
 |
27
|
Thomas P. Moran , Leysia Palen , Steve Harrison , Patrick Chiu , Don Kimber , Scott Minneman , William van Melle , Polle Zellweger, “I'll get that off the audio”: a case study of salvaging multimedia meeting records, Proceedings of the SIGCHI conference on Human factors in computing systems, p.202-209, March 22-27, 1997, Atlanta, Georgia, United States
[doi> 10.1145/258549.258704]
|
| |
28
|
Nelson Morgan , Don Baron , Jane Edwards , Dan Ellis , David Gelbart , Adam Janin , Thilo Pfau , Elizabeth Shriberg , Andreas Stolcke, The meeting project at ICSI, Proceedings of the first international conference on Human language technology research, p.1-7, March 18-21, 2001, San Diego
[doi> 10.3115/1072133.1072203]
|
| |
29
|
G. Murray, S. Renals, and J. Carletta, "Extractive summarization of meeting recordings," in Proc. European Conf. on Speech Communication and Technology (Eurospeech), Lisbon, Sep. 2005.
|
| |
30
|
A. Popescu-Belis and D. Lalanne, "Detection and resolution of references to meeting documents," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.
|
| |
31
|
S. Renals and D. Ellis, "Audio information access from meeting rooms," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Hong Kong, 2003.
|
| |
32
|
R. Rienks and D. Heylen, "Automatic dominance detection in meetings using support vector machines," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.
|
| |
33
|
E. Shriberg, R. Dhillon, S. Bhagat, J. Ang, and H. Carvey, "The ICSI meeting recorder dialog act (MRDA) corpus," in Proc. HLT-NAACL SIGDIAL Workshop, Boston, Apr. 2004.
|
| |
34
|
E. Shriberg, "Spontaneous speech: How people really talk and why engineers should care," in Proc. European Conf. on Speech Communication and Technology (Eurospeech), Lisbon, Sep. 2005.
|
| |
35
|
V. Stanford, J. Garofolo, and M. Michel, "The nist smart space and meeting room projects: Signals, acquisition, annotation, and metrics," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Hong Kong, 2003.
|
 |
36
|
Norbert A. Streitz , Jörg Geißler , Jörg M. Haake , Jeroen Hol, DOLPHIN: integrated meeting support across local and remote desktop environments and LiveBoards, Proceedings of the 1994 ACM conference on Computer supported cooperative work, p.345-358, October 22-26, 1994, Chapel Hill, North Carolina, United States
[doi> 10.1145/192844.193044]
|
| |
37
|
R. Stiefelhagen, J. Yang, and A. Waibel, "Modeling focus of attention for meeting indexing based on multiple cues," IEEE IEEE Trans. on Neural Networks, vol. 13, no. 4, pp. 928--938, 2002.
|
| |
38
|
S. Tucker and S. Whittaker, "Accessing multimodal meeting data: Systems, problems and possibilities," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Martigny, Jun. 2004.
|
| |
39
|
A. Vinciarelli and J.-M. Odobez, "Application of information retrieval techniques to presentation slides," IEEE Trans. on Multimedia, 2005, in press.
|
| |
40
|
A. Waibel, M. Bett, F. Metze, K. Ries, T. Schaaf, T. Schultz, H. Soltau, H. Yu, and K. Zechner, "Advances in automatic meeting record creation and access," in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Salt Lake City, May 2001.
|
 |
41
|
Steve Whittaker , Patrick Hyland , Myrtle Wiley, Filochat: handwritten notes provide access to recorded conversations, Conference companion on Human factors in computing systems, p.219, April 24-28, 1994, Boston, Massachusetts, United States
[doi> 10.1145/259963.260380]
|
| |
42
|
S. Whittaker, R. Laban, and S. Tucker, "Analysing meeting records: an ethnographic study and technological implications," in Proc. Workshop on Machine Learning for Multimodal Interaction (MLMI), Edinburgh, Jul. 2005.
|
| |
43
|
B. Wrede and E. Shriberg, "Spotting hotspots in meetings: Human judgments and prosodic cues," in Proc. European Conf. on Speech Communication and Technology (Eurospeech), Geneva, Sep. 2003.
|
| |
44
|
B. Wrede and E. Shriberg, "The relationship between dialogue acts and hot spots in meetings," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Virgin Islands, Dec. 2003.
|
| |
45
|
|
 |
46
|
Dong Zhang , Daniel Gatica-Perez , Samy Bengio , Iain McCowan , Guillaume Lathoud, Multimodal group action clustering in meetings, Proceedings of the ACM 2nd international workshop on Video surveillance & sensor networks, October 15-15, 2004, New York, NY, USA
[doi> 10.1145/1026799.1026810]
|
| |
47
|
D. Zhang, D. Gatica-Perez, S. Bengio, and I. McCowan, "Modeling individual and group actions in meetings with layered HMMs," IEEE Trans. on Multimedia, 2005, in press.
|
| |
48
|
D. Zhang, D. Gatica-Perez, S. Bengio, and D. Roy, "Learning Influence among Interacting Markov Chains," in Proc. Advances in Neural Information Processing Systems (NIPS 18), Vancouver, Dec. 2005.
|
| |
49
|
Augmented Multi-Party Interaction~(AMI) project, www.amiproject.org.
|
| |
50
|
Interactive Multimodal Information Management~(IM2) project, www.im2.ch.
|
| |
51
|
MultiModal Meeting Manager~(M4) project, www.m4project.org.
|
| |
52
|
AMI project, "Use cases and user requirements," Public deliverable D6.2, Apr. 2005.
|
| |
53
|
NIST, Proc. Rich Transcription 2005 Spring Meeting Recognition Evaluation Workshop, Edinburgh, Jul. 2005.
|
|