skip to main content
10.1145/1180995.1181039acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

Prototyping novel collaborative multimodal systems: simulation, data collection and analysis tools for the next decade

Published: 02 November 2006 Publication History

Abstract

To support research and development of next-generation multimodal interfaces for complex collaborative tasks, a comprehensive new infrastructure has been created for collecting and analyzing time-synchronized audio, video, and pen-based data during multi-party meetings. This infrastructure needs to be unobtrusive and to collect rich data involving multiple information sources of high temporal fidelity to allow the collection and annotation of simulation-driven studies of natural human-human-computer interactions. Furthermore, it must be flexibly extensible to facilitate exploratory research. This paper describes both the infrastructure put in place to record, encode, playback and annotate the meeting-related media data, and also the simulation environment used to prototype novel system concepts.

References

[1]
Banerjee, S., Rose, C., & Rudnicky, A. I. The necessity of a meeting recording and playback system and the benefit of topic-level annotations to meeting browsing. In Proceedings of the IFIP Interact'05: Human-Computer Interaction, 2005 (Rome, Italy). 3585, Springer Berlin/Heidelberg: 643--656.
[2]
Dielmann, A. & Renals, S. Dynamic bayesian networks for meeting structuring. In Proceedings of ICASSP '04. 2004 (Montreal, Canada). 5, IEEE: V-629-32 vol.5.
[3]
Falcon, V., Leonardi, C., Pianesi, F., Tomasini, D., & Zancanaro, M. Co-located support for small group meetings. Workshop on The Virtuality Continuum Revisited Workshop held in conjunction with Computer-Human Interaction CHI2005 Conference, 2005 (Portland, OR).
[4]
Gatica-Perez, D., Lathoud, G., Odobez, J.-M., & Mccowan, I. Multimodal multispeaker probabilistic tracking in meetings. In Proceedings of the 7th International Conference on Multimodal Interfaces, 2005 (Trento, Italy). ACM Press: 183--190.
[5]
Gatica-Perez, D., Mccowan, I., Zhang, D., & Bengio, S. Detecting group interest-level in meetings. In Proceedings of ICASSP '05. 2005 (Philadelphia, PA). 1, IEEE: 489--492.
[6]
Gatica-Perez, D., Odobez, J.-M., Ba, S., Smith, K., & Lathoud, G. Tracking people in meetings with particles. In Proceedings of the International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS), 2005, Instituto Superior Tecnico, Lisbon.
[7]
Gruenstein, A., Niekrasz, J., & Purver, M. Meeting structure annotation: Data and tools. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue, 2005 (Lisbon, Portugal). 117--127.
[8]
Huang, X., Oviatt, S., & Lunsford, R. Combining user modeling and machine learning to predict users' multimodal integration patterns. In Proceedings of the 3rd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms, 2006 (Washington DC, USA).
[9]
Katzenmaier, M., Stiefelhagen, R., & Schultz, T. Identifying the addressee in human-human-robot interactions based on head pose and speech. In Proceedings of the 6th International Conference on Multimodal Interfaces, 2004 (State College, PA, USA). ACM Press: 144--151.
[10]
Kumar, S., Cohen, P. R., & Levesque, H. J. The adaptive agent architecture: Achieving fault-tolerance using persistent broker teams. In Proceedings of the Fourth International Conference on Multi-Agent Systems (ICMAS 2000), 2000 (Boston, MA, USA). IEEE Press: 159--166.
[11]
Kuperus, J. The effect of agents on meetings. In Proceedings of the The 4th Twente Student Conference on IT, 2006 (Enschede, The Netherlands). Twente University Press.
[12]
Lunsford, R., Oviatt, S., & Arthur, A. Toward open-microphone engagement for multiparty field interactions. In press, ICMI 2006.
[13]
Lunsford, R. & Oviatt, S. Human perception of intended addressee in multiparty meetings. In press, ICMI 2006.
[14]
Macho, D., Padrell, J., Abad, A., Nadeu, C., Hernando, J., Mcdonough, J., Wölfel, M., Klee, U., Omologo, M., Brutti, A., Svaizer, P., Potamianos, G., & Chu, S. M. Automatic speech activity detection, source localization, and speech recognition on the CHIL seminar corpus. In Proceedings of the IEEE International Conference on Multimedia and Expo, 2005. ICME 2005., 2005 (Amsterdam, The Netherlands). IEEE: 876--879.
[15]
Martin, J.-C. & Kipp, M. Annotating and measuring multimodal behaviour - tycoon metrics in the anvil tool. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC 2002), 2002 (Las Palmas, Canary Islands, Spain).
[16]
Mccowan, I., Gatica-Perez, D., Bengio, S., Lathoud, G., Barnard, M., & Zhang, D. Automatic analysis of multimodal group actions in meetings. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005 27, IEEE: 305--317.
[17]
Michel, M. & Stanford, V. Synchronizing multimodal data streams acquired using commodity hardware. In submission.
[18]
Milde, J.-T. & Gut, U. The TASX-environment: An xml-based corpus database for time aligned language data. In Proceedings of the IRCS Workshop on Linguistic Databases, 2001 (Pennsylvania, Philadelphia). 174--180.
[19]
Moore, D., The IDIAP smart meeting room. IDIAP-COM 02-07, 2002.
[20]
NIST smart space project. http://www.nist.gov/smartspace/
[21]
Oviatt, S., Cohen, P., Fong, M., & Frank, M. A rapid semi-automatic simulation technique for investigating interactive speech and handwriting. In Proceedings of the International Conference on Spoken Language Processing, 1992 (University of Alberta). 2, Quality Color Press, Edmonton, Canada: 1351--1354.
[22]
Oviatt, S. L., Coulston, R., Tomko, S., Xiao, B., Lunsford, R., Wesson, M., & Carmichael, L. Toward a theory of organized multimodal integration patterns during human-computer interaction. In Proceedings of the International Conference on Multimodal Interfaces, 2003 (Vancouver, BC). ACM Press: 44--51.
[23]
Reiter, S., Schreiber, S., & Rigoll, G. Multimodal meeting analysis by segmentation and classification of meeting events based on a higher level semantic approach. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). 2005 (Philadelphia, PA). 2, IEEE: 161--164.
[24]
Rienks, R., Nijholt, A., & Barthelmess, P. Pro-active meeting assistants: Attention please. In Proceedings of the 5th Workshop on Social Intelligence Design, 2006 (Osaka, Japan).
[25]
Stanford, V., Garofolo, J., Galibert, O., Michel, M., & Laprun, C. The NIST smart space and meeting room projects: Signals, acquisition annotation, and metrics. In Proceedings of ICASSP '03, 2003 4, IEEE: IV-736-9 vol.4.
[26]
Tucker, S. & Whittaker, S. Novel techniques for time-compressing speech: An exploratory study. In Proceedings of ICASSP '05. 2005 (Philadelphia, PA). 1, IEEE: 477--480.
[27]
White, R. L. & Percival, J. W. Compression and progressive transmission of astronomical images. The International Society for Optical Engineering, 2199: 703--713.
[28]
Wölfel, M., Nickel, K., & Mcdonough, J. Microphone array driven speech recognition: Influence of localization on the word error rate. Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms, 2005 (Edinburgh). 3869: 320--331.

Cited By

View all
  • (2022)A Comprehensive Literature Review on Children’s Databases for Machine Learning ApplicationsIEEE Access10.1109/ACCESS.2022.314600810(12262-12285)Online publication date: 2022
  • (2021)I Know What You Know: What Hand Movements Reveal about Domain ExpertiseACM Transactions on Interactive Intelligent Systems10.1145/342304911:1(1-26)Online publication date: 15-Mar-2021
  • (2019)Dynamic Adaptive Gesturing Predicts Domain Expertise in Mathematics2019 International Conference on Multimodal Interaction10.1145/3340555.3353726(105-113)Online publication date: 14-Oct-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces
November 2006
404 pages
ISBN:159593541X
DOI:10.1145/1180995
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. annotation tools
  2. data collection infrastructure
  3. meeting
  4. multi-party
  5. multimodal interfaces
  6. simulation studies
  7. synchronized media

Qualifiers

  • Article

Conference

ICMI06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)A Comprehensive Literature Review on Children’s Databases for Machine Learning ApplicationsIEEE Access10.1109/ACCESS.2022.314600810(12262-12285)Online publication date: 2022
  • (2021)I Know What You Know: What Hand Movements Reveal about Domain ExpertiseACM Transactions on Interactive Intelligent Systems10.1145/342304911:1(1-26)Online publication date: 15-Mar-2021
  • (2019)Dynamic Adaptive Gesturing Predicts Domain Expertise in Mathematics2019 International Conference on Multimodal Interaction10.1145/3340555.3353726(105-113)Online publication date: 14-Oct-2019
  • (2018)Dynamic Handwriting Signal Features Predict Domain ExpertiseACM Transactions on Interactive Intelligent Systems10.1145/32133098:3(1-21)Online publication date: 24-Jul-2018
  • (2018)Multimodal learning analyticsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3107990.3108003(331-374)Online publication date: 1-Oct-2018
  • (2017)Multimodal speech and pen interfacesThe Handbook of Multimodal-Multisensor Interfaces10.1145/3015783.3015795(403-447)Online publication date: 24-Apr-2017
  • (2016)Optimal Modality Selection for Cooperative Human–Robot Task CompletionIEEE Transactions on Cybernetics10.1109/TCYB.2015.250698546:12(3388-3400)Online publication date: Dec-2016
  • (2015)The Paradigm Shift to Multimodality in Contemporary Computer InterfacesSynthesis Lectures on Human-Centered Informatics10.2200/S00636ED1V01Y201503HCI0308:3(1-243)Online publication date: 13-Apr-2015
  • (2015)Spoken Interruptions Signal Productive Problem Solving and Domain Expertise in MathematicsProceedings of the 2015 ACM on International Conference on Multimodal Interaction10.1145/2818346.2820743(311-318)Online publication date: 9-Nov-2015
  • (2015)Tool Design JamProceedings of the 2015 Annual Symposium on Computer-Human Interaction in Play10.1145/2793107.2810263(827-831)Online publication date: 5-Oct-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media