skip to main content
10.1145/1180995.1181040acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

Combining audio and video to predict helpers' focus of attention in multiparty remote collaboration on physical tasks

Published: 02 November 2006 Publication History

Abstract

The increasing interest in supporting multiparty remote collaboration has created both opportunities and challenges for the research community. The research reported here aims to develop tools to support multiparty remote collaborations and to study human behaviors using these tools. In this paper we first introduce an experimental multimedia (video and audio) system with which an expert can collaborate with several novices. We then use this system to study helpers' focus of attention (FOA) during a collaborative circuit assembly task. We investigate the relationship between FOA and language as well as activities using multimodal (audio and video) data, and use learning methods to predict helpers' FOA. We process different modalities separately and fusion the results to make a final decision. We employ a sliding window-based delayed labeling method to automatically predict changes in FOA in real time using only the dialogue among the helper and workers. We apply an adaptive background subtraction method and support vector machine to recognize the worker's activities from the video. To predict the helper's FOA, we make decisions using the information of joint project boundaries and workers' recent activities. The overall prediction accuracies are 79.52% using audio only and 81.79% using audio and video combined.

References

[1]
Argyle, M. & Cook, M. (1976). Gaze and Mutual Gaze. Cambridge University Press.
[2]
Augmented Multi-party Interaction, Recognition of Attentional Cues in Meetings, State-of-the-art overview, Annual Reports.
[3]
Bangerter, A., Clark, H. H., & Katz, A. R. Navigating joint projects in telephone conversations. Discourse Processes, 37, 1--23.
[4]
Beeferman, D., Berger, A., & Lafferty, J., Statistical models for text segmentation. Machine Learning, 34, 177--210.
[5]
Campbell, C. S. & Maglio, P. P. (2001). A robust algorithm for reading detection. In Proceedings of PUI '01.
[6]
Chang, C. & Lin, C., (2001). LIBSVM : a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
[7]
Clark, H. H. (1996). Using language. Cambridge, UK: Cambridge University Press.
[8]
Fussell, S. R., Kraut, R. E., & Siegel, J. (2000). Coordination of communication: Effects of shared visual context on collaborative work. Proceedings of CSCW 2000 (pp. 21--30). NY: ACM Press.
[9]
Galley, M., McKeown, K., Fosler-Lussier, E., Jing, H. (2003). Discourse segmentation of multi-party conversation. Proceedings of ACL-03.
[10]
Ivanovic., E., (2005). Automatic utterance segmentation in Instant Messaging dialogue. Proceedings of the Australasian Language Technology Workshop (pp. 241--249).
[11]
Kraut, R. E., Gergle, D., & Fussell, S. R. (2002). The use of visual information in shared visual spaces: Informing the development of virtual co-presence. Proceedings of CSCW 2002 (pp. 31--40). NY: ACM Press.
[12]
Otsuka, K., Takemae, Y., Yamato, J., & Murase, H. (2005). A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances. In Proceedings of ICMI 2005.
[13]
Ou, J., Fussell, S. R., Chen, X., Setlock, L. D., & Yang, J. (2003). Gestural communication over video stream: Supporting multimodal interaction for remote collaborative physical tasks. In Proceedings of ICMI 2003.
[14]
Ou, J., Oh, L. M., Fussell, S. R., Blum, T., & Yang, J. (2005). Analyzing and predicting focus of attention in remote collaborative tasks. Proceedings of ICMI'05.
[15]
Ou, J., Oh, L. M., Yang, J., & Fussell, S. R., (2005). Effects of task properties, partner actions, and message content on eye gaze patterns in a collaborative task. Proceedings of CHI-2005.
[16]
Pevzner, L. & Hearst, M. A., A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 28(1):19--36.
[17]
Salvucci D. (1999). Inferring intent in eye-based interfaces: tracing eye movements with process models. Proceedings of CHI 1999.
[18]
Stiefelhagen, R., Yang, J., & Waibel, A. (2002). Modeling focus of attention for meeting indexing based on multiple cues. IEEE Transactions on Neural Networks, 13, 928--938.
[19]
Vertegaal, R., Slagter, R., van der Veer, G., & Nijholt, A. (2001). Eye gaze patterns in conversations: There is more to conversational agents than meets the eyes. Proceedings of CHI 2001 (pp. 301--308). NY: ACM Press.

Cited By

View all
  • (2023)A Survey of Computer-Supported Remote Collaboration on Physical ObjectsHuman-Computer Interaction – INTERACT 202310.1007/978-3-031-42286-7_19(346-368)Online publication date: 25-Aug-2023
  • (2018)Integrated Deep Learning Structures for Hand Gesture Recognition13th International Conference on Theory and Application of Fuzzy Systems and Soft Computing — ICAFS-201810.1007/978-3-030-04164-9_19(129-136)Online publication date: 29-Dec-2018
  • (2017)Video-Mediated Peer Support in an Online Community for Recovery from Substance Use DisordersProceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing10.1145/2998181.2998246(1454-1469)Online publication date: 25-Feb-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces
November 2006
404 pages
ISBN:159593541X
DOI:10.1145/1180995
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. computer-supported cooperative work
  2. focus of attention
  3. multimodal integration
  4. remote collaborative physical tasks

Qualifiers

  • Article

Conference

ICMI06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)A Survey of Computer-Supported Remote Collaboration on Physical ObjectsHuman-Computer Interaction – INTERACT 202310.1007/978-3-031-42286-7_19(346-368)Online publication date: 25-Aug-2023
  • (2018)Integrated Deep Learning Structures for Hand Gesture Recognition13th International Conference on Theory and Application of Fuzzy Systems and Soft Computing — ICAFS-201810.1007/978-3-030-04164-9_19(129-136)Online publication date: 29-Dec-2018
  • (2017)Video-Mediated Peer Support in an Online Community for Recovery from Substance Use DisordersProceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing10.1145/2998181.2998246(1454-1469)Online publication date: 25-Feb-2017
  • (2016)A sliding window approach to natural hand gesture recognition using a custom data glove2016 IEEE Symposium on 3D User Interfaces (3DUI)10.1109/3DUI.2016.7460035(81-90)Online publication date: Mar-2016
  • (2014)Eye Tracking: A Brief IntroductionWays of Knowing in HCI10.1007/978-1-4939-0378-8_13(323-348)Online publication date: 20-Mar-2014
  • (2011)A study of gestures in a video-mediated collaborative assembly taskAdvances in Human-Computer Interaction10.1155/2011/9878302011(1-7)Online publication date: 1-Jan-2011
  • (2008)Integrating vision and audition within a cognitive architecture to track conversationsProceedings of the 3rd ACM/IEEE international conference on Human robot interaction10.1145/1349822.1349849(201-208)Online publication date: 12-Mar-2008
  • (2008)Predicting Visual Focus of Attention From Intention in Remote Collaborative TasksIEEE Transactions on Multimedia10.1109/TMM.2008.200136310:6(1034-1045)Online publication date: 1-Oct-2008
  • (2007)Sharing a single expert among multiple partnersProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1240624.1240668(261-270)Online publication date: 29-Apr-2007
  • (2007)Toward understanding the importance of gesture in distributed scientific collaborationKnowledge and Information Systems10.1007/s10115-006-0062-213:2(143-171)Online publication date: 8-Feb-2007

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media