Article

Combining audio and video to predict helpers' focus of attention in multiparty remote collaboration on physical tasks

Authors:

Susan R. Fussell,

Jie YangAuthors Info & Claims

ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

Pages 217 - 224

https://doi.org/10.1145/1180995.1181040

Published: 02 November 2006 Publication History

Abstract

The increasing interest in supporting multiparty remote collaboration has created both opportunities and challenges for the research community. The research reported here aims to develop tools to support multiparty remote collaborations and to study human behaviors using these tools. In this paper we first introduce an experimental multimedia (video and audio) system with which an expert can collaborate with several novices. We then use this system to study helpers' focus of attention (FOA) during a collaborative circuit assembly task. We investigate the relationship between FOA and language as well as activities using multimodal (audio and video) data, and use learning methods to predict helpers' FOA. We process different modalities separately and fusion the results to make a final decision. We employ a sliding window-based delayed labeling method to automatically predict changes in FOA in real time using only the dialogue among the helper and workers. We apply an adaptive background subtraction method and support vector machine to recognize the worker's activities from the video. To predict the helper's FOA, we make decisions using the information of joint project boundaries and workers' recent activities. The overall prediction accuracies are 79.52% using audio only and 81.79% using audio and video combined.

References

[1]

Argyle, M. & Cook, M. (1976). Gaze and Mutual Gaze. Cambridge University Press.

[2]

Augmented Multi-party Interaction, Recognition of Attentional Cues in Meetings, State-of-the-art overview, Annual Reports.

[3]

Bangerter, A., Clark, H. H., & Katz, A. R. Navigating joint projects in telephone conversations. Discourse Processes, 37, 1--23.

[4]

Beeferman, D., Berger, A., & Lafferty, J., Statistical models for text segmentation. Machine Learning, 34, 177--210.

Digital Library

[5]

Campbell, C. S. & Maglio, P. P. (2001). A robust algorithm for reading detection. In Proceedings of PUI '01.

Digital Library

[6]

Chang, C. & Lin, C., (2001). LIBSVM : a library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[7]

Clark, H. H. (1996). Using language. Cambridge, UK: Cambridge University Press.

[8]

Fussell, S. R., Kraut, R. E., & Siegel, J. (2000). Coordination of communication: Effects of shared visual context on collaborative work. Proceedings of CSCW 2000 (pp. 21--30). NY: ACM Press.

Digital Library

[9]

Galley, M., McKeown, K., Fosler-Lussier, E., Jing, H. (2003). Discourse segmentation of multi-party conversation. Proceedings of ACL-03.

Digital Library

[10]

Ivanovic., E., (2005). Automatic utterance segmentation in Instant Messaging dialogue. Proceedings of the Australasian Language Technology Workshop (pp. 241--249).

[11]

Kraut, R. E., Gergle, D., & Fussell, S. R. (2002). The use of visual information in shared visual spaces: Informing the development of virtual co-presence. Proceedings of CSCW 2002 (pp. 31--40). NY: ACM Press.

Digital Library

[12]

Otsuka, K., Takemae, Y., Yamato, J., & Murase, H. (2005). A probabilistic inference of multiparty-conversation structure based on Markov-switching models of gaze patterns, head directions, and utterances. In Proceedings of ICMI 2005.

Digital Library

[13]

Ou, J., Fussell, S. R., Chen, X., Setlock, L. D., & Yang, J. (2003). Gestural communication over video stream: Supporting multimodal interaction for remote collaborative physical tasks. In Proceedings of ICMI 2003.

Digital Library

[14]

Ou, J., Oh, L. M., Fussell, S. R., Blum, T., & Yang, J. (2005). Analyzing and predicting focus of attention in remote collaborative tasks. Proceedings of ICMI'05.

Digital Library

[15]

Ou, J., Oh, L. M., Yang, J., & Fussell, S. R., (2005). Effects of task properties, partner actions, and message content on eye gaze patterns in a collaborative task. Proceedings of CHI-2005.

Digital Library

[16]

Pevzner, L. & Hearst, M. A., A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, 28(1):19--36.

Digital Library

[17]

Salvucci D. (1999). Inferring intent in eye-based interfaces: tracing eye movements with process models. Proceedings of CHI 1999.

Digital Library

[18]

Stiefelhagen, R., Yang, J., & Waibel, A. (2002). Modeling focus of attention for meeting indexing based on multiple cues. IEEE Transactions on Neural Networks, 13, 928--938.

Digital Library

[19]

Vertegaal, R., Slagter, R., van der Veer, G., & Nijholt, A. (2001). Eye gaze patterns in conversations: There is more to conversational agents than meets the eyes. Proceedings of CHI 2001 (pp. 301--308). NY: ACM Press.

Digital Library

Cited By

Letter MGeorge CWolf K(2023)A Survey of Computer-Supported Remote Collaboration on Physical ObjectsHuman-Computer Interaction – INTERACT 202310.1007/978-3-031-42286-7_19(346-368)Online publication date: 25-Aug-2023
https://doi.org/10.1007/978-3-031-42286-7_19
Korkmaz S(2018)Integrated Deep Learning Structures for Hand Gesture Recognition13th International Conference on Theory and Application of Fuzzy Systems and Soft Computing — ICAFS-201810.1007/978-3-030-04164-9_19(129-136)Online publication date: 29-Dec-2018
https://doi.org/10.1007/978-3-030-04164-9_19
Rubya SYarosh SLee CPoltrock SBarkhuus LBorges MKellogg W(2017)Video-Mediated Peer Support in an Online Community for Recovery from Substance Use DisordersProceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing10.1145/2998181.2998246(1454-1469)Online publication date: 25-Feb-2017
https://dl.acm.org/doi/10.1145/2998181.2998246
Show More Cited By

Index Terms

Combining audio and video to predict helpers' focus of attention in multiparty remote collaboration on physical tasks
1. Human-centered computing
  1. Collaborative and social computing
    1. Collaborative and social computing theory, concepts and paradigms
      1. Computer supported cooperative work
2. Social and professional topics
  1. Professional topics
    1. Computing and business
      1. Computer supported cooperative work

Recommendations

Analyzing and predicting focus of attention in remote collaborative tasks
ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces

To overcome the limitations of current technologies for remote collaboration, we propose a system that changes a video feed based on task properties, people's actions, and message properties. First, we examined how participants manage different visual ...
Gestural communication over video stream: supporting multimodal interaction for remote collaborative physical tasks
ICMI '03: Proceedings of the 5th international conference on Multimodal interfaces

We present a system integrating gesture and live video to support collaboration on physical tasks. The architecture combines network IP cameras, desktop PCs, and tablet PCs to allow a remote helper to draw on a video feed of a workspace as he/she ...
Predicting Visual Focus of Attention From Intention in Remote Collaborative Tasks

While shared visual space plays a very important role in remote collaboration on physical tasks, it is challenging and expensive to track users' focus of attention (FOA) during these tasks. In this paper, we propose to identify a user's FOA from his/her ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

November 2006

404 pages

ISBN:159593541X

DOI:10.1145/1180995

General Chairs:
Francis Quek
Virginia Tech, USA
,
Jie Yang
Carnegie Mellon University, USA
,
Program Chairs:
Dominic Massaro
University of California, Santa Cruz, USA
,
Abeer Alwan
University of California, Los Angeles, USA
,
Timothy J. Hazen
Massachusetts Institute of Technology, USA

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICMI06

Sponsor:

ICMI06: 8th International Conference on Multimodal Interfaces 2006

November 2 - 4, 2006

Alberta, Banff, Canada

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
303
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Letter MGeorge CWolf K(2023)A Survey of Computer-Supported Remote Collaboration on Physical ObjectsHuman-Computer Interaction – INTERACT 202310.1007/978-3-031-42286-7_19(346-368)Online publication date: 25-Aug-2023
https://doi.org/10.1007/978-3-031-42286-7_19
Korkmaz S(2018)Integrated Deep Learning Structures for Hand Gesture Recognition13th International Conference on Theory and Application of Fuzzy Systems and Soft Computing — ICAFS-201810.1007/978-3-030-04164-9_19(129-136)Online publication date: 29-Dec-2018
https://doi.org/10.1007/978-3-030-04164-9_19
Rubya SYarosh SLee CPoltrock SBarkhuus LBorges MKellogg W(2017)Video-Mediated Peer Support in an Online Community for Recovery from Substance Use DisordersProceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing10.1145/2998181.2998246(1454-1469)Online publication date: 25-Feb-2017
https://dl.acm.org/doi/10.1145/2998181.2998246
Luzhnica GSimon JLex EPammer V(2016)A sliding window approach to natural hand gesture recognition using a custom data glove2016 IEEE Symposium on 3D User Interfaces (3DUI)10.1109/3DUI.2016.7460035(81-90)Online publication date: Mar-2016
https://doi.org/10.1109/3DUI.2016.7460035
Navalpakkam VChurchill E(2014)Eye Tracking: A Brief IntroductionWays of Knowing in HCI10.1007/978-1-4939-0378-8_13(323-348)Online publication date: 20-Mar-2014
https://doi.org/10.1007/978-1-4939-0378-8_13
Alem LLi J(2011)A study of gestures in a video-mediated collaborative assembly taskAdvances in Human-Computer Interaction10.1155/2011/9878302011(1-7)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.1155/2011/987830
Trafton JBugajska MFransen BRatwani RFong TDautenhahn KScheutz MDemiris Y(2008)Integrating vision and audition within a cognitive architecture to track conversationsProceedings of the 3rd ACM/IEEE international conference on Human robot interaction10.1145/1349822.1349849(201-208)Online publication date: 12-Mar-2008
https://dl.acm.org/doi/10.1145/1349822.1349849
Ou JOh LFussell SBlum TYang J(2008)Predicting Visual Focus of Attention From Intention in Remote Collaborative TasksIEEE Transactions on Multimedia10.1109/TMM.2008.200136310:6(1034-1045)Online publication date: 1-Oct-2008
https://dl.acm.org/doi/10.1109/TMM.2008.2001363
Wong JOh LOu JRosé CYang JFussell SRosson MGilmore D(2007)Sharing a single expert among multiple partnersProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1240624.1240668(261-270)Online publication date: 29-Apr-2007
https://dl.acm.org/doi/10.1145/1240624.1240668
Corrie BStorey M(2007)Toward understanding the importance of gesture in distributed scientific collaborationKnowledge and Information Systems10.1007/s10115-006-0062-213:2(143-171)Online publication date: 8-Feb-2007
https://doi.org/10.1007/s10115-006-0062-2

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten