skip to main content
10.1145/1088463.1088470acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

Contextual recognition of head gestures

Published: 04 October 2005 Publication History

Abstract

Head pose and gesture offer several key conversational grounding cues and are used extensively in face-to-face interaction among people. We investigate how dialog context from an embodied conversational agent (ECA) can improve visual recognition of user gestures. We present a recognition framework which (1) extracts contextual features from an ECA's dialog manager, (2) computes a prediction of head nod and head shakes, and (3) integrates the contextual predictions with the visual observation of a vision-based head gesture recognizer. We found a subset of lexical, punctuation and timing features that are easily available in most ECA architectures and can be used to learn how to predict user feedback. Using a discriminative approach to contextual prediction and multi-modal integration, we were able to improve the performance of head gesture detection even when the topic of the test set was significantly different than the training set.

References

[1]
Tim Bickmore and Justine Cassell. J. van Kuppevelt, L. Dybkjaer, and N. Bernsen (eds.), Natural, Intelligent and Effective Interaction with Multimodal Dialogue Systems, chapter Social Dialogue with Embodied Conversational Agents. Kluwer Academic, 2004.
[2]
Breazeal, Hoffman, and A. Lockerd. Teaching and working with robots as a collaboration. In The Third International Conference on Autonomous Agents and Multi-Agent Systems AAMAS 2004, pages 1028--1035. ACM Press, July 2004.
[3]
De Carolis, Pelachaud, Poggi, and F. de Rosis. Behavior planning for a reflexive agent. In Proceedings of IJCAI, Seattle, September 2001.
[4]
Justine Cassell and Kristinn R. Thorisson. The poser of a nod and a glance: Envelope vs. emotional feedback in animated conversational agents. Applied Artificial Intelligence, 1999.
[5]
Dillman, Becher, and P. Steinhaus. ARMAR II -- a learning and cooperative multimodal humanoid robot system. International Journal of Humanoid Robotics, 1(1):143--155, 2004.
[6]
Dillman, Ehrenmann, Steinhaus, Rogalla, and R. Zoellner. Human friendly programming of humanoid robots--the German Collaborative Research Center. In The Third IARP Intenational Workshop on Humanoid and Human-Friendly Robotics, Tsukuba Research Centre, Japan, December 2002.
[7]
Shinya Fujie, Yasuhi Ejiri, Kei Nakajima, Yosuke Matsusaka, and Tetsunori Kobayashi. A conversation robot using head gesture recognition as para-linguistic information. In Proceedings of 13th IEEE International Workshop on Robot and Human Communication, RO-MAN 2004, pages 159--164, September 2004.
[8]
A. Kapoor and R. Picard. A real-time head nod and shake detector. In Proceedings from the Workshop on Perspective User Interfaces, November 2001.
[9]
Lemon, Gruenstein, and Stanley Peters. Collaborative activities and multi-tasking in dialogue systems. Traitement Automatique des Langues (TAL), special issue on dialogue, 43(2):131--154, 2002.
[10]
Ali Rahimi Louis-Philippe Morency and Trevor Darrell. Adaptive view-based appearance model. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition, volume 1, pages 803--810, 2003.
[11]
L.-P. Morency, A. Rahimi, N. Checka, and T. Darrell. Fast stereo-based head tracking for interactive environment. In Proceedings of the Int. Conference on Automatic Face and Gesture Recognition, pages 375--380, 2002.
[12]
Nakano, Reinstein, Stocky, and Justine Cassell. Towards a model of face-to-face grounding. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, July 2003.
[13]
Rich, Sidner, and Neal Lesh. Collagen: Applying collaborative discourse theory to human--computer interaction. AI Magazine, Special Issue on Intelligent User Interfaces, 22(4):15--25, 2001.
[14]
C. Sidner, C. Lee, C.D.Kidd, N. Lesh, and C. Rich. Explorations in engagement for humans and robots. Artificial Intelligence, 166(1--2):140--164, August 2005.
[15]
M. Siracusa, L.-P. Morency, K. Wilson, J. Fisher, and T. Darrell. Haptics and biometrics: A multimodal approach for determining speaker location and focus. In Proceedings of the 5th International Conference on Multimodal Interfaces, November 2003.
[16]
R. Stiefelhagen. Tracking focus of attention in meetings. In Proceedings of International Conference on Multimodal Interfaces, 2002.
[17]
Y. Takemae, K. Otsuka, and N. Mukaua. Impact of video editing based on participants' gaze in multiparty conversation. In Extended Abstract of CHI'04, April 2004.
[18]
A. Torralba, K. P. Murphy, W. T. Freeman, and M. A. Rubin. Context-based vision system for place and object recognition. In IEEE Intl. Conference on Computer Vision (ICCV), Nice, France, October 2003.
[19]
D. Traum and J. Rickel. Embodied agents for multi-party dialogue in immersive virtual world. In Proceedings of the International Joint Conference on Autonomous Agents and Multi-agent Systems (AAMAS 2002), pages 766--773, July 2002.

Cited By

View all
  • (2024)CCDb-HG: Novel Annotations and Gaze-Aware Representations for Head Gesture Recognition2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG)10.1109/FG59268.2024.10581954(1-9)Online publication date: 27-May-2024
  • (2024)Effect of Robot Head Movement and its Timing on Human-Robot InteractionInternational Journal of Social Robotics10.1007/s12369-024-01196-017:1(3-14)Online publication date: 17-Dec-2024
  • (2023)HeadarProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36109007:3(1-28)Online publication date: 27-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces
October 2005
344 pages
ISBN:1595930280
DOI:10.1145/1088463
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 October 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. context-based recognition
  2. dialog context
  3. embodied conversational agent
  4. head gestures
  5. human-computer interaction

Qualifiers

  • Article

Conference

ICMI05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)3
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)CCDb-HG: Novel Annotations and Gaze-Aware Representations for Head Gesture Recognition2024 IEEE 18th International Conference on Automatic Face and Gesture Recognition (FG)10.1109/FG59268.2024.10581954(1-9)Online publication date: 27-May-2024
  • (2024)Effect of Robot Head Movement and its Timing on Human-Robot InteractionInternational Journal of Social Robotics10.1007/s12369-024-01196-017:1(3-14)Online publication date: 17-Dec-2024
  • (2023)HeadarProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36109007:3(1-28)Online publication date: 27-Sep-2023
  • (2023)Enabling Voice-Accompanying Hand-to-Face Gesture Recognition with Cross-Device SensingProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581008(1-17)Online publication date: 19-Apr-2023
  • (2023)An Integrated Model for Automated Identification and Learning of Conversational Gestures in Human–Robot InteractionCutting Edge Applications of Computational Intelligence Tools and Techniques10.1007/978-3-031-44127-1_3(33-61)Online publication date: 1-Dec-2023
  • (2022)Modeling Feedback in Interaction With Conversational Agents—A ReviewFrontiers in Computer Science10.3389/fcomp.2022.7445744Online publication date: 15-Mar-2022
  • (2022)A Meta-Analytic Review on Embodied Pedagogical Agent Design and Testing FormatsJournal of Educational Computing Research10.1177/07356331221100556(073563312211005)Online publication date: 23-May-2022
  • (2022)Platforms and Tools for SIA Research and DevelopmentThe Handbook on Socially Interactive Agents10.1145/3563659.3563668(261-304)Online publication date: 27-Oct-2022
  • (2022)Automated Real-Time Recognition of Non-emotional Conversational Head-Gestures for Social RobotsProceedings of the Future Technologies Conference (FTC) 2022, Volume 310.1007/978-3-031-18344-7_29(432-450)Online publication date: 14-Oct-2022
  • (2022)The Handbook on Socially Interactive AgentsundefinedOnline publication date: 27-Oct-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media