Article

From conversational tooltips to grounded discourse: head poseTracking in interactive dialog systems

Authors:

Louis-Philippe Morency,

Trevor DarrellAuthors Info & Claims

ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

Pages 32 - 37

https://doi.org/10.1145/1027933.1027940

Published: 13 October 2004 Publication History

Abstract

Head pose and gesture offer several key conversational grounding cues and are used extensively in face-to-face interaction among people. While the machine interpretation of these cues has previously been limited to output modalities, recent advances in face-pose tracking allow for systems which are robust and accurate enough to sense natural grounding gestures. We present the design of a module that detects these cues and show examples of its integration in three different conversational agents with varying degrees of discourse model complexity. Using a scripted discourse model and off-the-shelf animation and speech-recognition components, we demonstrate the use of this module in a novel "conversational tooltip" task, where additional information is spontaneously provided by an animated character when users attendto various physical objects or characters in the environment. We further describe the integration of our module in two systems where animated and robotic characters interact with users based on rich discourse and semantic models.

References

[1]

AT&T. Natural Voices. http://www.naturalvoices.att.com.

[2]

S. Basu, I. Essa, and A. Pentland. Motion regularization for model-based head tracking. In Proceedings. International Conference on Pattern Recognition, 1996.

Digital Library

[3]

M. Black and Y. Yacoob. Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion. In ICCV, pages 374--381, 1995.

Digital Library

[4]

V. Blanz and T. Vetter. A morphable model for the synthesis of 3 D faces. In SIGGRAPH99, pages 187--194, 1999.

Digital Library

[5]

J. Cassell, T. Bickmore, M. Billinghurst, L. Campbell, K. Chang, H. Vilhjalmsson, and H. Yan. Embodiment in conversational interfaces: Rea. In Proceedings of the CHI'99 Conference, pages 520--527, Pittsburgh, PA, 1999.

Digital Library

[6]

J. Cassell, T. Bickmore, H. Vilhjalmsson, and H. Yan. A relational agent: A model and implementation of building user trust. In Proceedings of the CHI'01 Conference, pages 396--403, Seattle, WA, 2001.

Digital Library

[7]

T. Cootes, G. Edwards, and C. Taylor. Active appearance models. PAMI, 23(6):681--684, June 2001.

Digital Library

[8]

V. Design. MEGA-D Megapixel Digital Stereo Head. http://www.ai.sri.com/ konolige/svs/, 2000.

[9]

G. Hager and P. Belhumeur. Efficient region tracking with parametric models of geometry and illumination. PAMI, 20(10):1025--1039, October 1998.

Digital Library

[10]

Haptek. Haptek Player. http://www.haptek.com.

[11]

A. Kapoor and R. Picard. A real-time head nod and shake detector. In Proceedings from the Workshop on Perspective User Interfaces, November 2001.

Digital Library

[12]

R. Kjeldsen. Head gestures for computer control. In Proc. Second International Workshop on Recognition, Analysis and Tracking of Faces and Gestures in Real-time Systems, pages 62--67, 2001.

Digital Library

[13]

M. La Cascia, S. Sclaroff, and V. Athitsos. Fast, reliable head tracking under varying illumination: An approach based on registration of textured-mapped 3 D models. PAMI, 22(4):322--336, April 2000.

Digital Library

[14]

C. Lee, N. Lesh, C. Sidner, L.-P. Morency, A. Kapoor, and T. Darrell. Nodding in conversations with a robot. In Extended Abstract of CHI'04, April 2004.

Digital Library

[15]

L.-P. Morency and T. Darrell. Stereo tracking using ICP and normal flow. In Proceedings International Conference on Pattern Recognition, 2002.

[16]

L.-P. Morency, A. Rahimi, N. Checka, and T. Darrell. Fast stereo-based head tracking for interactive environment. In Proceedings of the Int. Conference on Automatic Face and Gesture Recognition, 2002.

Digital Library

[17]

L.-P. Morency, A. Rahimi, and T. Darrell. Adaptive view-based appearance model. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition, 2003.

Digital Library

[18]

K. Murphy. Bayes Net Toolbox for Matlab. http://www.ai.mit.edu/ murphyk/Software/BNT/bnt.html.

[19]

Y. Nakano, G. Reinstein, T. Stocky, and J. Cassell. Towards a model of face-to-face grounding. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, July 2003.

Digital Library

[20]

Nuance. Nuance. http://www.nuance.com.

[21]

J. Pierrehumbert. The phonology and phonetic of English intonation. Massachusetts Institute of Technology, 1980.

[22]

C. Rich, C. Sidner, and N. Lesh. Collagen: Applying collaborative discourse theory to human--computer interaction. AI Magazine, Special Issue on Intelligent User Interfaces, 22(4):15--25, 2001.

Digital Library

[23]

A. Schodl, A. Haro, and I. Essa. Head tracking using a textured polygonal model. In PUI98, 1998.

[24]

C. Sidner, C. D. Kidd, C. Lee, and N. Lesh. Where to look: A study of human--robot engagement. In Proceedings of Intelligent User Interfaces, Portugal, 2004.

Digital Library

[25]

C. Sidner, C. Lee, and N. Lesh. Engagement when looking: Behaviors for robots when collaborating with people. In Diabruck: Proceedings of the 7th workshop on the Semantic and Pragmatics of Dialogue, pages 123--130, University of Saarland, 2003. I. Kruiff-Korbayova and C. Kosny (eds.).

[26]

M. Siracusa, L.-P. Morency, K. Wilson, J. Fisher, and T. Darrell. Haptics and biometrics: A multimodal approach for determining speaker location and focus. In Proceedings of the 5th International Conference on Multimodal Interfaces, November 2003.

Digital Library

[27]

R. Stiefelhagen. Tracking focus of attention in meetings. In Proceedings of International Conference on Multimodal Interfaces, 2002.

Digital Library

[28]

Y. Takemae, K. Otsuka, and N. Mukaua. Impact of video editing based on participants' gaze in multiparty conversation. In Extended Abstract of CHI'04, April 2004.

Digital Library

[29]

P. Viola and M. Jones. Robust real-time face detection. In ICCV, page II: 747, 2001.

[30]

L. Wiskott, J. Fellous, N. Kruger, and C. von der Malsburg. Face recognition by elastic bunch graph matching. PAMI, 19(7):775--779, July 1997.

Digital Library

[31]

C. Wren, A. Azarbayejani, T. Darrell, and A. Pentland. Pfinder: Real-time tracking of the human body. PAMI, 19(7):780--785, July 1997.

Digital Library

Cited By

Layher GLiebau HNiese RAl-Hamadi AMichaelis BNeumann H(2011)Robust stereoscopic head pose estimation in human-computer interaction and a unified evaluation frameworkProceedings of the 16th international conference on Image analysis and processing: Part I10.5555/2042620.2042649(227-236)Online publication date: 14-Sep-2011
https://dl.acm.org/doi/10.5555/2042620.2042649
Layher GLiebau HNiese RAl-Hamadi AMichaelis BNeumann H(2011)Robust Stereoscopic Head Pose Estimation in Human-Computer Interaction and a Unified Evaluation FrameworkImage Analysis and Processing – ICIAP 201110.1007/978-3-642-24085-0_24(227-236)Online publication date: 2011
https://doi.org/10.1007/978-3-642-24085-0_24
Marcos SGómez-García-Bermejo JZalama E(2010)A realistic, virtual head for human-computer interactionInteracting with Computers10.1016/j.intcom.2009.12.00222:3(176-192)Online publication date: 1-May-2010
https://dl.acm.org/doi/10.1016/j.intcom.2009.12.002
Show More Cited By

Index Terms

From conversational tooltips to grounded discourse: head poseTracking in interactive dialog systems
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics
    1. Animation
      1. Motion capture
      2. Motion processing
2. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Recognizing gaze aversion gestures in embodied conversational discourse
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

Eye gaze offers several key cues regarding conversational discourse during face-to-face interaction between people. While a large body of research results exist to document the use of gaze in human-to-human interaction, and in animating realistic ...
Ah, Alright, Okay! Communicating Understanding in Conversational Product Search
CUI '23: Proceedings of the 5th International Conference on Conversational User Interfaces

When talking about products, people often express their needs in vague terms with vocabulary that does not necessarily overlap with product descriptions written by retailers. This poses a problem for chatbots in online shops, as the vagueness and ...
Natural Communication about Uncertainties in Situated Interaction
ICMI '14: Proceedings of the 16th International Conference on Multimodal Interaction

Physically situated, multimodal interactive systems must often grapple with uncertainties about properties of the world, people, and their intentions and actions. We present methods for estimating and communicating about different uncertainties in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

October 2004

368 pages

ISBN:1581139950

DOI:10.1145/1027933

General Chairs:
Rajeev Sharma
Advanced Interfaces
,
Trevor Darrell
Massachusetts Institute of Technology
,
Program Chairs:
Mary Harper
Purdue University, West Lafayette, IN
,
Gianni Lazzari
ITC-IRST
,
Matthew Turk
University of California, Santa Barbara, CA

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICMI04

Sponsor:

ICMI04: Sixth International Conference on Multimodal Interfaces 2004

October 13 - 15, 2004

PA, State College, USA

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
605
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Layher GLiebau HNiese RAl-Hamadi AMichaelis BNeumann H(2011)Robust stereoscopic head pose estimation in human-computer interaction and a unified evaluation frameworkProceedings of the 16th international conference on Image analysis and processing: Part I10.5555/2042620.2042649(227-236)Online publication date: 14-Sep-2011
https://dl.acm.org/doi/10.5555/2042620.2042649
Layher GLiebau HNiese RAl-Hamadi AMichaelis BNeumann H(2011)Robust Stereoscopic Head Pose Estimation in Human-Computer Interaction and a Unified Evaluation FrameworkImage Analysis and Processing – ICIAP 201110.1007/978-3-642-24085-0_24(227-236)Online publication date: 2011
https://doi.org/10.1007/978-3-642-24085-0_24
Marcos SGómez-García-Bermejo JZalama E(2010)A realistic, virtual head for human-computer interactionInteracting with Computers10.1016/j.intcom.2009.12.00222:3(176-192)Online publication date: 1-May-2010
https://dl.acm.org/doi/10.1016/j.intcom.2009.12.002
Khan RDe Angeli AOrmerod TSas C(2007)Mapping the demographics of virtual humansProceedings of the 21st British HCI Group Annual Conference on People and Computers: HCI...but not as we know it - Volume 210.5555/1531407.1531446(149-152)Online publication date: 3-Sep-2007
https://dl.acm.org/doi/10.5555/1531407.1531446
Sidner CLee CMorency LForlines CGoodrich MSchultz ABruemmer D(2006)The effect of head-nod recognition in human-robot conversationProceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction10.1145/1121241.1121291(290-296)Online publication date: 2-Mar-2006
https://dl.acm.org/doi/10.1145/1121241.1121291
Turk M(2005)Multimodal Human-Computer InteractionReal-Time Vision for Human-Computer Interaction10.1007/0-387-27890-7_16(269-283)Online publication date: 2005
https://doi.org/10.1007/0-387-27890-7_16

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten