skip to main content
10.1145/1027933.1027940acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

From conversational tooltips to grounded discourse: head poseTracking in interactive dialog systems

Published: 13 October 2004 Publication History

Abstract

Head pose and gesture offer several key conversational grounding cues and are used extensively in face-to-face interaction among people. While the machine interpretation of these cues has previously been limited to output modalities, recent advances in face-pose tracking allow for systems which are robust and accurate enough to sense natural grounding gestures. We present the design of a module that detects these cues and show examples of its integration in three different conversational agents with varying degrees of discourse model complexity. Using a scripted discourse model and off-the-shelf animation and speech-recognition components, we demonstrate the use of this module in a novel "conversational tooltip" task, where additional information is spontaneously provided by an animated character when users attendto various physical objects or characters in the environment. We further describe the integration of our module in two systems where animated and robotic characters interact with users based on rich discourse and semantic models.

References

[1]
AT&T. Natural Voices. http://www.naturalvoices.att.com.
[2]
S. Basu, I. Essa, and A. Pentland. Motion regularization for model-based head tracking. In Proceedings. International Conference on Pattern Recognition, 1996.
[3]
M. Black and Y. Yacoob. Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion. In ICCV, pages 374--381, 1995.
[4]
V. Blanz and T. Vetter. A morphable model for the synthesis of 3 D faces. In SIGGRAPH99, pages 187--194, 1999.
[5]
J. Cassell, T. Bickmore, M. Billinghurst, L. Campbell, K. Chang, H. Vilhjalmsson, and H. Yan. Embodiment in conversational interfaces: Rea. In Proceedings of the CHI'99 Conference, pages 520--527, Pittsburgh, PA, 1999.
[6]
J. Cassell, T. Bickmore, H. Vilhjalmsson, and H. Yan. A relational agent: A model and implementation of building user trust. In Proceedings of the CHI'01 Conference, pages 396--403, Seattle, WA, 2001.
[7]
T. Cootes, G. Edwards, and C. Taylor. Active appearance models. PAMI, 23(6):681--684, June 2001.
[8]
V. Design. MEGA-D Megapixel Digital Stereo Head. http://www.ai.sri.com/ konolige/svs/, 2000.
[9]
G. Hager and P. Belhumeur. Efficient region tracking with parametric models of geometry and illumination. PAMI, 20(10):1025--1039, October 1998.
[10]
Haptek. Haptek Player. http://www.haptek.com.
[11]
A. Kapoor and R. Picard. A real-time head nod and shake detector. In Proceedings from the Workshop on Perspective User Interfaces, November 2001.
[12]
R. Kjeldsen. Head gestures for computer control. In Proc. Second International Workshop on Recognition, Analysis and Tracking of Faces and Gestures in Real-time Systems, pages 62--67, 2001.
[13]
M. La Cascia, S. Sclaroff, and V. Athitsos. Fast, reliable head tracking under varying illumination: An approach based on registration of textured-mapped 3 D models. PAMI, 22(4):322--336, April 2000.
[14]
C. Lee, N. Lesh, C. Sidner, L.-P. Morency, A. Kapoor, and T. Darrell. Nodding in conversations with a robot. In Extended Abstract of CHI'04, April 2004.
[15]
L.-P. Morency and T. Darrell. Stereo tracking using ICP and normal flow. In Proceedings International Conference on Pattern Recognition, 2002.
[16]
L.-P. Morency, A. Rahimi, N. Checka, and T. Darrell. Fast stereo-based head tracking for interactive environment. In Proceedings of the Int. Conference on Automatic Face and Gesture Recognition, 2002.
[17]
L.-P. Morency, A. Rahimi, and T. Darrell. Adaptive view-based appearance model. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition, 2003.
[18]
K. Murphy. Bayes Net Toolbox for Matlab. http://www.ai.mit.edu/ murphyk/Software/BNT/bnt.html.
[19]
Y. Nakano, G. Reinstein, T. Stocky, and J. Cassell. Towards a model of face-to-face grounding. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, July 2003.
[20]
Nuance. Nuance. http://www.nuance.com.
[21]
J. Pierrehumbert. The phonology and phonetic of English intonation. Massachusetts Institute of Technology, 1980.
[22]
C. Rich, C. Sidner, and N. Lesh. Collagen: Applying collaborative discourse theory to human--computer interaction. AI Magazine, Special Issue on Intelligent User Interfaces, 22(4):15--25, 2001.
[23]
A. Schodl, A. Haro, and I. Essa. Head tracking using a textured polygonal model. In PUI98, 1998.
[24]
C. Sidner, C. D. Kidd, C. Lee, and N. Lesh. Where to look: A study of human--robot engagement. In Proceedings of Intelligent User Interfaces, Portugal, 2004.
[25]
C. Sidner, C. Lee, and N. Lesh. Engagement when looking: Behaviors for robots when collaborating with people. In Diabruck: Proceedings of the 7th workshop on the Semantic and Pragmatics of Dialogue, pages 123--130, University of Saarland, 2003. I. Kruiff-Korbayova and C. Kosny (eds.).
[26]
M. Siracusa, L.-P. Morency, K. Wilson, J. Fisher, and T. Darrell. Haptics and biometrics: A multimodal approach for determining speaker location and focus. In Proceedings of the 5th International Conference on Multimodal Interfaces, November 2003.
[27]
R. Stiefelhagen. Tracking focus of attention in meetings. In Proceedings of International Conference on Multimodal Interfaces, 2002.
[28]
Y. Takemae, K. Otsuka, and N. Mukaua. Impact of video editing based on participants' gaze in multiparty conversation. In Extended Abstract of CHI'04, April 2004.
[29]
P. Viola and M. Jones. Robust real-time face detection. In ICCV, page II: 747, 2001.
[30]
L. Wiskott, J. Fellous, N. Kruger, and C. von der Malsburg. Face recognition by elastic bunch graph matching. PAMI, 19(7):775--779, July 1997.
[31]
C. Wren, A. Azarbayejani, T. Darrell, and A. Pentland. Pfinder: Real-time tracking of the human body. PAMI, 19(7):780--785, July 1997.

Cited By

View all
  • (2011)Robust stereoscopic head pose estimation in human-computer interaction and a unified evaluation frameworkProceedings of the 16th international conference on Image analysis and processing: Part I10.5555/2042620.2042649(227-236)Online publication date: 14-Sep-2011
  • (2011)Robust Stereoscopic Head Pose Estimation in Human-Computer Interaction and a Unified Evaluation FrameworkImage Analysis and Processing – ICIAP 201110.1007/978-3-642-24085-0_24(227-236)Online publication date: 2011
  • (2010)A realistic, virtual head for human-computer interactionInteracting with Computers10.1016/j.intcom.2009.12.00222:3(176-192)Online publication date: 1-May-2010
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces
October 2004
368 pages
ISBN:1581139950
DOI:10.1145/1027933
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. conversational tooltips
  2. grounding
  3. head gesture recognition
  4. head pose tracking
  5. human-computer interaction
  6. interactive dialog system

Qualifiers

  • Article

Conference

ICMI04
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2011)Robust stereoscopic head pose estimation in human-computer interaction and a unified evaluation frameworkProceedings of the 16th international conference on Image analysis and processing: Part I10.5555/2042620.2042649(227-236)Online publication date: 14-Sep-2011
  • (2011)Robust Stereoscopic Head Pose Estimation in Human-Computer Interaction and a Unified Evaluation FrameworkImage Analysis and Processing – ICIAP 201110.1007/978-3-642-24085-0_24(227-236)Online publication date: 2011
  • (2010)A realistic, virtual head for human-computer interactionInteracting with Computers10.1016/j.intcom.2009.12.00222:3(176-192)Online publication date: 1-May-2010
  • (2007)Mapping the demographics of virtual humansProceedings of the 21st British HCI Group Annual Conference on People and Computers: HCI...but not as we know it - Volume 210.5555/1531407.1531446(149-152)Online publication date: 3-Sep-2007
  • (2006)The effect of head-nod recognition in human-robot conversationProceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction10.1145/1121241.1121291(290-296)Online publication date: 2-Mar-2006
  • (2005)Multimodal Human-Computer InteractionReal-Time Vision for Human-Computer Interaction10.1007/0-387-27890-7_16(269-283)Online publication date: 2005

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media