ACM Home Page
Please provide us with feedback. Feedback
From conversational tooltips to grounded discourse: head poseTracking in interactive dialog systems
Full text PdfPdf (339 KB)
Source International Conference on Multimodal Interfaces archive
Proceedings of the 6th international conference on Multimodal interfaces table of contents
State College, PA, USA
SESSION: Multimodial conversational agents table of contents
Pages: 32 - 37  
Year of Publication: 2004
ISBN:1-58113-995-0
Authors
Louis-Philippe Morency  MIT, Cambridge, MA
Trevor Darrell  MIT, Cambridge, MA
Sponsors
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 58,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1027933.1027940
What is a DOI?

ABSTRACT

Head pose and gesture offer several key conversational grounding cues and are used extensively in face-to-face interaction among people. While the machine interpretation of these cues has previously been limited to output modalities, recent advances in face-pose tracking allow for systems which are robust and accurate enough to sense natural grounding gestures. We present the design of a module that detects these cues and show examples of its integration in three different conversational agents with varying degrees of discourse model complexity. Using a scripted discourse model and off-the-shelf animation and speech-recognition components, we demonstrate the use of this module in a novel "conversational tooltip" task, where additional information is spontaneously provided by an animated character when users attendto various physical objects or characters in the environment. We further describe the integration of our module in two systems where animated and robotic characters interact with users based on rich discourse and semantic models.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
AT&T. Natural Voices. http://www.naturalvoices.att.com.
 
2
 
3
 
4
5
6
 
7
 
8
V. Design. MEGA-D Megapixel Digital Stereo Head. http://www.ai.sri.com/ konolige/svs/, 2000.
 
9
 
10
Haptek. Haptek Player. http://www.haptek.com.
11
 
12
 
13
14
 
15
L.-P. Morency and T. Darrell. Stereo tracking using ICP and normal flow. In Proceedings International Conference on Pattern Recognition, 2002.
 
16
 
17
L.-P. Morency, A. Rahimi, and T. Darrell. Adaptive view-based appearance model. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition, 2003.
 
18
K. Murphy. Bayes Net Toolbox for Matlab. http://www.ai.mit.edu/ murphyk/Software/BNT/bnt.html.
 
19
 
20
Nuance. Nuance. http://www.nuance.com.
 
21
J. Pierrehumbert. The phonology and phonetic of English intonation. Massachusetts Institute of Technology, 1980.
 
22
 
23
A. Schodl, A. Haro, and I. Essa. Head tracking using a textured polygonal model. In PUI98, 1998.
24
 
25
C. Sidner, C. Lee, and N. Lesh. Engagement when looking: Behaviors for robots when collaborating with people. In Diabruck: Proceedings of the 7th workshop on the Semantic and Pragmatics of Dialogue, pages 123--130, University of Saarland, 2003. I. Kruiff-Korbayova and C. Kosny (eds.).
26
 
27
28
 
29
P. Viola and M. Jones. Robust real-time face detection. In ICCV, page II: 747, 2001.
 
30
 
31


Collaborative Colleagues:
Louis-Philippe Morency: colleagues
Trevor Darrell: colleagues