ACM Home Page
Please provide us with feedback. Feedback
A framework for evaluating multimodal integration by humans and a role for embodied conversational agents
Full text PdfPdf (218 KB)
Source International Conference on Multimodal Interfaces archive
Proceedings of the 6th international conference on Multimodal interfaces table of contents
State College, PA, USA
SESSION: Multimodial conversational agents table of contents
Pages: 24 - 31  
Year of Publication: 2004
ISBN:1-58113-995-0
Author
Dominic W. Massaro  University of California, Santa Cruz, Santa Cruz, CA
Sponsors
SIGCHI: ACM Special Interest Group on Computer-Human Interaction
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 57,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1027933.1027939
What is a DOI?

ABSTRACT

One of the implicit assumptions of multi-modal interfaces is that human-computer interaction is significantly facilitated by providing multiple input and output modalities. Surprisingly, however, there is very little theoretical and empirical research testing this assumption in terms of the presentation of multimodal displays to the user. The goal of this paper is provide both a theoretical and empirical framework for addressing this important issue. Two contrasting models of human information processing are formulated and contrasted in experimental tests. According to integration models, multiple sensory influences are continuously combined during categorization, leading to perceptual experience and action. The Fuzzy Logical Model of Perception (FLMP) assumes that processing occurs in three successive but overlapping stages: evaluation, integration, and decision (Massaro, 1998). According to nonintegration models, any perceptual experience and action results from only a single sensory influence. These models are tested in expanded factorial designs in which two input modalities are varied independently of one another in a factorial design and each modality is also presented alone. Results from a variety of experiments on speech, emotion, and gesture support the predictions of the FLMP. Baldi, an embodied conversational agent, is described and implications for applications of multimodal interfaces are discussed.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
Anastasio, T. J., & Patton, P. E. (2004). Analysis and modeling of multisensory enhancement in the deep superior colliculus. In G. Calvert, C. Spence & B. E. Stein (Eds.), Handbook of Multisensory Processes (pp. 265-283). Cambridge, MA: MIT Press.
 
2
Andre, E. (2004). Lessons Learned from Evaluating Animated Presentation Agents. Workshop on Evaluating Embodied Conversational Agents, Schloß Dagstuhl, Germany.
 
3
 
4
Bosseler, A. & Massaro, D.W. (2003). Development and Evaluation of a Computer-Animated Tutor for Vocabulary and Language Learning for Children with Autism. Journal of Autism and Developmental Disorders, 33, 653--672.
 
5
Campbell, C. S.; Schwarzer, G.; Massaro, D. W. (2001). Face perception: An information processing perspective. In M.J. Wenger, & J.T. Townsend (Eds.), Computational, geometric, and process perspectives on facial cognition: Contexts and challenges (pp. 285--345). Lawrence Erlbaum Associates, Inc., Publishers: Mahwah, NJ.
 
6
 
7
Cohen, M.M., Beskow, J. & Massaro, D.W. (1998). Recent developments in facial animation: An inside view. AVSP '98 (Dec 4-6, 1998, Sydney, Australia). http://mambo.ucsc.edu/psl/avsp98/11.doc
 
8
 
9
 
10
de Gelder, B. & Vroomen, J. (2000). Perceiving Emotions by Ear and by Eye. Cognition & Emotion 14, 289--311.
 
11
Erber, N. P. (1972). Auditory, visual, and auditory-visual recognition of consonants by children with normal and impaired hearing. Journal of Speech and Hearing Research, 15, 423--422.
 
12
13
 
14
Jesse, A., Vrignaud, N. & Massaro, D.W. (2000/01). The processing of information from multiple sources in simultaneous interpreting. Interpreting, 5, 95--115.
 
15
Lederman, S. J. & Klatzky, R. L. (2004). Multisensory texture perception. In G. Calvert, C. Spence & B. E. Stein (Eds.), Handbook of Multisensory Processes. (pp. 107--122). Cambridge, MA: MIT Press.
 
16
Lewkowicz, D. J. & Kraebel, K. S. (2004). The value of multisensory redundancy in the development of intersensory perception. In G. Calvert, C. Spence & B. E. Stein (Eds.), Handbook of Multisensory Processes (pp. 655--678). Cambridge, MA: MIT Press.
 
17
Massaro, D.W. (1984). Children's perception of visual and auditory speech. Child Development, 55, 1777--1788.
 
18
Massaro, D.W. (1987). Speech perception by ear and eye: A Paradigm for psychological inquiry. Hillsdale, NJ: Erlbaum.
 
19
Massaro, D.W. (1988). Ambiguity in perception and experimentation. Journal of Experimental Psychology: General, 117, 417--421.
 
20
Massaro, D.W. (1989). Testing between the TRACE model and the Fuzzy Logical Model of speech perception. Cognitive Psychology 21, 398--421.
 
21
Massaro, D.W. (1998). Perceiving talking faces: From speech perception to a behavioral principle. Cambridge, MA: MIT Press.
 
22
Massaro, D.W. (1999). From theory to practice: Rewards and challenges. In Proceedings of the International Conference of Phonetic Sciences (pp. 1289--1292). San Francisco, CA.
 
23
Massaro, D.W. (2000). From "Speech is Special" to Talking Heads in Language Learning. In Proceedings of Integrating speech technology in the (language) learning and assistive interface, (InSTIL 2000) (pp.153--161). University of Abertay Dundee, Scotland.
 
24
Massaro, D.W. (2002). Multimodal Speech Perception: A Paradigm for Speech Science. In B. Granstrom, D. House & I. Karlsson (Eds.), Multilmodality in language and speech systems (pp.45--71). The Netherlands: Kluwer Academic Publishers
25
 
26
Massaro, D.W. & Bosseler, A. (2003). Perceiving Speech by Ear and Eye: Multimodal Integration by Children with Autism. Journal of Developmental and Learning Disorders, 7, 111--144.
 
27
 
28
Massaro, D.W. & Cohen, M.M. (1999). Speech perception in hearing-impaired perceivers: Synergy of multiple modalities. Journal of Speech, Language & Hearing Science, 42, 21--41.
 
29
Massaro, D.W. & Cohen, M.M. (2000). Fuzzy logical model of bimodal emotion perception: Comment on "The perception of emotions by ear and by eye" by de Gelder and Vroomen. Cognition and Emotion, 14(3), 313--320.
 
30
Massaro, D.W., Cohen, M.M., Tabain, M., Beskow, J. & Clark, R. (in press). Animated speech: Research progress and applications. In E. Vatiokis-Bateson, G. Bailly & P. Perrier (Eds.) Audiovisual Speech Processing, Cambridge, MA: MIT Press.
 
31
Massaro, D.W. & Friedman, D. (1990). Models of integration given multiple sources of information. Psychological Review, 97(2) 225--252.
 
32
Massaro, D.W. & Light, J. (2003). Read My Tongue Movements: Bimodal Learning To Perceive And Produce Non-Native Speech /r/ and /l/. In Proceedings of Eurospeech '03-Switzerland (Interspeech). 8th European Conference on Speech Communication and Technology. Geneva, Switzerland.
 
33
Massaro, D.W. & Light, J. (in press). Using Visible Speech for Training Perception and Production of Speech for Hard of Hearing Individuals. Volta Review.
 
34
Massaro, D.W. & Stork, D. G. (1998). Sensory integration and speechreading by humans and machines. American Scientist, 86, 236--244.
 
35
McNeill, D. (1985). So you think gestures are nonverbal? Psychological Review, 92, 350--371.
 
36
Mesulam, M.M. (1998). From sensation to cognition. Brain, 121, 1013--1052.
 
37
Moore, M. & Calvert, S. (2000). Brief Report: Vocabulary acquisition for children with autism: Teacher or computer instruction. Journal of Autism and Developmental Disorders, 30, 359--362.
 
38
Movellan, J. R. & McClelland, J. L. (2001). The Morton-Massaro law of information integration: Implications for models of perception. Psychological Review, 108, 113--148.
 
39
Munhall, K., & Vatikiotis-Bateson, E. (2004). Spatial and Temporal Constraints on Audiovisual Speech Perception. In G. Calvert, C. Spence & B. E. Stein (Eds.), Handbook of Multisensory Processes (pp. 177--188). Cambridge, MA: MIT Press.
 
40
Ouni, S., Massaro, D.W., Cohen, M.M. & Young, K. (2003) Internationalization of a talking head. 15th International Congress of Phonetic Sciences. Barcelona, Spain.
41
 
42
Pashler, H. E. (1998). The psychology of attention. Cambridge, MA: MIT Press.
 
43
Potamianos, G., Neti, C., Gravier, G. & Garg, A. (2003). Automatic Recognition of audio-visual speech: Recent progress and challenges. In Proceedings of the IEEE, 91(9), (pp.1306--1326).
 
44
Stein, B. E., & Meredith, M. A. (1993). The merging of the senses. Cambridge, MA: MIT Press.
 
45
Thompson, L.A. & Massaro, D.W. (1994). Children's Integration of Speech and Pointing Gestures in Comprehension. Journal of Experimental Child Psychology, 57, 327--354.
 
46
Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338--353.