Article

Multimodal response generation in GIS

Author:

Levent BolelliAuthors Info & Claims

ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

Page 355

https://doi.org/10.1145/1027933.1028012

Published: 13 October 2004 Publication History

Get Access

Abstract

Advances in computer hardware and software technologies have enabled sophisticated information visualization techniques as well as new interaction opportunities to be introduced in the development of GIS (Geographical Information Systems) applications. Especially, research efforts in computer vision and natural language processing have enabled users to interact with computer applications using natural speech and gestures, which has proven to be effective for interacting with dynamic maps [1, 6]. Pen-based mobile devices and gesture recognition systems enable system designers to define application-specific gestures for carrying out particular tasks. Using force-feedback mouse for interacting with GIS has been proposed for visually-impaired people [4]. These are exciting new opportunities and hold the promise of advancing interaction with computers to a complete new level. The ultimate aim, however, should be directed on facilitating human-computer communication; that is, equal emphasis should be given to both understanding and generation of multimodal behavior. My proposed research will provide a conceptual framework and a computational model for generating multimodal responses to communicate spatial information along with dynamically generated maps. The model will eventually lead to development of a computational agent that has reasoning capabilities for distributing the semantic and pragmatic content of the intended response message among speech, deictic gestures and visual information. In other words, the system will be able to select the most natural and effective mode(s) of communicating back to the user.

Any research in computer science that investigates direct interaction of computers with humans should place human factors in center stage. Therefore, this work will follow a multi-disciplinary approach and integrate ideas from previous research in Psychology, Cognitive Science, Linguistics, Cartography, Geographical Information Science (GIScience) and Computer Science that will enable us to identify and address human, cartographic and computational issues involved in response planning and assist users with their spatial decision making by facilitating their visual thinking process as well as reducing their cognitive load. The methodology will be integrated into the design of DAVE_G [7] prototype: a,6e of Computer Science andUSAtyd Engineeringerface to Support Emergency Management. meaning. natural, multimodal, mixed - initiative dialogue interface to GIS. The system is currently capable of recognizing, interpreting and fusing users' natural occurring speech and gesture requests, and generating natural speech output. The communication between the system and user is modeled following the collaborative discourse theory [2] and maintains a Recipe Graph [5] structure - based on SharedPlan theory[3] - to represent the intentional structure of the discourse between the user and system. One major concern in generating speech responses for dynamic maps is that spatial information cannot be effectively communicated using speech. Altering perceptual attributes (e.g. color, size, pattern) of the visual data to direct user's attention to a particular location on the map is not usually effective, since each attribute bears an inherent semantic meaning and those perceptual attributes should be modified only when the system's judgement states that those attributes are not crucial to the user's understanding of the situation at that stage of the task. Gesticulation, on the other hand, is powerful for conveying location and form of spatially oriented information [6] without manipulating the map and the benefit of facilitating speech production. My research aims at designing feasible, extensible and effective multimodal response generation (content planning and modality allocation) model. A plan-based reasoning algorithm and methodology integrated with the Recipe Graph structure has the potential to achieve those goals.

References

[1]

Cohen, P.R., Johnston, M., McGee, D.R., Oviatt, S.L., Clow, J., Smith, I. The Efficiency of Multimodal Interaction: A Case Study. Proc. of the Int'l Conference on Spoken Langu-age Processing (IPSLP'98), Nov 30-Dec 4, 249--252, 1998

Google Scholar

[2]

Grosz, B.J., Sidner, C.L. Attention, Intentions, and the Struc-ture of Discourse. Computational Linguistics,12,175--204, '86

Digital Library

Google Scholar

[3]

Grosz, B.J., Kraus, S. Collaborative Plans for Complex Group Action. Artificial Intelligence, 2, 269--357, 1996

Digital Library

Google Scholar

[4]

Jacobson, R.D., Representing Spatial Information Through Multimodal Interfaces: Overview and Results in Non-visual Interfaces. 6th International Conference on Information Visualization: Sym. on Spatial/Geographic Data Visualization, IEEE Proceedings, 10-12 July, 730--734

Google Scholar

[5]

Lochbaum, K.E. A Collaborative Planning Model of Inten-tional Structure. Computational Linguistics, 4, 525--572, '94

Digital Library

Google Scholar

[6]

Oviatt, S.L. Multimodal Interfaces to Dynamic Interactive Maps. Proc. of the Conference on Human Factors in Computing Systems (CHI'96)

Digital Library

Google Scholar

[7]

Rauschert, I., Agrawal, P., Fuhrmann, S., Brewer, I., Wang, H., Sharma, R., Cai, G., MacEachren, A. Designing a Human-Centered, Multimodal GIS Interface to Support Emergency Management. ACM GIS'02

Digital Library

Google Scholar

Multimodal response generation in GIS
1. Human-centered computing

Recommendations

Aspect-Aware Response Generation for Multimodal Dialogue System
Survey Paper and Regular Paper

Multimodality in dialogue systems has opened up new frontiers for the creation of robust conversational agents. Any multimodal system aims at bridging the gap between language and vision by leveraging diverse and often complementary information from ...
The role of discourse structure and response time in multimodal communication
IVA'06: Proceedings of the 6th international conference on Intelligent Virtual Agents

In an ongoing project on multimodal communication in humans and agents [1], we investigate the interaction between two linguistic modalities (prosody and dialog structure) and two non-linguistic modalities (eye gaze and facial expressions). The goal is ...
How to Evaluate Humorous Response Generation, Seriously?
CHIIR '18: Proceedings of the 2018 Conference on Human Information Interaction & Retrieval

Nowadays natural language user interfaces, such as chatbots and conversational agents, are very common. A desirable trait of such applications is a sense of humor. It is, therefore, important to be able to measure quality of humorous responses. However, ...

Comments

Information & Contributors

Information

Published In

ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

October 2004

368 pages

ISBN:1581139950

DOI:10.1145/1027933

General Chairs:
Rajeev Sharma
Advanced Interfaces
,
Trevor Darrell
Massachusetts Institute of Technology
,
Program Chairs:
Mary Harper
Purdue University, West Lafayette, IN
,
Gianni Lazzari
ITC-IRST
,
Matthew Turk
University of California, Santa Barbara, CA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 October 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

ICMI04

Sponsor:

ICMI04: Sixth International Conference on Multimodal Interfaces 2004

October 13 - 15, 2004

PA, State College, USA

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
450
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Recommendations

Aspect-Aware Response Generation for Multimodal Dialogue System

The role of discourse structure and response time in multimodal communication

How to Evaluate Humorous Response Generation, Seriously?

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations