skip to main content
10.1145/1027933.1028012acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

Multimodal response generation in GIS

Published: 13 October 2004 Publication History

Abstract

Advances in computer hardware and software technologies have enabled sophisticated information visualization techniques as well as new interaction opportunities to be introduced in the development of GIS (Geographical Information Systems) applications. Especially, research efforts in computer vision and natural language processing have enabled users to interact with computer applications using natural speech and gestures, which has proven to be effective for interacting with dynamic maps [1, 6]. Pen-based mobile devices and gesture recognition systems enable system designers to define application-specific gestures for carrying out particular tasks. Using force-feedback mouse for interacting with GIS has been proposed for visually-impaired people [4]. These are exciting new opportunities and hold the promise of advancing interaction with computers to a complete new level. The ultimate aim, however, should be directed on facilitating human-computer communication; that is, equal emphasis should be given to both understanding and generation of multimodal behavior. My proposed research will provide a conceptual framework and a computational model for generating multimodal responses to communicate spatial information along with dynamically generated maps. The model will eventually lead to development of a computational agent that has reasoning capabilities for distributing the semantic and pragmatic content of the intended response message among speech, deictic gestures and visual information. In other words, the system will be able to select the most natural and effective mode(s) of communicating back to the user.
Any research in computer science that investigates direct interaction of computers with humans should place human factors in center stage. Therefore, this work will follow a multi-disciplinary approach and integrate ideas from previous research in Psychology, Cognitive Science, Linguistics, Cartography, Geographical Information Science (GIScience) and Computer Science that will enable us to identify and address human, cartographic and computational issues involved in response planning and assist users with their spatial decision making by facilitating their visual thinking process as well as reducing their cognitive load. The methodology will be integrated into the design of DAVE_G [7] prototype: a,6e of Computer Science andUSAtyd Engineeringerface to Support Emergency Management. meaning. natural, multimodal, mixed - initiative dialogue interface to GIS. The system is currently capable of recognizing, interpreting and fusing users' natural occurring speech and gesture requests, and generating natural speech output. The communication between the system and user is modeled following the collaborative discourse theory [2] and maintains a Recipe Graph [5] structure - based on SharedPlan theory[3] - to represent the intentional structure of the discourse between the user and system. One major concern in generating speech responses for dynamic maps is that spatial information cannot be effectively communicated using speech. Altering perceptual attributes (e.g. color, size, pattern) of the visual data to direct user's attention to a particular location on the map is not usually effective, since each attribute bears an inherent semantic meaning and those perceptual attributes should be modified only when the system's judgement states that those attributes are not crucial to the user's understanding of the situation at that stage of the task. Gesticulation, on the other hand, is powerful for conveying location and form of spatially oriented information [6] without manipulating the map and the benefit of facilitating speech production. My research aims at designing feasible, extensible and effective multimodal response generation (content planning and modality allocation) model. A plan-based reasoning algorithm and methodology integrated with the Recipe Graph structure has the potential to achieve those goals.

References

[1]
Cohen, P.R., Johnston, M., McGee, D.R., Oviatt, S.L., Clow, J., Smith, I. The Efficiency of Multimodal Interaction: A Case Study. Proc. of the Int'l Conference on Spoken Langu-age Processing (IPSLP'98), Nov 30-Dec 4, 249--252, 1998
[2]
Grosz, B.J., Sidner, C.L. Attention, Intentions, and the Struc-ture of Discourse. Computational Linguistics,12,175--204, '86
[3]
Grosz, B.J., Kraus, S. Collaborative Plans for Complex Group Action. Artificial Intelligence, 2, 269--357, 1996
[4]
Jacobson, R.D., Representing Spatial Information Through Multimodal Interfaces: Overview and Results in Non-visual Interfaces. 6th International Conference on Information Visualization: Sym. on Spatial/Geographic Data Visualization, IEEE Proceedings, 10-12 July, 730--734
[5]
Lochbaum, K.E. A Collaborative Planning Model of Inten-tional Structure. Computational Linguistics, 4, 525--572, '94
[6]
Oviatt, S.L. Multimodal Interfaces to Dynamic Interactive Maps. Proc. of the Conference on Human Factors in Computing Systems (CHI'96)
[7]
Rauschert, I., Agrawal, P., Fuhrmann, S., Brewer, I., Wang, H., Sharma, R., Cai, G., MacEachren, A. Designing a Human-Centered, Multimodal GIS Interface to Support Emergency Management. ACM GIS'02
  1. Multimodal response generation in GIS

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces
    October 2004
    368 pages
    ISBN:1581139950
    DOI:10.1145/1027933
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 October 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    ICMI04
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 450
      Total Downloads
    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media