Article

A study of manual gesture-based selection for the PEMMI multimodal transport management interface

Authors:

Mike WuAuthors Info & Claims

ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces

Pages 274 - 281

https://doi.org/10.1145/1088463.1088510

Published: 04 October 2005 Publication History

Abstract

Operators of traffic control rooms are often required to quickly respond to critical incidents using a complex array of multiple keyboards, mice, very large screen monitors and other peripheral equipment. To support the aim of finding more natural interfaces for this challenging application, this paper presents PEMMI (Perceptually Effective Multimodal Interface), a transport management system control prototype taking video-based manual gesture and speech recognition as inputs. A specific theme within this research is determining the optimum strategy for gesture input in terms of both single-point input selection and suitable multimodal feedback for selection. It has been found that users tend to prefer larger selection areas for targets in gesture interfaces, and tend to select within 44% of this selection radius. The minimum effective size for targets when using 'device-free' gesture interfaces was found to be 80 pixels (on a 1280x1024 screen). This paper also shows that feedback on gesture input via large screens is enhanced by the use of both audio and visual cues to guide the user's multimodal input. Audio feedback in particular was found to improve user response time by an average of 20% over existing gesture selection strategies for multimodal tasks.

References

[1]

Baudel, T. and Beaudouin-Lafon M. Charade: Remote Control of Objects Using Free-Hand Gestures. Communications of the ACM, 36, 7, (1993), 28--35.

Digital Library

[2]

Bolt, R. A. "Put-that-there": Voice and gesture at the graphic interface. Proc. of Computer Graphics (SIGGRAPH'80), 14, 3 (1980), 262--270.

Digital Library

[3]

Card, S. K., J. D. Mackinlay, et al., (Eds.) Information Visualization. Readings in Information Visualization. San Francisco, California, Morgan Kaufmann, (1999).

Digital Library

[4]

Cohen, P.R., Johnston, M., McGee, D.R., Oviatt, S.L., Pittman, J., Smith, I., Chen, L., and Clow, J. QuickSet: Multimodal interaction for distributed applications. In Proc. 5th Int. Multimedia Conf. (Multimedia '97), ACM Press: Seattle, WA, (1997), 31--40.

Digital Library

[5]

Corradini, A., Wesson, R. M., and Cohen, P. R. A Map-based System Using Speech and 3D Gestures for Pervasive Computing. In Proc. IEEE Int. Conf. on Multimodal Interfaces, Pittsburgh, PA, (2002), 191--196.

Digital Library

[6]

Elting C., Michelitsch G. A Multimodal Presentation Planner for a Home Entertainment Environment. In Proc. PUI 2001, Orlando FL, USA. ACM Press: New York, NY, (2001), 1--5.

Digital Library

[7]

Epps, J., Oviatt, S., and Chen, F. Integration of Speech and Gesture Inputs during Multimodal Interaction. In Proc. Aust. Int. Conf. on Computer-Human Interact. (OZCHI), (2004).

[8]

Gomez, G., Moralez, E. Automatic feature construction and a simple rule induction algorithm for skin detection. In Proc. IMCL Workshop on Machine Learning in Computer Vision, (2002), 31--38.

[9]

Graham, E. and MacKenzie, C.L. Pointing on a Computer Display. In Proc. Conf. companion on Human Factors in Computing Systems (1995), 314--315.

Digital Library

[10]

ISO Report number ISO/TC 159/SC4/WG3 N147: Ergonomic requirements for office work with visual display terminals (VDTs) - part 9 - Requirements for non-keyboard input devices (ISO 9241-9), International Organisation for Standardisation, 1998.

[11]

Jojic, N., Brumitt, B., Meyers, B., and Harris, S. Detecting and estimating pointing gestures in dense disparity maps. In Proc. IEEE Int. Conf. on Face and Gesture Recog. (2000), 468--475.

Digital Library

[12]

Kelly, M.J. Preliminary Human Factors Guidelines for Traffic Management Centers. Technical Report FHWA-JPO-99-042, U.S. Department of Transportation, (1999).

[13]

Miller, G. A. The Magic Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information. Psychological Review 63, (1957), 81--97.

[14]

Mousavi, S., Low, R., and Sweller, J. Reducing cognitive load by mixing auditory and visual presentation modes. Journal of Educational Psychology, 87 (1995). 319--334.

[15]

Nesbitt, K. V. Multi-sensory Display of Abstract Data, PhD thesis, School of Information Technology, University of Sydney, (2003).

[16]

Nickel, K., and Stiefelhagen, R. User tests and multimodal gesture: Pointing gesture recognition based on 3-D tracking of face, hands and head orientation. In Proc. Int. Conf. on Multimodal Interfaces (2003), 140--146.

Digital Library

[17]

Oviatt, S. L. Ten myths of multimodal interaction, Communications of the ACM, 42, 11, (1999), 74--81.

Digital Library

[18]

Oviatt, S., Coulston, R., and Lunsford, R. When Do We Interact Multimodally? Cognitive Load and Multimodal Communication Patterns. In Proc. 6th Int. Conf. on Multimodal Interfaces (2004).

Digital Library

[19]

Plaisant, C., Tarnoff, P., Keswani, S. and Rose, A. Understanding Transportation Management Systems Performance with a Simulation-Based Learning Environment. In Proc. Conference on Intelligent Transportation Systems ' 99 (CD ROM), (1999).

[20]

Schapira, E. and Sharma, R. Experimental Evaluation of Vision and Speech Based Multimodal Interfaces. In Proc. PUI 2001, ACM Press (2001), 1--9.

Digital Library

[21]

Wilson, A., and Oliver, N. GWindows: Robust stereo vision for gesture-based control of windows. In Proc. Int. Conf. on Multimodal Interfaces (2003), 140--146.

Digital Library

Cited By

Zhang XLi CHuang CXu Y(2017)Analysis on the Role of Transport Equipment Standardization in the Development of Multimodal TransportInternational Symposium for Intelligent Transportation and Smart City (ITASC) 2017 Proceedings10.1007/978-981-10-3575-3_15(139-148)Online publication date: 7-Apr-2017
https://doi.org/10.1007/978-981-10-3575-3_15
Köpsel AMajaranta PIsokoski PHuckauf A(2016)Effects of auditory, haptic and visual feedback on performing gestures by gaze or by handBehaviour & Information Technology10.1080/0144929X.2016.119447735:12(1044-1062)Online publication date: 1-Dec-2016
https://dl.acm.org/doi/10.1080/0144929X.2016.1194477
Shi YTsui T(2007)An FPGA-based smart camera for gesture recognition in HCI applicationsProceedings of the 8th Asian conference on Computer vision - Volume Part I10.5555/1775614.1775699(718-727)Online publication date: 18-Nov-2007
https://dl.acm.org/doi/10.5555/1775614.1775699
Show More Cited By

Index Terms

A study of manual gesture-based selection for the PEMMI multimodal transport management interface

Recommendations

Multimodal human discourse: gesture and speech

Gesture and speech combine to form a rich basis for human conversational interaction. To exploit these modalities in HCI, we need to understand the interplay between them and the way in which they support communication. We propose a framework for the ...
A Wizard of Oz study for an AR multimodal interface
ICMI '08: Proceedings of the 10th international conference on Multimodal interfaces

In this paper we describe a Wizard of Oz (WOz) user study of an Augmented Reality (AR) interface that uses multimodal input (MMI) with natural hand interaction and speech commands. Our goal is to use a WOz study to help guide the creation of a ...
“Stop over there”: natural gesture and speech interaction for non-critical spontaneous intervention in autonomous driving
ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal Interaction

We propose a new multimodal input technique for Non-critical Spontaneous Situations (NCSSs) in autonomous driving scenarios such as selecting a parking lot or picking up a hitchhiker. Speech and deictic (pointing) gestures were combined to instruct the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '05: Proceedings of the 7th international conference on Multimodal interfaces

October 2005

344 pages

ISBN:1595930280

DOI:10.1145/1088463

General Chairs:
Gianni Lazzari
ITC-irst, Trento (Italy)
,
Fabio Pianesi
ITC-irst, Trento (Italy)
,
Program Chairs:
James Crowley
I.N.P. Grenoble (France)
,
Kenji Mase
Nagoya University (Japan)
,
Sharon Oviatt
Oregon Health & Sciences University

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 October 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICMI05

Sponsor:

ICMI05: Seventh International Conference on Multimodal Interfaces 2005

October 4 - 6, 2005

Torento, Italy

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
432
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang XLi CHuang CXu Y(2017)Analysis on the Role of Transport Equipment Standardization in the Development of Multimodal TransportInternational Symposium for Intelligent Transportation and Smart City (ITASC) 2017 Proceedings10.1007/978-981-10-3575-3_15(139-148)Online publication date: 7-Apr-2017
https://doi.org/10.1007/978-981-10-3575-3_15
Köpsel AMajaranta PIsokoski PHuckauf A(2016)Effects of auditory, haptic and visual feedback on performing gestures by gaze or by handBehaviour & Information Technology10.1080/0144929X.2016.119447735:12(1044-1062)Online publication date: 1-Dec-2016
https://dl.acm.org/doi/10.1080/0144929X.2016.1194477
Shi YTsui T(2007)An FPGA-based smart camera for gesture recognition in HCI applicationsProceedings of the 8th Asian conference on Computer vision - Volume Part I10.5555/1775614.1775699(718-727)Online publication date: 18-Nov-2007
https://dl.acm.org/doi/10.5555/1775614.1775699
Sun YChen FShi YChung V(2007)An input-parsing algorithm supporting integration of deictic gesture in natural language interfaceProceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments10.5555/1769590.1769613(206-215)Online publication date: 22-Jul-2007
https://dl.acm.org/doi/10.5555/1769590.1769613
Shi YTaib RRuiz NChoi EChen F(2007)MULTIMODAL HUMAN-MACHINE INTERFACE AND USER COGNITIVE LOAD MEASUREMENTIFAC Proceedings Volumes10.3182/20070904-3-KR-2922.0003540:16(200-205)Online publication date: 2007
https://doi.org/10.3182/20070904-3-KR-2922.00035
Dasiopoulou SHeinecke JSaathoff CStrintzis M(2007)Multimedia Reasoning with Natural Language SupportProceedings of the International Conference on Semantic Computing10.1109/ICSC.2007.61(413-420)Online publication date: 17-Sep-2007
https://dl.acm.org/doi/10.1109/ICSC.2007.61
Choi ETaib RShi YChen F(2007)Multimodal user interface for traffic incident management in control roomIET Intelligent Transport Systems10.1049/iet-its:200600381:1(27)Online publication date: 2007
https://doi.org/10.1049/iet-its:20060038
Shi YTsui T(2007)An FPGA-Based Smart Camera for Gesture Recognition in HCI ApplicationsComputer Vision – ACCV 200710.1007/978-3-540-76386-4_68(718-727)Online publication date: 2007
https://doi.org/10.1007/978-3-540-76386-4_68
Sun YChen FShi YChung V(2007)An Input-Parsing Algorithm Supporting Integration of Deictic Gesture in Natural Language InterfaceHuman-Computer Interaction. HCI Intelligent Multimodal Interaction Environments10.1007/978-3-540-73110-8_22(206-215)Online publication date: 2007
https://doi.org/10.1007/978-3-540-73110-8_22
Sun YChen FChung V(2006)QuickFusionProceedings of the 2005 NICTA-HCSNet Multimodal User Interaction Workshop - Volume 5710.5555/1151804.1151813(51-54)Online publication date: 1-Apr-2006
https://dl.acm.org/doi/10.5555/1151804.1151813
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten