poster

Multi-modal features for real-time detection of human-robot interaction categories

Authors:

Masahiro Shiomi,

Pilippe-Emmanuel Chadutaud,

Takayuki Kanda,

Norihiro Hagita,

Hiroshi IshiguroAuthors Info & Claims

ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces

Pages 127 - 134

https://doi.org/10.1145/1647314.1647337

Published: 02 November 2009 Publication History

Abstract

Social interactions unfold over time, at multiple time scales, and can be observed through multiple sensory modalities. In this paper, we propose a machine learning framework for selecting and combining low-level sensory features from different modalities to produce high-level characterizations of human-robot social interactions in real-time.

We introduce a novel set of fast, multi-modal, spatio-temporal features for audio sensors, touch sensors, floor sensors, laser range sensors, and the time-series history of the robot's own behaviors. A subset of these features are automatically selected and combined using GentleBoost, an ensemble machine learning technique, allowing the robot to make an estimate of the current interaction category every 100 milliseconds. This information can then be used either by the robot to make decisions autonomously, or by a remote human-operator who can modify the robot's behavior manually (i.e., semi-autonomous operation).

We demonstrate the technique on an information-kiosk robot deployed in a busy train station, focusing on the problem of detecting interaction breakdowns (i.e., failure of the robot to engage in a good interaction). We show that despite the varied and unscripted nature of human-robot interactions in the real-world train-station setting, the robot can achieve highly accurate predictions of interaction breakdowns at the same instant human observers become aware of them.

References

[1]

S. Chu, S. Narayanan, C.-C. J. Kuo, and M. J. Mataric. Where am I? Scene recognition for mobile robots using audio features. In IEEE International Conference on Multimedia&Expo (ICME), 2006.

[2]

C. Cortes and M. Mohri. Auc optimization vs. error rate minimization. In S. Thrun, L. Saul, and B. Scholkpf, editors, Advances in Neural Information Processing Systems, 16, Cambridge, MA, USA, 2004. MIT Press.

[3]

D. Francois, D. Polani, and K. Dautenhahn. On--line behaviour classification and adaptation to human-robot interaction styles. In Proc. 2nd ACM/IEEE International Conference on Human--Robot Interaction (HRI07), pages 295--302, Washington DC, USA, March 9--11 2007.

Digital Library

[4]

J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28(2):337--374, 2000.

[5]

D. F. Glas, T. Kanda, H. Ishiguro, and N. Hagita. Simultaneous teleoperation of multiple social robots. In Proceedings of the 3rd ACM/IEEE international conference on human-robot interaction, Amsterdam, The Netherlands, 2007.

Digital Library

[6]

F. W. Glover and M. Laguna. Tabu Search. Kluwer Academic Publishers, 1997.

Digital Library

[7]

F. Hammer, A. Derakhshan, Y. Demazeau, and H. H. Lund. A multi-agent approach to social human behaviour in children's play. InIAT '06: Proceedings of the IEEE/WIC/ACM international conference on Intelligent Agent Technology, pages 403--406, Washington, DC, USA, 2006. IEEE Computer Society.

Digital Library

[8]

T. Kanda, H. Ishiguro, M. Imai, and T. Ono. Body movement analysis of human-robot interaction. In In Proc. Int. Joint Conf. on Artificial Intel ligence (IJCAI), pages 177--182, 2003.

Digital Library

[9]

T. Kanda, H. Ishiguro, T. Ono, M. Imai, and R. Nakatsu. Development and evaluation of an interactive humanoid robot. In IEEE Int. Conf. on Robotics and Automation (ICRA), 2002.

[10]

G. Littlewort, M. Bartlett, I. Fasel, J. Susskind, and J. R. Movellan. An automatic system for measuring facial expression in video.Computer Vision and Image Understanding, Special Issue on Face Processing in Video, 24(6):615--625, 2006.

Digital Library

[11]

G. Littlewort, M. Bartlett, I. Fasel, J. Susskind, and J. R. Movellan. An automatic system for measuring facial expression in video. Image and Vision Computing., in press.

[12]

G. Littlewort, M. S. Bartlett, C. J, I. Fasel, T. Kanda, H. Ishiguro, and J. R. Movellan. Towards social robots: Automatic evaluation of human-robot interaction by face detection and expression classification. In S. Thrun, L. Saul, and B. Schoelkopf, editors, Advances in neural information processing systems, volume 16, pages 1563--1570. MIT Press, Cambridge, MA, 2004.

[13]

J. R. Movellan and I. R. Fasel. A generative framework for real time object detection and classification. Computer Vision and Image Understanding, 2005.

Digital Library

[14]

P. Ruvolo, I. R. Fasel, and J. R. Movellan. Auditory mood detection for social and educational robots. In ICRA, pages 3551--3556, 2008.

[15]

P. Ruvolo and J. R. Movellan. Automatic cry detection in early childhood education settings. Proceedings of ICDL, pages 204--208, 2008.

[16]

T. Salter, F. Michaud, K. Dautenhahn, D. Létourneau, and S. Caron. Recognizing interaction from a robot's perspective. In Proc. 14th IEEE Int. Workshop on Robot and Human (ROMAN), pages 178--183, 2005.

[17]

B. Scassellati. Quantitative metrics of social response for autism diagnosis. In Proc. 14th IEEE Int. Workshop on Robot and Human Interactive Communication (ROMAN), 2005.

[18]

M. Shiomi, D. Sakamoto, T. Kanda, C. T. Ishi, H. Ishiguro, and N. Hagita. A semi-autonomous communication robot: a field trial at a train station. In HRI, pages 303--310, 2008.

Digital Library

[19]

F. Tanaka, A. Cicourel, and J. R. Movellan. Socialization between toddlers and robots at an early childhood education center.Proceedings of the National Academy of Science, 194(46):17954--17958, 2007.

[20]

P. Viola and M. Jones. Robust real-time object detection. International Journal of Computer Vision, 2002.

Cited By

Hajishirzi HLehman JHodgins JLee GGinzburg J(2012)Using group history to identify character-directed utterances in multi-child interactionsProceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue10.5555/2392800.2392838(207-216)Online publication date: 5-Jul-2012
https://dl.acm.org/doi/10.5555/2392800.2392838

Index Terms

Multi-modal features for real-time detection of human-robot interaction categories
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Robotics
2. Computing methodologies
  1. Artificial intelligence
    1. Control methods
      1. Robotic planning
    2. Planning and scheduling
      1. Robotic planning

Recommendations

Lexical Entrainment in Multi-party Human–Robot Interaction
Social Robotics
Abstract
This paper reports lexical entrainment in a multi-party human–robot interaction, wherein one robot and two humans serve as participants. Humans tend to use the same terms as their interlocutors while making conversation. This phenomenon is called ...
Multi-Modal Multi sensor Interaction between Human andHeterogeneous Multi-Robot System
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction

I introduce a novel multi-modal multi-sensor interaction method between humans and heterogeneous multi-robot systems. I have also developed a novel algorithm to control heterogeneous multi-robot systems. The proposed algorithm allows the human operator ...
Autonomy and Common Ground in Human-Robot Interaction: A Field Study

In a two-year study of a collaborative human-robot system, researchers observed a science team in Pittsburgh and a robot in Chile.The system was part of a project intended to inform planetary exploration while studying a terrestrial desert. Over two ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces

November 2009

374 pages

ISBN:9781605587721

DOI:10.1145/1647314

General Chairs:
James L. Crowley
INRIA Grenoble Rhône-Alpes Research Centre, France
,
Yuri Ivanov
MERL, USA
,
Christopher Wren
Google, USA
,
Program Chairs:
Daniel Gatica-Perez
Idiap Research Institute, Switzerland
,
Michael Johnston
AT&T Research, USA
,
Rainer Stiefelhagen
University of Karlsruhe, Germany

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

ICMI-MLMI '09

Sponsor:

SIGCHI

ICMI-MLMI '09: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES/WORKSHOP ON MACHINE LEARNING FOR MULTIMODAL INTERFACES

November 2 - 4, 2009

Massachusetts, Cambridge, USA

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
177
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hajishirzi HLehman JHodgins JLee GGinzburg J(2012)Using group history to identify character-directed utterances in multi-child interactionsProceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue10.5555/2392800.2392838(207-216)Online publication date: 5-Jul-2012
https://dl.acm.org/doi/10.5555/2392800.2392838

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten