skip to main content
10.1145/1647314.1647337acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
poster

Multi-modal features for real-time detection of human-robot interaction categories

Published: 02 November 2009 Publication History

Abstract

Social interactions unfold over time, at multiple time scales, and can be observed through multiple sensory modalities. In this paper, we propose a machine learning framework for selecting and combining low-level sensory features from different modalities to produce high-level characterizations of human-robot social interactions in real-time.
We introduce a novel set of fast, multi-modal, spatio-temporal features for audio sensors, touch sensors, floor sensors, laser range sensors, and the time-series history of the robot's own behaviors. A subset of these features are automatically selected and combined using GentleBoost, an ensemble machine learning technique, allowing the robot to make an estimate of the current interaction category every 100 milliseconds. This information can then be used either by the robot to make decisions autonomously, or by a remote human-operator who can modify the robot's behavior manually (i.e., semi-autonomous operation).
We demonstrate the technique on an information-kiosk robot deployed in a busy train station, focusing on the problem of detecting interaction breakdowns (i.e., failure of the robot to engage in a good interaction). We show that despite the varied and unscripted nature of human-robot interactions in the real-world train-station setting, the robot can achieve highly accurate predictions of interaction breakdowns at the same instant human observers become aware of them.

References

[1]
S. Chu, S. Narayanan, C.-C. J. Kuo, and M. J. Mataric. Where am I? Scene recognition for mobile robots using audio features. In IEEE International Conference on Multimedia&Expo (ICME), 2006.
[2]
C. Cortes and M. Mohri. Auc optimization vs. error rate minimization. In S. Thrun, L. Saul, and B. Scholkpf, editors, Advances in Neural Information Processing Systems, 16, Cambridge, MA, USA, 2004. MIT Press.
[3]
D. Francois, D. Polani, and K. Dautenhahn. On--line behaviour classification and adaptation to human-robot interaction styles. In Proc. 2nd ACM/IEEE International Conference on Human--Robot Interaction (HRI07), pages 295--302, Washington DC, USA, March 9--11 2007.
[4]
J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: A statistical view of boosting. Annals of Statistics, 28(2):337--374, 2000.
[5]
D. F. Glas, T. Kanda, H. Ishiguro, and N. Hagita. Simultaneous teleoperation of multiple social robots. In Proceedings of the 3rd ACM/IEEE international conference on human-robot interaction, Amsterdam, The Netherlands, 2007.
[6]
F. W. Glover and M. Laguna. Tabu Search. Kluwer Academic Publishers, 1997.
[7]
F. Hammer, A. Derakhshan, Y. Demazeau, and H. H. Lund. A multi-agent approach to social human behaviour in children's play. InIAT '06: Proceedings of the IEEE/WIC/ACM international conference on Intelligent Agent Technology, pages 403--406, Washington, DC, USA, 2006. IEEE Computer Society.
[8]
T. Kanda, H. Ishiguro, M. Imai, and T. Ono. Body movement analysis of human-robot interaction. In In Proc. Int. Joint Conf. on Artificial Intel ligence (IJCAI), pages 177--182, 2003.
[9]
T. Kanda, H. Ishiguro, T. Ono, M. Imai, and R. Nakatsu. Development and evaluation of an interactive humanoid robot. In IEEE Int. Conf. on Robotics and Automation (ICRA), 2002.
[10]
G. Littlewort, M. Bartlett, I. Fasel, J. Susskind, and J. R. Movellan. An automatic system for measuring facial expression in video.Computer Vision and Image Understanding, Special Issue on Face Processing in Video, 24(6):615--625, 2006.
[11]
G. Littlewort, M. Bartlett, I. Fasel, J. Susskind, and J. R. Movellan. An automatic system for measuring facial expression in video. Image and Vision Computing., in press.
[12]
G. Littlewort, M. S. Bartlett, C. J, I. Fasel, T. Kanda, H. Ishiguro, and J. R. Movellan. Towards social robots: Automatic evaluation of human-robot interaction by face detection and expression classification. In S. Thrun, L. Saul, and B. Schoelkopf, editors, Advances in neural information processing systems, volume 16, pages 1563--1570. MIT Press, Cambridge, MA, 2004.
[13]
J. R. Movellan and I. R. Fasel. A generative framework for real time object detection and classification. Computer Vision and Image Understanding, 2005.
[14]
P. Ruvolo, I. R. Fasel, and J. R. Movellan. Auditory mood detection for social and educational robots. In ICRA, pages 3551--3556, 2008.
[15]
P. Ruvolo and J. R. Movellan. Automatic cry detection in early childhood education settings. Proceedings of ICDL, pages 204--208, 2008.
[16]
T. Salter, F. Michaud, K. Dautenhahn, D. Létourneau, and S. Caron. Recognizing interaction from a robot's perspective. In Proc. 14th IEEE Int. Workshop on Robot and Human (ROMAN), pages 178--183, 2005.
[17]
B. Scassellati. Quantitative metrics of social response for autism diagnosis. In Proc. 14th IEEE Int. Workshop on Robot and Human Interactive Communication (ROMAN), 2005.
[18]
M. Shiomi, D. Sakamoto, T. Kanda, C. T. Ishi, H. Ishiguro, and N. Hagita. A semi-autonomous communication robot: a field trial at a train station. In HRI, pages 303--310, 2008.
[19]
F. Tanaka, A. Cicourel, and J. R. Movellan. Socialization between toddlers and robots at an early childhood education center.Proceedings of the National Academy of Science, 194(46):17954--17958, 2007.
[20]
P. Viola and M. Jones. Robust real-time object detection. International Journal of Computer Vision, 2002.

Cited By

View all
  • (2012)Using group history to identify character-directed utterances in multi-child interactionsProceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue10.5555/2392800.2392838(207-216)Online publication date: 5-Jul-2012

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces
November 2009
374 pages
ISBN:9781605587721
DOI:10.1145/1647314
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. human-robot interaction
  2. multi-modal features

Qualifiers

  • Poster

Conference

ICMI-MLMI '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2012)Using group history to identify character-directed utterances in multi-child interactionsProceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue10.5555/2392800.2392838(207-216)Online publication date: 5-Jul-2012

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media