research-article

A multi modal approach to gesture recognition from audio and video data

Authors:
Immanuel Bayer

University of Konstanz, Konstanz, Germany

University of Konstanz, Konstanz, Germany
View Profile

,
Thierry Silbermann

University of Konstanz, Konstanz, Germany

University of Konstanz, Konstanz, Germany
View Profile

ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interactionDecember 2013Pages 461–466https://doi.org/10.1145/2522848.2532592

Published:09 December 2013Publication History

ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

Pages 461–466

ABSTRACT

We describe in this paper our approach for the Multi-modal gesture recognition challenge organized by ChaLearn in conjunction with the ICMI 2013 conference. The competition's task was to learn a vocabulary of 20 types of Italian gestures performed from different persons and to detect them in sequences. We develop an algorithm to find the gesture intervals in the audio data, extract audio features from those intervals and train two different models. We engineer features from the skeleton data and use the gesture intervals in the training data to train a model that we afterwards apply to the test sequences using a sliding window. We combine the models through weighted averaging. We find that this way to combine information from two different sources boosts the models performance significantly.

References

L. Breiman. Random forests. Machine Learning, 45(1):5--32, 2001. Google ScholarDigital Library
P. Davis, S. Mermelstein. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. volume 28, pages 357--366, 1980.Google Scholar
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189--1232, 2000.Google ScholarCross Ref
P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine learning, 63(1):3--42, 2006. Google ScholarDigital Library
S. J. Pan and Q. Yang. A survey on transfer learning. Knowledge and Data Engineering, IEEE Transactions on, 22(10):1345--1359, 2010. Google ScholarDigital Library
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825--2830, 2011. Google ScholarDigital Library

Index Terms

A multi modal approach to gesture recognition from audio and video data
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Scene understanding

Recommendations

A multi-modal gesture recognition system using audio, video, and skeletal joint data
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

This paper describes the gesture recognition system developed by the Institute for Infocomm Research (I2R) for the 2013 ICMI CHALEARN Multi-modal Gesture Recognition Challenge. The proposed system adopts a multi-modal approach for detecting as well as ...
Read More
Multi-scenario gesture recognition using Kinect
CGAMES '12: Proceedings of the 2012 17th International Conference on Computer Games: AI, Animation, Mobile, Interactive Multimedia, Educational & Serious Games (CGAMES)

Hand gesture recognition (HGR) is an important research topic because some situations require silent communication with sign languages. Computational HGR systems assist silent communication, and help people learn a sign language. In this article, a ...
Read More
Non-audio–Video Gesture Recognition Systems
Abstract
Gesture recognition as a topic in computer science and language technology has the goal of interpreting human gestures via mathematical algorithms. Gestures can originate from any bodily motion or state but commonly originate from the face or ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction
December 2013
630 pages
ISBN:9781450321297
DOI:10.1145/2522848
General Chairs:
Julien Epps
The University of New South Wales, Australia
,
Fang Chen
National ICT Australia, Australia
,
Sharon Oviatt
Incaa Designs, USA
,
Kenji Mase
Nagoya University, Japan
,
Program Chairs:
Andrew Sears
Rochester Institute of Technology, USA
,
Kristiina Jokinen
University of Helsinki, Finland
,
Björn Schuller
Technische Universität München, Germany
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 December 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
fusion
multi-modal interaction
speech and gesture recognition
Qualifiers
- research-article
Conference

Acceptance Rates
ICMI '13 Paper Acceptance Rate49of133submissions,37%Overall Acceptance Rate453of1,080submissions,42%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 13
  Total Citations
  View Citations
- 230
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A multi modal approach to gesture recognition from audio and video data

ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

A multi-modal gesture recognition system using audio, video, and skeletal joint data

Multi-scenario gesture recognition using Kinect

Non-audio–Video Gesture Recognition Systems