ABSTRACT
The paper proposes a technique for Human Action Recognition (HAR) that uses a Convolutional Neural Network (CNN). Depth data sequences from the motion sensing devices are converted into images and fed into a CNN rather than using any conventional or statistical method. The initial data was obtained from 10 actions performed by six subjects captured by the Kinect v2 sensor as well as 20 actions performed by 7 subjects from the MSR 3D Action data set. A custom CNN architecture consisting of three convolutional and three max pooling layers followed by a fully connected layer was used. Training, validation, and testing was carried out on a total of 39715 images. An accuracy of 97.23% was achieved on the Kinect data set. On the MSR data set the accuracy was 87.1%.
- C. Chen, R. Jafari, and N. Kehtarnavaz. Fusion of depth, skeleton, and inertial data for human action recognition. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 2712--2716. IEEE, 2016.Google ScholarDigital Library
- C. Chen, R. Jafari, and N. Kehtarnavaz. A real-time human action recognition system using depth and inertial sensor fusion. IEEE Sensors Journal, 16(3):773--781, 2016.Google ScholarCross Ref
- H. Gammulle, S. Denman, S. Sridharan, and C. Fookes. Two stream lstm: A deep fusion framework for human action recognition. In Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on, pages 177--186. IEEE, 2017.Google ScholarCross Ref
- S. Ha, J.-M. Yun, and S. Choi. Multi-modal convolutional neural networks for activity recognition. In Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on, pages 3017--3022. IEEE, 2015.Google Scholar
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.Google ScholarDigital Library
- Y. LeCun, Y. Bengio, et al. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995, 1995.Google ScholarDigital Library
- C. Li, Y. Hou, P. Wang, and W. Li. Joint distance maps based action recognition with convolutional neural networks. IEEE Signal Processing Letters, 24(5):624--628, 2017.Google ScholarCross Ref
- W. Li, Z. Zhang, and Z. Liu. Action recognition based on a bag of 3d points. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pages 9--14. IEEE, 2010.Google ScholarCross Ref
- T. Lima, B. Fernandes, and P. Barros. Human action recognition with 3d convolutional neural network. In Computational Intelligence (LA-CCI), 2017 IEEE Latin American Conference on, pages 1--6. IEEE, 2017.Google ScholarCross Ref
- J. Qin, L. Liu, Z. Zhang, Y. Wang, and L. Shao. Compressive sequential learning for action similarity labeling. IEEE Transactions on Image Processing, 25(2):756--769, 2016.Google ScholarDigital Library
- A. Shafaei and J. J. Little. Real-time human motion capture with multiple depth cameras. In Computer and Robot Vision (CRV), 2016 13th Conference on, pages 24--31. IEEE, 2016.Google ScholarCross Ref
- A. Tomas and K. Biswas. Human activity recognition using combined deep architectures. In Signal and Image Processing (ICSIP), 2017 IEEE 2nd International Conference on, pages 41--45. IEEE, 2017.Google ScholarCross Ref
- P. Vepakomma, D. De, S. K. Das, and S. Bhansali. A-wristocracy: Deep learning on wrist-worn sensing for recognition of user complex activities. In Wearable and Implantable Body Sensor Networks (BSN), 2015 IEEE 12th International Conference on, pages 1--6. IEEE, 2015.Google ScholarCross Ref
- J. Wang, Y. Chen, S. Hao, X. Peng, and L. Hu. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters, 2018.Google Scholar
- Q. Xiao and Y. Si. Human action recognition using autoencoder. In Computer and Communications (ICCC), 2017 3rd IEEE International Conference on, pages 1672--1675. IEEE, 2017.Google ScholarCross Ref
Index Terms
- Human Action Recognition Using Convolutional Neural Network and Depth Sensor Data
Recommendations
Video spatiotemporal mapping for human action recognition by convolutional neural network
AbstractIn this paper, a 2D representation of a video clip called video spatiotemporal map (VSTM) is presented. VSTM is a compact representation of a video clip which incorporates its spatial and temporal properties. It is created by vertical ...
Human action recognition based on quaternion spatial-temporal convolutional neural network and LSTM in RGB videos
Convolutional neural networks (CNN) are the state-of-the-art method for action recognition in various kinds of datasets. However, most existing CNN models are based on lower-level handcrafted features from gray or RGB image sequences from small datasets,...
Human Action Recognition using Pre-trained Convolutional Neural Networks
VSIP '20: Proceedings of the 2020 2nd International Conference on Video, Signal and Image ProcessingRecognition of human action is one of the challenges in the field of artificial intelligence. Deep learning model has become a research issue in action recognition applications due to its ability to outperform traditional machine learning approaches. ...
Comments