skip to main content
survey

Imitation Learning: A Survey of Learning Methods

Authors Info & Claims
Published:06 April 2017Publication History
Skip Abstract Section

Abstract

Imitation learning techniques aim to mimic human behavior in a given task. An agent (a learning machine) is trained to perform a task from demonstrations by learning a mapping between observations and actions. The idea of teaching by imitation has been around for many years; however, the field is gaining attention recently due to advances in computing and sensing as well as rising demand for intelligent applications. The paradigm of learning by imitation is gaining popularity because it facilitates teaching complex tasks with minimal expert knowledge of the tasks. Generic imitation learning methods could potentially reduce the problem of teaching a task to that of providing demonstrations, without the need for explicit programming or designing reward functions specific to the task. Modern sensors are able to collect and transmit high volumes of data rapidly, and processors with high computational power allow fast processing that maps the sensory data to actions in a timely manner. This opens the door for many potential AI applications that require real-time perception and reaction such as humanoid robots, self-driving vehicles, human computer interaction, and computer games, to name a few. However, specialized algorithms are needed to effectively and robustly learn models as learning by imitation poses its own set of challenges. In this article, we survey imitation learning methods and present design options in different steps of the learning process. We introduce a background and motivation for the field as well as highlight challenges specific to the imitation problem. Methods for designing and evaluating imitation learning tasks are categorized and reviewed. Special attention is given to learning methods in robotics and games as these domains are the most popular in the literature and provide a wide array of problems and methodologies. We extensively discuss combining imitation learning approaches using different sources and methods, as well as incorporating other motion learning methods to enhance imitation. We also discuss the potential impact on industry, present major applications, and highlight current and future research directions.

References

  1. Pieter Abbeel, Adam Coates, Morgan Quigley, and Andrew Y. Ng. 2007. An application of reinforcement learning to aerobatic helicopter flight. Advances in Neural Information Processing Systems 19 (2007), 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Pieter Abbeel and Andrew Y. Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st International Conference on Machine Learning. ACM, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ricardo Aler, Oscar Garcia, and José María Valls. 2005. Correcting and improving imitation models of humans for robosoccer agents. In Proceedings of the 2005 IEEE Congress on Evolutionary Computation, 2005, Vol. 3. IEEE, 2402--2409. Google ScholarGoogle ScholarCross RefCross Ref
  4. Brenna Argall, Brett Browning, and Manuela Veloso. 2007. Learning by demonstration with critique from a human teacher. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. ACM, 57--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. 2009. A survey of robot learning from demonstration. Robotics and Autonomous Systems 57, 5 (2009), 469--483. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Tamim Asfour, Pedram Azad, Florian Gyarfas, and Rüdiger Dillmann. 2008. Imitation learning of dual-arm manipulation tasks in humanoid robots. International Journal of Humanoid Robotics 5, 2 (2008), 183--202. Google ScholarGoogle ScholarCross RefCross Ref
  7. Paul Bakker and Yasuo Kuniyoshi. 1996. Robot see, robot do: An overview of robot imitation. In Proceedings of the Workshop on Learning in Robots and Animals (AISB’96). 3--11.Google ScholarGoogle Scholar
  8. Juan Pedro Bandera Rubio. 2010. Vision-Based Gesture Recognition in a Robot Learning by Imitation Framework. Ph.D. Dissertation. Universidad de Málaga.Google ScholarGoogle Scholar
  9. Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2012. The arcade learning environment: An evaluation platform for general agents. arXiv preprint arXiv:1207.4708 (2012).Google ScholarGoogle Scholar
  10. Roger Bemelmans, Gert Jan Gelderblom, Pieter Jonker, and Luc De Witte. 2012. Socially assistive robots in elderly care: A systematic review into effects and effectiveness. Journal of the American Medical Directors Association 13, 2 (2012), 114--120. Google ScholarGoogle ScholarCross RefCross Ref
  11. Yoshua Bengio. 2009. Learning deep architectures for AI. Foundations and Trends in Machine Learning 2, 1 (2009), 1--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Darrin C. Bentivegna, Christopher G. Atkeson, and Gordon Cheng. 2004. Learning tasks from observation and practice. Robotics and Autonomous Systems 47, 2 (2004), 163--169. Google ScholarGoogle ScholarCross RefCross Ref
  13. Erik Berger, Heni Ben Amor, David Vogt, and Bernhard Jung. 2008. Towards a simulator for imitation learning with kinesthetic bootstrapping. In Workshop Proceedings of International Conference on Simulation, Modeling and Programming for Autonomous Robots (SIMPAR’08). 167--173.Google ScholarGoogle Scholar
  14. Aude Billard, Sylvain Calinon, RŘdiger Dillmann, and Stefan Schaal. 2008. Robot programming by demonstration. In Springer Handbook of Robotics. Springer, 1371--1394.Google ScholarGoogle Scholar
  15. Aude Billard and Maja J. Matarić. 2001. Learning human arm movements by imitation: Evaluation of a biologically inspired connectionist architecture. Robotics and Autonomous Systems 37, 2 (2001), 145--160. Google ScholarGoogle ScholarCross RefCross Ref
  16. Josh C. Bongard and Gregory S. Hornby. 2013. Combining fitness-based search and user modeling in evolutionary robotics. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation. ACM, 159--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Tim Brys, Anna Harutyunyan, Halit Bener Suay, Sonia Chernova, Matthew E. Taylor, and Ann Nowé. 2015a. Reinforcement learning from demonstration through shaping. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’15).Google ScholarGoogle Scholar
  18. Tim Brys, Anna Harutyunyan, Matthew E. Taylor, and Ann Nowé. 2015b. Policy transfer using reward shaping. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 181--188.Google ScholarGoogle Scholar
  19. Jonas Buchli, Freek Stulp, Evangelos Theodorou, and Stefan Schaal. 2011. Learning variable impedance control. International Journal of Robotics Research 30, 7 (2011), 820--833. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lucian Busoniu, Robert Babuska, and Bart De Schutter. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 38, 2 (2008), 156--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sylvain Calinon and Aude Billard. 2007a. Incremental learning of gestures by imitation in a humanoid robot. In Proceedings of the ACM/IEEE International Conference on Human-robot Interaction. ACM, 255--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Sylvain Calinon and Aude Billard. 2008. A framework integrating statistical and social cues to teach a humanoid robot new skills. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’08), Workshop on Social Interaction with Intelligent Indoor Robots.Google ScholarGoogle Scholar
  23. Sylvain Calinon and Aude G. Billard. 2007b. What is the teachers role in robot programming by demonstration?: Toward benchmarks for improved learning. Interaction Studies 8, 3 (2007), 441--464. Google ScholarGoogle ScholarCross RefCross Ref
  24. Sylvain Calinon, Zhibin Li, Tohid Alizadeh, Nikos G. Tsagarakis, and Darwin G. Caldwell. 2012. Statistical dynamical systems for skills acquisition in humanoids. In Proceedings of the 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids’12). IEEE, 323--329. Google ScholarGoogle ScholarCross RefCross Ref
  25. Luigi Cardamone, Daniele Loiacono, and Pier Luca Lanzi. 2009. Learning drivers for TORCS through imitation using supervised methods. In Proceedings of the IEEE Symposium on Computational Intelligence and Games, 2009 (CIG’09). IEEE, 148--155. Google ScholarGoogle ScholarCross RefCross Ref
  26. Nutan Chen, Justin Bayer, Sebastian Urban, and Patrick Van Der Smagt. 2015. Efficient movement representation by embedding dynamic movement primitives in deep autoencoders. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids’15). IEEE, 434--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ran Cheng and Yaochu Jin. 2015. A social learning particle swarm optimization algorithm for scalable optimization. Information Sciences 291 (2015), 43--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Sonia Chernova and Manuela Veloso. 2007a. Confidence-based policy learning from demonstration using gaussian mixture models. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems. ACM, 233.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sonia Chernova and Manuela Veloso. 2007b. Confidence-based policy learning from demonstration using Gaussian mixture models. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems. ACM, 233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sonia Chernova and Manuela Veloso. 2008. Teaching collaborative multi-robot tasks through demonstration. In Proceedings of the 8th IEEE-RAS International Conference on Humanoid Robots, 2008 (Humanoids’08). IEEE, 385--390. Google ScholarGoogle ScholarCross RefCross Ref
  31. Dan Ciresan, Ueli Meier, and Jürgen Schmidhuber. 2012. Multi-column deep neural networks for image classification. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, 3642--3649. Google ScholarGoogle ScholarCross RefCross Ref
  32. Christopher Clark and Amos Storkey. 2015. Training deep convolutional neural networks to play go. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15). 1766--1774.Google ScholarGoogle Scholar
  33. Adam Coates, Pieter Abbeel, and Andrew Y. Ng. 2008. Learning for control from multiple demonstrations. In Proceedings of the 25th International Conference on Machine Learning. ACM, 144--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. William Curran, Tim Brys, Matthew Taylor, and William Smart. 2015. Using PCA to efficiently represent state spaces. arXiv Preprint arXiv:1505.00322 (2015).Google ScholarGoogle Scholar
  35. David B. DAmbrosio and Kenneth O. Stanley. 2013. Scalable multiagent learning through indirect encoding of policy geometry. Evolutionary Intelligence 6, 1 (2013), 1--26.Google ScholarGoogle ScholarCross RefCross Ref
  36. Hal Daumé Iii, John Langford, and Daniel Marcu. 2009. Search-based structured prediction. Machine Learning 75, 3 (2009), 297--325. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kerstin Dautenhahn and Chrystopher L. Nehaniv. 2002. The Correspondence Problem. MIT Press.Google ScholarGoogle Scholar
  38. Agostino De Santis, Bruno Siciliano, Alessandro De Luca, and Antonio Bicchi. 2008. An atlas of physical human--robot interaction. Mechanism and Machine Theory 43, 3 (2008), 253--270. Google ScholarGoogle ScholarCross RefCross Ref
  39. Yiannis Demiris and Anthony Dearden. 2005. From motor babbling to hierarchical learning by imitation: A robot developmental pathway. In Proc. of the 5th International Workshop on Epigenetic Robotics. 31--37.Google ScholarGoogle Scholar
  40. Kevin R. Dixon and Pradeep K. Khosla. 2004. Learning by observation with mobile robots: A computational approach. In Proceedings of the 2004 IEEE International Conference on Robotics and Automation, 2004 (ICRA’04). Vol. 1. IEEE, 102--107. Google ScholarGoogle ScholarCross RefCross Ref
  41. Alain Droniou, Serena Ivaldi, and Olivier Sigaud. 2014. Learning a repertoire of actions with deep neural networks. In Proceedings of the 2014 Joint IEEE International Conferences on Development and Learning and Epigenetic Robotics (ICDL-Epirob’14). IEEE, 229--234. Google ScholarGoogle ScholarCross RefCross Ref
  42. Haitham El-Hussieny, Samy F. M. Assal, A. A. Abouelsoud, Said M. Megahed, and Tsukasa Ogasawara. 2015. Incremental learning of reach-to-grasp behavior: A PSO-based Inverse optimal control approach. In Proceedings of the 2015 7th International Conference of Soft Computing and Pattern Recognition (SoCPaR’15). IEEE, 129--135. Google ScholarGoogle ScholarCross RefCross Ref
  43. David Feil-Seifer and Maja J. Mataric. 2005. Defining socially assistive robotics. In Proceedings of the 9th International Conference on Rehabilitation Robotics, 2005 (ICORR’05). IEEE, 465--468.Google ScholarGoogle Scholar
  44. Benjamin Geisler. 2002. An Empirical Study of Machine Learning Algorithms Applied to Modeling Player Behavior in a First Person Shooter Video Game. Ph.D. Dissertation. Citeseer.Google ScholarGoogle Scholar
  45. Tao Geng, Mark Lee, and Martin Hülse. 2011. Transferring human grasping synergies to a robot. Mechatronics 21, 1 (2011), 272--284. Google ScholarGoogle ScholarCross RefCross Ref
  46. Miguel González-Fierro, Carlos Balaguer, Nicola Swann, and Thrishantha Nanayakkara. 2013. A humanoid robot standing up through learning from demonstration using a multimodal reward function. In Proceedings of the 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids’13). IEEE, 74--79. Google ScholarGoogle ScholarCross RefCross Ref
  47. Bernard Gorman. 2009. Imitation Learning Through Games: Theory, Implementation and Evaluation. Ph.D. Dissertation. Dublin City University.Google ScholarGoogle Scholar
  48. Daniel H. Grollman and Aude G. Billard. 2012. Robot learning from failed demonstrations. International Journal of Social Robotics 4, 4 (2012), 331--342. Google ScholarGoogle ScholarCross RefCross Ref
  49. Frederic Gruau and Kameel Quatramaran. 1997. Cellular encoding for interactive evolutionary robotics. In 4th European Conference on Artificial Life. MIT Press, 368--377.Google ScholarGoogle Scholar
  50. Florent Guenter, Micha Hersch, Sylvain Calinon, and Aude Billard. 2007. Reinforcement learning for imitating constrained reaching movements. Advanced Robotics 21, 13 (2007), 1521--1544.Google ScholarGoogle ScholarCross RefCross Ref
  51. Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, and Xiaoshi Wang. 2014. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning. In Advances in Neural Information Processing Systems. 3338--3346.Google ScholarGoogle Scholar
  52. He He, Jason Eisner, and Hal Daume. 2012. Imitation learning by coaching. In Advances in Neural Information Processing Systems. 3149--3157.Google ScholarGoogle Scholar
  53. Philip Hingston. 2012. Believable bots. Can Computers Play Like People (2012). Google ScholarGoogle ScholarCross RefCross Ref
  54. Chih-Lyang Hwang, Bo-Lin Chen, Huei-Ting Syu, Chao-Kuei Wang, and Mansour Karkoub. 2016. Humanoid robot’s visual imitation of 3-D motion of a human subject using neural-network-based inverse kinematics. IEEE Systems Journal 10, 2 (2016), 685--696. Google ScholarGoogle ScholarCross RefCross Ref
  55. Auke Jan Ijspeert, Jun Nakanishi, Heiko Hoffmann, Peter Pastor, and Stefan Schaal. 2013. Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Computation 25, 2 (2013), 328--373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Auke Jan Ijspeert, Jun Nakanishi, and Stefan Schaal. 2002a. Learning Attractor Landscapes for Learning Motor Primitives. Technical Report.Google ScholarGoogle Scholar
  57. Auke Jan Ijspeert, Jun Nakanishi, and Stefan Schaal. 2002b. Learning rhythmic movements by demonstration using nonlinear oscillators. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’02). 958--963. Google ScholarGoogle ScholarCross RefCross Ref
  58. Shuhei Ikemoto, Heni Ben Amor, Takashi Minato, Bernhard Jung, and Hiroshi Ishiguro. 2012. Physical human-robot interaction: Mutual learning and adaptation. IEEE Robotics 8 Automation Magazine 19, 4 (2012), 24--35.Google ScholarGoogle Scholar
  59. Shuo Jin, Chengkai Dai, Yang Liu, and Charlie CL Wang. 2016. Motion imitation based on sparsely sampled correspondence. arXiv Preprint arXiv:1607.04907 (2016).Google ScholarGoogle Scholar
  60. Kshitij Judah, Alan Fern, and Thomas G Dietterich. 2012. Active imitation learning via reduction to iid active learning. arXiv Preprint arXiv:1210.4876 (2012).Google ScholarGoogle Scholar
  61. S. Mohammad Khansari-Zadeh and Aude Billard. 2011. Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Transactions on Robotics 27, 5 (2011), 943--957. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Beomjoon Kim, Amir Massoud Farahmand, Joelle Pineau, and Doina Precup. 2013. Learning from limited demonstrations. In Advances in Neural Information Processing Systems. 2859--2867.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Jens Kober, J. Andrew Bagnell, and Jan Peters. 2013. Reinforcement learning in robotics: A survey. International Journal of Robotics Research 32, 11 (2013), 1238--1274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Jens Kober, Betty Mohler, and Jan Peters. 2008. Learning perceptual coupling for motor primitives. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008 (IROS’08). IEEE, 834--839. Google ScholarGoogle ScholarCross RefCross Ref
  65. Jens Kober and Jan Peters. 2009a. Learning motor primitives for robotics. In Proceedings of the IEEE International Conference on Robotics and Automation, 2009 (ICRA’09). IEEE, 2112--2118. Google ScholarGoogle ScholarCross RefCross Ref
  66. Jens Kober and Jan Peters. 2010. Imitation and reinforcement learning. IEEE Robotics 8 Automation Magazine 17, 2 (2010), 55--62.Google ScholarGoogle ScholarCross RefCross Ref
  67. Jens Kober and Jan Peters. 2014. Movement templates for learning of hitting and batting. In Learning Motor Skills. Springer, 69--82. Google ScholarGoogle ScholarCross RefCross Ref
  68. Jens Kober and Jan R. Peters. 2009b. Policy search for motor primitives in robotics. In Advances in Neural Information Processing Systems. 849--856.Google ScholarGoogle Scholar
  69. Jens Kober and Jan R. Peters. 2009c. Policy search for motor primitives in robotics. In Advances in Neural Information Processing Systems. 849--856.Google ScholarGoogle Scholar
  70. Nate Kohl and Peter Stone. 2004. Policy gradient reinforcement learning for fast quadrupedal locomotion. In Proceedings of the 2004 IEEE International Conference on Robotics and Automation, 2004 (ICRA’04), Vol. 3. IEEE, 2619--2624. Google ScholarGoogle ScholarCross RefCross Ref
  71. Jan Koutník, Giuseppe Cuccu, Jürgen Schmidhuber, and Faustino Gomez. 2013. Evolving large-scale neural networks for vision-based reinforcement learning. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation. ACM, 1061--1068. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Gregory Kuhlmann and Peter Stone. 2007. Graph-based domain mapping for transfer learning in general games. In Machine Learning (ECML’07). Springer, 188--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Hoang M. Le, Andrew Kang, Yisong Yue, and Peter Carr. 2016. Smooth Imitation Learning for Online Sequence Prediction. In Proceedings of the 33rd International Conference on Machine Learning.Google ScholarGoogle Scholar
  75. Geoffrey Lee, Min Luo, Fabio Zambetta, and Xiaodong Li. 2014. Learning a super mario controller from examples of human play. In Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC’14). IEEE, 1--8. Google ScholarGoogle ScholarCross RefCross Ref
  76. Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. 2015. End-to-end training of deep visuomotor policies. arXiv Preprint arXiv:1504.00702 (2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Sergey Levine and Vladlen Koltun. 2013. Guided policy search. In Proceedings of the 30th International Conference on Machine Learning. 1--9.Google ScholarGoogle Scholar
  78. Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv Preprint arXiv:1509.02971 (2015).Google ScholarGoogle Scholar
  79. Hsien-I Lin, Yu-Cheng Liu, and Chi-Li Chen. 2011. Evaluation of human-robot arm movement imitation. In Proceedings of the 2011 8th Asian Control Conference (ASCC’11). IEEE, 287--292.Google ScholarGoogle Scholar
  80. Long Ji Lin. 1991. Programming robots using reinforcement learning and teaching. In Proceedings of the Ninth National Conference on Artificial Intelligence - Volume 2 (AAAI’91). 781--786.Google ScholarGoogle Scholar
  81. Long-Ji Lin. 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 3--4 (1992), 293--321.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Maja J. Mataric. 2000a. Getting humanoids to move and imitate. IEEE Intelligent Systems 15, 4 (2000), 18--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Maja J. Mataric. 2000b. Sensory-motor primitives as a basis for imitation: Linking perception to action and biology to robotics. In Imitation in Animals and Artifacts. Citeseer.Google ScholarGoogle Scholar
  84. Hermann Mayer, Faustino Gomez, Daan Wierstra, Istvan Nagy, Alois Knoll, and Jürgen Schmidhuber. 2008. A system for robotic heart surgery that learns to tie knots using recurrent neural networks. Advanced Robotics 22, 13--14 (2008), 1521--1537.Google ScholarGoogle ScholarCross RefCross Ref
  85. Hua-Qing Min, Jin-Hui Zhu, and Xi-Jing Zheng. 2005. Obstacle avoidance with multi-objective optimization by PSO in dynamic environment. In Proceedings of the 2005 International Conference on Machine Learning and Cybernetics, Vol. 5. IEEE, 2950--2956.Google ScholarGoogle Scholar
  86. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. arXiv Preprint arXiv:1602.01783 (2016).Google ScholarGoogle Scholar
  87. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533. Google ScholarGoogle ScholarCross RefCross Ref
  88. Yasser Mohammad and Toyoaki Nishida. 2012. Fluid imitation. International Journal of Social Robotics 4, 4 (2012), 369--382. Google ScholarGoogle ScholarCross RefCross Ref
  89. Yasser Mohammad and Toyoaki Nishida. 2013. Tackling the correspondence problem. In Proceedings of the International Conference on Active Media Technology. Springer, 84--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Katharina Mülling, Jens Kober, Oliver Kroemer, and Jan Peters. 2013. Learning to select and generalize striking movements in robot table tennis. International Journal of Robotics Research 32, 3 (2013), 263--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Jorge Munoz, German Gutierrez, and Araceli Sanchis. 2009. Controller for torcs created by imitation. In Proceedings of the IEEE Symposium on Computational Intelligence and Games, 2009 (CIG’09). IEEE, 271--278.Google ScholarGoogle ScholarCross RefCross Ref
  92. Jorge Muñoz, German Gutierrez, and Araceli Sanchis. 2010. A human-like TORCS controller for the simulated car racing championship. In Proceedings of the 2010 IEEE Symposium on Computational Intelligence and Games (CIG’10). IEEE, 473--480.Google ScholarGoogle ScholarCross RefCross Ref
  93. Jun Nakanishi, Jun Morimoto, Gen Endo, Gordon Cheng, Stefan Schaal, and Mitsuo Kawato. 2004. Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems 47, 2 (2004), 79--91. Google ScholarGoogle ScholarCross RefCross Ref
  94. Andrew Y. Ng, Adam Coates, Mark Diel, Varun Ganapathi, Jamie Schulte, Ben Tse, Eric Berger, and Eric Liang. 2006. Autonomous inverted helicopter flight via reinforcement learning. In Experimental Robotics IX. Springer, 363--372. Google ScholarGoogle ScholarCross RefCross Ref
  95. Andrew Y. Ng, Daishi Harada, and Stuart Russell. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the Sixteenth International Conference on Machine Learning, Vol. 99. 278--287.Google ScholarGoogle Scholar
  96. Monica N. Nicolescu and Maja J. Mataric. 2003. Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems. ACM, 241--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Scott Niekum, Sachin Chitta, Andrew G. Barto, Bhaskara Marthi, and Sarah Osentoski. 2013. Incremental semantically grounded learning from demonstration. In Robotics: Science and Systems, Vol. 9. Google ScholarGoogle ScholarCross RefCross Ref
  98. Stefano Nolfi and Dario Floreano. 2000. Evolutionary Robotics: The Biology, Intelligence, and Technology. MIT Press, Cambridge, MA, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Mark Ollis, Wesley H. Huang, and Michael Happold. 2007. A Bayesian approach to imitation learning for robot navigation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007 (IROS’07). IEEE, 709--714. Google ScholarGoogle ScholarCross RefCross Ref
  100. Juan Ortega, Noor Shaker, Julian Togelius, and Georgios N. Yannakakis. 2013. Imitating human playing styles in super mario bros. Entertainment Computing 4, 2 (2013), 93--104. Google ScholarGoogle ScholarCross RefCross Ref
  101. Erhan Oztop and Michael A. Arbib. 2002. Schema design and implementation of the grasp-related mirror neuron system. Biological Cybernetics 87, 2 (2002), 116--140. Google ScholarGoogle ScholarCross RefCross Ref
  102. Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2010), 1345--1359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Peter Pastor, Mrinal Kalakrishnan, Sachin Chitta, Evangelos Theodorou, and Stefan Schaal. 2011. Skill learning and task outcome prediction for manipulation. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA’11). IEEE, 3828--3834. Google ScholarGoogle ScholarCross RefCross Ref
  104. Peter Pastor, Mrinal Kalakrishnan, Franziska Meier, Freek Stulp, Jonas Buchli, Evangelos Theodorou, and Stefan Schaal. 2013. From dynamic movement primitives to associative skill memories. Robotics and Autonomous Systems 61, 4 (2013), 351--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Jan Peters and Stefan Schaal. 2008. Reinforcement learning of motor skills with policy gradients. Neural Networks 21, 4 (2008), 682--697. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Dean Pomerleau. 1995. Neural network vision for robot driving. In The Handbook of Brain Theory and Neural Networks, M. Arbib (Ed.).Google ScholarGoogle Scholar
  107. Polly K. Pook and Dana H. Ballard. 1993. Recognizing teleoperated manipulations. In Proceedings of the 1993 IEEE International Conference on Robotics and Automation, 1993. IEEE, 578--585. Google ScholarGoogle ScholarCross RefCross Ref
  108. Bob Price and Craig Boutilier. 1999. Implicit imitation in multiagent reinforcement learning. In Proceedings of the Sixteenth International Conference on Machine Learning (ICML’99). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 325--334.Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. Rouhollah Rahmatizadeh, Pooya Abolghasemi, and Ladislau Bölöni. 2016. Learning manipulation trajectories using recurrent neural networks. arXiv Preprint arXiv:1603.03833 (2016).Google ScholarGoogle Scholar
  110. Jette Randlov and Preben Alstrom. 1998. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning. 463--471.Google ScholarGoogle Scholar
  111. Nathan Ratliff, David Bradley, J. Andrew Bagnell, and Joel Chestnutt. 2007. Boosting structured prediction for imitation learning. Robotics Institute (2007), 54.Google ScholarGoogle Scholar
  112. Saleha Raza, Sajjad Haider, and M.-A. Williams. 2012. Teaching coordinated strategies to soccer robots via imitation. In Proceedings of the 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO’12). IEEE, 1434--1439. Google ScholarGoogle ScholarCross RefCross Ref
  113. Nizar Rokbani, Abdallah Zaidi, and Adel M. Alimi. 2012. Prototyping a biped robot using an educational robotics kit. In Proceedings of the 2012 International Conference on Education and e-Learning Innovations (ICEELI’12). IEEE, 1--4. Google ScholarGoogle ScholarCross RefCross Ref
  114. Stéphane Ross and Drew Bagnell. 2010. Efficient reductions for imitation learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 661--668.Google ScholarGoogle Scholar
  115. Stéphane Ross, Geoffrey J. Gordon, and J. Andrew Bagnell. 2010. A reduction of imitation learning and structured prediction to no-regret online learning. arXiv Preprint arXiv:1011.0686 (2010).Google ScholarGoogle Scholar
  116. Leonel Rozo, Danilo Bruno, Sylvain Calinon, and Darwin G. Caldwell. 2015. Learning optimal controllers in human-robot cooperative transportation tasks with position and force constraints. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’15). IEEE, 1024--1030. Google ScholarGoogle ScholarCross RefCross Ref
  117. Leonel Rozo, Pablo Jiménez, and Carme Torras. 2013. A robot learning from demonstration framework to perform force-based manipulation tasks. Intelligent Service Robotics 6, 1 (2013), 33--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. Leonel Rozo Castañeda, Sylvain Calinon, Darwin Caldwell, Pablo Jimenez Schlegl, and Carme Torras. 2013. Learning collaborative impedance-based robot behaviors. In Proceedings of the 27th AAAI Conference on Artificial Intelligence. 1422--1428.Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. Stuart J. Russell and Peter Norvig. 2003. Artificial Intelligence: A Modern Approach (2nd ed.). Pearson Education.Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. Claude Sammut, Scott Hurst, Dana Kedzier, Donald Michie, and others. 2014. Learning to fly. In Proceedings of the 9th International Workshop on Machine Learning. 385--393.Google ScholarGoogle Scholar
  121. Joe Saunders, Chrystopher L. Nehaniv, and Kerstin Dautenhahn. 2006. Teaching robots by moulding behavior and scaffolding the environment. In Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction. ACM, 118--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. Stefan Schaal. 1999. Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences 3, 6 (1999), 233--242. Google ScholarGoogle ScholarCross RefCross Ref
  123. Stefan Schaal. 1997. Learning from Demonstration. In Advances in Neural Information Processing Systems 9, M. C. Mozer, M. I. Jordan, and T. Petsche (Eds.). MIT Press, 1040--1046.Google ScholarGoogle Scholar
  124. Stefan Schaal, Auke Ijspeert, and Aude Billard. 2003. Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society B: Biological Sciences 358, 1431 (2003), 537--547. Google ScholarGoogle ScholarCross RefCross Ref
  125. Stefan Schaal, Peyman Mohajerian, and Auke Ijspeert. 2007. Dynamics systems vs. optimal control: A unifying view. Progress in Brain Research 165 (2007), 425--445. Google ScholarGoogle ScholarCross RefCross Ref
  126. Tom Schaul, Julian Togelius, and Jürgen Schmidhuber. 2011. Measuring intelligence through games. arXiv Preprint arXiv:1109.1314 (2011).Google ScholarGoogle Scholar
  127. Yoav Shoham, Rob Powers, and Trond Grenager. 2003. Multi-Agent Reinforcement Learning: A Critical Survey. Technical Report. Stanford University.Google ScholarGoogle Scholar
  128. Aaron P. Shon, David B. Grimes, Chris L. Baker, Matthew W. Hoffman, Shengli Zhou, and Rajesh P. N. Rao. 2005. Probabilistic gaze imitation and saliency learning in a robotic head. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation. IEEE, 2865--2870. Google ScholarGoogle ScholarCross RefCross Ref
  129. David Silver, James Bagnell, and Anthony Stentz. 2008. High performance outdoor navigation from overhead data using imitation learning. In Robotics: Science and Systems IV. Google ScholarGoogle ScholarCross RefCross Ref
  130. David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484--489. Google ScholarGoogle ScholarCross RefCross Ref
  131. Tsung-Ying Sun, Chih-Li Huo, Shang-Jeng Tsai, and Chan-Cheng Liu. 2008. Optimal UAV flight path planning using skeletonization and particle swarm optimizer. In Proceedings of the 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence). IEEE, 1183--1188.Google ScholarGoogle Scholar
  132. Huan Tan. 2015. A behavior generation framework for robots to learn from demonstrations. In Proceedings of the 2015 IEEE International Conference on Systems, Man, and Cybernetics (SMC’15). IEEE, 947--953. Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. Adriana Tapus, Cristian Tapus, and Maja J. Mataric. 2009. The use of socially assistive robots in the design of intelligent cognitive therapies for people with dementia. In Proceedings of the IEEE International Conference on Rehabilitation Robotics, 2009 (ICORR’09). IEEE, 924--929. Google ScholarGoogle ScholarCross RefCross Ref
  134. Christian Thurau, Christian Bauckhage, and Gerhard Sagerer. 2004a. Imitation learning at all levels of game-AI. In Proceedings of the International Conference on Computer Games, Artificial Intelligence, Design and Education, Vol. 5.Google ScholarGoogle Scholar
  135. Christian Thurau, Christian Bauckhage, and Gerhard Sagerer. 2004b. Learning human-like movement behavior for computer games. In Proceedings of the International Conference on the Simulation of Adaptive Behavior. 315--323.Google ScholarGoogle Scholar
  136. Julian Togelius, Renzo De Nardi, and Simon M. Lucas. 2007. Towards automatic personalised content creation for racing games. In Proceedings of the IEEE Symposium on Computational Intelligence and Games, 2007 (CIG’07). IEEE, 252--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. Lisa Torrey and Jude Shavlik. 2009. Transfer learning. Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques 1 (2009), 242.Google ScholarGoogle Scholar
  138. Lisa Torrey, Trevor Walker, Jude Shavlik, and Richard Maclin. 2005. Using advice to transfer knowledge acquired in one reinforcement learning task to another. In Machine Learning (ECML’05). Springer, 412--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. Aleš Ude, Christopher G. Atkeson, and Marcia Riley. 2004. Programming full-body movements for humanoid robots by observation. Robotics and Autonomous Systems 47, 2 (2004), 93--108. Google ScholarGoogle ScholarCross RefCross Ref
  140. Andreas Vlachos. 2012. An investigation of imitation learning algorithms for structured prediction. In Proceedings of the European Workshop on Reinforcement Learning (EWRL). Citeseer, 143--154.Google ScholarGoogle Scholar
  141. David Vogt, Heni Ben Amor, Erik Berger, and Bernhard Jung. 2014. Learning two-person interaction models for responsive synthetic humanoids. Journal of Virtual Reality and Broadcasting 11, 1 (2014).Google ScholarGoogle Scholar
  142. Markus Wulfmeier, Peter Ondruska, and Ingmar Posner. 2015. Maximum entropy deep inverse reinforcement learning. arXiv Preprint arXiv:1507.04888 (2015).Google ScholarGoogle Scholar
  143. Chao Zhang, Ziyang Zhen, Daobo Wang, and Meng Li. 2010. UAV path planning method based on ant colony optimization. In Proceedings of the 2010 Chinese Control and Decision Conference. IEEE, 3790--3792. Google ScholarGoogle ScholarCross RefCross Ref
  144. Marvin Zhang, Zoe McCarthy, Chelsea Finn, Sergey Levine, and Pieter Abbeel. 2016. Learning deep neural network policies with continuous memory states. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA’16). IEEE, 520--527. Google ScholarGoogle ScholarDigital LibraryDigital Library
  145. Yudong Zhang, Shuihua Wang, and Genlin Ji. 2015. A comprehensive survey on particle swarm optimization algorithm and its applications. Mathematical Problems in Engineering 2015 (2015). Google ScholarGoogle ScholarCross RefCross Ref
  146. Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maximum entropy inverse reinforcement learning. In 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08. 1433--1438.Google ScholarGoogle Scholar

Index Terms

  1. Imitation Learning: A Survey of Learning Methods

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  • Published in

                    cover image ACM Computing Surveys
                    ACM Computing Surveys  Volume 50, Issue 2
                    March 2018
                    567 pages
                    ISSN:0360-0300
                    EISSN:1557-7341
                    DOI:10.1145/3071073
                    • Editor:
                    • Sartaj Sahni
                    Issue’s Table of Contents

                    Copyright © 2017 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 6 April 2017
                    • Accepted: 1 January 2017
                    • Revised: 1 December 2016
                    • Received: 1 April 2016
                    Published in csur Volume 50, Issue 2

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • survey
                    • Research
                    • Refereed

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader