Abstract
Imitation learning techniques aim to mimic human behavior in a given task. An agent (a learning machine) is trained to perform a task from demonstrations by learning a mapping between observations and actions. The idea of teaching by imitation has been around for many years; however, the field is gaining attention recently due to advances in computing and sensing as well as rising demand for intelligent applications. The paradigm of learning by imitation is gaining popularity because it facilitates teaching complex tasks with minimal expert knowledge of the tasks. Generic imitation learning methods could potentially reduce the problem of teaching a task to that of providing demonstrations, without the need for explicit programming or designing reward functions specific to the task. Modern sensors are able to collect and transmit high volumes of data rapidly, and processors with high computational power allow fast processing that maps the sensory data to actions in a timely manner. This opens the door for many potential AI applications that require real-time perception and reaction such as humanoid robots, self-driving vehicles, human computer interaction, and computer games, to name a few. However, specialized algorithms are needed to effectively and robustly learn models as learning by imitation poses its own set of challenges. In this article, we survey imitation learning methods and present design options in different steps of the learning process. We introduce a background and motivation for the field as well as highlight challenges specific to the imitation problem. Methods for designing and evaluating imitation learning tasks are categorized and reviewed. Special attention is given to learning methods in robotics and games as these domains are the most popular in the literature and provide a wide array of problems and methodologies. We extensively discuss combining imitation learning approaches using different sources and methods, as well as incorporating other motion learning methods to enhance imitation. We also discuss the potential impact on industry, present major applications, and highlight current and future research directions.
- Pieter Abbeel, Adam Coates, Morgan Quigley, and Andrew Y. Ng. 2007. An application of reinforcement learning to aerobatic helicopter flight. Advances in Neural Information Processing Systems 19 (2007), 1.Google ScholarDigital Library
- Pieter Abbeel and Andrew Y. Ng. 2004. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the 21st International Conference on Machine Learning. ACM, 1. Google ScholarDigital Library
- Ricardo Aler, Oscar Garcia, and José María Valls. 2005. Correcting and improving imitation models of humans for robosoccer agents. In Proceedings of the 2005 IEEE Congress on Evolutionary Computation, 2005, Vol. 3. IEEE, 2402--2409. Google ScholarCross Ref
- Brenna Argall, Brett Browning, and Manuela Veloso. 2007. Learning by demonstration with critique from a human teacher. In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction. ACM, 57--64. Google ScholarDigital Library
- Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. 2009. A survey of robot learning from demonstration. Robotics and Autonomous Systems 57, 5 (2009), 469--483. Google ScholarDigital Library
- Tamim Asfour, Pedram Azad, Florian Gyarfas, and Rüdiger Dillmann. 2008. Imitation learning of dual-arm manipulation tasks in humanoid robots. International Journal of Humanoid Robotics 5, 2 (2008), 183--202. Google ScholarCross Ref
- Paul Bakker and Yasuo Kuniyoshi. 1996. Robot see, robot do: An overview of robot imitation. In Proceedings of the Workshop on Learning in Robots and Animals (AISB’96). 3--11.Google Scholar
- Juan Pedro Bandera Rubio. 2010. Vision-Based Gesture Recognition in a Robot Learning by Imitation Framework. Ph.D. Dissertation. Universidad de Málaga.Google Scholar
- Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2012. The arcade learning environment: An evaluation platform for general agents. arXiv preprint arXiv:1207.4708 (2012).Google Scholar
- Roger Bemelmans, Gert Jan Gelderblom, Pieter Jonker, and Luc De Witte. 2012. Socially assistive robots in elderly care: A systematic review into effects and effectiveness. Journal of the American Medical Directors Association 13, 2 (2012), 114--120. Google ScholarCross Ref
- Yoshua Bengio. 2009. Learning deep architectures for AI. Foundations and Trends in Machine Learning 2, 1 (2009), 1--127. Google ScholarDigital Library
- Darrin C. Bentivegna, Christopher G. Atkeson, and Gordon Cheng. 2004. Learning tasks from observation and practice. Robotics and Autonomous Systems 47, 2 (2004), 163--169. Google ScholarCross Ref
- Erik Berger, Heni Ben Amor, David Vogt, and Bernhard Jung. 2008. Towards a simulator for imitation learning with kinesthetic bootstrapping. In Workshop Proceedings of International Conference on Simulation, Modeling and Programming for Autonomous Robots (SIMPAR’08). 167--173.Google Scholar
- Aude Billard, Sylvain Calinon, RŘdiger Dillmann, and Stefan Schaal. 2008. Robot programming by demonstration. In Springer Handbook of Robotics. Springer, 1371--1394.Google Scholar
- Aude Billard and Maja J. Matarić. 2001. Learning human arm movements by imitation: Evaluation of a biologically inspired connectionist architecture. Robotics and Autonomous Systems 37, 2 (2001), 145--160. Google ScholarCross Ref
- Josh C. Bongard and Gregory S. Hornby. 2013. Combining fitness-based search and user modeling in evolutionary robotics. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation. ACM, 159--166. Google ScholarDigital Library
- Tim Brys, Anna Harutyunyan, Halit Bener Suay, Sonia Chernova, Matthew E. Taylor, and Ann Nowé. 2015a. Reinforcement learning from demonstration through shaping. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’15).Google Scholar
- Tim Brys, Anna Harutyunyan, Matthew E. Taylor, and Ann Nowé. 2015b. Policy transfer using reward shaping. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 181--188.Google Scholar
- Jonas Buchli, Freek Stulp, Evangelos Theodorou, and Stefan Schaal. 2011. Learning variable impedance control. International Journal of Robotics Research 30, 7 (2011), 820--833. Google ScholarDigital Library
- Lucian Busoniu, Robert Babuska, and Bart De Schutter. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 38, 2 (2008), 156--172. Google ScholarDigital Library
- Sylvain Calinon and Aude Billard. 2007a. Incremental learning of gestures by imitation in a humanoid robot. In Proceedings of the ACM/IEEE International Conference on Human-robot Interaction. ACM, 255--262. Google ScholarDigital Library
- Sylvain Calinon and Aude Billard. 2008. A framework integrating statistical and social cues to teach a humanoid robot new skills. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’08), Workshop on Social Interaction with Intelligent Indoor Robots.Google Scholar
- Sylvain Calinon and Aude G. Billard. 2007b. What is the teachers role in robot programming by demonstration?: Toward benchmarks for improved learning. Interaction Studies 8, 3 (2007), 441--464. Google ScholarCross Ref
- Sylvain Calinon, Zhibin Li, Tohid Alizadeh, Nikos G. Tsagarakis, and Darwin G. Caldwell. 2012. Statistical dynamical systems for skills acquisition in humanoids. In Proceedings of the 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids’12). IEEE, 323--329. Google ScholarCross Ref
- Luigi Cardamone, Daniele Loiacono, and Pier Luca Lanzi. 2009. Learning drivers for TORCS through imitation using supervised methods. In Proceedings of the IEEE Symposium on Computational Intelligence and Games, 2009 (CIG’09). IEEE, 148--155. Google ScholarCross Ref
- Nutan Chen, Justin Bayer, Sebastian Urban, and Patrick Van Der Smagt. 2015. Efficient movement representation by embedding dynamic movement primitives in deep autoencoders. In 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids’15). IEEE, 434--440. Google ScholarDigital Library
- Ran Cheng and Yaochu Jin. 2015. A social learning particle swarm optimization algorithm for scalable optimization. Information Sciences 291 (2015), 43--60. Google ScholarDigital Library
- Sonia Chernova and Manuela Veloso. 2007a. Confidence-based policy learning from demonstration using gaussian mixture models. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems. ACM, 233.Google ScholarDigital Library
- Sonia Chernova and Manuela Veloso. 2007b. Confidence-based policy learning from demonstration using Gaussian mixture models. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems. ACM, 233. Google ScholarDigital Library
- Sonia Chernova and Manuela Veloso. 2008. Teaching collaborative multi-robot tasks through demonstration. In Proceedings of the 8th IEEE-RAS International Conference on Humanoid Robots, 2008 (Humanoids’08). IEEE, 385--390. Google ScholarCross Ref
- Dan Ciresan, Ueli Meier, and Jürgen Schmidhuber. 2012. Multi-column deep neural networks for image classification. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). IEEE, 3642--3649. Google ScholarCross Ref
- Christopher Clark and Amos Storkey. 2015. Training deep convolutional neural networks to play go. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15). 1766--1774.Google Scholar
- Adam Coates, Pieter Abbeel, and Andrew Y. Ng. 2008. Learning for control from multiple demonstrations. In Proceedings of the 25th International Conference on Machine Learning. ACM, 144--151. Google ScholarDigital Library
- William Curran, Tim Brys, Matthew Taylor, and William Smart. 2015. Using PCA to efficiently represent state spaces. arXiv Preprint arXiv:1505.00322 (2015).Google Scholar
- David B. DAmbrosio and Kenneth O. Stanley. 2013. Scalable multiagent learning through indirect encoding of policy geometry. Evolutionary Intelligence 6, 1 (2013), 1--26.Google ScholarCross Ref
- Hal Daumé Iii, John Langford, and Daniel Marcu. 2009. Search-based structured prediction. Machine Learning 75, 3 (2009), 297--325. Google ScholarDigital Library
- Kerstin Dautenhahn and Chrystopher L. Nehaniv. 2002. The Correspondence Problem. MIT Press.Google Scholar
- Agostino De Santis, Bruno Siciliano, Alessandro De Luca, and Antonio Bicchi. 2008. An atlas of physical human--robot interaction. Mechanism and Machine Theory 43, 3 (2008), 253--270. Google ScholarCross Ref
- Yiannis Demiris and Anthony Dearden. 2005. From motor babbling to hierarchical learning by imitation: A robot developmental pathway. In Proc. of the 5th International Workshop on Epigenetic Robotics. 31--37.Google Scholar
- Kevin R. Dixon and Pradeep K. Khosla. 2004. Learning by observation with mobile robots: A computational approach. In Proceedings of the 2004 IEEE International Conference on Robotics and Automation, 2004 (ICRA’04). Vol. 1. IEEE, 102--107. Google ScholarCross Ref
- Alain Droniou, Serena Ivaldi, and Olivier Sigaud. 2014. Learning a repertoire of actions with deep neural networks. In Proceedings of the 2014 Joint IEEE International Conferences on Development and Learning and Epigenetic Robotics (ICDL-Epirob’14). IEEE, 229--234. Google ScholarCross Ref
- Haitham El-Hussieny, Samy F. M. Assal, A. A. Abouelsoud, Said M. Megahed, and Tsukasa Ogasawara. 2015. Incremental learning of reach-to-grasp behavior: A PSO-based Inverse optimal control approach. In Proceedings of the 2015 7th International Conference of Soft Computing and Pattern Recognition (SoCPaR’15). IEEE, 129--135. Google ScholarCross Ref
- David Feil-Seifer and Maja J. Mataric. 2005. Defining socially assistive robotics. In Proceedings of the 9th International Conference on Rehabilitation Robotics, 2005 (ICORR’05). IEEE, 465--468.Google Scholar
- Benjamin Geisler. 2002. An Empirical Study of Machine Learning Algorithms Applied to Modeling Player Behavior in a First Person Shooter Video Game. Ph.D. Dissertation. Citeseer.Google Scholar
- Tao Geng, Mark Lee, and Martin Hülse. 2011. Transferring human grasping synergies to a robot. Mechatronics 21, 1 (2011), 272--284. Google ScholarCross Ref
- Miguel González-Fierro, Carlos Balaguer, Nicola Swann, and Thrishantha Nanayakkara. 2013. A humanoid robot standing up through learning from demonstration using a multimodal reward function. In Proceedings of the 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids’13). IEEE, 74--79. Google ScholarCross Ref
- Bernard Gorman. 2009. Imitation Learning Through Games: Theory, Implementation and Evaluation. Ph.D. Dissertation. Dublin City University.Google Scholar
- Daniel H. Grollman and Aude G. Billard. 2012. Robot learning from failed demonstrations. International Journal of Social Robotics 4, 4 (2012), 331--342. Google ScholarCross Ref
- Frederic Gruau and Kameel Quatramaran. 1997. Cellular encoding for interactive evolutionary robotics. In 4th European Conference on Artificial Life. MIT Press, 368--377.Google Scholar
- Florent Guenter, Micha Hersch, Sylvain Calinon, and Aude Billard. 2007. Reinforcement learning for imitating constrained reaching movements. Advanced Robotics 21, 13 (2007), 1521--1544.Google ScholarCross Ref
- Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, and Xiaoshi Wang. 2014. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning. In Advances in Neural Information Processing Systems. 3338--3346.Google Scholar
- He He, Jason Eisner, and Hal Daume. 2012. Imitation learning by coaching. In Advances in Neural Information Processing Systems. 3149--3157.Google Scholar
- Philip Hingston. 2012. Believable bots. Can Computers Play Like People (2012). Google ScholarCross Ref
- Chih-Lyang Hwang, Bo-Lin Chen, Huei-Ting Syu, Chao-Kuei Wang, and Mansour Karkoub. 2016. Humanoid robot’s visual imitation of 3-D motion of a human subject using neural-network-based inverse kinematics. IEEE Systems Journal 10, 2 (2016), 685--696. Google ScholarCross Ref
- Auke Jan Ijspeert, Jun Nakanishi, Heiko Hoffmann, Peter Pastor, and Stefan Schaal. 2013. Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Computation 25, 2 (2013), 328--373. Google ScholarDigital Library
- Auke Jan Ijspeert, Jun Nakanishi, and Stefan Schaal. 2002a. Learning Attractor Landscapes for Learning Motor Primitives. Technical Report.Google Scholar
- Auke Jan Ijspeert, Jun Nakanishi, and Stefan Schaal. 2002b. Learning rhythmic movements by demonstration using nonlinear oscillators. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’02). 958--963. Google ScholarCross Ref
- Shuhei Ikemoto, Heni Ben Amor, Takashi Minato, Bernhard Jung, and Hiroshi Ishiguro. 2012. Physical human-robot interaction: Mutual learning and adaptation. IEEE Robotics 8 Automation Magazine 19, 4 (2012), 24--35.Google Scholar
- Shuo Jin, Chengkai Dai, Yang Liu, and Charlie CL Wang. 2016. Motion imitation based on sparsely sampled correspondence. arXiv Preprint arXiv:1607.04907 (2016).Google Scholar
- Kshitij Judah, Alan Fern, and Thomas G Dietterich. 2012. Active imitation learning via reduction to iid active learning. arXiv Preprint arXiv:1210.4876 (2012).Google Scholar
- S. Mohammad Khansari-Zadeh and Aude Billard. 2011. Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Transactions on Robotics 27, 5 (2011), 943--957. Google ScholarDigital Library
- Beomjoon Kim, Amir Massoud Farahmand, Joelle Pineau, and Doina Precup. 2013. Learning from limited demonstrations. In Advances in Neural Information Processing Systems. 2859--2867.Google ScholarDigital Library
- Jens Kober, J. Andrew Bagnell, and Jan Peters. 2013. Reinforcement learning in robotics: A survey. International Journal of Robotics Research 32, 11 (2013), 1238--1274. Google ScholarDigital Library
- Jens Kober, Betty Mohler, and Jan Peters. 2008. Learning perceptual coupling for motor primitives. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008 (IROS’08). IEEE, 834--839. Google ScholarCross Ref
- Jens Kober and Jan Peters. 2009a. Learning motor primitives for robotics. In Proceedings of the IEEE International Conference on Robotics and Automation, 2009 (ICRA’09). IEEE, 2112--2118. Google ScholarCross Ref
- Jens Kober and Jan Peters. 2010. Imitation and reinforcement learning. IEEE Robotics 8 Automation Magazine 17, 2 (2010), 55--62.Google ScholarCross Ref
- Jens Kober and Jan Peters. 2014. Movement templates for learning of hitting and batting. In Learning Motor Skills. Springer, 69--82. Google ScholarCross Ref
- Jens Kober and Jan R. Peters. 2009b. Policy search for motor primitives in robotics. In Advances in Neural Information Processing Systems. 849--856.Google Scholar
- Jens Kober and Jan R. Peters. 2009c. Policy search for motor primitives in robotics. In Advances in Neural Information Processing Systems. 849--856.Google Scholar
- Nate Kohl and Peter Stone. 2004. Policy gradient reinforcement learning for fast quadrupedal locomotion. In Proceedings of the 2004 IEEE International Conference on Robotics and Automation, 2004 (ICRA’04), Vol. 3. IEEE, 2619--2624. Google ScholarCross Ref
- Jan Koutník, Giuseppe Cuccu, Jürgen Schmidhuber, and Faustino Gomez. 2013. Evolving large-scale neural networks for vision-based reinforcement learning. In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation. ACM, 1061--1068. Google ScholarDigital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.Google ScholarDigital Library
- Gregory Kuhlmann and Peter Stone. 2007. Graph-based domain mapping for transfer learning in general games. In Machine Learning (ECML’07). Springer, 188--200. Google ScholarDigital Library
- Hoang M. Le, Andrew Kang, Yisong Yue, and Peter Carr. 2016. Smooth Imitation Learning for Online Sequence Prediction. In Proceedings of the 33rd International Conference on Machine Learning.Google Scholar
- Geoffrey Lee, Min Luo, Fabio Zambetta, and Xiaodong Li. 2014. Learning a super mario controller from examples of human play. In Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC’14). IEEE, 1--8. Google ScholarCross Ref
- Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. 2015. End-to-end training of deep visuomotor policies. arXiv Preprint arXiv:1504.00702 (2015).Google ScholarDigital Library
- Sergey Levine and Vladlen Koltun. 2013. Guided policy search. In Proceedings of the 30th International Conference on Machine Learning. 1--9.Google Scholar
- Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv Preprint arXiv:1509.02971 (2015).Google Scholar
- Hsien-I Lin, Yu-Cheng Liu, and Chi-Li Chen. 2011. Evaluation of human-robot arm movement imitation. In Proceedings of the 2011 8th Asian Control Conference (ASCC’11). IEEE, 287--292.Google Scholar
- Long Ji Lin. 1991. Programming robots using reinforcement learning and teaching. In Proceedings of the Ninth National Conference on Artificial Intelligence - Volume 2 (AAAI’91). 781--786.Google Scholar
- Long-Ji Lin. 1992. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 3--4 (1992), 293--321.Google ScholarDigital Library
- Maja J. Mataric. 2000a. Getting humanoids to move and imitate. IEEE Intelligent Systems 15, 4 (2000), 18--24. Google ScholarDigital Library
- Maja J. Mataric. 2000b. Sensory-motor primitives as a basis for imitation: Linking perception to action and biology to robotics. In Imitation in Animals and Artifacts. Citeseer.Google Scholar
- Hermann Mayer, Faustino Gomez, Daan Wierstra, Istvan Nagy, Alois Knoll, and Jürgen Schmidhuber. 2008. A system for robotic heart surgery that learns to tie knots using recurrent neural networks. Advanced Robotics 22, 13--14 (2008), 1521--1537.Google ScholarCross Ref
- Hua-Qing Min, Jin-Hui Zhu, and Xi-Jing Zheng. 2005. Obstacle avoidance with multi-objective optimization by PSO in dynamic environment. In Proceedings of the 2005 International Conference on Machine Learning and Cybernetics, Vol. 5. IEEE, 2950--2956.Google Scholar
- Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. arXiv Preprint arXiv:1602.01783 (2016).Google Scholar
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533. Google ScholarCross Ref
- Yasser Mohammad and Toyoaki Nishida. 2012. Fluid imitation. International Journal of Social Robotics 4, 4 (2012), 369--382. Google ScholarCross Ref
- Yasser Mohammad and Toyoaki Nishida. 2013. Tackling the correspondence problem. In Proceedings of the International Conference on Active Media Technology. Springer, 84--95. Google ScholarDigital Library
- Katharina Mülling, Jens Kober, Oliver Kroemer, and Jan Peters. 2013. Learning to select and generalize striking movements in robot table tennis. International Journal of Robotics Research 32, 3 (2013), 263--279. Google ScholarDigital Library
- Jorge Munoz, German Gutierrez, and Araceli Sanchis. 2009. Controller for torcs created by imitation. In Proceedings of the IEEE Symposium on Computational Intelligence and Games, 2009 (CIG’09). IEEE, 271--278.Google ScholarCross Ref
- Jorge Muñoz, German Gutierrez, and Araceli Sanchis. 2010. A human-like TORCS controller for the simulated car racing championship. In Proceedings of the 2010 IEEE Symposium on Computational Intelligence and Games (CIG’10). IEEE, 473--480.Google ScholarCross Ref
- Jun Nakanishi, Jun Morimoto, Gen Endo, Gordon Cheng, Stefan Schaal, and Mitsuo Kawato. 2004. Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems 47, 2 (2004), 79--91. Google ScholarCross Ref
- Andrew Y. Ng, Adam Coates, Mark Diel, Varun Ganapathi, Jamie Schulte, Ben Tse, Eric Berger, and Eric Liang. 2006. Autonomous inverted helicopter flight via reinforcement learning. In Experimental Robotics IX. Springer, 363--372. Google ScholarCross Ref
- Andrew Y. Ng, Daishi Harada, and Stuart Russell. 1999. Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the Sixteenth International Conference on Machine Learning, Vol. 99. 278--287.Google Scholar
- Monica N. Nicolescu and Maja J. Mataric. 2003. Natural methods for robot task learning: Instructive demonstrations, generalization and practice. In Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems. ACM, 241--248. Google ScholarDigital Library
- Scott Niekum, Sachin Chitta, Andrew G. Barto, Bhaskara Marthi, and Sarah Osentoski. 2013. Incremental semantically grounded learning from demonstration. In Robotics: Science and Systems, Vol. 9. Google ScholarCross Ref
- Stefano Nolfi and Dario Floreano. 2000. Evolutionary Robotics: The Biology, Intelligence, and Technology. MIT Press, Cambridge, MA, USA.Google ScholarDigital Library
- Mark Ollis, Wesley H. Huang, and Michael Happold. 2007. A Bayesian approach to imitation learning for robot navigation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007 (IROS’07). IEEE, 709--714. Google ScholarCross Ref
- Juan Ortega, Noor Shaker, Julian Togelius, and Georgios N. Yannakakis. 2013. Imitating human playing styles in super mario bros. Entertainment Computing 4, 2 (2013), 93--104. Google ScholarCross Ref
- Erhan Oztop and Michael A. Arbib. 2002. Schema design and implementation of the grasp-related mirror neuron system. Biological Cybernetics 87, 2 (2002), 116--140. Google ScholarCross Ref
- Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2010), 1345--1359. Google ScholarDigital Library
- Peter Pastor, Mrinal Kalakrishnan, Sachin Chitta, Evangelos Theodorou, and Stefan Schaal. 2011. Skill learning and task outcome prediction for manipulation. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation (ICRA’11). IEEE, 3828--3834. Google ScholarCross Ref
- Peter Pastor, Mrinal Kalakrishnan, Franziska Meier, Freek Stulp, Jonas Buchli, Evangelos Theodorou, and Stefan Schaal. 2013. From dynamic movement primitives to associative skill memories. Robotics and Autonomous Systems 61, 4 (2013), 351--361. Google ScholarDigital Library
- Jan Peters and Stefan Schaal. 2008. Reinforcement learning of motor skills with policy gradients. Neural Networks 21, 4 (2008), 682--697. Google ScholarDigital Library
- Dean Pomerleau. 1995. Neural network vision for robot driving. In The Handbook of Brain Theory and Neural Networks, M. Arbib (Ed.).Google Scholar
- Polly K. Pook and Dana H. Ballard. 1993. Recognizing teleoperated manipulations. In Proceedings of the 1993 IEEE International Conference on Robotics and Automation, 1993. IEEE, 578--585. Google ScholarCross Ref
- Bob Price and Craig Boutilier. 1999. Implicit imitation in multiagent reinforcement learning. In Proceedings of the Sixteenth International Conference on Machine Learning (ICML’99). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 325--334.Google ScholarDigital Library
- Rouhollah Rahmatizadeh, Pooya Abolghasemi, and Ladislau Bölöni. 2016. Learning manipulation trajectories using recurrent neural networks. arXiv Preprint arXiv:1603.03833 (2016).Google Scholar
- Jette Randlov and Preben Alstrom. 1998. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning. 463--471.Google Scholar
- Nathan Ratliff, David Bradley, J. Andrew Bagnell, and Joel Chestnutt. 2007. Boosting structured prediction for imitation learning. Robotics Institute (2007), 54.Google Scholar
- Saleha Raza, Sajjad Haider, and M.-A. Williams. 2012. Teaching coordinated strategies to soccer robots via imitation. In Proceedings of the 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO’12). IEEE, 1434--1439. Google ScholarCross Ref
- Nizar Rokbani, Abdallah Zaidi, and Adel M. Alimi. 2012. Prototyping a biped robot using an educational robotics kit. In Proceedings of the 2012 International Conference on Education and e-Learning Innovations (ICEELI’12). IEEE, 1--4. Google ScholarCross Ref
- Stéphane Ross and Drew Bagnell. 2010. Efficient reductions for imitation learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 661--668.Google Scholar
- Stéphane Ross, Geoffrey J. Gordon, and J. Andrew Bagnell. 2010. A reduction of imitation learning and structured prediction to no-regret online learning. arXiv Preprint arXiv:1011.0686 (2010).Google Scholar
- Leonel Rozo, Danilo Bruno, Sylvain Calinon, and Darwin G. Caldwell. 2015. Learning optimal controllers in human-robot cooperative transportation tasks with position and force constraints. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS’15). IEEE, 1024--1030. Google ScholarCross Ref
- Leonel Rozo, Pablo Jiménez, and Carme Torras. 2013. A robot learning from demonstration framework to perform force-based manipulation tasks. Intelligent Service Robotics 6, 1 (2013), 33--51. Google ScholarDigital Library
- Leonel Rozo Castañeda, Sylvain Calinon, Darwin Caldwell, Pablo Jimenez Schlegl, and Carme Torras. 2013. Learning collaborative impedance-based robot behaviors. In Proceedings of the 27th AAAI Conference on Artificial Intelligence. 1422--1428.Google ScholarDigital Library
- Stuart J. Russell and Peter Norvig. 2003. Artificial Intelligence: A Modern Approach (2nd ed.). Pearson Education.Google ScholarDigital Library
- Claude Sammut, Scott Hurst, Dana Kedzier, Donald Michie, and others. 2014. Learning to fly. In Proceedings of the 9th International Workshop on Machine Learning. 385--393.Google Scholar
- Joe Saunders, Chrystopher L. Nehaniv, and Kerstin Dautenhahn. 2006. Teaching robots by moulding behavior and scaffolding the environment. In Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction. ACM, 118--125. Google ScholarDigital Library
- Stefan Schaal. 1999. Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences 3, 6 (1999), 233--242. Google ScholarCross Ref
- Stefan Schaal. 1997. Learning from Demonstration. In Advances in Neural Information Processing Systems 9, M. C. Mozer, M. I. Jordan, and T. Petsche (Eds.). MIT Press, 1040--1046.Google Scholar
- Stefan Schaal, Auke Ijspeert, and Aude Billard. 2003. Computational approaches to motor learning by imitation. Philosophical Transactions of the Royal Society B: Biological Sciences 358, 1431 (2003), 537--547. Google ScholarCross Ref
- Stefan Schaal, Peyman Mohajerian, and Auke Ijspeert. 2007. Dynamics systems vs. optimal control: A unifying view. Progress in Brain Research 165 (2007), 425--445. Google ScholarCross Ref
- Tom Schaul, Julian Togelius, and Jürgen Schmidhuber. 2011. Measuring intelligence through games. arXiv Preprint arXiv:1109.1314 (2011).Google Scholar
- Yoav Shoham, Rob Powers, and Trond Grenager. 2003. Multi-Agent Reinforcement Learning: A Critical Survey. Technical Report. Stanford University.Google Scholar
- Aaron P. Shon, David B. Grimes, Chris L. Baker, Matthew W. Hoffman, Shengli Zhou, and Rajesh P. N. Rao. 2005. Probabilistic gaze imitation and saliency learning in a robotic head. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation. IEEE, 2865--2870. Google ScholarCross Ref
- David Silver, James Bagnell, and Anthony Stentz. 2008. High performance outdoor navigation from overhead data using imitation learning. In Robotics: Science and Systems IV. Google ScholarCross Ref
- David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 7587 (2016), 484--489. Google ScholarCross Ref
- Tsung-Ying Sun, Chih-Li Huo, Shang-Jeng Tsai, and Chan-Cheng Liu. 2008. Optimal UAV flight path planning using skeletonization and particle swarm optimizer. In Proceedings of the 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence). IEEE, 1183--1188.Google Scholar
- Huan Tan. 2015. A behavior generation framework for robots to learn from demonstrations. In Proceedings of the 2015 IEEE International Conference on Systems, Man, and Cybernetics (SMC’15). IEEE, 947--953. Google ScholarDigital Library
- Adriana Tapus, Cristian Tapus, and Maja J. Mataric. 2009. The use of socially assistive robots in the design of intelligent cognitive therapies for people with dementia. In Proceedings of the IEEE International Conference on Rehabilitation Robotics, 2009 (ICORR’09). IEEE, 924--929. Google ScholarCross Ref
- Christian Thurau, Christian Bauckhage, and Gerhard Sagerer. 2004a. Imitation learning at all levels of game-AI. In Proceedings of the International Conference on Computer Games, Artificial Intelligence, Design and Education, Vol. 5.Google Scholar
- Christian Thurau, Christian Bauckhage, and Gerhard Sagerer. 2004b. Learning human-like movement behavior for computer games. In Proceedings of the International Conference on the Simulation of Adaptive Behavior. 315--323.Google Scholar
- Julian Togelius, Renzo De Nardi, and Simon M. Lucas. 2007. Towards automatic personalised content creation for racing games. In Proceedings of the IEEE Symposium on Computational Intelligence and Games, 2007 (CIG’07). IEEE, 252--259. Google ScholarDigital Library
- Lisa Torrey and Jude Shavlik. 2009. Transfer learning. Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques 1 (2009), 242.Google Scholar
- Lisa Torrey, Trevor Walker, Jude Shavlik, and Richard Maclin. 2005. Using advice to transfer knowledge acquired in one reinforcement learning task to another. In Machine Learning (ECML’05). Springer, 412--424. Google ScholarDigital Library
- Aleš Ude, Christopher G. Atkeson, and Marcia Riley. 2004. Programming full-body movements for humanoid robots by observation. Robotics and Autonomous Systems 47, 2 (2004), 93--108. Google ScholarCross Ref
- Andreas Vlachos. 2012. An investigation of imitation learning algorithms for structured prediction. In Proceedings of the European Workshop on Reinforcement Learning (EWRL). Citeseer, 143--154.Google Scholar
- David Vogt, Heni Ben Amor, Erik Berger, and Bernhard Jung. 2014. Learning two-person interaction models for responsive synthetic humanoids. Journal of Virtual Reality and Broadcasting 11, 1 (2014).Google Scholar
- Markus Wulfmeier, Peter Ondruska, and Ingmar Posner. 2015. Maximum entropy deep inverse reinforcement learning. arXiv Preprint arXiv:1507.04888 (2015).Google Scholar
- Chao Zhang, Ziyang Zhen, Daobo Wang, and Meng Li. 2010. UAV path planning method based on ant colony optimization. In Proceedings of the 2010 Chinese Control and Decision Conference. IEEE, 3790--3792. Google ScholarCross Ref
- Marvin Zhang, Zoe McCarthy, Chelsea Finn, Sergey Levine, and Pieter Abbeel. 2016. Learning deep neural network policies with continuous memory states. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA’16). IEEE, 520--527. Google ScholarDigital Library
- Yudong Zhang, Shuihua Wang, and Genlin Ji. 2015. A comprehensive survey on particle swarm optimization algorithm and its applications. Mathematical Problems in Engineering 2015 (2015). Google ScholarCross Ref
- Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, and Anind K. Dey. 2008. Maximum entropy inverse reinforcement learning. In 23rd AAAI Conference on Artificial Intelligence and the 20th Innovative Applications of Artificial Intelligence Conference, AAAI-08/IAAI-08. 1433--1438.Google Scholar
Index Terms
Imitation Learning: A Survey of Learning Methods
Recommendations
Deep imitation learning for 3D navigation tasks
Deep learning techniques have shown success in learning from raw high-dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in ...
Model-free reinforcement learning from expert demonstrations: a survey
AbstractReinforcement learning from expert demonstrations (RLED) is the intersection of imitation learning with reinforcement learning that seeks to take advantage of these two learning approaches. RLED uses demonstration trajectories to improve sample ...
Deep reinforcement learning and imitation learning based on VizDoom
EITCE '22: Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer EngineeringReinforcement learning is a field of machine learning that focuses on intelligent agents, primarily the concept of what actions an intelligent agent takes in the environment to maximize cumulative reward. In environments where rewards are scarce, a ...
Comments