ABSTRACT
This paper investigates the conditions under which cues from social signals can be used for user adaptation (or user tracking) of a learning agent. In this work we consider the case of the Reinforcement Learning (RL) of a dialogue management module. Social signals (gazes, postures, emotions, etc.) have an undeniable importance in human interactions and can be used as an additional and user-dependent (subjective) reinforcement signal during learning. In this paper, the Kalman Temporal Differences (KTD) framework is employed in combination with a potential-based shaping reward method to properly integrate the social information in the optimisation procedure and adapt the policy to user profiles. In a second step the ability of the method to track a new user profile (after self learning of the user or switch to a new user) is shown. Experiments carried out using a state-of-the-art goal-oriented dialogue management framework with simulations support our claims.
- A. Boularias, H. R. Chinaei, and B. Chaib-draa. Learning the reward model of dialogue pomdps from data. In NIPS 2010 Workshop of Machine Learning for Assistive Techniques, 2010.Google Scholar
- K. Bousmalis, M. Mehu, and M. Pantic. Spotting agreement and disagreement: A survey of nonverbal audiovisual cues and tools. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction, 2009.Google ScholarCross Ref
- J. Broekens and P. Haazebroek. Emotion and reinforcement: Affective facial expressions facilitate robot learning. In Artificial Intelligence for Human Computing, volume 4451 of Lecture Notes in Computer Science, pages 113--132, 2007. Google ScholarDigital Library
- S. Chandramohan, M. Geist, F. Lefèvre, and O. Pietquin. User Simulation in Dialogue Systems using Inverse Reinforcement Learning. In Interspeech, 2011.Google ScholarCross Ref
- R. Custers and H. Aarts. Positive affect as implicit motivator: On the nonconscious operation of behavioral goals. Journal of Personality and Social Psychology, 89(2):129--142, Aug. 2005.Google ScholarCross Ref
- L. Daubigney, M. Gasic, S. Chandramohan, M. Geist, O. Pietquin, and S. Young. Uncertainty management for on-line optimisation of a pomdp-based large-scale spoken dialogue system. In Interspeech, 2011.Google Scholar
- L. Daubigney, M. Geist, S. Chandramohan, and O. Pietquin. A comprehensive reinforcement learning framework for dialogue management optimization. Journal on Selected Topics in Signal Processing, 6(8):891--902, 2012.Google ScholarCross Ref
- M. Gašić, F. Jurčíček, S. Keizer, F. Mairesse, B. Thomson, K. Yu, and S. Young. Gaussian processes for fast policy optimisation of pomdp-based dialogue managers. In SIGDIAL, 2010.Google Scholar
- M. Geist and O. Pietquin. Kalman temporal differences. Journal of Artificial Intelligence Research (JAIR), 39(1):483--532, Sept. 2010. Google ScholarDigital Library
- M. Geist, O. Pietquin, and G. Fricout. Tracking in reinforcement learning. In Neural Information Processing, volume 5863 of Lecture Notes in Computer Science, pages 502--511, 2009. Google ScholarDigital Library
- L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence Journal, 101(1-2):99--134, May 1998. Google ScholarDigital Library
- R. Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82:35--45, 1960.Google ScholarCross Ref
- S. Keizer, M. Gašić, F. Jurčíček, F. Mairesse, B. Thomson, K. Yu, and S. Young. Parameter estimation for agenda-based user simulation. In SIGDIAL, 2010. Google ScholarDigital Library
- E. Levin, R. Pieraccini, and W. Eckert. Learning dialogue strategies within the markov decision process framework. In ASRU, 1997.Google ScholarCross Ref
- A. Y. Ng, D. Harada, and S. Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, 1999. Google ScholarDigital Library
- F. Pinault and F. Lefèvre. Unsupervised clustering of probability distributions of semantic graphs for pomdp based spoken dialogue systems with summary space. In IJCAI 7th Workshop on knowledge and reasoning in practical dialogue systems, 2011.Google Scholar
- J. Pineau, G. Gordon, and S. Thrun. Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research, 27:335--380, 2006. Google ScholarCross Ref
- B. Price and C. Boutilier. A bayesian approach to imitation in reinforcement learning. In IJCAI, 2003. Google ScholarDigital Library
- N. Roy, J. Pineau, and S. Thrun. Spoken dialogue management using probabilistic reasoning. In ACL, 2000. Google ScholarDigital Library
- J. Schatzmann, K. Weilhammer, M. Stuttle, and S. Young. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. Knowledge Engineering Review, 21(2):97--126, June 2006. Google ScholarDigital Library
- B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, and S. Narayanan. Paralinguistics in speech and language - state-of-the-art and the challenge. Computer Speech and Language (CSL), Special Issue on "Paralinguistics in Naturalistic Speech and Language", (1):4--39, Jan 2012. Google ScholarDigital Library
- R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. IEEE Transactions on Neural Networks, 9(5):1054--1054, 1998. Google ScholarDigital Library
- R. S. Sutton, A. Koop, and D. Silver. On the role of tracking in stationary environments. In ICML, 2007. Google ScholarDigital Library
- B. Thomson and S. Young. Bayesian update of dialogue state: A pomdp framework for spoken dialogue systems. Computer Speech and Language, 24(4):562--588, 2010. Google ScholarDigital Library
- D. R. Traum and S. Larsson. The information state approach to dialogue management. In Current and New Directions in Discourse and Dialogue, volume 22 of Text, Speech and Language Technology, pages 325--353, 2003.Google ScholarCross Ref
- A. Vinciarelli, M. Pantic, and H. Bourlard. Social signal processing: Survey of an emerging domain. Image and Vision Computing, 27(12):1743--1759, 2009. Google ScholarDigital Library
- M. A. Walker, D. J. Litman, C. A. Kamm, and A. Abella. Paradise: a framework for evaluating spoken dialogue agents. In ACL, 1997. Google ScholarDigital Library
- S. Young, M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson, and K. Yu. The hidden information state model: A practical framework for pomdp-based spoken dialogue management. Computer Speech and Language, 24(2):150--174, 2010. Google ScholarDigital Library
Index Terms
- Social signal and user adaptation in reinforcement learning-based dialogue management
Recommendations
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent SystemsRecent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Reinforcement-learning based dialogue system for human-robot interactions with socially-inspired rewards
HighlightsWe integrate user appraisals in a POMDP-based dialogue manager procedure.We employ additional socially-inspired rewards in a RL setup to guide the learning.A unified framework for speeding up the policy optimisation and user adaptation.We ...
Reducing reinforcement learning to KWIK online regression
One of the key problems in reinforcement learning (RL) is balancing exploration and exploitation. Another is learning and acting in large Markov decision processes (MDPs) where compact function approximation has to be used. This paper introduces REKWIRE,...
Comments