research-article

Social signal and user adaptation in reinforcement learning-based dialogue management

Authors:
Emmanuel Ferreira

University of Avignon, Avignon, France

University of Avignon, Avignon, France
View Profile

,
Fabrice Lefèvre

University of Avignon, Avignon, France

University of Avignon, Avignon, France
View Profile

MLIS '13: Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and CommunicationAugust 2013Pages 61–69https://doi.org/10.1145/2493525.2493535

Published:04 August 2013Publication History

MLIS '13: Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication

Pages 61–69

ABSTRACT

This paper investigates the conditions under which cues from social signals can be used for user adaptation (or user tracking) of a learning agent. In this work we consider the case of the Reinforcement Learning (RL) of a dialogue management module. Social signals (gazes, postures, emotions, etc.) have an undeniable importance in human interactions and can be used as an additional and user-dependent (subjective) reinforcement signal during learning. In this paper, the Kalman Temporal Differences (KTD) framework is employed in combination with a potential-based shaping reward method to properly integrate the social information in the optimisation procedure and adapt the policy to user profiles. In a second step the ability of the method to track a new user profile (after self learning of the user or switch to a new user) is shown. Experiments carried out using a state-of-the-art goal-oriented dialogue management framework with simulations support our claims.

References

A. Boularias, H. R. Chinaei, and B. Chaib-draa. Learning the reward model of dialogue pomdps from data. In NIPS 2010 Workshop of Machine Learning for Assistive Techniques, 2010.Google Scholar
K. Bousmalis, M. Mehu, and M. Pantic. Spotting agreement and disagreement: A survey of nonverbal audiovisual cues and tools. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction, 2009.Google ScholarCross Ref
J. Broekens and P. Haazebroek. Emotion and reinforcement: Affective facial expressions facilitate robot learning. In Artificial Intelligence for Human Computing, volume 4451 of Lecture Notes in Computer Science, pages 113--132, 2007. Google ScholarDigital Library
S. Chandramohan, M. Geist, F. Lefèvre, and O. Pietquin. User Simulation in Dialogue Systems using Inverse Reinforcement Learning. In Interspeech, 2011.Google ScholarCross Ref
R. Custers and H. Aarts. Positive affect as implicit motivator: On the nonconscious operation of behavioral goals. Journal of Personality and Social Psychology, 89(2):129--142, Aug. 2005.Google ScholarCross Ref
L. Daubigney, M. Gasic, S. Chandramohan, M. Geist, O. Pietquin, and S. Young. Uncertainty management for on-line optimisation of a pomdp-based large-scale spoken dialogue system. In Interspeech, 2011.Google Scholar
L. Daubigney, M. Geist, S. Chandramohan, and O. Pietquin. A comprehensive reinforcement learning framework for dialogue management optimization. Journal on Selected Topics in Signal Processing, 6(8):891--902, 2012.Google ScholarCross Ref
M. Gašić, F. Jurčíček, S. Keizer, F. Mairesse, B. Thomson, K. Yu, and S. Young. Gaussian processes for fast policy optimisation of pomdp-based dialogue managers. In SIGDIAL, 2010.Google Scholar
M. Geist and O. Pietquin. Kalman temporal differences. Journal of Artificial Intelligence Research (JAIR), 39(1):483--532, Sept. 2010. Google ScholarDigital Library
M. Geist, O. Pietquin, and G. Fricout. Tracking in reinforcement learning. In Neural Information Processing, volume 5863 of Lecture Notes in Computer Science, pages 502--511, 2009. Google ScholarDigital Library
L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence Journal, 101(1-2):99--134, May 1998. Google ScholarDigital Library
R. Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82:35--45, 1960.Google ScholarCross Ref
S. Keizer, M. Gašić, F. Jurčíček, F. Mairesse, B. Thomson, K. Yu, and S. Young. Parameter estimation for agenda-based user simulation. In SIGDIAL, 2010. Google ScholarDigital Library
E. Levin, R. Pieraccini, and W. Eckert. Learning dialogue strategies within the markov decision process framework. In ASRU, 1997.Google ScholarCross Ref
A. Y. Ng, D. Harada, and S. Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, 1999. Google ScholarDigital Library
F. Pinault and F. Lefèvre. Unsupervised clustering of probability distributions of semantic graphs for pomdp based spoken dialogue systems with summary space. In IJCAI 7th Workshop on knowledge and reasoning in practical dialogue systems, 2011.Google Scholar
J. Pineau, G. Gordon, and S. Thrun. Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research, 27:335--380, 2006. Google ScholarCross Ref
B. Price and C. Boutilier. A bayesian approach to imitation in reinforcement learning. In IJCAI, 2003. Google ScholarDigital Library
N. Roy, J. Pineau, and S. Thrun. Spoken dialogue management using probabilistic reasoning. In ACL, 2000. Google ScholarDigital Library
J. Schatzmann, K. Weilhammer, M. Stuttle, and S. Young. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. Knowledge Engineering Review, 21(2):97--126, June 2006. Google ScholarDigital Library
B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, and S. Narayanan. Paralinguistics in speech and language - state-of-the-art and the challenge. Computer Speech and Language (CSL), Special Issue on "Paralinguistics in Naturalistic Speech and Language", (1):4--39, Jan 2012. Google ScholarDigital Library
R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. IEEE Transactions on Neural Networks, 9(5):1054--1054, 1998. Google ScholarDigital Library
R. S. Sutton, A. Koop, and D. Silver. On the role of tracking in stationary environments. In ICML, 2007. Google ScholarDigital Library
B. Thomson and S. Young. Bayesian update of dialogue state: A pomdp framework for spoken dialogue systems. Computer Speech and Language, 24(4):562--588, 2010. Google ScholarDigital Library
D. R. Traum and S. Larsson. The information state approach to dialogue management. In Current and New Directions in Discourse and Dialogue, volume 22 of Text, Speech and Language Technology, pages 325--353, 2003.Google ScholarCross Ref
A. Vinciarelli, M. Pantic, and H. Bourlard. Social signal processing: Survey of an emerging domain. Image and Vision Computing, 27(12):1743--1759, 2009. Google ScholarDigital Library
M. A. Walker, D. J. Litman, C. A. Kamm, and A. Abella. Paradise: a framework for evaluating spoken dialogue agents. In ACL, 1997. Google ScholarDigital Library
S. Young, M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson, and K. Yu. The hidden information state model: A practical framework for pomdp-based spoken dialogue management. Computer Speech and Language, 24(2):150--174, 2010. Google ScholarDigital Library

Index Terms

Social signal and user adaptation in reinforcement learning-based dialogue management
1. Computing methodologies
  1. Machine learning

Recommendations

Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Read More
Reinforcement-learning based dialogue system for human-robot interactions with socially-inspired rewards

HighlightsWe integrate user appraisals in a POMDP-based dialogue manager procedure.We employ additional socially-inspired rewards in a RL setup to guide the learning.A unified framework for speeding up the policy optimisation and user adaptation.We ...
Read More
Reducing reinforcement learning to KWIK online regression

One of the key problems in reinforcement learning (RL) is balancing exploration and exploitation. Another is learning and acting in large Markov decision processes (MDPs) where compact function approximation has to be used. This paper introduces REKWIRE,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MLIS '13: Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication
August 2013
70 pages
ISBN:9781450320191
DOI:10.1145/2493525
Editors:
Heriberto Cuayáhuitl
Heriot-Watt University, Edinburgh, UK
,
Lutz Frommberger
University of Bremen, Germany
,
Nina Dethlefs
Heriot-Watt University, Edinburgh, UK
,
Martijn van Otterlo
Radboud University Nijmegen, The Netherlands
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
dialogue management
reinforcement learning
reward shaping
social signals
user adaptation
value function approximation
Qualifiers
- research-article
Conference

Acceptance Rates
MLIS '13 Paper Acceptance Rate10of14submissions,71%Overall Acceptance Rate10of14submissions,71%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 212
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Social signal and user adaptation in reinforcement learning-based dialogue management

MLIS '13: Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication

ABSTRACT

References

Cited By

Index Terms

Recommendations

Reward Shaping in Episodic Reinforcement Learning

Reinforcement-learning based dialogue system for human-robot interactions with socially-inspired rewards

Reducing reinforcement learning to KWIK online regression