skip to main content
10.1145/2493525.2493535acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmlisConference Proceedingsconference-collections
research-article

Social signal and user adaptation in reinforcement learning-based dialogue management

Published:04 August 2013Publication History

ABSTRACT

This paper investigates the conditions under which cues from social signals can be used for user adaptation (or user tracking) of a learning agent. In this work we consider the case of the Reinforcement Learning (RL) of a dialogue management module. Social signals (gazes, postures, emotions, etc.) have an undeniable importance in human interactions and can be used as an additional and user-dependent (subjective) reinforcement signal during learning. In this paper, the Kalman Temporal Differences (KTD) framework is employed in combination with a potential-based shaping reward method to properly integrate the social information in the optimisation procedure and adapt the policy to user profiles. In a second step the ability of the method to track a new user profile (after self learning of the user or switch to a new user) is shown. Experiments carried out using a state-of-the-art goal-oriented dialogue management framework with simulations support our claims.

References

  1. A. Boularias, H. R. Chinaei, and B. Chaib-draa. Learning the reward model of dialogue pomdps from data. In NIPS 2010 Workshop of Machine Learning for Assistive Techniques, 2010.Google ScholarGoogle Scholar
  2. K. Bousmalis, M. Mehu, and M. Pantic. Spotting agreement and disagreement: A survey of nonverbal audiovisual cues and tools. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  3. J. Broekens and P. Haazebroek. Emotion and reinforcement: Affective facial expressions facilitate robot learning. In Artificial Intelligence for Human Computing, volume 4451 of Lecture Notes in Computer Science, pages 113--132, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Chandramohan, M. Geist, F. Lefèvre, and O. Pietquin. User Simulation in Dialogue Systems using Inverse Reinforcement Learning. In Interspeech, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  5. R. Custers and H. Aarts. Positive affect as implicit motivator: On the nonconscious operation of behavioral goals. Journal of Personality and Social Psychology, 89(2):129--142, Aug. 2005.Google ScholarGoogle ScholarCross RefCross Ref
  6. L. Daubigney, M. Gasic, S. Chandramohan, M. Geist, O. Pietquin, and S. Young. Uncertainty management for on-line optimisation of a pomdp-based large-scale spoken dialogue system. In Interspeech, 2011.Google ScholarGoogle Scholar
  7. L. Daubigney, M. Geist, S. Chandramohan, and O. Pietquin. A comprehensive reinforcement learning framework for dialogue management optimization. Journal on Selected Topics in Signal Processing, 6(8):891--902, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  8. M. Gašić, F. Jurčíček, S. Keizer, F. Mairesse, B. Thomson, K. Yu, and S. Young. Gaussian processes for fast policy optimisation of pomdp-based dialogue managers. In SIGDIAL, 2010.Google ScholarGoogle Scholar
  9. M. Geist and O. Pietquin. Kalman temporal differences. Journal of Artificial Intelligence Research (JAIR), 39(1):483--532, Sept. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Geist, O. Pietquin, and G. Fricout. Tracking in reinforcement learning. In Neural Information Processing, volume 5863 of Lecture Notes in Computer Science, pages 502--511, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence Journal, 101(1-2):99--134, May 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82:35--45, 1960.Google ScholarGoogle ScholarCross RefCross Ref
  13. S. Keizer, M. Gašić, F. Jurčíček, F. Mairesse, B. Thomson, K. Yu, and S. Young. Parameter estimation for agenda-based user simulation. In SIGDIAL, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. Levin, R. Pieraccini, and W. Eckert. Learning dialogue strategies within the markov decision process framework. In ASRU, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  15. A. Y. Ng, D. Harada, and S. Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. Pinault and F. Lefèvre. Unsupervised clustering of probability distributions of semantic graphs for pomdp based spoken dialogue systems with summary space. In IJCAI 7th Workshop on knowledge and reasoning in practical dialogue systems, 2011.Google ScholarGoogle Scholar
  17. J. Pineau, G. Gordon, and S. Thrun. Anytime point-based approximations for large POMDPs. Journal of Artificial Intelligence Research, 27:335--380, 2006. Google ScholarGoogle ScholarCross RefCross Ref
  18. B. Price and C. Boutilier. A bayesian approach to imitation in reinforcement learning. In IJCAI, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. N. Roy, J. Pineau, and S. Thrun. Spoken dialogue management using probabilistic reasoning. In ACL, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Schatzmann, K. Weilhammer, M. Stuttle, and S. Young. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. Knowledge Engineering Review, 21(2):97--126, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Schuller, S. Steidl, A. Batliner, F. Burkhardt, L. Devillers, C. Müller, and S. Narayanan. Paralinguistics in speech and language - state-of-the-art and the challenge. Computer Speech and Language (CSL), Special Issue on "Paralinguistics in Naturalistic Speech and Language", (1):4--39, Jan 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. IEEE Transactions on Neural Networks, 9(5):1054--1054, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. S. Sutton, A. Koop, and D. Silver. On the role of tracking in stationary environments. In ICML, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Thomson and S. Young. Bayesian update of dialogue state: A pomdp framework for spoken dialogue systems. Computer Speech and Language, 24(4):562--588, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. R. Traum and S. Larsson. The information state approach to dialogue management. In Current and New Directions in Discourse and Dialogue, volume 22 of Text, Speech and Language Technology, pages 325--353, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  26. A. Vinciarelli, M. Pantic, and H. Bourlard. Social signal processing: Survey of an emerging domain. Image and Vision Computing, 27(12):1743--1759, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. A. Walker, D. J. Litman, C. A. Kamm, and A. Abella. Paradise: a framework for evaluating spoken dialogue agents. In ACL, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Young, M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson, and K. Yu. The hidden information state model: A practical framework for pomdp-based spoken dialogue management. Computer Speech and Language, 24(2):150--174, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Social signal and user adaptation in reinforcement learning-based dialogue management

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      MLIS '13: Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication
      August 2013
      70 pages
      ISBN:9781450320191
      DOI:10.1145/2493525

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 August 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      MLIS '13 Paper Acceptance Rate10of14submissions,71%Overall Acceptance Rate10of14submissions,71%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader