skip to main content
10.1145/2522848.2522866acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
poster

Speaker-adaptive multimodal prediction model for listener responses

Published:09 December 2013Publication History

ABSTRACT

The goal of this paper is to analyze and model the variability in speaking styles in dyadic interactions and build a predictive algorithm for listener responses that is able to adapt to these different styles. The end result of this research will be a virtual human able to automatically respond to a human speaker with proper listener responses (e.g., head nods). Our novel speaker-adaptive prediction model is created from a corpus of dyadic interactions where speaker variability is analyzed to identify a subset of prototypical speaker styles. During a live interaction our prediction model automatically identifies the closest prototypical speaker style and predicts listener responses based on this ``communicative style". Central to our approach is the idea of ``speaker profile" which uniquely identifies each speaker and enables the matching between prototypical speakers and new speakers. The paper shows the merits of our speaker-adaptive listener response prediction model by showing improvement over a state-of-the-art approach which does not adapt to the speaker. Besides the merits of speaker-adapta-tion, our experiments highlights the importance of using multimodal features when comparing speakers to select the closest prototypical speaker style.

References

  1. hCRF library. http://sourceforge.net/projects/hcrf/.Google ScholarGoogle Scholar
  2. M. Argyle, R. Ingham, F. Alkema, and M. McCallin. The different functions of gaze. Semiotica, 7(1):19--32, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  3. J. B. Bavelas, L. Coates, and T. Johnson. Listeners as co-narrators. Journal of Personality and Social Psychology, 79(6):941--952, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  4. J. B. Bavelas, L. Coates, and T. Johnson. Listener responses as a collaborative process: The role of gaze. Journal of Communication, 52(3):566--580, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  5. N. Cathcart, J. Carletta, and E. Klein. A shallow model of backchannel continuers in spoken dialogue. European ACL, pages 51--58, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. L. Chartrand and J. A. Bargh. The chameleon effect: the perception-behavior link and social interaction. Journal of personality and social psychology, 76(6):893--910, 1999.Google ScholarGoogle Scholar
  7. I. de Kok and D. Heylen. Appropriate and Inappropriate Timing of Listener Responses from Multiple Perspectives. In Intelligent Virtual Agents, pages 248--254, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. de Kok and D. Heylen. The MultiLis Corpus - Dealing with Individual Differences of Nonverbal Listening Behavior. In Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces: Theoretical and Practical Issues, pages 374--387. Springer Verlag, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. I. de Kok and D. Heylen. A survey on evaluation metrics for backchannel prediction models. In Feedback Behaviors in Dialog, pages 15--18, 2012.Google ScholarGoogle Scholar
  10. I. de Kok and D. Heylen. Controlling the Listener Response Rate. In Intelligent Virtual Agents, pages 168--179, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  11. I. de Kok, D. Ozkan, D. Heylen, and L.-P. Morency. Learning and Evaluating Response Prediction Models using Parallel Listener Consensus. In Proceeding of International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. T. Dittmann and L. G. Llewellyn. Relationship between vocalizations and head nods as listener responses. Journal of Personality and Social Psychology, 9(1):79--84, 1968.Google ScholarGoogle ScholarCross RefCross Ref
  13. T. Drugman and A. Alwan. Joint robust voicing detection and pitch estimation based on residual harmonics. In Interspeech, pages 1973--1976, 2011.Google ScholarGoogle Scholar
  14. C. Goodwin. Conversational Organization: interaction between speakers and hearers. Academic Press, 1981.Google ScholarGoogle Scholar
  15. J. Gratch, N. Wang, J. Gerten, E. Fast, and R. Duffy. Creating rapport with virtual agents. In Intelligent Virtual Agents, pages 125--138, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Gravano and J. Hirschberg. Backchannel-Inviting Cues in Task-Oriented Dialogue. In Interspeech 2009, pages 1019--1022, 2009.Google ScholarGoogle Scholar
  17. S. W. Gregory Jr. and B. R. Hoyt. Conversation partner mutual adaptation as demonstrated by Fourier series analysis. Journal of Psycholinguistic Research, 11(1):35--46, Jan. 1982.Google ScholarGoogle ScholarCross RefCross Ref
  18. L. Huang, L.-P. Morency, and J. Gratch. Parasocial Consensus Sampling: Combining Multiple Perspectives to Learn Virtual Human Behavior. In Proceedings of Autonomous Agents and Multi-Agent Systems, pages 1265--1272, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Huijbregts. Segmentation , Diarization and Speech Transcription : Surprise Data Unraveled. Phd thesis, University of Twente, 2008.Google ScholarGoogle Scholar
  20. S.-H. Kang, J. Gratch, N. Wang, and J. H. Watt. Does the contingency of agents' nonverbal feedback affect users' social anxiety? In Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems-Volume 1, number Aamas, pages 120--127, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Kendon. Some functions of gaze direction in social interaction. Acta Psychologica, 26:22--63, 1967.Google ScholarGoogle ScholarCross RefCross Ref
  22. H. Koiso, Y. Horiuchi, S. Tutiya, A. Ichikawa, and Y. Den. An Analysis of Turn-Taking and Backchannels Based on Prosodic and Syntactic Features in Japanese Map Task Dialogs. Language and Speech, 41(3--4):295--321, 1998.Google ScholarGoogle Scholar
  23. R. E. Kraut, S. H. Lewis, and L. W. Swezey. Listener responsiveness and the coordination of conversation. Journal of Personality and Social Psychology, 43(4):718--731, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  24. J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of International Conference on Machine Learning, pages 282--289, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. M. Maatman, J. Gratch, and S. Marsella. Natural behavior of a listening agent. In Intelligent Virtual Agents, pages 25--36, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. L.-P. Morency, I. de Kok, and J. Gratch. A probabilistic multimodal approach for predicting listener backchannels. Autonomous Agents and Multi-Agent Systems, 20(1):70--84, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Okato, K. Kato, M. Kamamoto, and S. Itahashi. Insertion of interjectory response based on prosodic information. Proceedings of IVTTA '96. Workshop on Interactive Voice Technology for Telecommunications Applications, pages 85--88, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  28. D. Ozkan and L.-P. Morency. Modeling wisdom of crowds using latent mixture of discriminative experts. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Sun and L.-P. Morency. Dialogue act recognition using reweighted speaker adaptation. In Proceedings of the 13th Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL '12), pages 118--125, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Ward and W. Tsukahara. Prosodic features which cue back-channel responses in English and Japanese. Journal of Pragmatics, 32(8):1177--1207, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  31. T. Watanabe and N. Yuuki. A Voice Reaction System with a Visualized Response Equivalent to Nodding. In Proceedings of the third international conference on human-computer interaction, Vol.1 on Work with computers: organizational, management, stress and health aspects, pages 396--403, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Speaker-adaptive multimodal prediction model for listener responses

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICMI '13: Proceedings of the 15th ACM on International conference on multimodal interaction
        December 2013
        630 pages
        ISBN:9781450321297
        DOI:10.1145/2522848

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 December 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        ICMI '13 Paper Acceptance Rate49of133submissions,37%Overall Acceptance Rate453of1,080submissions,42%
      • Article Metrics

        • Downloads (Last 12 months)7
        • Downloads (Last 6 weeks)3

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader