ABSTRACT
To facilitate referential communication between humans and robots and mediate their differences in representing the shared environment, we are exploring embodied collaborative models for referring expression generation (REG). Instead of a single minimum description to describe a target object, episodes of expressions are generated based on human feedback during human-robot interaction. We particularly investigate the role of embodiment such as robot gesture behaviors (i.e., pointing to an object) and human's gaze feedback (i.e., looking at a particular object) in the collaborative process. This paper examines different strategies of incorporating embodiment and collaboration in REG and discusses their possibilities and challenges in enabling human-robot referential communication.
- J. Y. Chai, L. She, R. Fang, S. Ottarson, C. Littley, C. Liu, and K. Hanson. Collaborative e'ort towards common ground in situated human robot dialogue. In Proceedings of 9th ACM/IEEE International Conference on Human-Robot Interaction, Bielefeld, Germany, 2014. Google ScholarDigital Library
- H. Clark and A. Bangerter. Changing ideas about reference, pages 25--49. Experimental pragmatics. Palgrave Macmillan, 2004.Google Scholar
- H. Clark and S. Brennan. Grounding in communication. Perspectives on socially shared cognition, 13:127--149, 1991.Google Scholar
- H. H. Clark and D. Wilkes-Gibbs. Referring as a collaborative process. Cognition, 22:1--39, 1986.Google ScholarCross Ref
- R. Dale. Computational interpretations of the gricean maxims in the generation of referring expressions. Cognitive Science, 19:233--263, 1995.Google ScholarCross Ref
- D. DeVault, N. Kariaeva, A. Kothari, I. Oved, and M. Stone. An information-state approach to collaborative reference. In Proceedings of the ACL 2005 on Interactive Poster and Demonstration Sessions, 2005. Google ScholarDigital Library
- R. Fang, M. Doering, and J. Y. Chai. Collaborative models for referring expression generation in situated dialogue. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27-31, 2014, Quebec City, Quebec, Canada., pages 1544--1550, 2014.Google ScholarDigital Library
- R. Fang, C. Liu, L. She, and J. Y. Chai. Towards situated dialogue: Revisiting referring expression generation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 392--402, Seattle, Washington, USA, October 2013. Association for Computational Linguistics.Google Scholar
- C. J. Fillmore. Towards a descriptive framework for spatial deixis. In R. J. Jarvella and W. Klein, editors, Speech, Place, and Action, pages 31--59. Wiley, Chichester, 1982.Google Scholar
- P. M. Fitts. The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology, 74:381--391, 1954.Google ScholarCross Ref
- A. Gatt. Structuring knowledge for reference generation: A clustering algorithm. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pages 321--328, 2006.Google Scholar
- A. Gatt and P. Paggio. What and where: An empirical investigation of pointing gestures and descriptions in multimodal referring actions. In Proceedings of the 14th European Workshop on Natural Language Generation, pages 82--91, So'a, Bulgaria, August 2013. Association for Computational Linguistics.Google Scholar
- A. Gatt and P. Paggio. Learning when to point: A data-driven approach. In Proceedings of the 25th International Conference on Computational Linguistics (COLING '14), 2014.Google Scholar
- S. Goldin-Meadow. The role of gesture in communication and thinking. Trends Cogn. Sci., 1999.Google ScholarCross Ref
- P. A. Heeman and G. Hirst. Collaborating on referring expressions. Computational Linguistics, 21:351--382, 1995. Google ScholarDigital Library
- S. Kazemzadeh, V. Ordonez, M. Matten, and T. Berg. Referitgame: Referring to objects in photographs of natural scenes. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 787--798, Doha, Qatar, October 2014. Association for Computational Linguistics.Google ScholarCross Ref
- A. Koller, M. Staudte, K. Garou', and M. Crocker. Enhancing referential success by tracking hearer gaze. In Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL '12, pages 30--39, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics. Google ScholarDigital Library
- E. Krahmer and K. V. Deemter. Computational generation of referring expressions: A survey. computational linguistics, 38(1):173--218, 2012. Google ScholarDigital Library
- C. Liu, R. Fang, and J. Y. Chai. Towards mediating shared perceptual basis in situated dialogue. In Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue, SIGDIAL '12, pages 140--149, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics. Google ScholarDigital Library
- C. Liu, R. Fang, L. She, and J. Chai. Modeling collaborative referring for situated referential grounding. In Proceedings of the SIGDIAL 2013 Conference, pages 78--86, Metz, France, August 2013. Association for Computational Linguistics.Google Scholar
- I. S. MacKenzie, A. Sellen, and W. A. S. Buxton. A comparison of input devices in element pointing and dragging tasks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '91, pages 161--166, New York, NY, USA, 1991. ACM. Google ScholarDigital Library
- M. Mitchell, K. van Deemter, and E. Reiter. Generating expressions that refer to visible objects. In Proceedings of NAAC-HLT 2013, pages 1174--1184, 2013.Google Scholar
- P. Piwek. Salience in the generation of multimodal referring acts. In Proceedings of the 2009 International Conference on Multimodal Interfaces, ICMI-MLMI '09, pages 207--210, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- A. Sauppe and B. Mutlu. Robot deictics: How gesture and context shape referential communication. In Proceedings of the 2014 ACM/IEEE International Conference on Human-robot Interaction, HRI '14, pages 342--349, New York, NY, USA, 2014. ACM. Google ScholarDigital Library
- I. F. V. D. Sluis. Multimodal Reference, Studies in Automatic Generation of Multimodal Referring Expressions. PhD thesis, Tulburg University, 2005.Google Scholar
- R. S. Sutton and A. G. Barto. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998. Google ScholarDigital Library
- M. Tanenhaus, M. Spivey-Knowlton, K. Eberhard, and J. Sedivy. Integration of visual and linguistic information during spoken language comprehension. Science, 268:1632--1634, 1995.Google ScholarCross Ref
- S. Tellex, R. Knepper, A. Li, D. Rus, and N. Roy. Asking for help using inverse semantics. In Proceedings of Robotics: Science and Systems, Berkeley, USA, July 2014.Google ScholarCross Ref
Index Terms
- Embodied Collaborative Referring Expression Generation in Situated Human-Robot Interaction
Recommendations
Collaborative effort towards common ground in situated human-robot dialogue
HRI '14: Proceedings of the 2014 ACM/IEEE international conference on Human-robot interactionIn situated human-robot dialogue, although humans and robots are co-present in a shared environment, they have significantly mismatched capabilities in perceiving the shared environment. Their representations of the shared world are misaligned. In order ...
A Methodology for Evaluating Multimodal Referring Expression Generation for Embodied Virtual Agents
ICMI '23 Companion: Companion Publication of the 25th International Conference on Multimodal InteractionRobust use of definite descriptions in a situated space often involves recourse to both verbal and non-verbal modalities. For IVAs, virtual agents designed to interact with humans, the ability to both recognize and generate non-verbal and verbal ...
Comparison of Human-Human and Human-Robot Turn-Taking Behaviour in Multiparty Situated Interaction
UM3I '14: Proceedings of the 2014 workshop on Understanding and Modeling Multiparty, Multimodal InteractionsIn this paper, we present an experiment where two human subjects are given a team-building task to solve together with a robot. The setting requires that the speakers' attention is partly directed towards objects on the table between them, as well as to ...
Comments