skip to main content
10.1145/1647314.1647320acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Multimodal floor control shift detection

Published: 02 November 2009 Publication History

Abstract

Floor control is a scheme used by people to organize speaking turns in multi-party conversations. Identifying the floor control shifts is important for understanding a conversation's structure and would be helpful for more natural human computer interaction systems. Although people tend to use verbal and nonverbal cues for managing floor control shifts, only audio cues, e.g., lexical and prosodic cues, have been used in most previous investigations on speaking turn prediction. In this paper, we present a statistical model to automatically detect floor control shifts using both verbal and nonverbal cues. Our experimental results show that using a combination of verbal and nonverbal cues provides more accurate detection.

References

[1]
M. Argyle and M. Cook. Gaze and Mutual Gaze. Cambridge Univ. Press,1976.
[2]
G. Beattie. The regulation of speaker turns in face-to-face conversation: Some implications for conversation in sound-only communication channels. Semiotica, 34:55--70, 1981.
[3]
A. Berger, S. Pietra, and V. Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22:39--72, 1996.
[4]
A. Cassell, T. Nakano, T. Bickmore, C. Sidner, and C. Rich. Non-verbal cues for discourse structure. In Proceedings of the Conference of Annual Meeting on Association for Computational Linguistics Linguistics (ACL), pages 106--115, Toulouse, France, 2001.
[5]
L. Chen. Incorporating Nonverbal Features into Multimodal Models of Human-to-Human Communication. PhD thesis, Purdue University, West Lafayette,IN, August 2008.
[6]
L. Chen, M. Harper, A. Franklin, T. R. Rose, I. Kimbara, Z. Q. Huang, and F. Quek. A multimodal analysis of oor control in meetings. In Proceedings of the Joint Workshop on Machine Learning and Multimodal Interaction (MLMI), Washington, DC, USA, May 2006.
[7]
L. Chen, T. Rose, Y. Qiao, I. Kimbara, F. Parrill, H. Welji, T. Xu, J. Tu, Z. Huang, M. Harper, F. Quek, Y. Xiong, D. McNeill, R. Tuttle, and T. S. Huang. VACE multimodal meeting corpus. In Proceedings of the Joint Workshop on Machine Learning and Multimodal Interaction (MLMI), 2005.
[8]
S. Chen and R. Rosenfeld. A Gaussian prior for smoothing maximum entropy models. Technical report, Carnegie Mellon University, 1999.
[9]
T. Dietterich. Ensemble methods in machine learning. Lecture Notes in Computer Science, 1857:1--15, 2000.
[10]
S. Duncan. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology, 23:283--292, 1972.
[11]
U. Fayyad and K. Irani. On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8:87--102, 1992.
[12]
L. Ferrer, E. Shriberg, and A. Stolcke. Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody in human-computer dialog. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Denver, CO, 2002.
[13]
C. Ford and S. Thompson. Interactional units in conversation: Syntactic, intonational, and pragmatic resources for the managment of turns. In T. Ochs, Scheglo_, editor, Interaction and Grammar. Cambridge Univ. Press, 1996.
[14]
Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In Proceedings of the International Conference on Machine Learning (ICML),1996.
[15]
D. Gatica-Perez. Analyzing human interaction in conversations: A review. In Proceedings of IEEE Conference on Multisensor Fusion and Integration for Intelligent Systems, 2006.
[16]
C. Goodwin. Conversational Organization: Interaction Between Speakers and Hearers. Academic Press, 1981.
[17]
Z. Huang, L. Chen, and M. Harper. An open source prosodic feature extraction tool. In Proceedings of the Conference on Language Resources and Evaluations (LREC), May 2006.
[18]
N. Jovanovic, R. Akker, and A. Nijholt. Addressee identi_cation in face-to-face meetings. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL), Trento, Italy, April 2006.
[19]
A. Kalma. Gazing in trials -- A powerful signal in oor appointment. British Journal of Social Psychology, 1:21--39, 1992.
[20]
A. Kendon. Some functions of gaze--direction in social interaction. ActaPsychologica, 26:22--63, 1967.
[21]
J. La_erty, A. McCallum, and F. Pereira. Conditional random field: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning (ICML), 2001.
[22]
E. L. Lehmann. Testing Statistical Hypotheses. Springer, 3rd edition, 2005.
[23]
G. Levow. Turn-taking in mandarin dialogue: Interactions of tones and intonation. In Proceedings of the SIGHAN Workshop, 2005.
[24]
Y. Liu. Structural Event Detection for Rich Transcription of Speech. PhD thesis, Purdue University, 2004.
[25]
J. Local and J. Kelly. Projection and 'silences': Notes on phonetic and conversational structure. Human Studies, 9:185--204, 1986.
[26]
A. McCallum. Mallet: A machine learning toolkit for language. http://mallet.cs.umass.edu, 2005.
[27]
D. McNeill. Hand and Mind: What Gestures Reveal about Thought. Univ. Chicago Press, 1992.
[28]
D. G. Novick. Models of gaze in multi-party discourse. In Proceedings of Computer Human Interface (CHI) Workshop on the Virtuality Continuum Revisted, Portland, OR, April 2005.
[29]
D. G. Novick, B. Hansen, and K. Ward. Coordinating turn-taking with gaze. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), 1996.
[30]
E. Padilha and J. Carletta. Nonverbal behaviours improving a simulation of small group discussion. In Proceedings of the First International Nordic Symposium of Multi-modal Communication, 2003.
[31]
T. Rose, F. Quek, and Y. Shi. MacVissta: A system for multimodal analysis. In Proceedings of the International Conference on Multimodal Interface (ICMI), 2004.
[32]
H. Sacks, E. Scheglo_, and G. Jeerson. A simplest systematics for the organisation of turn taking for conversation. Language, 50:696--735, 1974.
[33]
D. Schlangen. From reaction to prediction experiments with computational models of turn-taking. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), 2006.
[34]
E. Shriberg, A. Stolcke, D. Hakkani-Tur, and G. Tur. Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication, 32(1--2):127--154, 2000.
[35]
S. Strassel. Simple Metadata Annotation Specification. Linguistic Data Consortium, 5.0 edition, 2003.
[36]
S. Thede and M. Harper. A second-order hidden markov model for part-of-speech tagging. In Proceedings of the Conference of Annual Meeting on Association for Computational Linguistics Linguistics (ACL), Baltimore, MD, 1999.
[37]
R. Vertegaal, G. Veer, and H. Vons. Effects of gaze on multiparty mediated communication. In Proceedings of Graphics Interface, 2000.
[38]
A. Wichmann and J. Caspers. Melodic cues to turn-taking in english: Evidence from perception. In Proceedings of the SIGDial Workshop on Discourse and Dialogue, 2001.
[39]
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2005.
[40]
L. Zhang. Maximum Entropy Modeling Toolkit for Python and C++. http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html, 2005.

Cited By

View all
  • (2024)Is It Possible to Recognize a Speaker Without Listening? Unraveling Conversation Dynamics in Multi-Party Interactions Using Continuous Eye GazeIEEE Robotics and Automation Letters10.1109/LRA.2024.34408449:11(9923-9929)Online publication date: Nov-2024
  • (2022)Virtual Intelligence in the Post-Pandemic EraPost-Pandemic Talent Management Models in Knowledge Organizations10.4018/978-1-6684-3894-7.ch007(140-170)Online publication date: 27-May-2022
  • (2022)Trimodal prediction of speaking and listening willingness to help improve turn-changing modelingFrontiers in Psychology10.3389/fpsyg.2022.77454713Online publication date: 18-Oct-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces
November 2009
374 pages
ISBN:9781605587721
DOI:10.1145/1647314
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. floor control
  2. language model
  3. multimodal fusion
  4. nonverbal communication
  5. prosody

Qualifiers

  • Research-article

Conference

ICMI-MLMI '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Is It Possible to Recognize a Speaker Without Listening? Unraveling Conversation Dynamics in Multi-Party Interactions Using Continuous Eye GazeIEEE Robotics and Automation Letters10.1109/LRA.2024.34408449:11(9923-9929)Online publication date: Nov-2024
  • (2022)Virtual Intelligence in the Post-Pandemic EraPost-Pandemic Talent Management Models in Knowledge Organizations10.4018/978-1-6684-3894-7.ch007(140-170)Online publication date: 27-May-2022
  • (2022)Trimodal prediction of speaking and listening willingness to help improve turn-changing modelingFrontiers in Psychology10.3389/fpsyg.2022.77454713Online publication date: 18-Oct-2022
  • (2021)Improved Gazing Transition Patterns for Predicting Turn-Taking in Multiparty ConversationProceedings of the 2021 5th International Conference on Video and Image Processing10.1145/3511176.3511208(215-219)Online publication date: 22-Dec-2021
  • (2021)Estimation of Empathy Skill Level and Personal Traits Using Gaze Behavior and Dialogue Act During Turn-ChangingHCI International 2021 - Late Breaking Papers: Multimodality, eXtended Reality, and Artificial Intelligence10.1007/978-3-030-90963-5_4(44-57)Online publication date: 11-Nov-2021
  • (2019)Prediction of Who Will Be Next Speaker and When Using Mouth-Opening Pattern in Multi-Party ConversationMultimodal Technologies and Interaction10.3390/mti30400703:4(70)Online publication date: 26-Oct-2019
  • (2018)Analyzing Gaze Behavior and Dialogue Act during Turn-taking for Estimating Empathy Skill LevelProceedings of the 20th ACM International Conference on Multimodal Interaction10.1145/3242969.3242978(31-39)Online publication date: 2-Oct-2018
  • (2018)Predicting Turn-Taking by Compact Gazing Transition Patterns in Multiparty ConversationImage and Video Technology10.1007/978-3-319-75786-5_35(437-447)Online publication date: 15-Feb-2018
  • (2017)Analyzing gaze behavior during turn-taking for estimating empathy skill levelProceedings of the 19th ACM International Conference on Multimodal Interaction10.1145/3136755.3136786(365-373)Online publication date: 3-Nov-2017
  • (2017)Prediction of Next-Utterance Timing using Head Movement in Multi-Party MeetingsProceedings of the 5th International Conference on Human Agent Interaction10.1145/3125739.3125765(181-187)Online publication date: 17-Oct-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media