research-article

Multimodal floor control shift detection

Authors:

Mary P. HarperAuthors Info & Claims

ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces

Pages 15 - 22

https://doi.org/10.1145/1647314.1647320

Published: 02 November 2009 Publication History

Abstract

Floor control is a scheme used by people to organize speaking turns in multi-party conversations. Identifying the floor control shifts is important for understanding a conversation's structure and would be helpful for more natural human computer interaction systems. Although people tend to use verbal and nonverbal cues for managing floor control shifts, only audio cues, e.g., lexical and prosodic cues, have been used in most previous investigations on speaking turn prediction. In this paper, we present a statistical model to automatically detect floor control shifts using both verbal and nonverbal cues. Our experimental results show that using a combination of verbal and nonverbal cues provides more accurate detection.

References

[1]

M. Argyle and M. Cook. Gaze and Mutual Gaze. Cambridge Univ. Press,1976.

[2]

G. Beattie. The regulation of speaker turns in face-to-face conversation: Some implications for conversation in sound-only communication channels. Semiotica, 34:55--70, 1981.

[3]

A. Berger, S. Pietra, and V. Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22:39--72, 1996.

Digital Library

[4]

A. Cassell, T. Nakano, T. Bickmore, C. Sidner, and C. Rich. Non-verbal cues for discourse structure. In Proceedings of the Conference of Annual Meeting on Association for Computational Linguistics Linguistics (ACL), pages 106--115, Toulouse, France, 2001.

Digital Library

[5]

L. Chen. Incorporating Nonverbal Features into Multimodal Models of Human-to-Human Communication. PhD thesis, Purdue University, West Lafayette,IN, August 2008.

[6]

L. Chen, M. Harper, A. Franklin, T. R. Rose, I. Kimbara, Z. Q. Huang, and F. Quek. A multimodal analysis of oor control in meetings. In Proceedings of the Joint Workshop on Machine Learning and Multimodal Interaction (MLMI), Washington, DC, USA, May 2006.

Digital Library

[7]

L. Chen, T. Rose, Y. Qiao, I. Kimbara, F. Parrill, H. Welji, T. Xu, J. Tu, Z. Huang, M. Harper, F. Quek, Y. Xiong, D. McNeill, R. Tuttle, and T. S. Huang. VACE multimodal meeting corpus. In Proceedings of the Joint Workshop on Machine Learning and Multimodal Interaction (MLMI), 2005.

Digital Library

[8]

S. Chen and R. Rosenfeld. A Gaussian prior for smoothing maximum entropy models. Technical report, Carnegie Mellon University, 1999.

[9]

T. Dietterich. Ensemble methods in machine learning. Lecture Notes in Computer Science, 1857:1--15, 2000.

Digital Library

[10]

S. Duncan. Some signals and rules for taking speaking turns in conversations. Journal of Personality and Social Psychology, 23:283--292, 1972.

[11]

U. Fayyad and K. Irani. On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8:87--102, 1992.

[12]

L. Ferrer, E. Shriberg, and A. Stolcke. Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody in human-computer dialog. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), Denver, CO, 2002.

[13]

C. Ford and S. Thompson. Interactional units in conversation: Syntactic, intonational, and pragmatic resources for the managment of turns. In T. Ochs, Scheglo_, editor, Interaction and Grammar. Cambridge Univ. Press, 1996.

[14]

Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In Proceedings of the International Conference on Machine Learning (ICML),1996.

[15]

D. Gatica-Perez. Analyzing human interaction in conversations: A review. In Proceedings of IEEE Conference on Multisensor Fusion and Integration for Intelligent Systems, 2006.

[16]

C. Goodwin. Conversational Organization: Interaction Between Speakers and Hearers. Academic Press, 1981.

[17]

Z. Huang, L. Chen, and M. Harper. An open source prosodic feature extraction tool. In Proceedings of the Conference on Language Resources and Evaluations (LREC), May 2006.

[18]

N. Jovanovic, R. Akker, and A. Nijholt. Addressee identi_cation in face-to-face meetings. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL), Trento, Italy, April 2006.

[19]

A. Kalma. Gazing in trials -- A powerful signal in oor appointment. British Journal of Social Psychology, 1:21--39, 1992.

[20]

A. Kendon. Some functions of gaze--direction in social interaction. ActaPsychologica, 26:22--63, 1967.

[21]

J. La_erty, A. McCallum, and F. Pereira. Conditional random field: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning (ICML), 2001.

Digital Library

[22]

E. L. Lehmann. Testing Statistical Hypotheses. Springer, 3rd edition, 2005.

[23]

G. Levow. Turn-taking in mandarin dialogue: Interactions of tones and intonation. In Proceedings of the SIGHAN Workshop, 2005.

[24]

Y. Liu. Structural Event Detection for Rich Transcription of Speech. PhD thesis, Purdue University, 2004.

Digital Library

[25]

J. Local and J. Kelly. Projection and 'silences': Notes on phonetic and conversational structure. Human Studies, 9:185--204, 1986.

[26]

A. McCallum. Mallet: A machine learning toolkit for language. http://mallet.cs.umass.edu, 2005.

[27]

D. McNeill. Hand and Mind: What Gestures Reveal about Thought. Univ. Chicago Press, 1992.

[28]

D. G. Novick. Models of gaze in multi-party discourse. In Proceedings of Computer Human Interface (CHI) Workshop on the Virtuality Continuum Revisted, Portland, OR, April 2005.

[29]

D. G. Novick, B. Hansen, and K. Ward. Coordinating turn-taking with gaze. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), 1996.

[30]

E. Padilha and J. Carletta. Nonverbal behaviours improving a simulation of small group discussion. In Proceedings of the First International Nordic Symposium of Multi-modal Communication, 2003.

[31]

T. Rose, F. Quek, and Y. Shi. MacVissta: A system for multimodal analysis. In Proceedings of the International Conference on Multimodal Interface (ICMI), 2004.

Digital Library

[32]

H. Sacks, E. Scheglo_, and G. Jeerson. A simplest systematics for the organisation of turn taking for conversation. Language, 50:696--735, 1974.

[33]

D. Schlangen. From reaction to prediction experiments with computational models of turn-taking. In Proceedings of the International Conference on Spoken Language Processing (ICSLP), 2006.

[34]

E. Shriberg, A. Stolcke, D. Hakkani-Tur, and G. Tur. Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication, 32(1--2):127--154, 2000.

Digital Library

[35]

S. Strassel. Simple Metadata Annotation Specification. Linguistic Data Consortium, 5.0 edition, 2003.

[36]

S. Thede and M. Harper. A second-order hidden markov model for part-of-speech tagging. In Proceedings of the Conference of Annual Meeting on Association for Computational Linguistics Linguistics (ACL), Baltimore, MD, 1999.

Digital Library

[37]

R. Vertegaal, G. Veer, and H. Vons. Effects of gaze on multiparty mediated communication. In Proceedings of Graphics Interface, 2000.

[38]

A. Wichmann and J. Caspers. Melodic cues to turn-taking in english: Evidence from perception. In Proceedings of the SIGDial Workshop on Discourse and Dialogue, 2001.

Digital Library

[39]

I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2005.

Digital Library

[40]

L. Zhang. Maximum Entropy Modeling Toolkit for Python and C++. http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html, 2005.

Cited By

Durrani ILiu CIshi CIshiguro H(2024)Is It Possible to Recognize a Speaker Without Listening? Unraveling Conversation Dynamics in Multi-Party Interactions Using Continuous Eye GazeIEEE Robotics and Automation Letters10.1109/LRA.2024.34408449:11(9923-9929)Online publication date: Nov-2024
https://doi.org/10.1109/LRA.2024.3440844
Thompson L(2022)Virtual Intelligence in the Post-Pandemic EraPost-Pandemic Talent Management Models in Knowledge Organizations10.4018/978-1-6684-3894-7.ch007(140-170)Online publication date: 27-May-2022
https://doi.org/10.4018/978-1-6684-3894-7.ch007
Ishii RRen XMuszynski MMorency L(2022)Trimodal prediction of speaking and listening willingness to help improve turn-changing modelingFrontiers in Psychology10.3389/fpsyg.2022.77454713Online publication date: 18-Oct-2022
https://doi.org/10.3389/fpsyg.2022.774547
Show More Cited By

Index Terms

Multimodal floor control shift detection

Recommendations

A First Step Towards Considering Nonverbal Communicative Cues in Dialog Systems
PETRA '24: Proceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments

Currently, spoken dialogue systems mostly consider only voice interactions and do not consider nonverbal cues. This work aims to explore the domain of interaction with conversational assistants, taking into account not only verbal cues, but also the ...
A framework for the multimodal joint work of turn construction in face-to-face interaction

Gesture increases the face-to-face communication efficacy.Face-to-face efficacy is due to parallel feedback processing from all participants.The framework explains micro-level face-to-face interaction through dataflow.The framework can be used to ...
Punchline Detection using Context-Aware Hierarchical Multimodal Fusion
ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction

Humor has a history as old as humanity. Humor often induces laughter and elicits amusement and engagement. Humorous behavior involves behavior manifested in different modalities including language, voice tone, and gestures. Thus, automatic understanding ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces

November 2009

374 pages

ISBN:9781605587721

DOI:10.1145/1647314

General Chairs:
James L. Crowley
INRIA Grenoble Rhône-Alpes Research Centre, France
,
Yuri Ivanov
MERL, USA
,
Christopher Wren
Google, USA
,
Program Chairs:
Daniel Gatica-Perez
Idiap Research Institute, Switzerland
,
Michael Johnston
AT&T Research, USA
,
Rainer Stiefelhagen
University of Karlsruhe, Germany

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICMI-MLMI '09

Sponsor:

SIGCHI

ICMI-MLMI '09: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES/WORKSHOP ON MACHINE LEARNING FOR MULTIMODAL INTERFACES

November 2 - 4, 2009

Massachusetts, Cambridge, USA

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
258
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)2

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Durrani ILiu CIshi CIshiguro H(2024)Is It Possible to Recognize a Speaker Without Listening? Unraveling Conversation Dynamics in Multi-Party Interactions Using Continuous Eye GazeIEEE Robotics and Automation Letters10.1109/LRA.2024.34408449:11(9923-9929)Online publication date: Nov-2024
https://doi.org/10.1109/LRA.2024.3440844
Thompson L(2022)Virtual Intelligence in the Post-Pandemic EraPost-Pandemic Talent Management Models in Knowledge Organizations10.4018/978-1-6684-3894-7.ch007(140-170)Online publication date: 27-May-2022
https://doi.org/10.4018/978-1-6684-3894-7.ch007
Ishii RRen XMuszynski MMorency L(2022)Trimodal prediction of speaking and listening willingness to help improve turn-changing modelingFrontiers in Psychology10.3389/fpsyg.2022.77454713Online publication date: 18-Oct-2022
https://doi.org/10.3389/fpsyg.2022.774547
Tian L(2021)Improved Gazing Transition Patterns for Predicting Turn-Taking in Multiparty ConversationProceedings of the 2021 5th International Conference on Video and Image Processing10.1145/3511176.3511208(215-219)Online publication date: 22-Dec-2021
https://dl.acm.org/doi/10.1145/3511176.3511208
Ishii RKumano SHigashinaka ROzawa SKinebuchi T(2021)Estimation of Empathy Skill Level and Personal Traits Using Gaze Behavior and Dialogue Act During Turn-ChangingHCI International 2021 - Late Breaking Papers: Multimodality, eXtended Reality, and Artificial Intelligence10.1007/978-3-030-90963-5_4(44-57)Online publication date: 11-Nov-2021
https://doi.org/10.1007/978-3-030-90963-5_4
Ishii ROtsuka KKumano SHigashinaka RTomita J(2019)Prediction of Who Will Be Next Speaker and When Using Mouth-Opening Pattern in Multi-Party ConversationMultimodal Technologies and Interaction10.3390/mti30400703:4(70)Online publication date: 26-Oct-2019
https://doi.org/10.3390/mti3040070
Ishii ROtsuka KKumano SHigashinaka RTomita JD'Mello SGeorgiou PScherer SProvost ESoleymani MWorsley M(2018)Analyzing Gaze Behavior and Dialogue Act during Turn-taking for Estimating Empathy Skill LevelProceedings of the 20th ACM International Conference on Multimodal Interaction10.1145/3242969.3242978(31-39)Online publication date: 2-Oct-2018
https://dl.acm.org/doi/10.1145/3242969.3242978
Tian LJia QZhu Z(2018)Predicting Turn-Taking by Compact Gazing Transition Patterns in Multiparty ConversationImage and Video Technology10.1007/978-3-319-75786-5_35(437-447)Online publication date: 15-Feb-2018
https://doi.org/10.1007/978-3-319-75786-5_35
Ishii RKumano SOtsuka KLank EVinciarelli AHoggan ESubramanian SBrewster S(2017)Analyzing gaze behavior during turn-taking for estimating empathy skill levelProceedings of the 19th ACM International Conference on Multimodal Interaction10.1145/3136755.3136786(365-373)Online publication date: 3-Nov-2017
https://dl.acm.org/doi/10.1145/3136755.3136786
Ishii RKumano SOtsuka KWrede BNagai YKomatsu THanheide MNatale L(2017)Prediction of Next-Utterance Timing using Head Movement in Multi-Party MeetingsProceedings of the 5th International Conference on Human Agent Interaction10.1145/3125739.3125765(181-187)Online publication date: 17-Oct-2017
https://dl.acm.org/doi/10.1145/3125739.3125765
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten