Article

Using maximum entropy (ME) model to incorporate gesture cues for SU detection

Authors:

Zhongqiang HuangAuthors Info & Claims

ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

Pages 185 - 192

https://doi.org/10.1145/1180995.1181035

Published: 02 November 2006 Publication History

Abstract

Accurate identification of sentence units (SUs) in spontaneous speech has been found to improve the accuracy of speech recognition, as well as downstream applications such as parsing. In recent multimodal investigations, gestur]al features were utilized, in addition to lexical and prosodic cues from the speech channel, for detecting SUs in conversational interactions using a hidden Markov model (HMM) approach. Although this approach is computationally efficient and provides a convenient way to modularize the knowledge sources, it has two drawbacks for our SU task. First, standard HMM training methods maximize the joint probability of observations and hidden events, as opposed to the posterior probability of a hidden event given observations, a criterion more closely related to SU classification error. A second challenge for integrating gestural features is that their absence sanctions neither SU events nor non-events; it is only the co-timing of gestures with the speech channel that should impact our model. To address these problems, a Maximum Entropy (ME) model is used to combine multimodal cues for SU estimation. Experiments carried out on VACE multi-party meetings confirm that the ME modeling approach provides a solid framework for multimodal integration.

References

[1]

M. Argyle and M. Cook. Gaze and Mutual Gaze. Cambridge Univ. Press, 1976.

[2]

A. Berger, S. Pietra, and V. Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22:39--72, 1996.

Digital Library

[3]

L. Chen, Y. Liu, M. Harper, and E. Shriberg. Multimodal model integration for sentence unit detection. In Proc. of Int. Conf. on Multimodal Interface (ICMI), University Park, PA, Oct 2004.

Digital Library

[4]

L. Chen, T. Rose, F. Parrill, X. Han, J. Tu, Z. Huang, I. Kimbara, H. Welji, M. Harper, F. Quek, D. McNeill, S. Duncan, R. Tuttle, and T. Huang. VACE multimodal meeting corpus. In Proc. of the Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI), 2005.

Digital Library

[5]

S. Coquoz. Broadcast news segmentation using MDE and STT information to improve speech recognition. Technical report, International Computer Science Institute, 2004.

[6]

J. Eisenstein and R. Davis. Gestural cues for sentence segmentation. MIT AI Memo, 2005.

[7]

L. Ferrer. Prosodic features extraction. Technical report, SRI, 2002.

[8]

S. Goldin-Meadow. The role of gesture in communication and thinking. Trends in Cognitive Sciences, 3(11), Nov 1999.

[9]

M. Harper, B. Dorr, B. Roark, J. Hale, Z. Shafran, Y. Liu, M. Lease, M. Snover, L. Young, R. Stewart, and A. Krasnyanskaya. Final report: Parsing speech and structural event detection. Technical report, John Hopkins CSLP Summer Workshop, 2005.

[10]

Z. Huang, L. Chen, and M. Harper. An open source prosodic feature extraction tool. In Proc. of the Language Resources and Evaluation Conference (LREC), May 2006.

[11]

M. Kipp. Anvil: A generic annotation tool for multimodal dialogue. In Proc. of European Conf. on Speech Processing (EuroSpeech), 2001.

[12]

Y. Liu. Structural Event Detection for Rich Transcription of Speech. PhD thesis, Purdue University, 2004.

Digital Library

[13]

Y. Liu, N. V. Chawla, E. Shriberg, A. Stolcke, and M. Harper. Resampling techniques for Sentence Boundary Detection: A Case Study in Machine Learning from Imbalanced Data for Spoken Language Processing. Computer Speech and Language, to appear.

[14]

Y. Liu, E. Shriberg, A. Stockle, and M. Harper. Comparing HMM, Maximum Entropy, and Conditional Random Fields for disfluency detection. In Proc. of InterSpeech, Lisbon, Sept. 2005.

[15]

Y. Liu, E. Shriberg, A. Stolcke, B. Peskin, J. Ang, H. D., M. Ostendorf, M. Tomalin, P. Woodland, and M. Harper. Structural Metadata Research in the EARS Program. In Proc. of ICASSP, 2005.

[16]

Y. Liu, A. Stolcke, E. Shriberg, and M. Harper. Comparing and combining generative and posterior probability models: Some advances in sentence boundary detection in speech. In Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2004.

[17]

D. McNeill. Hand and Mind: What Gestures Reveal about Thought. Univ. Chicago Press, 1992.

[18]

D. McNeill. Growth points, catchments, and contexts. Cognitive Studies: Bulletin of the Japanese Cognitive Science Society, 7(1), 2000.

[19]

F. Quek, D. McNeill, R. Bryll, S. Duncan, X. Ma, C. Kirbas, K. E. McCullough, and R. Ansari. Multimodal human discourse: Gesture and speech. ACM Trans. Comput.-Hum. Interact., 9(3):171--193, 2002.

Digital Library

[20]

L. R. Rabiner and B. H. Juang. An introduction to hidden Markov models. IEEE ASSP Magazine, 3(1):4--16, 1986.

[21]

T. Rose, F. Quek, and Y. Shi. Macvissta: A system for multimodal analysis. In Proc. of Int. Conf. on Multimodal Interface (ICMI), 2004.

Digital Library

[22]

E. Shriberg, A. Stolcke, D. Hakkani-Tur, and G. Tur. Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication, 32(1-2):127--154, 2000.

Digital Library

[23]

L. Zhang. Maximum Entropy Modeling Toolkit for Python and C++. http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html.

Cited By

Chen LHarper M(2011)Utilizing gestures to improve sentence boundary detectionMultimedia Tools and Applications10.1007/s11042-009-0436-z51:3(1035-1067)Online publication date: 1-Feb-2011
https://dl.acm.org/doi/10.1007/s11042-009-0436-z
Eisenstein JBarzilay RDavis R(2008)Gesture salience as a hidden variable for coreference resolution and keyframe extractionJournal of Artificial Intelligence Research10.5555/1622655.162266631:1(353-398)Online publication date: 1-Feb-2008
https://dl.acm.org/doi/10.5555/1622655.1622666
Sowa T(2008)The Recognition and Comprehension of Hand Gestures - A Review and Research AgendaModeling Communication with Robots and Virtual Humans10.1007/978-3-540-79037-2_3(38-56)Online publication date: 2008
https://doi.org/10.1007/978-3-540-79037-2_3
Show More Cited By

Index Terms

Using maximum entropy (ME) model to incorporate gesture cues for SU detection

Recommendations

Multimodal model integration for sentence unit detection
ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

In this paper, we adopt a direct modeling approach to utilize conversational gesture cues in detecting sentence boundaries, called SUs, in video taped conversations. We treat the detection of SUs as a classification task such that for each inter-word ...
Utilizing gestures to better understand dynamic structure of human communication
ICMI '04: Proceedings of the 6th international conference on Multimodal interfaces

<b>Motivation:</b> Many researchers have highlighted the importance of gesture in natural human communication. McNeill [4] puts forward the hypothesis that gesture and speech stem from the same mental process and so tend to be both temporally and ...
Utilizing gestures to improve sentence boundary detection

An accurate estimation of sentence units (SUs) in spontaneous speech is important for (1) helping listeners to better understand speech content and (2) supporting other natural language processing tasks that require sentence information. There has been ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces

November 2006

404 pages

ISBN:159593541X

DOI:10.1145/1180995

General Chairs:
Francis Quek
Virginia Tech, USA
,
Jie Yang
Carnegie Mellon University, USA
,
Program Chairs:
Dominic Massaro
University of California, Santa Cruz, USA
,
Abeer Alwan
University of California, Los Angeles, USA
,
Timothy J. Hazen
Massachusetts Institute of Technology, USA

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

ICMI06

Sponsor:

ICMI06: 8th International Conference on Multimodal Interfaces 2006

November 2 - 4, 2006

Alberta, Banff, Canada

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
222
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Chen LHarper M(2011)Utilizing gestures to improve sentence boundary detectionMultimedia Tools and Applications10.1007/s11042-009-0436-z51:3(1035-1067)Online publication date: 1-Feb-2011
https://dl.acm.org/doi/10.1007/s11042-009-0436-z
Eisenstein JBarzilay RDavis R(2008)Gesture salience as a hidden variable for coreference resolution and keyframe extractionJournal of Artificial Intelligence Research10.5555/1622655.162266631:1(353-398)Online publication date: 1-Feb-2008
https://dl.acm.org/doi/10.5555/1622655.1622666
Sowa T(2008)The Recognition and Comprehension of Hand Gestures - A Review and Research AgendaModeling Communication with Robots and Virtual Humans10.1007/978-3-540-79037-2_3(38-56)Online publication date: 2008
https://doi.org/10.1007/978-3-540-79037-2_3
Sowa T(2006)The recognition and comprehension of hand gesturesProceedings of the Embodied communication in humans and machines, 2nd ZiF research group international conference on Modeling communication with robots and virtual humans10.5555/1794517.1794520(38-56)Online publication date: 5-Apr-2006
https://dl.acm.org/doi/10.5555/1794517.1794520

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten