skip to main content
10.1145/1180995.1181035acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
Article

Using maximum entropy (ME) model to incorporate gesture cues for SU detection

Published: 02 November 2006 Publication History

Abstract

Accurate identification of sentence units (SUs) in spontaneous speech has been found to improve the accuracy of speech recognition, as well as downstream applications such as parsing. In recent multimodal investigations, gestur]al features were utilized, in addition to lexical and prosodic cues from the speech channel, for detecting SUs in conversational interactions using a hidden Markov model (HMM) approach. Although this approach is computationally efficient and provides a convenient way to modularize the knowledge sources, it has two drawbacks for our SU task. First, standard HMM training methods maximize the joint probability of observations and hidden events, as opposed to the posterior probability of a hidden event given observations, a criterion more closely related to SU classification error. A second challenge for integrating gestural features is that their absence sanctions neither SU events nor non-events; it is only the co-timing of gestures with the speech channel that should impact our model. To address these problems, a Maximum Entropy (ME) model is used to combine multimodal cues for SU estimation. Experiments carried out on VACE multi-party meetings confirm that the ME modeling approach provides a solid framework for multimodal integration.

References

[1]
M. Argyle and M. Cook. Gaze and Mutual Gaze. Cambridge Univ. Press, 1976.
[2]
A. Berger, S. Pietra, and V. Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22:39--72, 1996.
[3]
L. Chen, Y. Liu, M. Harper, and E. Shriberg. Multimodal model integration for sentence unit detection. In Proc. of Int. Conf. on Multimodal Interface (ICMI), University Park, PA, Oct 2004.
[4]
L. Chen, T. Rose, F. Parrill, X. Han, J. Tu, Z. Huang, I. Kimbara, H. Welji, M. Harper, F. Quek, D. McNeill, S. Duncan, R. Tuttle, and T. Huang. VACE multimodal meeting corpus. In Proc. of the Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI), 2005.
[5]
S. Coquoz. Broadcast news segmentation using MDE and STT information to improve speech recognition. Technical report, International Computer Science Institute, 2004.
[6]
J. Eisenstein and R. Davis. Gestural cues for sentence segmentation. MIT AI Memo, 2005.
[7]
L. Ferrer. Prosodic features extraction. Technical report, SRI, 2002.
[8]
S. Goldin-Meadow. The role of gesture in communication and thinking. Trends in Cognitive Sciences, 3(11), Nov 1999.
[9]
M. Harper, B. Dorr, B. Roark, J. Hale, Z. Shafran, Y. Liu, M. Lease, M. Snover, L. Young, R. Stewart, and A. Krasnyanskaya. Final report: Parsing speech and structural event detection. Technical report, John Hopkins CSLP Summer Workshop, 2005.
[10]
Z. Huang, L. Chen, and M. Harper. An open source prosodic feature extraction tool. In Proc. of the Language Resources and Evaluation Conference (LREC), May 2006.
[11]
M. Kipp. Anvil: A generic annotation tool for multimodal dialogue. In Proc. of European Conf. on Speech Processing (EuroSpeech), 2001.
[12]
Y. Liu. Structural Event Detection for Rich Transcription of Speech. PhD thesis, Purdue University, 2004.
[13]
Y. Liu, N. V. Chawla, E. Shriberg, A. Stolcke, and M. Harper. Resampling techniques for Sentence Boundary Detection: A Case Study in Machine Learning from Imbalanced Data for Spoken Language Processing. Computer Speech and Language, to appear.
[14]
Y. Liu, E. Shriberg, A. Stockle, and M. Harper. Comparing HMM, Maximum Entropy, and Conditional Random Fields for disfluency detection. In Proc. of InterSpeech, Lisbon, Sept. 2005.
[15]
Y. Liu, E. Shriberg, A. Stolcke, B. Peskin, J. Ang, H. D., M. Ostendorf, M. Tomalin, P. Woodland, and M. Harper. Structural Metadata Research in the EARS Program. In Proc. of ICASSP, 2005.
[16]
Y. Liu, A. Stolcke, E. Shriberg, and M. Harper. Comparing and combining generative and posterior probability models: Some advances in sentence boundary detection in speech. In Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2004.
[17]
D. McNeill. Hand and Mind: What Gestures Reveal about Thought. Univ. Chicago Press, 1992.
[18]
D. McNeill. Growth points, catchments, and contexts. Cognitive Studies: Bulletin of the Japanese Cognitive Science Society, 7(1), 2000.
[19]
F. Quek, D. McNeill, R. Bryll, S. Duncan, X. Ma, C. Kirbas, K. E. McCullough, and R. Ansari. Multimodal human discourse: Gesture and speech. ACM Trans. Comput.-Hum. Interact., 9(3):171--193, 2002.
[20]
L. R. Rabiner and B. H. Juang. An introduction to hidden Markov models. IEEE ASSP Magazine, 3(1):4--16, 1986.
[21]
T. Rose, F. Quek, and Y. Shi. Macvissta: A system for multimodal analysis. In Proc. of Int. Conf. on Multimodal Interface (ICMI), 2004.
[22]
E. Shriberg, A. Stolcke, D. Hakkani-Tur, and G. Tur. Prosody-based automatic segmentation of speech into sentences and topics. Speech Communication, 32(1-2):127--154, 2000.
[23]
L. Zhang. Maximum Entropy Modeling Toolkit for Python and C++. http://homepages.inf.ed.ac.uk/s0450736/maxent_toolkit.html.

Cited By

View all
  • (2011)Utilizing gestures to improve sentence boundary detectionMultimedia Tools and Applications10.1007/s11042-009-0436-z51:3(1035-1067)Online publication date: 1-Feb-2011
  • (2008)Gesture salience as a hidden variable for coreference resolution and keyframe extractionJournal of Artificial Intelligence Research10.5555/1622655.162266631:1(353-398)Online publication date: 1-Feb-2008
  • (2008)The Recognition and Comprehension of Hand Gestures - A Review and Research AgendaModeling Communication with Robots and Virtual Humans10.1007/978-3-540-79037-2_3(38-56)Online publication date: 2008
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI '06: Proceedings of the 8th international conference on Multimodal interfaces
November 2006
404 pages
ISBN:159593541X
DOI:10.1145/1180995
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. gesture
  2. language models
  3. meetings
  4. multimodal fusion
  5. prosody
  6. sentence boundary detection

Qualifiers

  • Article

Conference

ICMI06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2011)Utilizing gestures to improve sentence boundary detectionMultimedia Tools and Applications10.1007/s11042-009-0436-z51:3(1035-1067)Online publication date: 1-Feb-2011
  • (2008)Gesture salience as a hidden variable for coreference resolution and keyframe extractionJournal of Artificial Intelligence Research10.5555/1622655.162266631:1(353-398)Online publication date: 1-Feb-2008
  • (2008)The Recognition and Comprehension of Hand Gestures - A Review and Research AgendaModeling Communication with Robots and Virtual Humans10.1007/978-3-540-79037-2_3(38-56)Online publication date: 2008
  • (2006)The recognition and comprehension of hand gesturesProceedings of the Embodied communication in humans and machines, 2nd ZiF research group international conference on Modeling communication with robots and virtual humans10.5555/1794517.1794520(38-56)Online publication date: 5-Apr-2006

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media