skip to main content
10.1145/1719970.1719989acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

Usage patterns and latent semantic analyses for task goal inference of multimodal user interactions

Published: 07 February 2010 Publication History

Abstract

This paper describes our work in usage pattern analysis and development of a latent semantic analysis framework for interpreting multimodal user input consisting speech and pen gestures. We have designed and collected a multimodal corpus of navigational inquiries. Each modality carries semantics related to domain-specific task goal. Each inquiry is annotated manually with a task goal based on the semantics. Multimodal input usually has a simpler syntactic structure than unimodal input and the order of semantic constituents is different in multimodal and unimodal inputs. Therefore, we proposed to use semantic analysis to derive the latent semantics from the multimodal inputs using latent semantic modeling (LSM). In order to achieve this, we parse the recognized Chinese spoken input for the spoken locative references (SLR). These SLRs are then aligned with their corresponding pen gesture(s). Then, we characterized the cross-modal integration pattern as 3-tuple multimodal terms with SLR, pen gesture type and their temporal relation. The inquiry-multimodal term matrix is then decomposed using singular value decomposition (SVD) to derive the latent semantics automatically. Task goal inference based on the latent semantics shows that the task goal inference accuracy on a disjoint test set is of 99%.

References

[1]
Nigay, L. and J. Coutaz, "A Generic Platform for Addressing the Multimodal Challenge," in the Proc. of CHI, 1995.
[2]
Wang, S. "A Multimodal Galaxy-based Geographic System," S.M. Thesis, MIT, 2003.
[3]
Johnston, M. et al., "Unification-based Multimodal Integration," in the Proc. of COLING-ACL, 1997.
[4]
Johnston, M., "Unification-based Multimodal Parsing," in the Proc. of COLING-ACL, 1998.
[5]
Wu, L. et al., "Multimodal Integration - A Statistical View," IEEE Transactions on Multimedia, 1(4), pp.334--341, 1999.
[6]
Wahlster, W. et al., SmartKom (www.smartkom.org)
[7]
Johnston, M. & S. Bangalore, "Finite-state Multimodal Parsing and Understanding," in the Proc. of COLING, 2000.
[8]
Chai, J. et. al., "A Probabilistic Approach to Reference Resolution in Multimodal User Interfaces," in the Proc. of IUI, 2004.
[9]
Chai, J. et. al., "Optimization in Multimodal Interpretation," in the Proc. of ACL, 2004.
[10]
Qu, S. and J. Chai, "Salience Modeling based on Non-verbal Modalities for Spoken Language Understanding," in the Proc. of ICMI, 2006.
[11]
Hui, P. Y. and H. Meng, "Cross-Modality Semantic Integration with Hypothesis Rescoring for Robust Interpretation of Multimodal User Interactions," IEEE Trans. on Audio, Speech and Language Processing, Vol. 17, No. 3, 2009.
[12]
Meng, H., et al., "To Believe is to Understand," in the Proc. of the Eurospeech, 1999.
[13]
Chan, S. F. and H. Meng, "Interdependencies among Dialog Acts, Task Goals and Discourse Inheritance in Mixed-Initiative Dialog," in the Proc. of the HLT, 2002.
[14]
Bellegarda, J. R., "Latent Semantic Mapping: Principles and Applications," Synthesis Lectures on Speech and Audio Processing, Vol. 3, No. 1, 2007.
[15]
Naptali, W., et al., "Word Co-occurrence Matrix and Context Dependent Class in LSA based Language Model for Speech Recognition," International Journal of Computers, Issue 1, Volume 3, 2009.
[16]
Song, W. and S. C. Park, "A Novel Document Clustering Model Based on Latent Semantic Analysis," in the Proc. of the ICSKG, 2007.
[17]
Chen, B., "Word Topic Models for Spoken Document Retrieval and Transcription," ACM Trans. on Asian Language Information Processing, Vol. 18, No. 1, 2009.
[18]
Lee, J. H., et al., "Automatic Generic Document Summarization based on Non-negative Matrix Factorization," International Journal on Information Processing and Management, Vol. 45, Issue 1, 2009.
[19]
Oviatt, S., et al., "Integration and Synchronization of Input Modes during Multimodal Human-Computer Interaction," in the Proc. of the CHI, 1997.
[20]
Hofmann, T., "Probabilistic Latent Semantic Analysis," in the Proc. of UAI, 1999.
[21]
Blei, D. M., et. al., "Latent Dirichlet allocation," Journal of Machine Learning Research 3, 2003.
[22]
Salton, G. and M. McGill, Introduction to Modern Information Retrieval, McGraw-Hall, New York, New Jersey, USA, 1983.

Cited By

View all
  • (2010)Goal detection from natural language queriesProceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems10.5555/1894525.1894547(157-168)Online publication date: 23-Jun-2010
  • (2010)Goal Detection from Natural Language QueriesNatural Language Processing and Information Systems10.1007/978-3-642-13881-2_16(157-168)Online publication date: 2010

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IUI '10: Proceedings of the 15th international conference on Intelligent user interfaces
February 2010
460 pages
ISBN:9781605585154
DOI:10.1145/1719970
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 February 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. latent semantic modeling
  2. multimodal input
  3. pen gesture
  4. singular value decomposition
  5. spoken input
  6. task goal inference

Qualifiers

  • Research-article

Conference

IUI '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2010)Goal detection from natural language queriesProceedings of the Natural language processing and information systems, and 15th international conference on Applications of natural language to information systems10.5555/1894525.1894547(157-168)Online publication date: 23-Jun-2010
  • (2010)Goal Detection from Natural Language QueriesNatural Language Processing and Information Systems10.1007/978-3-642-13881-2_16(157-168)Online publication date: 2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media