poster

Cache-based language model adaptation using visual attention for ASR in meeting scenarios

Authors:

Neil J. Cooke,

Martin J. RussellAuthors Info & Claims

ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces

Pages 87 - 90

https://doi.org/10.1145/1647314.1647331

Published: 02 November 2009 Publication History

Get Access

Abstract

In a typical group meeting involving discussion and collaboration, people look at one another, at shared information resources such as presentation material, and also at nothing in particular. In this work we investigate whether the knowledge of what a person is looking at may improve the performance of Automatic Speech Recognition (ASR). A framework for cache Language Model (LM) adaptation is proposed with the cache based on a person's Visual Attention (VA) sequence. The framework attempts to measure the appropriateness of adaptation from VA sequence characteristics. Evaluation on the AMI Meeting corpus data shows reduced LM perplexity. This work demonstrates the potential for cache-based LM adaptation using VA information in large vocabulary ASR deployed in meeting scenarios.

References

[1]

A. H. Anderson et al. The HCRC map task corpus. Language and Speech, 34(4):351--366, 1991.

Crossref

Google Scholar

[2]

L. Burnard. Users reference guide for the British National Corpus. Technical report, Technical report, Oxford University Computing Services, 2000.

Google Scholar

[3]

S. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th annual meeting on Association for Computational Linguistics, pages 310--318.

Digital Library

Google Scholar

[4]

Association for Computational Linguistics Morristown, NJ, USA, 1996.

Google Scholar

[5]

N. Cooke and M. Russell. Gaze-contingent asr for spontaneous, conversational speech: An evaluation. In Acoustics, Speech and Signal Processing, 2008.

Google Scholar

[6]

ICASSP 2008. IEEE International Conference on, pages 4433--4436, 2008.

Google Scholar

[7]

Z. Griffin. Why look? Reasons for eye movements related to language production. In M. Henderson and F. Ferreira Eds., The interface of language, vision, and action: Eye movements and the visual world, pages 213--247, 2004.

Google Scholar

[8]

R. Jacob. Eye tracking in human-computer interaction and usability research: Ready to deliver the promises (section commentary), 2003.

Google Scholar

[9]

R. Kuhn and R. De Mori. A cache-based natural language method for speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(6):570--582, 1990.

Digital Library

Google Scholar

[10]

I. McCowan, J. Carletta, W. Kraaij, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, et al. The ami meeting corpus. In Proceedings of the 5th International Conference on Methods and Techniques in Behavioral Research, page 4, 2005.

Google Scholar

[11]

A. Meyer and C. Dobel. Application of eye tracking in speech production research. The Mind's Eye: Cognitive and Applied Aspects of Eye Movement Research, 2003.

Google Scholar

[12]

S. Qu and J. Chai. An exploration of eye gaze in spoken language processing for multimodal conversational interfaces. Proc. of North American Chapter of the Association for Computational Linguistics -- Human Language Technologies (NAACL-HLT), pages 284--291, 2007.

Google Scholar

[13]

R. Sarukkai and C. Hunter. Integration of eye fixation information with speech recognition systems. 5th European Conf. on Speech Communication and Technology, pages 1639--1643, 1997.

Google Scholar

[14]

A. Stolcke. SRILM-an Extensible Language Modeling Toolkit. In Seventh International Conference on Spoken Language Processing. ISCA, 2002.

Google Scholar

Cited By

View all

Cooke NShen ARussell M(2014)Exploiting a ‘gaze-Lombard effect’ to improve ASR performance in acoustically noisy settings2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2014.6853899(1754-1758)Online publication date: May-2014
https://doi.org/10.1109/ICASSP.2014.6853899
Hong KTan TTang E(2013)Improving the Accuracy of Large Vocabulary Continuous Speech Recognizer Using Dependency Parse Tree and Chomsky Hierarchy in Lattice RescoringProceedings of the 2013 International Conference on Asian Language Processing10.1109/IALP.2013.53(167-170)Online publication date: 17-Aug-2013
https://dl.acm.org/doi/10.1109/IALP.2013.53

Index Terms

Cache-based language model adaptation using visual attention for ASR in meeting scenarios
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Acoustic and lexical resource constrained ASR using language-independent acoustic model and language-dependent probabilistic lexical model

Approach for acoustic and lexical resource constrained ASR is proposed.Explicates that acoustic likelihood in HMM-based ASR is a match between acoustic and lexical model scores.Shows limitations of standard HMM-based ASR where the lexical model is ...
Using different acoustic, lexical and language modeling units for ASR of an under-resourced language - Amharic

State-of-the-art large vocabulary continuous speech recognition systems use mostly phone based acoustic models (AMs) and word based lexical and language models. However, phone based AMs are not efficient in modeling long-term temporal dependencies and ...
Using Morphological Information for Robust Language Modeling in Czech ASR System

Automatic speech recognition, or more precisely language modeling, of the Czech language has to face challenges that are not present in the language modeling of English. Those include mainly the rapid vocabulary growth and closely connected unreliable ...

Comments

Information & Contributors

Information

Published In

ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces

November 2009

374 pages

ISBN:9781605587721

DOI:10.1145/1647314

General Chairs:
James L. Crowley
INRIA Grenoble Rhône-Alpes Research Centre, France
,
Yuri Ivanov
MERL, USA
,
Christopher Wren
Google, USA
,
Program Chairs:
Daniel Gatica-Perez
Idiap Research Institute, Switzerland
,
Michael Johnston
AT&T Research, USA
,
Rainer Stiefelhagen
University of Karlsruhe, Germany

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

ICMI-MLMI '09

Sponsor:

SIGCHI

ICMI-MLMI '09: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES/WORKSHOP ON MACHINE LEARNING FOR MULTIMODAL INTERFACES

November 2 - 4, 2009

Massachusetts, Cambridge, USA

Acceptance Rates

Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
116
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Cooke NShen ARussell M(2014)Exploiting a ‘gaze-Lombard effect’ to improve ASR performance in acoustically noisy settings2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2014.6853899(1754-1758)Online publication date: May-2014
https://doi.org/10.1109/ICASSP.2014.6853899
Hong KTan TTang E(2013)Improving the Accuracy of Large Vocabulary Continuous Speech Recognizer Using Dependency Parse Tree and Chomsky Hierarchy in Lattice RescoringProceedings of the 2013 International Conference on Asian Language Processing10.1109/IALP.2013.53(167-170)Online publication date: 17-Aug-2013
https://dl.acm.org/doi/10.1109/IALP.2013.53

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Acoustic and lexical resource constrained ASR using language-independent acoustic model and language-dependent probabilistic lexical model

Using different acoustic, lexical and language modeling units for ASR of an under-resourced language - Amharic

Using Morphological Information for Robust Language Modeling in Czech ASR System

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations