skip to main content
10.1145/1647314.1647331acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
poster

Cache-based language model adaptation using visual attention for ASR in meeting scenarios

Published: 02 November 2009 Publication History

Abstract

In a typical group meeting involving discussion and collaboration, people look at one another, at shared information resources such as presentation material, and also at nothing in particular. In this work we investigate whether the knowledge of what a person is looking at may improve the performance of Automatic Speech Recognition (ASR). A framework for cache Language Model (LM) adaptation is proposed with the cache based on a person's Visual Attention (VA) sequence. The framework attempts to measure the appropriateness of adaptation from VA sequence characteristics. Evaluation on the AMI Meeting corpus data shows reduced LM perplexity. This work demonstrates the potential for cache-based LM adaptation using VA information in large vocabulary ASR deployed in meeting scenarios.

References

[1]
A. H. Anderson et al. The HCRC map task corpus. Language and Speech, 34(4):351--366, 1991.
[2]
L. Burnard. Users reference guide for the British National Corpus. Technical report, Technical report, Oxford University Computing Services, 2000.
[3]
S. Chen and J. Goodman. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th annual meeting on Association for Computational Linguistics, pages 310--318.
[4]
Association for Computational Linguistics Morristown, NJ, USA, 1996.
[5]
N. Cooke and M. Russell. Gaze-contingent asr for spontaneous, conversational speech: An evaluation. In Acoustics, Speech and Signal Processing, 2008.
[6]
ICASSP 2008. IEEE International Conference on, pages 4433--4436, 2008.
[7]
Z. Griffin. Why look? Reasons for eye movements related to language production. In M. Henderson and F. Ferreira Eds., The interface of language, vision, and action: Eye movements and the visual world, pages 213--247, 2004.
[8]
R. Jacob. Eye tracking in human-computer interaction and usability research: Ready to deliver the promises (section commentary), 2003.
[9]
R. Kuhn and R. De Mori. A cache-based natural language method for speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(6):570--582, 1990.
[10]
I. McCowan, J. Carletta, W. Kraaij, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, et al. The ami meeting corpus. In Proceedings of the 5th International Conference on Methods and Techniques in Behavioral Research, page 4, 2005.
[11]
A. Meyer and C. Dobel. Application of eye tracking in speech production research. The Mind's Eye: Cognitive and Applied Aspects of Eye Movement Research, 2003.
[12]
S. Qu and J. Chai. An exploration of eye gaze in spoken language processing for multimodal conversational interfaces. Proc. of North American Chapter of the Association for Computational Linguistics -- Human Language Technologies (NAACL-HLT), pages 284--291, 2007.
[13]
R. Sarukkai and C. Hunter. Integration of eye fixation information with speech recognition systems. 5th European Conf. on Speech Communication and Technology, pages 1639--1643, 1997.
[14]
A. Stolcke. SRILM-an Extensible Language Modeling Toolkit. In Seventh International Conference on Spoken Language Processing. ISCA, 2002.

Cited By

View all
  • (2014)Exploiting a ‘gaze-Lombard effect’ to improve ASR performance in acoustically noisy settings2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2014.6853899(1754-1758)Online publication date: May-2014
  • (2013)Improving the Accuracy of Large Vocabulary Continuous Speech Recognizer Using Dependency Parse Tree and Chomsky Hierarchy in Lattice RescoringProceedings of the 2013 International Conference on Asian Language Processing10.1109/IALP.2013.53(167-170)Online publication date: 17-Aug-2013

Index Terms

  1. Cache-based language model adaptation using visual attention for ASR in meeting scenarios

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMI-MLMI '09: Proceedings of the 2009 international conference on Multimodal interfaces
    November 2009
    374 pages
    ISBN:9781605587721
    DOI:10.1145/1647314
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 November 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. multimodal
    2. visual attention

    Qualifiers

    • Poster

    Conference

    ICMI-MLMI '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 453 of 1,080 submissions, 42%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)Exploiting a ‘gaze-Lombard effect’ to improve ASR performance in acoustically noisy settings2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2014.6853899(1754-1758)Online publication date: May-2014
    • (2013)Improving the Accuracy of Large Vocabulary Continuous Speech Recognizer Using Dependency Parse Tree and Chomsky Hierarchy in Lattice RescoringProceedings of the 2013 International Conference on Asian Language Processing10.1109/IALP.2013.53(167-170)Online publication date: 17-Aug-2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media