Article

A phonotactic-semantic paradigm for automatic spoken document classification

Authors:

Haizhou LiAuthors Info & Claims

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 369 - 376

https://doi.org/10.1145/1076034.1076098

Published: 15 August 2005 Publication History

Abstract

We demonstrate a phonotactic-semantic paradigm for spoken document categorization. In this framework, we define a set of acoustic words instead of lexical words to represent acoustic activities in spoken languages. The strategy for acoustic vocabulary selection is studied by comparing different feature selection methods. With an appropriate acoustic vocabulary, a voice tokenizer converts a spoken document into a text-like document of acoustic words. Thus, a spoken document can be represented by a count vector, named a bag-of-sounds vector, which characterizes a spoken document's semantic domain. We study two phonotactic-semantic classifiers, the support vector machine classifier and the latent semantic analysis classifier, and their properties. The phonotactic-semantic framework constitutes a new paradigm in spoken document classification, as demonstrated by its success in the spoken language identification task. It achieves 18.2% error reduction over state-of-the-art benchmark performance on the 1996 NIST Language Recognition Evaluation database.

References

[1]

Alshawi, H. Effective utterance classification with unsupervised phonotactic models. In Proceedings of HLT-NAACL, Edmonton, 2003, 1--7.

Digital Library

[2]

Bellegarda, J.R. Exploiting latent semantic information in statistical language modeling, In Proc. of the IEEE, 88, 8 (Aug. 2000), 1279--1296.

[3]

Cavnar, W.B., and Trenkle, J.M. N-Gram-Based Text Categorization, In Proc. of 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994, 161--169.

[4]

Chu-Carroll, J., and Carpenter, B. Vector-based Natural Language Call Routing, Computational Linguistics, 25,3 (Sept. 1999), 361--388.

Digital Library

[5]

Dai, P., Iurgel, U., and Rigoll, G. A novel feature combination approach for spoken document classification with support vector machines, Multimedia Information Retrieval Workshop 2003, Toronto, Canada, Aug 2003.

[6]

Duda, R.O., and Hart, P.E. Pattern Classification and scene analysis. John Wiley & Sons, 1973.

Digital Library

[7]

Garofolo, J.S., Auzanne, C.G.P., and Voorhees, E.M. The TREC spoken document retrieval track: A success story. In Proceedings of the RIAO 2000 Conference: Context-based Multimedia Information Access, Paris 2000, 1--20.

[8]

Hieronymus, J.L. ASCII Phonetic Symbols for the World's Languages: Worldbet. Technical Report AT&T Bell Labs, 1994.

[9]

Ma, B., Li, H., and Lee, C.H. An Acoustic Segment Modeling Approach to Automatic Language Identification, submitted to Interspeech 2005.

[10]

Mladenic, D., Brank, J., Grobelnik, M., and Milic-Frayling, N. Feature selection using linear classifier weights: Interaction with classification with classification models, SIGIR'04, Sheffield, UK, 2004, 234--241.

Digital Library

[11]

Muller, K.R., Mika, S., Ratsch, G., Tsuda, K. and Scholkopf, B. An introduction to kernel-based learning algorithm, IEEE Trans on Neural Networks, 12, 2 (Mar 2001), 181--202.

Digital Library

[12]

Ng, C., Wilkinson, R., and Zobel, J. Experiments in Spoken Document Retrieval using Phoneme N-gram, Speech Communication, 32 (2000), 61--77.

Digital Library

[13]

Ng, K., Zue, V.W. Subword unit representations for spoken document retrieval, In Proc. of Eurospeech 1997, Rhodes, Greece, 1607--1610.

[14]

Salton, G. The SMART Retrieval System. Prentice-Hall, Englewood Cliffs, NJ, 1971.

[15]

Singer, E., Torres-Carrasquillo, P.A., Gleason, T.P., Campbell W.M., and Reynolds, D.A. Acoustic, Phonetic and Discriminative Approaches to Automatic language recognition, In Proc. of Eurospeech, 2003.

[16]

Torres-Carrasquillo, P.A., Reynolds, D.A., and Deller. Jr., J.R. Language identification using Gaussian Mixture model tokenization. In Proc. of ICASSP, 2002.

[17]

Zipf, G.K. Human Behavior and the Principal of Least effort, an introduction to human ecology. Addison-Wesley, Reading, Mass, 1949.

[18]

Zissman, M.A. Comparison of four approaches to automatic language identification of telephone speech, IEEE Trans. on Speech and Audio Processing, 4, 1 (Jan. 1996), 31--44.

Cited By

Tejedor JToledano DLopez-Otero PDocio-Fernandez LSerrano LHernaez ICoucheiro-Limeres AFerreiros JOlcoz JLlombart J(2017)ALBAYZIN 2016 spoken term detection evaluationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-017-0119-z2017:1(1-23)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1186/s13636-017-0119-z
Wang DKing SFrankel JVipperla REvans NTroncy R(2012)Direct posterior confidence for out-of-vocabulary spoken term detectionACM Transactions on Information Systems10.1145/2328967.232896930:3(1-34)Online publication date: 6-Sep-2012
https://dl.acm.org/doi/10.1145/2328967.2328969
Wang DTejedor JKing SFrankel J(2012)Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term DetectionJournal of Computer Science and Technology10.1007/s11390-012-1228-x27:2(358-375)Online publication date: 5-Mar-2012
https://doi.org/10.1007/s11390-012-1228-x
Show More Cited By

Index Terms

A phonotactic-semantic paradigm for automatic spoken document classification
1. Applied computing
  1. Arts and humanities
    1. Sound and music computing
2. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

Study for Automatic Classification of Arabic Spoken Documents
Computational Collective Intelligence
Abstract
One of the important tasks in natural language processing is speech classification by domain. As shown in the literature, no prior studies have addressed this problem, specially the effect of using root N-grams and stem N-grams on Arabic speech ...
A Target-Oriented Phonotactic Front-End for Spoken Language Recognition

This paper presents a strategy to optimize the phonotactic front-end for spoken language recognition. This is achieved by selecting a subset of phones from an existing phone recognizer's phone inventory such that only the phones that best discriminate ...
A robust/fast spoken term detection method based on a syllable n-gram index with a distance metric

For spoken document retrieval, it is crucial to consider Out-of-vocabulary (OOV) and the mis-recognition of spoken words. Consequently, sub-word unit based recognition and retrieval methods have been proposed. This paper describes a Japanese spoken term ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

August 2005

708 pages

ISBN:1595930345

DOI:10.1145/1076034

General Chairs:
Ricardo Baeza-Yates
University of Chile, Chile
,
Nivio Ziviani
Federal University of Minas Gerais, Brazil
,
Program Chairs:
Gary Marchionini
University of North Carolina, USA
,
Alistair Moffat
University of Melbourne, Australia
,
John Tait
University of Sunderland, UK

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGIR05

Sponsor:

SIGIR

SIGIR05: The 28th ACM/SIGIR International Symposium on Information Retrieval 2005

August 15 - 19, 2005

Salvador, Brazil

Acceptance Rates

Overall Acceptance Rate 546 of 2,834 submissions, 19%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
608
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)1

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tejedor JToledano DLopez-Otero PDocio-Fernandez LSerrano LHernaez ICoucheiro-Limeres AFerreiros JOlcoz JLlombart J(2017)ALBAYZIN 2016 spoken term detection evaluationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-017-0119-z2017:1(1-23)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1186/s13636-017-0119-z
Wang DKing SFrankel JVipperla REvans NTroncy R(2012)Direct posterior confidence for out-of-vocabulary spoken term detectionACM Transactions on Information Systems10.1145/2328967.232896930:3(1-34)Online publication date: 6-Sep-2012
https://dl.acm.org/doi/10.1145/2328967.2328969
Wang DTejedor JKing SFrankel J(2012)Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term DetectionJournal of Computer Science and Technology10.1007/s11390-012-1228-x27:2(358-375)Online publication date: 5-Mar-2012
https://doi.org/10.1007/s11390-012-1228-x
Biatov KKoehler JSchneider D(2009)Audio Clips Content Comparison Using Latent Semantic IndexingProceedings of the 2009 IEEE International Conference on Semantic Computing10.1109/ICSC.2009.21(509-512)Online publication date: 14-Sep-2009
https://dl.acm.org/doi/10.1109/ICSC.2009.21
Richardson FCampbell W(2008)Language recognition with discriminative keyword selection2008 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2008.4518567(4145-4148)Online publication date: Mar-2008
https://doi.org/10.1109/ICASSP.2008.4518567
Campbell WRichardson F(2007)Discriminative keyword selection using support vector machinesProceedings of the 21st International Conference on Neural Information Processing Systems10.5555/2981562.2981589(209-216)Online publication date: 3-Dec-2007
https://dl.acm.org/doi/10.5555/2981562.2981589
Yi-cheng Pan Lin-shan Lee (2007)Type-II dialogue systems for information access from unstructured knowledge sources2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)10.1109/ASRU.2007.4430170(544-549)Online publication date: Dec-2007
https://doi.org/10.1109/ASRU.2007.4430170
Maddage NLi HKankanhalli MEfthimiadis EDumais SHawking DJärvelin, K(2006)Music structure based vector space retrievalProceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval10.1145/1148170.1148185(67-74)Online publication date: 6-Aug-2006
https://dl.acm.org/doi/10.1145/1148170.1148185
Rong Tong Bin Ma Donglai Zhu Haizhou Li Eng Siong Chng (2006)Integrating Acoustic, Prosodic and Phonotactic Features for Spoken Language Identification2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings10.1109/ICASSP.2006.1659993(I-205-I-208)Online publication date: 2006
https://doi.org/10.1109/ICASSP.2006.1659993

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten