ACM Home Page
Please provide us with feedback. Feedback
A phonotactic-semantic paradigm for automatic spoken document classification
Full text PdfPdf (311 KB)
Source Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Salvador, Brazil
SESSION: Multimedia table of contents
Pages: 369 - 376  
Year of Publication: 2005
ISBN:1-59593-034-5
Authors
Bin Ma  Institute for Infocomm Research, Keng Terrace, Singapore
Haizhou Li  Institute for Infocomm Research, Keng Terrace, Singapore
Sponsor
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 53,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1076034.1076098
What is a DOI?

ABSTRACT

We demonstrate a phonotactic-semantic paradigm for spoken document categorization. In this framework, we define a set of acoustic words instead of lexical words to represent acoustic activities in spoken languages. The strategy for acoustic vocabulary selection is studied by comparing different feature selection methods. With an appropriate acoustic vocabulary, a voice tokenizer converts a spoken document into a text-like document of acoustic words. Thus, a spoken document can be represented by a count vector, named a bag-of-sounds vector, which characterizes a spoken document's semantic domain. We study two phonotactic-semantic classifiers, the support vector machine classifier and the latent semantic analysis classifier, and their properties. The phonotactic-semantic framework constitutes a new paradigm in spoken document classification, as demonstrated by its success in the spoken language identification task. It achieves 18.2% error reduction over state-of-the-art benchmark performance on the 1996 NIST Language Recognition Evaluation database.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Bellegarda, J.R. Exploiting latent semantic information in statistical language modeling, In Proc. of the IEEE, 88, 8 (Aug. 2000), 1279--1296.
 
3
Cavnar, W.B., and Trenkle, J.M. N-Gram-Based Text Categorization, In Proc. of 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994, 161--169.
 
4
 
5
Dai, P., Iurgel, U., and Rigoll, G. A novel feature combination approach for spoken document classification with support vector machines, Multimedia Information Retrieval Workshop 2003, Toronto, Canada, Aug 2003.
 
6
 
7
Garofolo, J.S., Auzanne, C.G.P., and Voorhees, E.M. The TREC spoken document retrieval track: A success story. In Proceedings of the RIAO 2000 Conference: Context-based Multimedia Information Access, Paris 2000, 1--20.
 
8
Hieronymus, J.L. ASCII Phonetic Symbols for the World's Languages: Worldbet. Technical Report AT&T Bell Labs, 1994.
 
9
Ma, B., Li, H., and Lee, C.H. An Acoustic Segment Modeling Approach to Automatic Language Identification, submitted to Interspeech 2005.
10
 
11
Muller, K.R., Mika, S., Ratsch, G., Tsuda, K. and Scholkopf, B. An introduction to kernel-based learning algorithm, IEEE Trans on Neural Networks, 12, 2 (Mar 2001), 181--202.
 
12
 
13
Ng, K., Zue, V.W. Subword unit representations for spoken document retrieval, In Proc. of Eurospeech 1997, Rhodes, Greece, 1607--1610.
 
14
Salton, G. The SMART Retrieval System. Prentice-Hall, Englewood Cliffs, NJ, 1971.
 
15
Singer, E., Torres-Carrasquillo, P.A., Gleason, T.P., Campbell W.M., and Reynolds, D.A. Acoustic, Phonetic and Discriminative Approaches to Automatic language recognition, In Proc. of Eurospeech, 2003.
 
16
Torres-Carrasquillo, P.A., Reynolds, D.A., and Deller. Jr., J.R. Language identification using Gaussian Mixture model tokenization. In Proc. of ICASSP, 2002.
 
17
Zipf, G.K. Human Behavior and the Principal of Least effort, an introduction to human ecology. Addison-Wesley, Reading, Mass, 1949.
 
18
Zissman, M.A. Comparison of four approaches to automatic language identification of telephone speech, IEEE Trans. on Speech and Audio Processing, 4, 1 (Jan. 1996), 31--44.