ABSTRACT
The words spoken in an audio stream form an obvious descriptor essential to most audio-visual metadata standards. When derived using automatic speech recognition systems, the spoken content fits into neither low-level (representative) nor high-level (semantic) metadata categories. This results in difficulties in creating a representation that can support both interoperability between different extraction and application utilities while retaining robustness to the limitations of the extraction process. In this paper, we discuss the issues encountered in the design of the MPEG-7 spoken content descriptor and their applicability to other metadata standards.
- 1.See, e.g., www.mpeg-7.comGoogle Scholar
- 2.See, e.g., www.digitalimaging.orgGoogle Scholar
- 3.For a comprehensive treatment of ASR techniques see Rabiner, L and B. Juang, Fundamentals of Speech Recognition, Wiley (1997).Google Scholar
- 4.Johnson, S.E., et al., "Spoken document retrieval for TREC- 7 at Cambridge University", Proc. 7th text retrieval conf., NIST special publication 500-242, p 191 (1998).Google Scholar
- 5.Siegler, M. et al. "Experiments in Spoken Document Retrieval at CMU", Proc. 7th text retrieval conf., NIST special publication 500-242, p319 (1998).Google Scholar
- 6.Ng, K., "Information fusion for spoken document retrieval", Proc. ICASSP 4, p2405 (2000) Google ScholarDigital Library
- 7.Wechsler M, "Spoken document retrieval based on phoneme recognition" PhD thesis, Swiss federal institute of technology, Zurich (1998)Google Scholar
- 8.Charlesworth, J.P.A., Garner P.N., Srinivasan S "Output of an of automatic speech recognition" ISO/1EC/JCC1/SC29/WG11 MPEG99/4458 (1999)Google Scholar
- 9.The seventh Text REtrieval Conference, NIST special publication 500-242 (1998)Google Scholar
- 10.Charlesworth, J.P.A., Gamer P.N., Srinivasan S "Results of CE of automatic speech recognition" ISO/IEC/JCCl/SC29/WGI I MPEG99/5106 (1999)Google Scholar
Index Terms
- Spoken content metadata and MPEG-7
Recommendations
Content-based language models for spoken document retrieval
IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languagesSpoken document retrieval (SDR) has been extensively studied in recent years because of its potential use in navigating large multimedia collections in the near future. This paper presents a novel concept of applying the content-based language models to ...
Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives
This paper deals with speaker-adaptive speech recognition for large spoken archives. The goal is to improve the recognition accuracy of an automatic speech recognition (ASR) system that is being deployed for transcription of a large archive of Czech ...
Discovering salient prosodic cues and their interactions for automatic story segmentation in Mandarin broadcast news
This paper investigates speech prosody for automatic story segmentation in Mandarin broadcast news. Prosodic cues effectively used in English story segmentation deserve a re-investigation since the lexical tones of Mandarin may complicate the ...
Comments