Spoken content metadata and MPEG-7

Authors:
J. P. A. Charlesworth

Canon Research Centre Europe, 1 Occam Ct, Surrey Research Park, Guilford GU2 5YJ, England

Canon Research Centre Europe, 1 Occam Ct, Surrey Research Park, Guilford GU2 5YJ, England
View Profile

,
P. N. Garner

Canon Research Centre Europe, 1 Occam Ct, Surrey Research Park, Guilford GU2 5YJ, England

Canon Research Centre Europe, 1 Occam Ct, Surrey Research Park, Guilford GU2 5YJ, England
View Profile

MULTIMEDIA '00: Proceedings of the 2000 ACM workshops on MultimediaNovember 2000Pages 81–84https://doi.org/10.1145/357744.357880

Published:04 November 2000Publication History

MULTIMEDIA '00: Proceedings of the 2000 ACM workshops on Multimedia

Pages 81–84

ABSTRACT

The words spoken in an audio stream form an obvious descriptor essential to most audio-visual metadata standards. When derived using automatic speech recognition systems, the spoken content fits into neither low-level (representative) nor high-level (semantic) metadata categories. This results in difficulties in creating a representation that can support both interoperability between different extraction and application utilities while retaining robustness to the limitations of the extraction process. In this paper, we discuss the issues encountered in the design of the MPEG-7 spoken content descriptor and their applicability to other metadata standards.

References

1.See, e.g., www.mpeg-7.comGoogle Scholar
2.See, e.g., www.digitalimaging.orgGoogle Scholar
3.For a comprehensive treatment of ASR techniques see Rabiner, L and B. Juang, Fundamentals of Speech Recognition, Wiley (1997).Google Scholar
4.Johnson, S.E., et al., "Spoken document retrieval for TREC- 7 at Cambridge University", Proc. 7th text retrieval conf., NIST special publication 500-242, p 191 (1998).Google Scholar
5.Siegler, M. et al. "Experiments in Spoken Document Retrieval at CMU", Proc. 7th text retrieval conf., NIST special publication 500-242, p319 (1998).Google Scholar
6.Ng, K., "Information fusion for spoken document retrieval", Proc. ICASSP 4, p2405 (2000) Google ScholarDigital Library
7.Wechsler M, "Spoken document retrieval based on phoneme recognition" PhD thesis, Swiss federal institute of technology, Zurich (1998)Google Scholar
8.Charlesworth, J.P.A., Garner P.N., Srinivasan S "Output of an of automatic speech recognition" ISO/1EC/JCC1/SC29/WG11 MPEG99/4458 (1999)Google Scholar
9.The seventh Text REtrieval Conference, NIST special publication 500-242 (1998)Google Scholar
10.Charlesworth, J.P.A., Gamer P.N., Srinivasan S "Results of CE of automatic speech recognition" ISO/IEC/JCCl/SC29/WGI I MPEG99/5106 (1999)Google Scholar

Index Terms

Recommendations

Content-based language models for spoken document retrieval
IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languages

Spoken document retrieval (SDR) has been extensively studied in recent years because of its potential use in navigating large multimedia collections in the near future. This paper presents a novel concept of applying the content-based language models to ...
Read More
Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives

This paper deals with speaker-adaptive speech recognition for large spoken archives. The goal is to improve the recognition accuracy of an automatic speech recognition (ASR) system that is being deployed for transcription of a large archive of Czech ...
Read More
Discovering salient prosodic cues and their interactions for automatic story segmentation in Mandarin broadcast news

This paper investigates speech prosody for automatic story segmentation in Mandarin broadcast news. Prosodic cues effectively used in English story segmentation deserve a re-investigation since the lexical tones of Mandarin may complicate the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MULTIMEDIA '00: Proceedings of the 2000 ACM workshops on Multimedia
November 2000
248 pages
ISBN:1581133111
DOI:10.1145/357744
Chairmen:
Shahram Ghandeharizadeh
Univ. of Southern California
,
Shih-Fu Chang
Columbia Univ., New York, NY
,
Stephen Fischer
GMD-IPSI, Germany
,
Joseph Konstan
Univ. of Minnesota
,
Klara Nahrstedt
Univ. of Illinois, Urbana-Champaign
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 November 2000
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
MPEG-7
automatic speech recognition
interoperability
robust retrieval
spoken content
spoken document retrieval
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 479
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Spoken content metadata and MPEG-7

MULTIMEDIA '00: Proceedings of the 2000 ACM workshops on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Content-based language models for spoken document retrieval

Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives

Discovering salient prosodic cues and their interactions for automatic story segmentation in Mandarin broadcast news