Article

Assembling personal speech collections by monologue scene detection from a news video archive

Authors:

Tomokazu Takahashi,

Hiroshi MuraseAuthors Info & Claims

MIR '06: Proceedings of the 8th ACM international workshop on Multimedia information retrieval

Pages 223 - 230

https://doi.org/10.1145/1178677.1178708

Published: 26 October 2006 Publication History

Abstract

Monologue scenes in news shows are important since they contain non-verbal information that could not be expressed through text media. In this paper, we propose a method that detects monologue scenes by individuals in news shows (news subjects) without external or prior knowledge on the show. The method first detects monologue scene candidates by face detection in the frame images, and then excludes scenes overlapped with speech by anchor-persons or reporters (news persons) by dynamically modeling them according to clues obtained from the closed-caption text and from the audio stream. As an application of monologue scene detection, we also propose a method which assembles personal speech collections per individual that appear in the news. Although the methods still need further improvement for realistic use, we confirmed the effectiveness of employing multimodal information for the tasks, and also saw interesting outputs from the automatically assembled speech collections.

References

[1]

A. Amir, M. Berg, S.-F. Chang, W. Hsu, G. Iyengar, C.-Y. Lin, M. R. Naphade, A. P. Nastev, C. Neti, H. Nock, J. R. Smith, B. Tseng, Y. Wu, and D. Zhang. IBM research TRECVID-2003 video retrieval system. In Online Proc. TRECVID 2003, November 2003.]]

[2]

S. Bocconi, F. Nack, and L. Hardman. Using theoretical annotations for generating video documentaries s. In Proc. IEEE 2005 Intl. Conf. on Multimedia and Expo, July 2005.]]

[3]

A. G. Hauptmann, D. Ng, R. Baron, M. G. Christel, P. Duygulu, C. Huang, W.-H. Lin, H. D. Wactlar, N. Moraveji, C. G. Snoek, G. Tzanetakis, J. Yang, and R. Jin. Informedia at TRECVID 2003: Analyzing and searching broadcast news video. In Online Proc. TRECVID 2003, November 2003.]]

[4]

I. Ide, R. Hamada, S. Sakai, and H. Tanaka. Semantic analysis of television news captions referring to suffixes. In Proc. Fourth Intl. Workshop on Information Retrieval with Asian Languages, pages 37--42, November 1999.]]

[5]

I. Ide, H. Mo, N. Katayama, and S. Satoh. Image and Video Retrieval ---Third Intl. Conf., CIVR2004, Dublin, Ireland, July 2004, Procs.---, volume 3115 of Lecture Notes in Computer Science, chapter Topic threading for structuring a large-scale news video archive, pages 123--131. Springer-Verlag, July 2004.]]

[6]

I. Ide, H. Mo, N. Katayama, and S. Satoh. Exploiting topic thread structures in a news video archive for the semi-automatic generation of video summaries. In Proc. 2006 IEEE Intl. Conf. on Multimedia and Expo, pages 1473--1476, July 2006.]]

[7]

I. Ide, K. Yamamoto, and H. Tanaka. Advanced Multimedia Content Processing ---First Intl. Conf. AMCP'98, Osaka, Japan---, volume 1554 of Lecture Notes in Computer Science, chapter Automatic video indexing based on shot classification, pages 87--102. Springer-Verlag, January 1999.]]

Digital Library

[8]

Intel Corp. Open source computer vision library. http://www.intel.com/technology/computing/opencv/.]]

[9]

N. Katayama, H. Mo, I. Ide, and S. Satoh. Advances in Multimedia Information Processing ---PCM2004 Fifth Pacific Rim Conf. on Multimedia, Tokyo, Japan, November/December 2004, Procs. Part II---, volume 3332 of Lecture Notes in Computer Science, chapter Mining large-scale broadcast video archives towards inter-video structuring, pages 489--496. Springer-Verlag, December 2004.]]

[10]

Kyoto University, Kurohashi Lab. Japanese morphological analysis system, JUMAN. http://nlp.kuee.kyoto-u.ac.jp/nl-resource/juman.html.]]

[11]

Kyoto University, Kurohashi Lab. Japanese parsing system, KNP ver. 2.0. http://nlp.kuee.kyoto-u.ac.jp/nl-resource/knp.html.]]

[12]

Nagoya Institute of Technology, Tokuda Lab. Speech signal processing toolkit: SPTK. http://kt-lab.ics.nitech.ac.jp/~tokuda/SPTK/.]]

[13]

Y. Nakamura and T. Kanade. Semantic analysis for video contents extraction ---spotting by association in news video. In Proc. Fifth ACM Intl. Conf. on Multimedia, pages 393--401, November 1997.]]

Digital Library

[14]

S. Satoh, Y. Nakamura, and T. Kanade. Name-It: Naming and detecting faces in news videos. IEEE Multimedia, 6(1):22--35, January--March 1999.]]

Digital Library

[15]

A. F. Smeaton. Image and Video Retrieval ---Fourth Intl. Conf., CIVR2005, Singapore, July 2005, Procs.---, volume 3568 of Lecture Notes in Computer Science, chapter Large scale evaluations of multimedia information retrieval: The TRECVid experience, pages 11--17. Springer-Verlag, July 2005.]]

Digital Library

[16]

United States, National Institute of Standards and Technology. TRECVid evaluation. http://www-nlpir.nist.gov/projects/trecvid/.]]

[17]

P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Proc. 2001 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, volume 1, pages 511--518, December 2001.]]

Cited By

IDE I(2014)Report on the analyses and the applications of a large-scale news video archive: NII TV-RECSProgress in Informatics10.2201/NiiPi.2014.11.3(9)Online publication date: Mar-2014
https://doi.org/10.2201/NiiPi.2014.11.3
Ji ZSu Y(2007)News Monologue Shot Detection using Conditional Random Fields2007 International Conference on Machine Learning and Cybernetics10.1109/ICMLC.2007.4370598(2657-2661)Online publication date: Aug-2007
https://doi.org/10.1109/ICMLC.2007.4370598

Index Terms

Assembling personal speech collections by monologue scene detection from a news video archive
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Video summarization
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

A Robust Bimodal Speech Section Detection

This paper discusses robust speech section detection by audio and video modalities. Most of today's speech recognition systems require speech section detection prior to any further analysis, and the accuracy of detected speech section s is said to ...
Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News

This paper describes a new method to detect speech segments online with identifying gender attributes for efficient dual gender-dependent speech recognition and broadcast news captioning. The proposed online speech detection performs dual-gender phoneme ...
Prosody dependent speech recognition on american radio news speech

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MIR '06: Proceedings of the 8th ACM international workshop on Multimedia information retrieval

October 2006

344 pages

ISBN:1595934952

DOI:10.1145/1178677

General Chairs:
James Z. Wang
The Pennsylvania State University
,
Nozha Boujemaa
INRIA Rocquencourt, France
,
Yixin Chen
The University of Mississippi

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

MM06

Sponsor:

MM06: The 14th ACM International Conference on Multimedia 2006

October 26 - 27, 2006

California, Santa Barbara, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
237
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

IDE I(2014)Report on the analyses and the applications of a large-scale news video archive: NII TV-RECSProgress in Informatics10.2201/NiiPi.2014.11.3(9)Online publication date: Mar-2014
https://doi.org/10.2201/NiiPi.2014.11.3
Ji ZSu Y(2007)News Monologue Shot Detection using Conditional Random Fields2007 International Conference on Machine Learning and Cybernetics10.1109/ICMLC.2007.4370598(2657-2661)Online publication date: Aug-2007
https://doi.org/10.1109/ICMLC.2007.4370598

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten