skip to main content
10.1145/1178677.1178708acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Assembling personal speech collections by monologue scene detection from a news video archive

Published: 26 October 2006 Publication History

Abstract

Monologue scenes in news shows are important since they contain non-verbal information that could not be expressed through text media. In this paper, we propose a method that detects monologue scenes by individuals in news shows (news subjects) without external or prior knowledge on the show. The method first detects monologue scene candidates by face detection in the frame images, and then excludes scenes overlapped with speech by anchor-persons or reporters (news persons) by dynamically modeling them according to clues obtained from the closed-caption text and from the audio stream. As an application of monologue scene detection, we also propose a method which assembles personal speech collections per individual that appear in the news. Although the methods still need further improvement for realistic use, we confirmed the effectiveness of employing multimodal information for the tasks, and also saw interesting outputs from the automatically assembled speech collections.

References

[1]
A. Amir, M. Berg, S.-F. Chang, W. Hsu, G. Iyengar, C.-Y. Lin, M. R. Naphade, A. P. Nastev, C. Neti, H. Nock, J. R. Smith, B. Tseng, Y. Wu, and D. Zhang. IBM research TRECVID-2003 video retrieval system. In Online Proc. TRECVID 2003, November 2003.]]
[2]
S. Bocconi, F. Nack, and L. Hardman. Using theoretical annotations for generating video documentaries s. In Proc. IEEE 2005 Intl. Conf. on Multimedia and Expo, July 2005.]]
[3]
A. G. Hauptmann, D. Ng, R. Baron, M. G. Christel, P. Duygulu, C. Huang, W.-H. Lin, H. D. Wactlar, N. Moraveji, C. G. Snoek, G. Tzanetakis, J. Yang, and R. Jin. Informedia at TRECVID 2003: Analyzing and searching broadcast news video. In Online Proc. TRECVID 2003, November 2003.]]
[4]
I. Ide, R. Hamada, S. Sakai, and H. Tanaka. Semantic analysis of television news captions referring to suffixes. In Proc. Fourth Intl. Workshop on Information Retrieval with Asian Languages, pages 37--42, November 1999.]]
[5]
I. Ide, H. Mo, N. Katayama, and S. Satoh. Image and Video Retrieval ---Third Intl. Conf., CIVR2004, Dublin, Ireland, July 2004, Procs.---, volume 3115 of Lecture Notes in Computer Science, chapter Topic threading for structuring a large-scale news video archive, pages 123--131. Springer-Verlag, July 2004.]]
[6]
I. Ide, H. Mo, N. Katayama, and S. Satoh. Exploiting topic thread structures in a news video archive for the semi-automatic generation of video summaries. In Proc. 2006 IEEE Intl. Conf. on Multimedia and Expo, pages 1473--1476, July 2006.]]
[7]
I. Ide, K. Yamamoto, and H. Tanaka. Advanced Multimedia Content Processing ---First Intl. Conf. AMCP'98, Osaka, Japan---, volume 1554 of Lecture Notes in Computer Science, chapter Automatic video indexing based on shot classification, pages 87--102. Springer-Verlag, January 1999.]]
[8]
Intel Corp. Open source computer vision library. http://www.intel.com/technology/computing/opencv/.]]
[9]
N. Katayama, H. Mo, I. Ide, and S. Satoh. Advances in Multimedia Information Processing ---PCM2004 Fifth Pacific Rim Conf. on Multimedia, Tokyo, Japan, November/December 2004, Procs. Part II---, volume 3332 of Lecture Notes in Computer Science, chapter Mining large-scale broadcast video archives towards inter-video structuring, pages 489--496. Springer-Verlag, December 2004.]]
[10]
Kyoto University, Kurohashi Lab. Japanese morphological analysis system, JUMAN. http://nlp.kuee.kyoto-u.ac.jp/nl-resource/juman.html.]]
[11]
Kyoto University, Kurohashi Lab. Japanese parsing system, KNP ver. 2.0. http://nlp.kuee.kyoto-u.ac.jp/nl-resource/knp.html.]]
[12]
Nagoya Institute of Technology, Tokuda Lab. Speech signal processing toolkit: SPTK. http://kt-lab.ics.nitech.ac.jp/~tokuda/SPTK/.]]
[13]
Y. Nakamura and T. Kanade. Semantic analysis for video contents extraction ---spotting by association in news video. In Proc. Fifth ACM Intl. Conf. on Multimedia, pages 393--401, November 1997.]]
[14]
S. Satoh, Y. Nakamura, and T. Kanade. Name-It: Naming and detecting faces in news videos. IEEE Multimedia, 6(1):22--35, January--March 1999.]]
[15]
A. F. Smeaton. Image and Video Retrieval ---Fourth Intl. Conf., CIVR2005, Singapore, July 2005, Procs.---, volume 3568 of Lecture Notes in Computer Science, chapter Large scale evaluations of multimedia information retrieval: The TRECVid experience, pages 11--17. Springer-Verlag, July 2005.]]
[16]
United States, National Institute of Standards and Technology. TRECVid evaluation. http://www-nlpir.nist.gov/projects/trecvid/.]]
[17]
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Proc. 2001 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, volume 1, pages 511--518, December 2001.]]

Cited By

View all
  • (2014)Report on the analyses and the applications of a large-scale news video archive: NII TV-RECSProgress in Informatics10.2201/NiiPi.2014.11.3(9)Online publication date: Mar-2014
  • (2007)News Monologue Shot Detection using Conditional Random Fields2007 International Conference on Machine Learning and Cybernetics10.1109/ICMLC.2007.4370598(2657-2661)Online publication date: Aug-2007

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MIR '06: Proceedings of the 8th ACM international workshop on Multimedia information retrieval
October 2006
344 pages
ISBN:1595934952
DOI:10.1145/1178677
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. closed-caption text
  2. dynamic speech modeling
  3. face detection
  4. personal namannotation

Qualifiers

  • Article

Conference

MM06
MM06: The 14th ACM International Conference on Multimedia 2006
October 26 - 27, 2006
California, Santa Barbara, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2014)Report on the analyses and the applications of a large-scale news video archive: NII TV-RECSProgress in Informatics10.2201/NiiPi.2014.11.3(9)Online publication date: Mar-2014
  • (2007)News Monologue Shot Detection using Conditional Random Fields2007 International Conference on Machine Learning and Cybernetics10.1109/ICMLC.2007.4370598(2657-2661)Online publication date: Aug-2007

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media