article

Learning rich semantics from news video archives by style analysis

Authors:

Cees G. M. Snoek,

Marcel Worring,

Alexander G. HauptmannAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 2, Issue 2

Pages 91 - 108

https://doi.org/10.1145/1142020.1142021

Published: 01 May 2006 Publication History

Abstract

We propose a generic and robust framework for news video indexing which we founded on a broadcast news production model. We identify within this model four production phases, each providing useful metadata for annotation. In contrast to semiautomatic indexing approaches which exploit this information at production time, we adhere to an automatic data-driven approach. To that end, we analyze a digital news video using a separate set of multimodal detectors for each production phase. By combining the resulting production-derived features into a statistical classifier ensemble, the framework facilitates robust classification of several rich semantic concepts in news video; rich meaning that concepts share many similarities in their production process. Experiments on an archive of 120 hours of news video from the 2003 TRECVID benchmark show that a combined analysis of production phases yields the best results. In addition, we demonstrate that the accuracy of the proposed style analysis framework for classification of several rich semantic concepts is state-of-the-art.

Supplementary Material

Snoek Appendix (p91-snoek-apndx.pdf)

Online appendix to designing mediation for context-aware applications. The appendix supports the information on page 91.

Download
543.19 KB

References

[1]

Adams, B., Dorai, C., and Venkatesh, S. 2002. Toward automatic extraction of expressive elements from motion pictures: Tempo. IEEE Trans. Multimedia 4, 4, 472--481.

[2]

Adams, W. H., Iyengar, G., Lin, C.-Y., Naphade, M., Neti, C., Nock, H., and Smith, J. 2003. Semantic indexing of multimedia content using visual, audio, and text cues. EURASIP J. Appl. Signal Process. 2003, 2, 170--185.

[3]

Amir, A., Berg, M., Chang, S.-F., Hsu, W., Iyengar, G., Lin, C.-Y., Naphade, M., Natsev, A., Neti, C., Nock, H., Smith, J., Tseng, B., Wu, Y., and Zhang, D. 2003. IBM research TRECVID-2003 video retrieval system. In Proceedings of the TRECVID Workshop. NIST Special Publication. Gaithersburg, Md.

[4]

Baan, J., Ballegooij, A., Geusebroek, J., Hiemstra, D., den Hartog, J., List, J., Snoek, C., Patras, I., Raaijmakers, S., Todoran, L., Vendrig, J., de Vries, A., Westerveld, T., and Worring, M. 2001. Lazy users and automatic video retrieval tools in (the) lowlands. In Proceedings of the 10th Text REtrieval Conference, E. Voorhees and D. Harman, eds. NIST Special Publication, vol. 500-250. Gaithersburg, Md.

[5]

Barry, B. and Davenport, G. 2003. Documenting life: Videography and common sense. In Proceedings of the IEEE International Conference on Multimedia & Expo. Baltimore, Md.

[6]

Boggs, J. and Petrie, D. 2000. The Art of Watching Films, 5th ed. Mayfield Publishing Mountain View, Calif.

[7]

Bordwell, D. and Thompson, K. 1997. Film Art: An Introduction, 5th ed. McGraw-Hill, New York.

[8]

Breiman, L. 1996. Bagging predictors. Mach. Learn. 24, 2, 123--140.

[9]

Chang, C.-C. and Lin, C.-J. 2001. LIBSVM : A Library for Support Vector Machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

[10]

Davis, M. 2003. Editing out video editing. IEEE Multimedia 10, 2, 54--64.

[11]

Dey, A. 2001. Understanding and using context. Personal Ubiquitous Comput. Journal 5, 1, 4--7.

[12]

Dourish, P. 2004. What we talk about when we talk about context. Personal Ubiquitous Comput. 8, 1, 19--30.

[13]

Gauvain, J., Lamel, L., and Adda, G. 2002. The LIMSI broadcast news transcription system. Speech Commun. 37, 1--2, 89--108.

[14]

Hanjalic, A., Lagendijk, R., and Biemond, J. 1999. Automated high-level movie segmentation for advanced video-retrieval systems. IEEE Trans. Circuits Syst. Video Technol. 9, 4, 580--588.

[15]

Hauptmann, A., Baron, R., Chen, M.-Y., Christel, M., Duygulu, P., Huang, C., Jin, R., Lin, W.-H., Ng, T., Moraveji, N., Papernick, N., Snoek, C., Tzanetakis, G., Yang, J., Yan, R., and Wactlar, H. 2003. Informedia at TRECVID 2003: Analyzing and searching broadcast news video. In Proceedings of the TRECVID Workshop. NIST Special Publication. Gaithersburg, Md.

[16]

Jain, A., Duin, R., and Mao, J. 2000. Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Mach. Intel. 22, 1, 4--37.

[17]

Nack, F. and Putz, W. 2004. Saying what it means: Semi-Automated (news) media annotation. Multimedia Tools Appl. 22, 3, 263--302.

[18]

Naphade, M., Kozintsev, I., and Huang, T. 2002. A factor graph framework for semantic video indexing. IEEE Trans. Circuits Syst. Video Technol. 12, 1, 40--52.

[19]

Platt, J. 2000. Probabilities for SV machines. In Advances in Large Margin Classifiers, A. Smola, et al., eds. MIT Press, Cambridge, Mass., 61--74.

[20]

Quénot, G., Moraru, D., Besacier, L., and Mulhem, P. 2002. CLIPS at TREC -11: Experiments in video retrieval. In Proceedings of the 11th Text REtrieval Conference, E. Voorhees and L. Buckland, eds. NIST Special Publication, vol. 500-251. Gaithersburg, Md.

[21]

Salton, G. and McGill, M. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, New York.

[22]

Sato, T., Kanade, T., Hughes, E., Smith, M., and Satoh, S. 1999. Video OCR: Indexing digital news libraries by recognition of superimposed caption. Multimedia Syst. 7, 5, 385--395.

[23]

Schapire, R. 1990. The strength of weak learnability. Mach. Learn. 5, 2, 197--227.

[24]

Schneiderman, H. and Kanade, T. 2004. Object detection using the statistics of parts. Int. J. Comput. Vision 56, 3, 151--177.

[25]

Smeaton, A., Kraaij, W., and Over, P. 2003. TRECVID 2003---An introduction. In Proceedings of the TRECVID Workshop. NIST Special Publication. Gaithersburg, Md.

[26]

Smeulders, A., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22, 12, 1349--1380.

[27]

Smith, J. and Li, C.-S. 1998. Decoding image semantics using composite region templates. In Proceedings of the IEEE CVPR-98 Workshop on Content-Based Access to Image, Video Databases. Santa Barbara, Calif.

[28]

Snoek, C. and Worring, M. 2005a. Multimedia event-based video indexing using time intervals. IEEE Trans. Multimedia 7, 4, 638--647.

[29]

Snoek, C. and Worring, M. 2005b. Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools Appl. 25, 1, 5--35.

[30]

Snoek, C., Worring, M., and Hauptmann, A. 2004. Detection of TV news monologues by style analysis. In Proceedings of the IEEE International Conference on Multimedia & Expo. Taipei, Taiwan.

[31]

Sundaram, H. and Chang, S.-F. 2002. Computable scenes and structures in films. IEEE Trans. Multimedia 4, 4, 482--491.

[32]

Truong, B., Venkatesh, S., and Dorai, C. 2005. Extraction of film takes for cinematic analysis. Multimedia Tools Appl. 26, 3, 277--298.

[33]

Tseng, B., Lin, C.-Y., Naphade, M., Natsev, A., and Smith, J. 2003. Normalized classifier fusion for semantic visual concept detection. In Proceedings of the IEEE International Conference on Image Processing. vol. 2. Barcelona, Spain, 535--538.

[34]

Vapnik, V. 2000. The Nature of Statistical Learning Theory, 2nd ed. Springer Verlag, New York.

[35]

Vendrig, J. and Worring, M. 2002. Systematic evaluation of logical story unit segmentation. IEEE Trans. Multimedia 4, 4, 492--499.

[36]

Wactlar, H., Christel, M., Gong, Y., and Hauptmann, A. 1999. Lessons learned from building a terabyte digital video library. IEEE Computer 32, 2, 66--73.

[37]

Wolpert, D. 1992. Stacked generalization. Neural Netw. 5, 241--259.

[38]

Xie, L., Xu, P., Chang, S.-F., Divakaran, A., and Sun, H. 2004. Structure analysis of soccer video with domain knowledge and hidden Markov models. Pattern Recogn. Lett. 25, 7, 767--775.

[39]

Zhang, H.-J., Wu, J., Zhong, D., and Smoliar, S. 1997. An integrated system for content-based video retrieval and browsing. Pattern Recogn. 30, 4, 643--658.

Cited By

Wagenpfeil SKevitt PHemmje M(2023)Smart Multimedia Information RetrievalAnalytics10.3390/analytics20100112:1(198-224)Online publication date: 20-Feb-2023
https://doi.org/10.3390/analytics2010011
Hauptmann A(2018)Video Content AnalysisEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_1018(4381-4388)Online publication date: 7-Dec-2018
https://doi.org/10.1007/978-1-4614-8265-9_1018
Hauptmann A(2017)Video Content AnalysisEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_1018-2(1-8)Online publication date: 19-Jan-2017
https://doi.org/10.1007/978-1-4899-7993-3_1018-2
Show More Cited By

Index Terms

Learning rich semantics from news video archives by style analysis
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval
    1. Document representation
    2. Search engine architectures and scalability
      1. Search engine indexing

Recommendations

On the optimality of Naïve Bayes with dependent binary features

While Naive Bayes classifier (NB) is Bayes-optimal for independent features, we prove that it is also optimal for two equiprobable classes and two features with equal class-conditional covariances. Although strict optimality does not extend for three ...
Conceptualization Effects on MEDLINE Documents Classification Using Rocchio Method
WI-IAT '12: Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01

The aim of this paper is to propose a supervised text classification method for the biomedical domain using semantic resources. We choose the traditional text classification method, Rocchio, for its scalability and extendibility with semantic knowledge. ...
Two-stage cascaded classification approach based on genetic fuzzy learning for speech/music discrimination

Automatic discrimination of speech and music is an important tool in many multimedia applications. The paper presents a robust and effective approach for speech/music discrimination, which relies on a two-stage cascaded classification scheme. The ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 2, Issue 2

May 2006

82 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/1142020

Issue’s Table of Contents

Copyright © 2006 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2006

Published in TOMM Volume 2, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
945
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wagenpfeil SKevitt PHemmje M(2023)Smart Multimedia Information RetrievalAnalytics10.3390/analytics20100112:1(198-224)Online publication date: 20-Feb-2023
https://doi.org/10.3390/analytics2010011
Hauptmann A(2018)Video Content AnalysisEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_1018(4381-4388)Online publication date: 7-Dec-2018
https://doi.org/10.1007/978-1-4614-8265-9_1018
Hauptmann A(2017)Video Content AnalysisEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_1018-2(1-8)Online publication date: 19-Jan-2017
https://doi.org/10.1007/978-1-4899-7993-3_1018-2
Elleuch NBen Ammar AAlimi A(2015)A generic framework for semantic video indexing based on visual concepts/contexts detectionMultimedia Tools and Applications10.1007/s11042-014-1955-974:4(1397-1421)Online publication date: 1-Feb-2015
https://dl.acm.org/doi/10.1007/s11042-014-1955-9
Hamadi AMulhem PQuénot G(2015)Extended conceptual feedback for semantic multimedia indexingMultimedia Tools and Applications10.1007/s11042-014-1937-y74:4(1225-1248)Online publication date: 1-Feb-2015
https://dl.acm.org/doi/10.1007/s11042-014-1937-y
Rickert MEibl MLu CNadimi EKim SWang W(2014)A proposal for a taxonomy of semantic editing devices to support semantic classificationProceedings of the 2014 Conference on Research in Adaptive and Convergent Systems10.1145/2663761.2664225(34-39)Online publication date: 5-Oct-2014
https://dl.acm.org/doi/10.1145/2663761.2664225
Aran OBiel JGatica-Perez D(2014)Broadcasting Oneself: Visual Discovery of Vlogging StylesIEEE Transactions on Multimedia10.1109/TMM.2013.228489316:1(201-215)Online publication date: Jan-2014
https://doi.org/10.1109/TMM.2013.2284893
Law MThome NCord M(2014)Bag-of-Words Image Representation: Key Ideas and Further InsightFusion in Computer Vision10.1007/978-3-319-05696-8_2(29-52)Online publication date: 26-Mar-2014
https://doi.org/10.1007/978-3-319-05696-8_2
Ide INack F(2013)[Invited Paper] Explain This to Me!ITE Transactions on Media Technology and Applications10.3169/mta.1.1011:2(101-117)Online publication date: 2013
https://doi.org/10.3169/mta.1.101
Safadi BQuénot G(2012)Active learning with multiple classifiers for multimedia indexingMultimedia Tools and Applications10.1007/s11042-010-0599-760:2(403-417)Online publication date: 1-Sep-2012
https://dl.acm.org/doi/10.1007/s11042-010-0599-7
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents