skip to main content
article

Learning rich semantics from news video archives by style analysis

Published: 01 May 2006 Publication History

Abstract

We propose a generic and robust framework for news video indexing which we founded on a broadcast news production model. We identify within this model four production phases, each providing useful metadata for annotation. In contrast to semiautomatic indexing approaches which exploit this information at production time, we adhere to an automatic data-driven approach. To that end, we analyze a digital news video using a separate set of multimodal detectors for each production phase. By combining the resulting production-derived features into a statistical classifier ensemble, the framework facilitates robust classification of several rich semantic concepts in news video; rich meaning that concepts share many similarities in their production process. Experiments on an archive of 120 hours of news video from the 2003 TRECVID benchmark show that a combined analysis of production phases yields the best results. In addition, we demonstrate that the accuracy of the proposed style analysis framework for classification of several rich semantic concepts is state-of-the-art.

Supplementary Material

Snoek Appendix (p91-snoek-apndx.pdf)
Online appendix to designing mediation for context-aware applications. The appendix supports the information on page 91.

References

[1]
Adams, B., Dorai, C., and Venkatesh, S. 2002. Toward automatic extraction of expressive elements from motion pictures: Tempo. IEEE Trans. Multimedia 4, 4, 472--481.
[2]
Adams, W. H., Iyengar, G., Lin, C.-Y., Naphade, M., Neti, C., Nock, H., and Smith, J. 2003. Semantic indexing of multimedia content using visual, audio, and text cues. EURASIP J. Appl. Signal Process. 2003, 2, 170--185.
[3]
Amir, A., Berg, M., Chang, S.-F., Hsu, W., Iyengar, G., Lin, C.-Y., Naphade, M., Natsev, A., Neti, C., Nock, H., Smith, J., Tseng, B., Wu, Y., and Zhang, D. 2003. IBM research TRECVID-2003 video retrieval system. In Proceedings of the TRECVID Workshop. NIST Special Publication. Gaithersburg, Md.
[4]
Baan, J., Ballegooij, A., Geusebroek, J., Hiemstra, D., den Hartog, J., List, J., Snoek, C., Patras, I., Raaijmakers, S., Todoran, L., Vendrig, J., de Vries, A., Westerveld, T., and Worring, M. 2001. Lazy users and automatic video retrieval tools in (the) lowlands. In Proceedings of the 10th Text REtrieval Conference, E. Voorhees and D. Harman, eds. NIST Special Publication, vol. 500-250. Gaithersburg, Md.
[5]
Barry, B. and Davenport, G. 2003. Documenting life: Videography and common sense. In Proceedings of the IEEE International Conference on Multimedia & Expo. Baltimore, Md.
[6]
Boggs, J. and Petrie, D. 2000. The Art of Watching Films, 5th ed. Mayfield Publishing Mountain View, Calif.
[7]
Bordwell, D. and Thompson, K. 1997. Film Art: An Introduction, 5th ed. McGraw-Hill, New York.
[8]
Breiman, L. 1996. Bagging predictors. Mach. Learn. 24, 2, 123--140.
[9]
Chang, C.-C. and Lin, C.-J. 2001. LIBSVM : A Library for Support Vector Machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
[10]
Davis, M. 2003. Editing out video editing. IEEE Multimedia 10, 2, 54--64.
[11]
Dey, A. 2001. Understanding and using context. Personal Ubiquitous Comput. Journal 5, 1, 4--7.
[12]
Dourish, P. 2004. What we talk about when we talk about context. Personal Ubiquitous Comput. 8, 1, 19--30.
[13]
Gauvain, J., Lamel, L., and Adda, G. 2002. The LIMSI broadcast news transcription system. Speech Commun. 37, 1--2, 89--108.
[14]
Hanjalic, A., Lagendijk, R., and Biemond, J. 1999. Automated high-level movie segmentation for advanced video-retrieval systems. IEEE Trans. Circuits Syst. Video Technol. 9, 4, 580--588.
[15]
Hauptmann, A., Baron, R., Chen, M.-Y., Christel, M., Duygulu, P., Huang, C., Jin, R., Lin, W.-H., Ng, T., Moraveji, N., Papernick, N., Snoek, C., Tzanetakis, G., Yang, J., Yan, R., and Wactlar, H. 2003. Informedia at TRECVID 2003: Analyzing and searching broadcast news video. In Proceedings of the TRECVID Workshop. NIST Special Publication. Gaithersburg, Md.
[16]
Jain, A., Duin, R., and Mao, J. 2000. Statistical pattern recognition: A review. IEEE Trans. Pattern Anal. Mach. Intel. 22, 1, 4--37.
[17]
Nack, F. and Putz, W. 2004. Saying what it means: Semi-Automated (news) media annotation. Multimedia Tools Appl. 22, 3, 263--302.
[18]
Naphade, M., Kozintsev, I., and Huang, T. 2002. A factor graph framework for semantic video indexing. IEEE Trans. Circuits Syst. Video Technol. 12, 1, 40--52.
[19]
Platt, J. 2000. Probabilities for SV machines. In Advances in Large Margin Classifiers, A. Smola, et al., eds. MIT Press, Cambridge, Mass., 61--74.
[20]
Quénot, G., Moraru, D., Besacier, L., and Mulhem, P. 2002. CLIPS at TREC -11: Experiments in video retrieval. In Proceedings of the 11th Text REtrieval Conference, E. Voorhees and L. Buckland, eds. NIST Special Publication, vol. 500-251. Gaithersburg, Md.
[21]
Salton, G. and McGill, M. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, New York.
[22]
Sato, T., Kanade, T., Hughes, E., Smith, M., and Satoh, S. 1999. Video OCR: Indexing digital news libraries by recognition of superimposed caption. Multimedia Syst. 7, 5, 385--395.
[23]
Schapire, R. 1990. The strength of weak learnability. Mach. Learn. 5, 2, 197--227.
[24]
Schneiderman, H. and Kanade, T. 2004. Object detection using the statistics of parts. Int. J. Comput. Vision 56, 3, 151--177.
[25]
Smeaton, A., Kraaij, W., and Over, P. 2003. TRECVID 2003---An introduction. In Proceedings of the TRECVID Workshop. NIST Special Publication. Gaithersburg, Md.
[26]
Smeulders, A., Worring, M., Santini, S., Gupta, A., and Jain, R. 2000. Content based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22, 12, 1349--1380.
[27]
Smith, J. and Li, C.-S. 1998. Decoding image semantics using composite region templates. In Proceedings of the IEEE CVPR-98 Workshop on Content-Based Access to Image, Video Databases. Santa Barbara, Calif.
[28]
Snoek, C. and Worring, M. 2005a. Multimedia event-based video indexing using time intervals. IEEE Trans. Multimedia 7, 4, 638--647.
[29]
Snoek, C. and Worring, M. 2005b. Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools Appl. 25, 1, 5--35.
[30]
Snoek, C., Worring, M., and Hauptmann, A. 2004. Detection of TV news monologues by style analysis. In Proceedings of the IEEE International Conference on Multimedia & Expo. Taipei, Taiwan.
[31]
Sundaram, H. and Chang, S.-F. 2002. Computable scenes and structures in films. IEEE Trans. Multimedia 4, 4, 482--491.
[32]
Truong, B., Venkatesh, S., and Dorai, C. 2005. Extraction of film takes for cinematic analysis. Multimedia Tools Appl. 26, 3, 277--298.
[33]
Tseng, B., Lin, C.-Y., Naphade, M., Natsev, A., and Smith, J. 2003. Normalized classifier fusion for semantic visual concept detection. In Proceedings of the IEEE International Conference on Image Processing. vol. 2. Barcelona, Spain, 535--538.
[34]
Vapnik, V. 2000. The Nature of Statistical Learning Theory, 2nd ed. Springer Verlag, New York.
[35]
Vendrig, J. and Worring, M. 2002. Systematic evaluation of logical story unit segmentation. IEEE Trans. Multimedia 4, 4, 492--499.
[36]
Wactlar, H., Christel, M., Gong, Y., and Hauptmann, A. 1999. Lessons learned from building a terabyte digital video library. IEEE Computer 32, 2, 66--73.
[37]
Wolpert, D. 1992. Stacked generalization. Neural Netw. 5, 241--259.
[38]
Xie, L., Xu, P., Chang, S.-F., Divakaran, A., and Sun, H. 2004. Structure analysis of soccer video with domain knowledge and hidden Markov models. Pattern Recogn. Lett. 25, 7, 767--775.
[39]
Zhang, H.-J., Wu, J., Zhong, D., and Smoliar, S. 1997. An integrated system for content-based video retrieval and browsing. Pattern Recogn. 30, 4, 643--658.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 2, Issue 2
May 2006
82 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/1142020
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 May 2006
Published in TOMM Volume 2, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Benchmark evaluation
  2. multimedia understanding
  3. multimodal detectors
  4. news video indexing
  5. semantic classification
  6. statistical pattern recognition
  7. style analysis

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Smart Multimedia Information RetrievalAnalytics10.3390/analytics20100112:1(198-224)Online publication date: 20-Feb-2023
  • (2018)Video Content AnalysisEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_1018(4381-4388)Online publication date: 7-Dec-2018
  • (2017)Video Content AnalysisEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_1018-2(1-8)Online publication date: 19-Jan-2017
  • (2015)A generic framework for semantic video indexing based on visual concepts/contexts detectionMultimedia Tools and Applications10.1007/s11042-014-1955-974:4(1397-1421)Online publication date: 1-Feb-2015
  • (2015)Extended conceptual feedback for semantic multimedia indexingMultimedia Tools and Applications10.1007/s11042-014-1937-y74:4(1225-1248)Online publication date: 1-Feb-2015
  • (2014)A proposal for a taxonomy of semantic editing devices to support semantic classificationProceedings of the 2014 Conference on Research in Adaptive and Convergent Systems10.1145/2663761.2664225(34-39)Online publication date: 5-Oct-2014
  • (2014)Broadcasting Oneself: Visual Discovery of Vlogging StylesIEEE Transactions on Multimedia10.1109/TMM.2013.228489316:1(201-215)Online publication date: Jan-2014
  • (2014)Bag-of-Words Image Representation: Key Ideas and Further InsightFusion in Computer Vision10.1007/978-3-319-05696-8_2(29-52)Online publication date: 26-Mar-2014
  • (2013)[Invited Paper] Explain This to Me!ITE Transactions on Media Technology and Applications10.3169/mta.1.1011:2(101-117)Online publication date: 2013
  • (2012)Active learning with multiple classifiers for multimedia indexingMultimedia Tools and Applications10.1007/s11042-010-0599-760:2(403-417)Online publication date: 1-Sep-2012
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media