skip to main content
10.1145/1772690.1772783acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections

What are the most eye-catching and ear-catching features in the video?: implications for video summarization

Published: 26 April 2010 Publication History


Video summarization is a mechanism for generating short summaries of the video to help people quickly make sense of the content of the video before downloading or seeking more detailed information. To produce reliable automatic video summarization algorithms, it is essential to first understand how human beings create video summaries with manual efforts. This paper focuses on a corpus of instructional documentary video, and seeks to improve automatic video summaries by understanding what features in the video catch the eyes and ears of human assessors, and using these findings to inform automatic summarization algorithms. The paper contributes a thorough and valuable methodology for performing automatic video summarization, and the methodology can be extended to inform summarization of other video corpuses.


L. Agnihotri, K. Devera, T. McGee, and N. Dimitrove. Summarization of video programs based on closed captions. In Proc. SPIE. Conf. Storage and Retrieval for Media Databases, page 599--607, San Jose, CA, Jan. 2001.
B. Arons. Speechskimmer: A system for interactively skimming recorded speech. ACM Transactions onComputer Human Interaction, 4:3--38, 1997.
R. Cabasson and A. Divakaran. Automatic extraction of soccer video highlights using a combination of motion and audio features. In Proceedings of SPIE Conference on Storage and Retrieval for Media Databases 2003, volume 5021, pages 272--276, Santa Clara, CA, 2003.
F. R. Chen and M. Withgott. The use of emphasis to automatically summarize a spoken discourse. In Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on, volume 1, pages 229--232 vol.1, 1992.
M. Christel, S. Stevens, T. Kanade, M. Mauldin, R. Reddy, and H. Wactlar. Techniques for the creation and exploration of digital video libraries. In Multimedia Tools and Applications, B. Furht, Editor. Kluwer Academic Publishers, 1996.
M. G. Christel, M. A. Smith, C. R. Taylor, and D. B. Winkler. Evolving video skims into useful multimedia abstractions. In CHI '98: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 171--178, New York, NY, USA, 1998. ACMPress/Addison-Wesley Publishing Co.
A. Ekin and A. M. Tekalp. Automatic soccer video analysis and summarization. IEEE Trans. on Image Processing, 12:796--807, 2003.
B. Erol, D.-S. Lee, and J. Hull. Multimodal summarization of meeting recordings. In ICME '03: Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03), pages 25--28, Washington, DC, USA, 2003. IEEE Computer Society.
D. Farin, W. Effelsberg, and P. H. N. deWith. Robust clustering-based video-summarization with integration of domain-knowledge. In Proc. IEEE Int. Conf. Multimedia and Expo 2002 (ICME'2002), pages 89--92, Lausanne, Switzerland, 2002.
A. M. Ferman and A. M. Tekalp. Two-stage hierarchical video summary extraction to match low-level user browsing preferences. IEEE Trans. Multimedia, 5(2):244--256, Jun. 2003.
J. Foote, M. Cooper, and L. Wilcox. Enhanced video browsing using automatically extracted audio excerpts. IEEE, 2000.
M.E. Funk and C.A. Reid. Indexing consistency in MEDLINE. Bull Med Libr Assoc. 1983;71:176--183.
Y. Gong. Summarizing audiovisual contents of a video program. EURASIP J. Appl. Signal Process., 2003:160--169, 2003.
A. Hanjalic and H. Zhang. An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis. IEEE Trans. Circuits Syst. Video Technol, 9(8):1280--1289, 1999.
A. Hauptmann, M. Christel, W. Lin, B. Maher, J. Yang, R. Baron, and G. Xiang. Summarizing bbc rushes the informedia way. In TVS '07: Proceedings of the international workshop on TRECVID video summarization, New York, NY, USA, 2007. ACM.
L. Kennedy and D. Ellis. Pitch-based emphasis detection for characterization of meeting recordings. In Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Virgin Islands, Dec. 2003.
J. R. Landis and G. G. Koch. The measurement of observer agreement for categorical data. Biometrics, Vol. 33, No. 1, pages 159--174, Mar. 1977.
B. Li, H. Pan, and I. Sezan. A general framework for sports video summarization with its application to soccer. In Proc. IEEE Int. Conf. Acoustic, Speech and Signal Processing, pages 169--172, Hong Kong, 2003.
Y. Li, C. Dorai, and R. Farrell. Creating magic: system for generating learning object metadata for instructional content. In MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia, pages 367--370, New York, NY, USA, 2005. ACM.
W.-N. Lie and C.-M. Lai. News video summarization based on spatial and motion feature analysis. In Proceedings of the 5th Pacific Rim Conference on Multimedia. Lecture Notes in Computer Science, volume 3332, pages 246--255, 2004.
R. Lienhart, S. Pfeiffer, and W. Effelsberg. Video abstracting. Commun. ACM, 40(12):54--62, 1997.
G. Marchionini, Y. Song, and R. Farrell. Multimedia surrogates for video gisting: Toward combining spoken words and imagery. In Journal of Information & Process Manage. 45(6): 615--630.
P. Over, A. F. Smeaton, and P. Kelly. The trecvid 2007 bbc rushes summarization evaluation pilot. In TVS '07: Proceedings of the international workshop on TRECVID video summarization, pages 1--15, New York, NY, USA, 2007. ACM.
K. Ratakonda, I. M. Sezan, and R. J. Crinon. Hierarchical video summarization. In Proc. SPIE Conf. Visual Communications and Image Processing, volume 3653, pages 1531--1541, San Jose, CA, Jan. 1999.
Y. Song and G. Marchionini. Effects of audio and visual surrogates for making sense of digital video. In CHI '07: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 867--876, New York, NY, USA.
C. Taskiran, Z. Pizlo, Amir, D. A., Ponceleon, and E. J. Delp. Automated video program summarization using speech transcripts. IEEE Transactions on Multimedia, 8(4):775--791, 2006.
S. Uchihashi, J. Foote, A. Girgensohn, and J. Boreczky. Video manga: Generating semantically meaningful video summaries. In ACM Multimedia'99, pages 383--392. ACM Press, 1999.
B. M. Wildemuth, G. Marchionini, T. Wilkens, M. Yang, G. Geisler, B. Fowler, A. Hughes, and X. Mu. (2002). Alternative surrogates for video objects in a digital library: Users' perspectives on their relative usability. In ECDL '02: Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries, pages 493--507, London, UK. Springer-Verlag.
J. Yang and A. G. Hauptmann. Exploring temporal consistency for video analysis and retrieval. In MIR' 06: Proceedings of the 8th ACM international workshop on Multimedia information retrieval, pages 33--42, New York, NY, USA, 2006. ACM.

Cited By

View all
  • (2019)Interacting with Heterogeneous Information EcologiesProceedings of the 2019 Conference on Human Information Interaction and Retrieval10.1145/3295750.3298967(445-448)Online publication date: 8-Mar-2019
  • (2013)Beyond audio and video retrieval: topic-oriented multimedia summarizationInternational Journal of Multimedia Information Retrieval10.1007/s13735-012-0028-y2:2(131-144)Online publication date: 4-Jan-2013
  • (2012)Advanced Mobile Lecture ViewingInternational Journal of Handheld Computing Research10.4018/jhcr.20120401043:2(58-72)Online publication date: 1-Apr-2012
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Other conferences
WWW '10: Proceedings of the 19th international conference on World wide web
April 2010
1407 pages


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 April 2010


Request permissions for this article.

Check for updates

Author Tags

  1. audio salience
  2. video summarization
  3. visual salience


  • Research-article


WWW '10
WWW '10: The 19th International World Wide Web Conference
April 26 - 30, 2010
North Carolina, Raleigh, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Feb 2025

Other Metrics


Cited By

View all
  • (2019)Interacting with Heterogeneous Information EcologiesProceedings of the 2019 Conference on Human Information Interaction and Retrieval10.1145/3295750.3298967(445-448)Online publication date: 8-Mar-2019
  • (2013)Beyond audio and video retrieval: topic-oriented multimedia summarizationInternational Journal of Multimedia Information Retrieval10.1007/s13735-012-0028-y2:2(131-144)Online publication date: 4-Jan-2013
  • (2012)Advanced Mobile Lecture ViewingInternational Journal of Handheld Computing Research10.4018/jhcr.20120401043:2(58-72)Online publication date: 1-Apr-2012
  • (2012)"You've got video"Proceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/2207676.2207755(565-568)Online publication date: 5-May-2012

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.



View this article in ePub.







Share this Publication link

Share on social media