skip to main content
10.1145/319463.319691acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article
Free Access

Auto-summarization of audio-video presentations

Published:30 October 1999Publication History

ABSTRACT

As streaming audio-video technology becomes widespread, there is a dramatic increase in the amount of multimedia content available on the net. Users face a new challenge: How to examine large amounts of multimedia content quickly. One technique that can enable quick overview of multimedia is video summaries; that is, a shorter version assembled by picking important segments from the original.

We evaluate three techniques for automatic creation of summaries for online audio-video presentations. These techniques exploit information in the audio signal (e.g., pitch and pause information), knowledge of slide transition points in the presentation, and information about access patterns of previous users. We report a user study that compares automatically generated summaries that are 20%-25% the length of full presentations to author generated summaries. Users learn from the computer-generated summaries, although less than from authors' summaries. They initially find computer-generated summaries less coherent, but quickly grow accustomed to them.

References

  1. 1.Aoki, H,, Shimotsuji, S. & Hori, O. A Shot Classification Method of Selecting Effective Key-frames for Video Browsing. In Proceedings of the 6th ACM international conference on Multimedia, 1996, pp 1-10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.Arman, F., Depommier, R., Hsu, A. & Chiu M.Y. Contentbased Browsing of Video Sequences, In Proceedings of the 6th ACM international conference on Multimedia, 1994, pp 9'7-103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.Arons, B. Techniques, Perception, and Applications of Time- Compressed Speech. In Proceedings of ~992 Conference, American Voice I/O Society, Sep. 1992, pp. 169-177.Google ScholarGoogle Scholar
  4. 4.Arons, B. Pitch-based Emphasis Detection for Segmenting Speech Recordings. In Proceedings of International Conferetzce on Spoken Language Processing, vol. 4, 1994, pp 1931-I934.Google ScholarGoogle Scholar
  5. 5.Arons, B. SpeechSkimmer: A System for Interactively Skimming Recorded Speech. A CM Transactions on Computer Human Interaction, 4, 1, 1997, 3-38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.Chert, F.R. & Withgott M. The use of emphasis to automatically summarize a spoken discourse, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 229-233. 1992. IEEE.Google ScholarGoogle Scholar
  7. 7.Christel, M.G., Smith, M.A., Taylor, C.R. & Winkler, D.B. Evolving Video Skims into Useful Multimedia Abstractions. In Proceedings of CHl, April 1998, pp. 171-178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.Foote, J., Boreczky, J., Girgensohn, A. & Wilcox, L. An Intelligent Media Brower using Automatic Multimodal Analysis. In Proceedings of A CM Multimedia, September 1998, pp. 375-380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.Gun, C.K. & Donaldson, R.W. Adaptive Silence Deletion for Speech Storage and Voice Mail Applications. IEEE Transactions on Acoustics, Speech, and Signal Processing 36, 6 (Jun. 1988), pp 924-927.Google ScholarGoogle Scholar
  10. 10.Heiman, G.W., Leo, R.J., Leighbody, G., & Bowler, K. Word Intelligibility Decrements and the Comprehension of Time- Compressed Speech. Perception and Psychophysics 40, 6 ( 1986): 407-411.Google ScholarGoogle ScholarCross RefCross Ref
  11. 11.Hirschberg, J. & Grosz, B. Intonational Features of Local and Global Discourse. In Proceedings of the Speech and Natural Language Workshop, San Mateo, CA: Morgan Kaufmann Publishers, 1992, pp. 441-446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.Ju, S.X., Black, M.J., Minnerman, S. & Kimber D. Analysis of Gesture and Action in Technical Talks for Video Indexing. in IEEE Trans. on Circttits and Svstems.{br Video Technology.Google ScholarGoogle Scholar
  13. 13.Kutik, E.J., Cooper, W.E. & Boyce, S. Declination of Fundamental Frequency in Speakers' Production of Parenthetical and Main Clauses. Journal of the Acoustic Society of America 73, 5 (1983), pp 1731 - 1738.Google ScholarGoogle ScholarCross RefCross Ref
  14. 14.Lienhart, R., Pfeiffer, S., Fischer S. & Effeisberg, W. Video Abstracting, A CM Communications, December 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.Medan, Y., Yair, E. & Chazan, D. Super Resolution Pitch Determination of Speech Signals, IEEE Transactions on Signal Processing, 39(1), Jan, 1991, pp 40-48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.Merlino, A., Morey, D. 8,: Maybury, M. Broadcast News Navigation Using Story Segmentation, In Proceedings of the 6th ACM international conference on Multimedia, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.Omoigui, N., He, L., Gupta, A., Grudin, J. & Sanocki, E. Time-compression: System Concerns, Usage, and Benefits. Proceedings of A CM Conference on Computer Human Interaction, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.Ponceleon, D., Srinivasan, S., Amir, A., Petkovic, D. & Diklic, D. Key to Effective Video Retrieval: Effective Cataloging and Browsing. In Proceedings of the 6th ACM international conference on Multimedia, September 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.Resnick, P. & Vafian, H.R. (Guest Editors) Recommender Systems. In ACM Communications, March 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.Silverman, K.E.A. The Structure and Processing of Fundamental Frequency Contours. Ph.D. dissertation, University of Cambridge, Apr. 1987.Google ScholarGoogle Scholar
  21. 21.Stanford Online: Masters in Electricat Engineering, 1998. http ://scpd.stan ford.edu/cee/telecom/onlinedegree.htmlGoogle ScholarGoogle Scholar
  22. 22.Smith M. and Kanade T. Video skimming and characterization through the combination of image and language understanding techniques. Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR), 775.781. 1997. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23.Stifelman, L. The Audio Notebook: Paper and Pen Interaction with Structured Speech Ph.D. dissertation, MIT Media Laboratory, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. 24.Stifelman, L.J., Arons, B., Schmandt, C. & Hulteen, E.A. VoiceNotes: A Speech Interface for a Hand-Held Voice Notetaker. Proc. INTERCHI'93 (Amsterdam, 1993), ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25.Tonomura, Y. & Abe, S., Content Oriented Visual interface Using Video Icons for Visual Database Systems. In Journal of Visual Languages and Computing, vol. l, 1990. pp 183- 198.Google ScholarGoogle Scholar
  26. 26.Zhang, H.J., Low, C.Y., Smoliar, S.W. and Wu, J.H. Video parsing, retrieval and browsing: an integrated and contentbased solution. In Proceedings of A CM Multimedia, September 1995, pp. 15-24. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Auto-summarization of audio-video presentations

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MULTIMEDIA '99: Proceedings of the seventh ACM international conference on Multimedia (Part 1)
      October 1999
      516 pages
      ISBN:1581131518
      DOI:10.1145/319463

      Copyright © 1999 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 October 1999

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader