ABSTRACT
As streaming audio-video technology becomes widespread, there is a dramatic increase in the amount of multimedia content available on the net. Users face a new challenge: How to examine large amounts of multimedia content quickly. One technique that can enable quick overview of multimedia is video summaries; that is, a shorter version assembled by picking important segments from the original.
We evaluate three techniques for automatic creation of summaries for online audio-video presentations. These techniques exploit information in the audio signal (e.g., pitch and pause information), knowledge of slide transition points in the presentation, and information about access patterns of previous users. We report a user study that compares automatically generated summaries that are 20%-25% the length of full presentations to author generated summaries. Users learn from the computer-generated summaries, although less than from authors' summaries. They initially find computer-generated summaries less coherent, but quickly grow accustomed to them.
- 1.Aoki, H,, Shimotsuji, S. & Hori, O. A Shot Classification Method of Selecting Effective Key-frames for Video Browsing. In Proceedings of the 6th ACM international conference on Multimedia, 1996, pp 1-10. Google ScholarDigital Library
- 2.Arman, F., Depommier, R., Hsu, A. & Chiu M.Y. Contentbased Browsing of Video Sequences, In Proceedings of the 6th ACM international conference on Multimedia, 1994, pp 9'7-103. Google ScholarDigital Library
- 3.Arons, B. Techniques, Perception, and Applications of Time- Compressed Speech. In Proceedings of ~992 Conference, American Voice I/O Society, Sep. 1992, pp. 169-177.Google Scholar
- 4.Arons, B. Pitch-based Emphasis Detection for Segmenting Speech Recordings. In Proceedings of International Conferetzce on Spoken Language Processing, vol. 4, 1994, pp 1931-I934.Google Scholar
- 5.Arons, B. SpeechSkimmer: A System for Interactively Skimming Recorded Speech. A CM Transactions on Computer Human Interaction, 4, 1, 1997, 3-38. Google ScholarDigital Library
- 6.Chert, F.R. & Withgott M. The use of emphasis to automatically summarize a spoken discourse, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 229-233. 1992. IEEE.Google Scholar
- 7.Christel, M.G., Smith, M.A., Taylor, C.R. & Winkler, D.B. Evolving Video Skims into Useful Multimedia Abstractions. In Proceedings of CHl, April 1998, pp. 171-178. Google ScholarDigital Library
- 8.Foote, J., Boreczky, J., Girgensohn, A. & Wilcox, L. An Intelligent Media Brower using Automatic Multimodal Analysis. In Proceedings of A CM Multimedia, September 1998, pp. 375-380. Google ScholarDigital Library
- 9.Gun, C.K. & Donaldson, R.W. Adaptive Silence Deletion for Speech Storage and Voice Mail Applications. IEEE Transactions on Acoustics, Speech, and Signal Processing 36, 6 (Jun. 1988), pp 924-927.Google Scholar
- 10.Heiman, G.W., Leo, R.J., Leighbody, G., & Bowler, K. Word Intelligibility Decrements and the Comprehension of Time- Compressed Speech. Perception and Psychophysics 40, 6 ( 1986): 407-411.Google ScholarCross Ref
- 11.Hirschberg, J. & Grosz, B. Intonational Features of Local and Global Discourse. In Proceedings of the Speech and Natural Language Workshop, San Mateo, CA: Morgan Kaufmann Publishers, 1992, pp. 441-446. Google ScholarDigital Library
- 12.Ju, S.X., Black, M.J., Minnerman, S. & Kimber D. Analysis of Gesture and Action in Technical Talks for Video Indexing. in IEEE Trans. on Circttits and Svstems.{br Video Technology.Google Scholar
- 13.Kutik, E.J., Cooper, W.E. & Boyce, S. Declination of Fundamental Frequency in Speakers' Production of Parenthetical and Main Clauses. Journal of the Acoustic Society of America 73, 5 (1983), pp 1731 - 1738.Google ScholarCross Ref
- 14.Lienhart, R., Pfeiffer, S., Fischer S. & Effeisberg, W. Video Abstracting, A CM Communications, December 1997. Google ScholarDigital Library
- 15.Medan, Y., Yair, E. & Chazan, D. Super Resolution Pitch Determination of Speech Signals, IEEE Transactions on Signal Processing, 39(1), Jan, 1991, pp 40-48.Google ScholarDigital Library
- 16.Merlino, A., Morey, D. 8,: Maybury, M. Broadcast News Navigation Using Story Segmentation, In Proceedings of the 6th ACM international conference on Multimedia, 1997. Google ScholarDigital Library
- 17.Omoigui, N., He, L., Gupta, A., Grudin, J. & Sanocki, E. Time-compression: System Concerns, Usage, and Benefits. Proceedings of A CM Conference on Computer Human Interaction, 1999. Google ScholarDigital Library
- 18.Ponceleon, D., Srinivasan, S., Amir, A., Petkovic, D. & Diklic, D. Key to Effective Video Retrieval: Effective Cataloging and Browsing. In Proceedings of the 6th ACM international conference on Multimedia, September 1998. Google ScholarDigital Library
- 19.Resnick, P. & Vafian, H.R. (Guest Editors) Recommender Systems. In ACM Communications, March 1997. Google ScholarDigital Library
- 20.Silverman, K.E.A. The Structure and Processing of Fundamental Frequency Contours. Ph.D. dissertation, University of Cambridge, Apr. 1987.Google Scholar
- 21.Stanford Online: Masters in Electricat Engineering, 1998. http ://scpd.stan ford.edu/cee/telecom/onlinedegree.htmlGoogle Scholar
- 22.Smith M. and Kanade T. Video skimming and characterization through the combination of image and language understanding techniques. Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR), 775.781. 1997. IEEE. Google ScholarDigital Library
- 23.Stifelman, L. The Audio Notebook: Paper and Pen Interaction with Structured Speech Ph.D. dissertation, MIT Media Laboratory, 1997. Google ScholarDigital Library
- 24.Stifelman, L.J., Arons, B., Schmandt, C. & Hulteen, E.A. VoiceNotes: A Speech Interface for a Hand-Held Voice Notetaker. Proc. INTERCHI'93 (Amsterdam, 1993), ACM. Google ScholarDigital Library
- 25.Tonomura, Y. & Abe, S., Content Oriented Visual interface Using Video Icons for Visual Database Systems. In Journal of Visual Languages and Computing, vol. l, 1990. pp 183- 198.Google Scholar
- 26.Zhang, H.J., Low, C.Y., Smoliar, S.W. and Wu, J.H. Video parsing, retrieval and browsing: an integrated and contentbased solution. In Proceedings of A CM Multimedia, September 1995, pp. 15-24. Google ScholarDigital Library
Index Terms
- Auto-summarization of audio-video presentations
Recommendations
Design lessons from deployment of on-demand video
CHI EA '99: CHI '99 Extended Abstracts on Human Factors in Computing SystemsStreaming video to the desktop is increasingly widespread. A key application is in training, making information available over the Internet or corporate intranets, in real time or as archived presentations. How should a presentation be redesigned for ...
CatchLive: Real-time Summarization of Live Streams with Stream Content and Interaction Data
CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing SystemsLive streams usually last several hours with many viewers joining in the middle. Viewers who join in the middle often want to understand what has happened in the stream. However, catching up with the earlier parts is challenging because it is difficult ...
Capturing Conference Presentations
Working on multimedia and e-learning areas, you might have heard about the Berkeley MPEG-1 Tools, the Berkeley Multimedia, Interfaces,and Graphics (MIG) Seminar/Lecture Webcasting System, or the Open Mash Streaming Media Toolkit. All these achievements ...
Comments