Article

Free Access

Auto-summarization of audio-video presentations

Authors:
Liwei He

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

,
Elizabeth Sanocki

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

,
Anoop Gupta

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

,
Jonathan Grudin

Microsoft Research, Redmond, WA

Microsoft Research, Redmond, WA
View Profile

MULTIMEDIA '99: Proceedings of the seventh ACM international conference on Multimedia (Part 1)October 1999Pages 489–498https://doi.org/10.1145/319463.319691

Published:30 October 1999Publication History

MULTIMEDIA '99: Proceedings of the seventh ACM international conference on Multimedia (Part 1)

Pages 489–498

ABSTRACT

As streaming audio-video technology becomes widespread, there is a dramatic increase in the amount of multimedia content available on the net. Users face a new challenge: How to examine large amounts of multimedia content quickly. One technique that can enable quick overview of multimedia is video summaries; that is, a shorter version assembled by picking important segments from the original.

We evaluate three techniques for automatic creation of summaries for online audio-video presentations. These techniques exploit information in the audio signal (e.g., pitch and pause information), knowledge of slide transition points in the presentation, and information about access patterns of previous users. We report a user study that compares automatically generated summaries that are 20%-25% the length of full presentations to author generated summaries. Users learn from the computer-generated summaries, although less than from authors' summaries. They initially find computer-generated summaries less coherent, but quickly grow accustomed to them.

References

1.Aoki, H,, Shimotsuji, S. & Hori, O. A Shot Classification Method of Selecting Effective Key-frames for Video Browsing. In Proceedings of the 6th ACM international conference on Multimedia, 1996, pp 1-10. Google ScholarDigital Library
2.Arman, F., Depommier, R., Hsu, A. & Chiu M.Y. Contentbased Browsing of Video Sequences, In Proceedings of the 6th ACM international conference on Multimedia, 1994, pp 9'7-103. Google ScholarDigital Library
3.Arons, B. Techniques, Perception, and Applications of Time- Compressed Speech. In Proceedings of ~992 Conference, American Voice I/O Society, Sep. 1992, pp. 169-177.Google Scholar
4.Arons, B. Pitch-based Emphasis Detection for Segmenting Speech Recordings. In Proceedings of International Conferetzce on Spoken Language Processing, vol. 4, 1994, pp 1931-I934.Google Scholar
5.Arons, B. SpeechSkimmer: A System for Interactively Skimming Recorded Speech. A CM Transactions on Computer Human Interaction, 4, 1, 1997, 3-38. Google ScholarDigital Library
6.Chert, F.R. & Withgott M. The use of emphasis to automatically summarize a spoken discourse, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 229-233. 1992. IEEE.Google Scholar
7.Christel, M.G., Smith, M.A., Taylor, C.R. & Winkler, D.B. Evolving Video Skims into Useful Multimedia Abstractions. In Proceedings of CHl, April 1998, pp. 171-178. Google ScholarDigital Library
8.Foote, J., Boreczky, J., Girgensohn, A. & Wilcox, L. An Intelligent Media Brower using Automatic Multimodal Analysis. In Proceedings of A CM Multimedia, September 1998, pp. 375-380. Google ScholarDigital Library
9.Gun, C.K. & Donaldson, R.W. Adaptive Silence Deletion for Speech Storage and Voice Mail Applications. IEEE Transactions on Acoustics, Speech, and Signal Processing 36, 6 (Jun. 1988), pp 924-927.Google Scholar
10.Heiman, G.W., Leo, R.J., Leighbody, G., & Bowler, K. Word Intelligibility Decrements and the Comprehension of Time- Compressed Speech. Perception and Psychophysics 40, 6 ( 1986): 407-411.Google ScholarCross Ref
11.Hirschberg, J. & Grosz, B. Intonational Features of Local and Global Discourse. In Proceedings of the Speech and Natural Language Workshop, San Mateo, CA: Morgan Kaufmann Publishers, 1992, pp. 441-446. Google ScholarDigital Library
12.Ju, S.X., Black, M.J., Minnerman, S. & Kimber D. Analysis of Gesture and Action in Technical Talks for Video Indexing. in IEEE Trans. on Circttits and Svstems.{br Video Technology.Google Scholar
13.Kutik, E.J., Cooper, W.E. & Boyce, S. Declination of Fundamental Frequency in Speakers' Production of Parenthetical and Main Clauses. Journal of the Acoustic Society of America 73, 5 (1983), pp 1731 - 1738.Google ScholarCross Ref
14.Lienhart, R., Pfeiffer, S., Fischer S. & Effeisberg, W. Video Abstracting, A CM Communications, December 1997. Google ScholarDigital Library
15.Medan, Y., Yair, E. & Chazan, D. Super Resolution Pitch Determination of Speech Signals, IEEE Transactions on Signal Processing, 39(1), Jan, 1991, pp 40-48.Google ScholarDigital Library
16.Merlino, A., Morey, D. 8,: Maybury, M. Broadcast News Navigation Using Story Segmentation, In Proceedings of the 6th ACM international conference on Multimedia, 1997. Google ScholarDigital Library
17.Omoigui, N., He, L., Gupta, A., Grudin, J. & Sanocki, E. Time-compression: System Concerns, Usage, and Benefits. Proceedings of A CM Conference on Computer Human Interaction, 1999. Google ScholarDigital Library
18.Ponceleon, D., Srinivasan, S., Amir, A., Petkovic, D. & Diklic, D. Key to Effective Video Retrieval: Effective Cataloging and Browsing. In Proceedings of the 6th ACM international conference on Multimedia, September 1998. Google ScholarDigital Library
19.Resnick, P. & Vafian, H.R. (Guest Editors) Recommender Systems. In ACM Communications, March 1997. Google ScholarDigital Library
20.Silverman, K.E.A. The Structure and Processing of Fundamental Frequency Contours. Ph.D. dissertation, University of Cambridge, Apr. 1987.Google Scholar
21.Stanford Online: Masters in Electricat Engineering, 1998. http ://scpd.stan ford.edu/cee/telecom/onlinedegree.htmlGoogle Scholar
22.Smith M. and Kanade T. Video skimming and characterization through the combination of image and language understanding techniques. Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR), 775.781. 1997. IEEE. Google ScholarDigital Library
23.Stifelman, L. The Audio Notebook: Paper and Pen Interaction with Structured Speech Ph.D. dissertation, MIT Media Laboratory, 1997. Google ScholarDigital Library
24.Stifelman, L.J., Arons, B., Schmandt, C. & Hulteen, E.A. VoiceNotes: A Speech Interface for a Hand-Held Voice Notetaker. Proc. INTERCHI'93 (Amsterdam, 1993), ACM. Google ScholarDigital Library
25.Tonomura, Y. & Abe, S., Content Oriented Visual interface Using Video Icons for Visual Database Systems. In Journal of Visual Languages and Computing, vol. l, 1990. pp 183- 198.Google Scholar
26.Zhang, H.J., Low, C.Y., Smoliar, S.W. and Wu, J.H. Video parsing, retrieval and browsing: an integrated and contentbased solution. In Proceedings of A CM Multimedia, September 1995, pp. 15-24. Google ScholarDigital Library

Index Terms

Auto-summarization of audio-video presentations
1. Information systems
  1. Information systems applications
    1. Multimedia information systems

Recommendations

Design lessons from deployment of on-demand video
CHI EA '99: CHI '99 Extended Abstracts on Human Factors in Computing Systems

Streaming video to the desktop is increasingly widespread. A key application is in training, making information available over the Internet or corporate intranets, in real time or as archived presentations. How should a presentation be redesigned for ...
Read More
CatchLive: Real-time Summarization of Live Streams with Stream Content and Interaction Data
CHI '22: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems

Live streams usually last several hours with many viewers joining in the middle. Viewers who join in the middle often want to understand what has happened in the stream. However, catching up with the earlier parts is challenging because it is difficult ...
Read More
Capturing Conference Presentations

Working on multimedia and e-learning areas, you might have heard about the Berkeley MPEG-1 Tools, the Berkeley Multimedia, Interfaces,and Graphics (MIG) Seminar/Lecture Webcasting System, or the Open Mash Streaming Media Toolkit. All these achievements ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MULTIMEDIA '99: Proceedings of the seventh ACM international conference on Multimedia (Part 1)
October 1999
516 pages
ISBN:1581131518
DOI:10.1145/319463
Chairmen:
John Buford
GTE Laboratories
,
Scott Stevens
Carnegie Mellon Univ.
,
Dick Bulterman
CWI
,
Kevin Jeffay
Univ. of North Carolina, Chapel Hill
,
HongJiang Zhang
Microsoft Research, China
Copyright © 1999 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 October 1999
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
corporate training
digital library
streaming media
user evaluation
user log analysis
video on-demand
video summarization
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 138
  Total Citations
  View Citations
- 1,692
  Total Downloads
- Downloads (Last 12 months)155
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Auto-summarization of audio-video presentations

MULTIMEDIA '99: Proceedings of the seventh ACM international conference on Multimedia (Part 1)

ABSTRACT

References

Cited By

Index Terms

Recommendations

Design lessons from deployment of on-demand video

CatchLive: Real-time Summarization of Live Streams with Stream Content and Interaction Data

Capturing Conference Presentations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Auto-summarization of audio-video presentations

MULTIMEDIA '99: Proceedings of the seventh ACM international conference on Multimedia (Part 1)

ABSTRACT

References

Cited By

Index Terms

Recommendations

Design lessons from deployment of on-demand video

CatchLive: Real-time Summarization of Live Streams with Stream Content and Interaction Data

Capturing Conference Presentations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media