ACM Home Page
Please provide us with feedback. Feedback
A utility framework for the automatic generation of audio-visual skims
Full text PdfPdf (488 KB)
Source International Multimedia Conference archive
Proceedings of the tenth ACM international conference on Multimedia table of contents
Juan-les-Pins, France
SESSION: Session 6: student best paper contest table of contents
Pages: 189 - 198  
Year of Publication: 2002
ISBN:1-58113-620-X
Authors
Hari Sundaram  Columbia University, New York, New York
Lexing Xie  Columbia University, New York, New York
Shih-Fu Chang  Columbia University, New York, New York
Sponsors
SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques
SIGCOMM: ACM Special Interest Group on Data Communication
SIGMULTIMEDIA: ACM Special Interest Group on Multimedia
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 7,   Downloads (12 Months): 71,   Citation Count: 18
Additional Information:

abstract   references   cited by   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/641007.641042
What is a DOI?

ABSTRACT

In this paper, we present a novel algorithm for generating audio-visual skims from computable scenes. Skims are useful for browsing digital libraries, and for on-demand summaries in set-top boxes. A computable scene is a chunk of data that exhibits consistencies with respect to chromaticity, lighting and sound. There are three key aspects to our approach: (a) visual complexity and grammar, (b) robust audio segmentation and (c) an utility model for skim generation. We define a measure of visual complexity of a shot, and map complexity to the minimum time for comprehending the shot. Then, we analyze the underlying visual grammar, since it makes the shot sequence meaningful. We segment the audio data into four classes, and then detect significant phrases in the speech segments. The utility functions are defined in terms of complexity and duration of the segment. The target skim is created using a general constrained utility maximization procedure that maximizes the information content and the coherence of the resulting skim. The objective function is constrained due to multimedia synchronization constraints, visual syntax and by penalty functions on audio and video segments. The user study results indicate that the optimal skims show statistically significant differences with other skims with compression rates up to 90%.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
B. Adams et. al. Automated Film Rhythm Extraction for Scene Analysis, Proc. ICME 2001, Aug. 2001, Japan.
 
2
B. Arons, Pitch-Based Emphasis Detection For Segmenting Speech Recordings, Proc. ICSLP 1994, Sep. 1994, vol. 4, pp. 1931--1934, Yokohama, Japan, 1994.
 
3
 
4
 
5
 
6
J. Feldman, Minimization of Boolean complexity in human concept learning, Nature, pp. 630--633, vol. 407, Oct. 2000.
 
7
J. Hirschberg, B. Groz, Some Intonational Characteristics of Discourse Structure, Proc. ICSLP 1992.
 
8
J. Hirschberg D. Litman, Empirical Studies on the Disambiguation of Cue Phrases, Computational Linguistics, 1992.
9
10
11
 
12
D. O'Shaughnessy, Recognition of Hesitations in Spontaneous Speech, Proc. ICASSP, 1992.
 
13
S. Pfeiffer et. al. Abstracting Digital Movies Automatically, J. of Visual Communication and Image Representation, pp. 345--53, vol. 7, No. 4, Dec. 1996.
 
14
 
15
S. Sharff, The Elements of Cinema: Towards a Theory of Cinesthetic Impact, 1982, Columbia University Press.
 
16
 
17
 
18
H. Sundaram, S.F. Chang, Computable Scenes and structures in Films, IEEE Trans. on Multimedia, Vol. 4, No. 2, June 2002.
 
19
20
 
21
D. Zhong, Segmentation, Indexing and Summarization of Digital Video Content PhD Thesis, Dept. Of Electrical Eng. Columbia University, NY, Jan. 2001.

CITED BY  18
 
 
Collaborative Colleagues:
Hari Sundaram: colleagues
Lexing Xie: colleagues
Shih-Fu Chang: colleagues

Peer to Peer - Readers of this Article have also read: