|
ABSTRACT
In this paper, we present a novel algorithm for generating audio-visual skims from computable scenes. Skims are useful for browsing digital libraries, and for on-demand summaries in set-top boxes. A computable scene is a chunk of data that exhibits consistencies with respect to chromaticity, lighting and sound. There are three key aspects to our approach: (a) visual complexity and grammar, (b) robust audio segmentation and (c) an utility model for skim generation. We define a measure of visual complexity of a shot, and map complexity to the minimum time for comprehending the shot. Then, we analyze the underlying visual grammar, since it makes the shot sequence meaningful. We segment the audio data into four classes, and then detect significant phrases in the speech segments. The utility functions are defined in terms of complexity and duration of the segment. The target skim is created using a general constrained utility maximization procedure that maximizes the information content and the coherence of the resulting skim. The objective function is constrained due to multimedia synchronization constraints, visual syntax and by penalty functions on audio and video segments. The user study results indicate that the optimal skims show statistically significant differences with other skims with compression rates up to 90%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
B. Adams et. al. Automated Film Rhythm Extraction for Scene Analysis, Proc. ICME 2001, Aug. 2001, Japan.
|
| |
2
|
B. Arons, Pitch-Based Emphasis Detection For Segmenting Speech Recordings, Proc. ICSLP 1994, Sep. 1994, vol. 4, pp. 1931--1934, Yokohama, Japan, 1994.
|
| |
3
|
|
| |
4
|
Michael G. Christel , Michael A. Smith , C. Roy Taylor , David B. Winkler, Evolving video skims into useful multimedia abstractions, Proceedings of the SIGCHI conference on Human factors in computing systems, p.171-178, April 18-23, 1998, Los Angeles, California, United States
[doi> 10.1145/274644.274670]
|
| |
5
|
|
| |
6
|
J. Feldman, Minimization of Boolean complexity in human concept learning, Nature, pp. 630--633, vol. 407, Oct. 2000.
|
| |
7
|
J. Hirschberg, B. Groz, Some Intonational Characteristics of Discourse Structure, Proc. ICSLP 1992.
|
| |
8
|
J. Hirschberg D. Litman, Empirical Studies on the Disambiguation of Cue Phrases, Computational Linguistics, 1992.
|
 |
9
|
Liwei He , Elizabeth Sanocki , Anoop Gupta , Jonathan Grudin, Auto-summarization of audio-video presentations, Proceedings of the seventh ACM international conference on Multimedia (Part 1), p.489-498, October 30-November 05, 1999, Orlando, Florida, United States
[doi> 10.1145/319463.319691]
|
 |
10
|
|
 |
11
|
|
| |
12
|
D. O'Shaughnessy, Recognition of Hesitations in Spontaneous Speech, Proc. ICASSP, 1992.
|
| |
13
|
S. Pfeiffer et. al. Abstracting Digital Movies Automatically, J. of Visual Communication and Image Representation, pp. 345--53, vol. 7, No. 4, Dec. 1996.
|
| |
14
|
|
| |
15
|
S. Sharff, The Elements of Cinema: Towards a Theory of Cinesthetic Impact, 1982, Columbia University Press.
|
| |
16
|
|
| |
17
|
|
| |
18
|
H. Sundaram, S.F. Chang, Computable Scenes and structures in Films, IEEE Trans. on Multimedia, Vol. 4, No. 2, June 2002.
|
| |
19
|
|
 |
20
|
Shingo Uchihashi , Jonathan Foote , Andreas Girgensohn , John Boreczky, Video Manga: generating semantically meaningful video summaries, Proceedings of the seventh ACM international conference on Multimedia (Part 1), p.383-392, October 30-November 05, 1999, Orlando, Florida, United States
[doi> 10.1145/319463.319654]
|
| |
21
|
D. Zhong, Segmentation, Indexing and Summarization of Digital Video Content PhD Thesis, Dept. Of Electrical Eng. Columbia University, NY, Jan. 2001.
|
CITED BY 18
|
|
|
|
|
|
|
|
|
|
Reede Ren , Punitha Puttu Swamy , Joemon M. Jose , Jana Urban, Attention-based video summarisation in rushes collection, Proceedings of the international workshop on TRECVID video summarization, p.89-93, September 28-28, 2007, Augsburg, Bavaria, Germany
|
|
Chitra L. Madhwacharyula , Marc Davis , Philippe Mulhem , Mohan S. Kankanhalli, Metadata handling: A video perspective, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), v.2 n.4, p.358-388, November 2006
|
|
|
|
|
|
|
|
|
|
|
|
Xi Shao , Changsheng Xu , Namunu C. Maddage , Qi Tian , Mohan S. Kankanhalli , Jesse S. Jin, Automatic summarization of music videos, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), v.2 n.2, p.127-148, May 2006
|
|
|
|
Hangzai Luo , Yuli Gao , Xiangyang Xue , Jinye Peng , Jianping Fan, Incorporating feature hierarchy and boosting to achieve more effective classifier training and concept-oriented video summarization and skimming, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), v.4 n.1, p.1-25, January 2008
|
|
|
|
|
|
|
|
|
Michael S. Lew , Nicu Sebe , Chabane Djeraba , Ramesh Jain, Content-based multimedia information retrieval: State of the art and challenges, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP), v.2 n.1, p.1-19, February 2006
|
|