|
ABSTRACT
Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph-based approach, "MMG", to discover such cross-modal correlations.Our "MMG" method requires no tuning, no clustering, no user-determined constants; it can be applied to any multimedia collection, as long as we have a similarity function for each medium; and it scales linearly with the database size. We report auto-captioning experiments on the "standard" Corel image database of 680 MB, where it outperforms domain specific, fine-tuned methods by up to 10 percentage points in captioning accuracy (50% relative improvement).
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Albert, H. Jeong, and A.-L. Barabasi. Diameter of the world wide web. Nature, 401:130--131, 1999.
|
| |
2
|
|
| |
3
|
K. Barnard, P. Duygulu, and D. A. Forsyth. Clustering art. In IEEE Conf. on Computer Vision and Pattern Recognition, volume 2, pages 434--441, 2001.
|
| |
4
|
K. Barnard and D. A. Forsyth. Learning the semantics of words and pictures. In Int. Conf. on Computer Vision, pages 408--15, 2001.
|
| |
5
|
|
| |
6
|
P. G. Doyle and J. L. Snell. Random Walks and Electric Networks. Kluwer.
|
| |
7
|
|
| |
8
|
|
 |
9
|
|
 |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
|
| |
14
|
L. Lovasz. Random walks on graphs: A survey. Combinatorics, Paul Erdos is Eighty, 2:353--398, 1996.
|
| |
15
|
|
| |
16
|
Y. Mori, H. Takahashi, and R. Oka. Image-to-word transformation based on dividing and vector quantizing images with words. In First International Workshop on Multimedia Intelligent Storage and Retrieval Management, 1999.
|
| |
17
|
C. R. Palmer and C. Faloutsos. Electricity based external similarity of categorical attributes. In PAKDD 2003, May 2003.
|
| |
18
|
|
 |
19
|
Christos H. Papadimitriou , Hisao Tamaki , Prabhakar Raghavan , Santosh Vempala, Latent semantic indexing: a probabilistic analysis, Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.159-168, June 01-04, 1998, Seattle, Washington, United States
[doi> 10.1145/275487.275505]
|
| |
20
|
|
| |
21
|
|
| |
22
|
|
| |
23
|
Taher Haveliwala, S. Kamvar and G. Jeh. An analytical comparison of approaches to personalizing PageRank. Technical report, Stanford University, 2003.
|
| |
24
|
|
| |
25
|
|
| |
26
|
L. Wenyin, S. Dumais, Y. Sun, H. Zhang, M. Czerwinski, and B. Field. Semi-automatic image annotation. In INTERACT2001, 8th IFIP TC.13 Conference on Human-Computer Interaction, Tokyo, Japan July 9-13, 2001.
|
CITED BY 18
|
|
|
|
|
|
|
|
|
Kai Song , Yonghong Tian , Wen Gao , Tiejun Huang, Diversifying the image retrieval results, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
|
|
|
|
|
|
|
|
|
Hanghang Tong , Christos Faloutsos , Brian Gallagher , Tina Eliassi-Rad, Fast best-effort pattern matching in large attributed graphs, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
|
|
|
|
|
|
|
|
Zhen Guo , Zhongfei Zhang , Eric Xing , Christos Faloutsos, Enhanced max margin learning on multimodal data mining in a multimedia database, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-15, 2007, San Jose, California, USA
|
|
Jing Liu , Mingjing Li , Wei-Ying Ma , Qingshan Liu , Hanqing Lu, An adaptive graph model for automatic image annotation, Proceedings of the 8th ACM international workshop on Multimedia information retrieval, October 26-27, 2006, Santa Barbara, California, USA
|
|
|
|
|
|
|
|