ACM Home Page
Please provide us with feedback. Feedback
Automatic multimedia cross-modal correlation discovery
Full text PdfPdf (168 KB)
Source Conference on Knowledge Discovery in Data archive
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Seattle, WA, USA
POSTER SESSION: Research track posters table of contents
Pages: 653 - 658  
Year of Publication: 2004
ISBN:1-58113-888-1
Authors
Jia-Yu Pan  Carnegie Mellon University, Pittsburgh, PA
Hyung-Jeong Yang  Carnegie Mellon University, Pittsburgh, PA
Christos Faloutsos  Carnegie Mellon University, Pittsburgh, PA
Pinar Duygulu  Carnegie Mellon University, Pittsburgh, PA
Sponsors
SIGMOD: ACM Special Interest Group on Management of Data
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 12,   Downloads (12 Months): 94,   Citation Count: 18
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1014052.1014135
What is a DOI?

ABSTRACT

Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph-based approach, "MMG", to discover such cross-modal correlations.Our "MMG" method requires no tuning, no clustering, no user-determined constants; it can be applied to any multimedia collection, as long as we have a similarity function for each medium; and it scales linearly with the database size. We report auto-captioning experiments on the "standard" Corel image database of 680 MB, where it outperforms domain specific, fine-tuned methods by up to 10 percentage points in captioning accuracy (50% relative improvement).


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
A. Albert, H. Jeong, and A.-L. Barabasi. Diameter of the world wide web. Nature, 401:130--131, 1999.
 
2
 
3
K. Barnard, P. Duygulu, and D. A. Forsyth. Clustering art. In IEEE Conf. on Computer Vision and Pattern Recognition, volume 2, pages 434--441, 2001.
 
4
K. Barnard and D. A. Forsyth. Learning the semantics of words and pictures. In Int. Conf. on Computer Vision, pages 408--15, 2001.
 
5
 
6
P. G. Doyle and J. L. Snell. Random Walks and Electric Networks. Kluwer.
 
7
 
8
9
10
 
11
12
 
13
 
14
L. Lovasz. Random walks on graphs: A survey. Combinatorics, Paul Erdos is Eighty, 2:353--398, 1996.
 
15
 
16
Y. Mori, H. Takahashi, and R. Oka. Image-to-word transformation based on dividing and vector quantizing images with words. In First International Workshop on Multimedia Intelligent Storage and Retrieval Management, 1999.
 
17
C. R. Palmer and C. Faloutsos. Electricity based external similarity of categorical attributes. In PAKDD 2003, May 2003.
 
18
19
 
20
 
21
 
22
 
23
Taher Haveliwala, S. Kamvar and G. Jeh. An analytical comparison of approaches to personalizing PageRank. Technical report, Stanford University, 2003.
 
24
 
25
 
26
L. Wenyin, S. Dumais, Y. Sun, H. Zhang, M. Czerwinski, and B. Field. Semi-automatic image annotation. In INTERACT2001, 8th IFIP TC.13 Conference on Human-Computer Interaction, Tokyo, Japan July 9-13, 2001.

CITED BY  18
 

Collaborative Colleagues:
Jia-Yu Pan: colleagues
Hyung-Jeong Yang: colleagues
Christos Faloutsos: colleagues
Pinar Duygulu: colleagues