Article

Automatic multimedia cross-modal correlation discovery

Authors:

Hyung-Jeong Yang,

Christos Faloutsos,

Pinar DuyguluAuthors Info & Claims

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 653 - 658

https://doi.org/10.1145/1014052.1014135

Published: 22 August 2004 Publication History

Abstract

Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph-based approach, "MMG", to discover such cross-modal correlations.Our "MMG" method requires no tuning, no clustering, no user-determined constants; it can be applied to any multimedia collection, as long as we have a similarity function for each medium; and it scales linearly with the database size. We report auto-captioning experiments on the "standard" Corel image database of 680 MB, where it outperforms domain specific, fine-tuned methods by up to 10 percentage points in captioning accuracy (50% relative improvement).

References

[1]

A. Albert, H. Jeong, and A.-L. Barabasi. Diameter of the world wide web. Nature, 401:130--131, 1999.

[2]

K. Barnard, P. Duygulu, N. de Freitas, D. A. Forsyth, D. B. lei, and M. Jordan. Matching words and pictures. Journal of Machine Learning Research, 3:1107--1135, 2003.

Digital Library

[3]

K. Barnard, P. Duygulu, and D. A. Forsyth. Clustering art. In IEEE Conf. on Computer Vision and Pattern Recognition, volume 2, pages 434--441, 2001.

[4]

K. Barnard and D. A. Forsyth. Learning the semantics of words and pictures. In Int. Conf. on Computer Vision, pages 408--15, 2001.

[5]

S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International World Wide Web Conference, 1998.

Digital Library

[6]

P. G. Doyle and J. L. Snell. Random Walks and Electric Networks. Kluwer.

[7]

P. Duygulu, K. Barnard, N. Freitas, and D. A. Forsyth. Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In Seventh European Conference on Computer Vision (ECCV), volume 4, pages 97--112, 2002.

Digital Library

[8]

C. Faloutsos. Searching Multimedia Databases by Content. Kluwer, 1996.

Digital Library

[9]

T. H. Haveliwala. Topic-sensitive PageRank. In WWW2002, May 7-11 2002.

Digital Library

[10]

J. Jeon, V. Lavrenko, and R. Manmatha. Automatic image annotation and retrieval using cross-media relevance models. In 26th Annual International ACM SIGIR Conference, July 28-August 1, 2003, Toronto, Canada.

Digital Library

[11]

J. Kleinberg. Authoritative sources in a hyperlinked environment. In Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.

Digital Library

[12]

T. G. Kolda and D. P. O'Leary. A semidiscrete matrix decomposition for latent semantic indexing information retrieval. ACM Transactions on Information Systems, 16(4):322--346, 1998.

Digital Library

[13]

J. Li and J. Z. Wang. Automatic linguistic indexing of pictures by a statistical modeling app roach. IEEE Trans. on Pattern Analysis and Machine Intelligence, 25(10):14, 2003.

Digital Library

[14]

L. Lovasz. Random walks on graphs: A survey. Combinatorics, Paul Erdos is Eighty, 2:353--398, 1996.

[15]

O. Maron and A. L. Ratan. Multiple-instance learning for natural scene classification. In The Fifteenth International Conference on Machine Learning, 1998.

Digital Library

[16]

Y. Mori, H. Takahashi, and R. Oka. Image-to-word transformation based on dividing and vector quantizing images with words. In First International Workshop on Multimedia Intelligent Storage and Retrieval Management, 1999.

[17]

C. R. Palmer and C. Faloutsos. Electricity based external similarity of categorical attributes. In PAKDD 2003, May 2003.

Digital Library

[18]

J.-Y. Pan and C. Faloutsos. VideoCube: a novel tool for video mining and classification. In Proceedings of the Fifth International Conference on Asian Digital Libraries (ICADL 2002), 2002.

Digital Library

[19]

C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: A probabilistic analysis. In PODS 98, 1998.

Digital Library

[20]

S. Satoh, Y. Nakamura, and T. Kanade. Name-it: Naming and detecting faces in news videos. IEEE Multimedia, 6(1), January-March 1999.

Digital Library

[21]

T. Sellis, N. Roussopoulos, and C. Faloutsos. The R+-tree: A dynamic index for multi-dimensional objects. In 12th International Conf. on VLDB, pages 507--518, Sept. 1987.

Digital Library

[22]

J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(8):888--905, 2000.

Digital Library

[23]

Taher Haveliwala, S. Kamvar and G. Jeh. An analytical comparison of approaches to personalizing PageRank. Technical report, Stanford University, 2003.

[24]

G. Tzanetakis and P. Cook. MARSYAS: A framework for audio analysis. Organized Sound, 4(3), 2000.

Digital Library

[25]

H. Wactlar, M. Christel, Y. Gong, and A. Hauptmann. Lessons learned from the creation and deployment of a terabyte digital video library. IEEE Computer, 32(2):66--73, February 1999.

Digital Library

[26]

L. Wenyin, S. Dumais, Y. Sun, H. Zhang, M. Czerwinski, and B. Field. Semi-automatic image annotation. In INTERACT2001, 8th IFIP TC.13 Conference on Human-Computer Interaction, Tokyo, Japan July 9-13, 2001.

Cited By

Lyasmine AIdir FSamia B(2025)Plant leaf image segmentation in natural scenes: a multi-layer graph queries propagation approachPattern Analysis & Applications10.1007/s10044-024-01380-y28:1Online publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1007/s10044-024-01380-y
Baptista ABrière GBaudot A(2024)Random walk with restart on multilayer networks: from node prioritisation to supervised link prediction and beyondBMC Bioinformatics10.1186/s12859-024-05683-z25:1Online publication date: 14-Feb-2024
https://doi.org/10.1186/s12859-024-05683-z
Liu ZQiu RZeng ZZhu YHamann HTong HBaeza-Yates RBonchi F(2024)AIM: Attributing, Interpreting, Mitigating Data UnfairnessProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671797(2014-2025)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671797
Show More Cited By

Index Terms

Automatic multimedia cross-modal correlation discovery
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

A new approach to cross-modal multimedia retrieval
MM '10: Proceedings of the 18th ACM international conference on Multimedia

The problem of joint modeling the text and image components of multimedia documents is studied. The text component is represented as a sample from a hidden topic model, learned with latent Dirichlet allocation, and images are represented as bags of ...
Cross-modal correlation learning for clustering on image-audio dataset
MM '07: Proceedings of the 15th ACM international conference on Multimedia

It is interesting and challenging to explore correlations between different datasets and utilize such correlations for the clustering on these datasets. Cross-modal correlation between images and audios can help identify images (or audios) of certain ...
Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval

Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multimedia retrieval, with the aim of learning joint representations between different data modalities. Unfortunately, little research focuses on cross-modal ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

August 2004

874 pages

ISBN:1581138881

DOI:10.1145/1014052

General Chairs:
Won Kim
Cyber Database Solutions
,
Ronny Kohavi
Amazon.com
,
Program Chairs:
Johannes Gehrke
Cornell University
,
William DuMouchel
AT&T Labs Research

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD04

Sponsor:

KDD04: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 22 - 25, 2004

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

346
Total Citations
View Citations
1,922
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)5

Reflects downloads up to 07 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lyasmine AIdir FSamia B(2025)Plant leaf image segmentation in natural scenes: a multi-layer graph queries propagation approachPattern Analysis & Applications10.1007/s10044-024-01380-y28:1Online publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1007/s10044-024-01380-y
Baptista ABrière GBaudot A(2024)Random walk with restart on multilayer networks: from node prioritisation to supervised link prediction and beyondBMC Bioinformatics10.1186/s12859-024-05683-z25:1Online publication date: 14-Feb-2024
https://doi.org/10.1186/s12859-024-05683-z
Liu ZQiu RZeng ZZhu YHamann HTong HBaeza-Yates RBonchi F(2024)AIM: Attributing, Interpreting, Mitigating Data UnfairnessProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671797(2014-2025)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671797
Ma RHuang JJiang TMa W(2024)A Mini-Review of Single-Cell Hi-C Embedding MethodsComputational and Structural Biotechnology Journal10.1016/j.csbj.2024.11.002Online publication date: Nov-2024
https://doi.org/10.1016/j.csbj.2024.11.002
Kassimi ARiffi JEl Fazazy KGardelle TMouncif HMahraz MYahyaouy ATairi H(2024)1D CNNs and Face-Based Random Walks: A Powerful Combination to Enhance Mesh Understanding and 3D Semantic SegmentationComputer Aided Geometric Design10.1016/j.cagd.2024.102379(102379)Online publication date: Aug-2024
https://doi.org/10.1016/j.cagd.2024.102379
Yadav RTripathi SRai A(2024)Finding future associations in complex networks using multiple network featuresThe Journal of Supercomputing10.1007/s11227-024-06544-581:1Online publication date: 19-Oct-2024
https://doi.org/10.1007/s11227-024-06544-5
Chen QChen Q(2024)Association Analysis: Basic Concepts and AlgorithmsAssociation Analysis Techniques and Applications in Bioinformatics10.1007/978-981-99-8251-6_2(9-53)Online publication date: 26-Apr-2024
https://doi.org/10.1007/978-981-99-8251-6_2
Ma YChu ZLiu HZhang YLiu CLi DHe W(2023)Strong and Weak Supervision Combined with CLIP for Water Surface Garbage DetectionWater10.3390/w1517315615:17(3156)Online publication date: 4-Sep-2023
https://doi.org/10.3390/w15173156
Yang YHan BRan ZGao MWei Y(2023)Attributed Graph Embedding with Random Walk Regularization and Centrality-Based AttentionMathematics10.3390/math1108183011:8(1830)Online publication date: 12-Apr-2023
https://doi.org/10.3390/math11081830
Goldsmith MSaarinen HGarcía-Pérez GMalmi JRossi MManiscalco S(2023)Link Prediction with Continuous-Time Classical and Quantum WalksEntropy10.3390/e2505073025:5(730)Online publication date: 28-Apr-2023
https://doi.org/10.3390/e25050730
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents