skip to main content
10.1145/1014052.1014135acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Automatic multimedia cross-modal correlation discovery

Published: 22 August 2004 Publication History

Abstract

Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph-based approach, "MMG", to discover such cross-modal correlations.Our "MMG" method requires no tuning, no clustering, no user-determined constants; it can be applied to any multimedia collection, as long as we have a similarity function for each medium; and it scales linearly with the database size. We report auto-captioning experiments on the "standard" Corel image database of 680 MB, where it outperforms domain specific, fine-tuned methods by up to 10 percentage points in captioning accuracy (50% relative improvement).

References

[1]
A. Albert, H. Jeong, and A.-L. Barabasi. Diameter of the world wide web. Nature, 401:130--131, 1999.
[2]
K. Barnard, P. Duygulu, N. de Freitas, D. A. Forsyth, D. B. lei, and M. Jordan. Matching words and pictures. Journal of Machine Learning Research, 3:1107--1135, 2003.
[3]
K. Barnard, P. Duygulu, and D. A. Forsyth. Clustering art. In IEEE Conf. on Computer Vision and Pattern Recognition, volume 2, pages 434--441, 2001.
[4]
K. Barnard and D. A. Forsyth. Learning the semantics of words and pictures. In Int. Conf. on Computer Vision, pages 408--15, 2001.
[5]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International World Wide Web Conference, 1998.
[6]
P. G. Doyle and J. L. Snell. Random Walks and Electric Networks. Kluwer.
[7]
P. Duygulu, K. Barnard, N. Freitas, and D. A. Forsyth. Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In Seventh European Conference on Computer Vision (ECCV), volume 4, pages 97--112, 2002.
[8]
C. Faloutsos. Searching Multimedia Databases by Content. Kluwer, 1996.
[9]
T. H. Haveliwala. Topic-sensitive PageRank. In WWW2002, May 7-11 2002.
[10]
J. Jeon, V. Lavrenko, and R. Manmatha. Automatic image annotation and retrieval using cross-media relevance models. In 26th Annual International ACM SIGIR Conference, July 28-August 1, 2003, Toronto, Canada.
[11]
J. Kleinberg. Authoritative sources in a hyperlinked environment. In Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.
[12]
T. G. Kolda and D. P. O'Leary. A semidiscrete matrix decomposition for latent semantic indexing information retrieval. ACM Transactions on Information Systems, 16(4):322--346, 1998.
[13]
J. Li and J. Z. Wang. Automatic linguistic indexing of pictures by a statistical modeling app roach. IEEE Trans. on Pattern Analysis and Machine Intelligence, 25(10):14, 2003.
[14]
L. Lovasz. Random walks on graphs: A survey. Combinatorics, Paul Erdos is Eighty, 2:353--398, 1996.
[15]
O. Maron and A. L. Ratan. Multiple-instance learning for natural scene classification. In The Fifteenth International Conference on Machine Learning, 1998.
[16]
Y. Mori, H. Takahashi, and R. Oka. Image-to-word transformation based on dividing and vector quantizing images with words. In First International Workshop on Multimedia Intelligent Storage and Retrieval Management, 1999.
[17]
C. R. Palmer and C. Faloutsos. Electricity based external similarity of categorical attributes. In PAKDD 2003, May 2003.
[18]
J.-Y. Pan and C. Faloutsos. VideoCube: a novel tool for video mining and classification. In Proceedings of the Fifth International Conference on Asian Digital Libraries (ICADL 2002), 2002.
[19]
C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: A probabilistic analysis. In PODS 98, 1998.
[20]
S. Satoh, Y. Nakamura, and T. Kanade. Name-it: Naming and detecting faces in news videos. IEEE Multimedia, 6(1), January-March 1999.
[21]
T. Sellis, N. Roussopoulos, and C. Faloutsos. The R+-tree: A dynamic index for multi-dimensional objects. In 12th International Conf. on VLDB, pages 507--518, Sept. 1987.
[22]
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(8):888--905, 2000.
[23]
Taher Haveliwala, S. Kamvar and G. Jeh. An analytical comparison of approaches to personalizing PageRank. Technical report, Stanford University, 2003.
[24]
G. Tzanetakis and P. Cook. MARSYAS: A framework for audio analysis. Organized Sound, 4(3), 2000.
[25]
H. Wactlar, M. Christel, Y. Gong, and A. Hauptmann. Lessons learned from the creation and deployment of a terabyte digital video library. IEEE Computer, 32(2):66--73, February 1999.
[26]
L. Wenyin, S. Dumais, Y. Sun, H. Zhang, M. Czerwinski, and B. Field. Semi-automatic image annotation. In INTERACT2001, 8th IFIP TC.13 Conference on Human-Computer Interaction, Tokyo, Japan July 9-13, 2001.

Cited By

View all
  • (2025)Plant leaf image segmentation in natural scenes: a multi-layer graph queries propagation approachPattern Analysis & Applications10.1007/s10044-024-01380-y28:1Online publication date: 1-Mar-2025
  • (2024)Random walk with restart on multilayer networks: from node prioritisation to supervised link prediction and beyondBMC Bioinformatics10.1186/s12859-024-05683-z25:1Online publication date: 14-Feb-2024
  • (2024)AIM: Attributing, Interpreting, Mitigating Data UnfairnessProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671797(2014-2025)Online publication date: 25-Aug-2024
  • Show More Cited By

Index Terms

  1. Automatic multimedia cross-modal correlation discovery

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2004
    874 pages
    ISBN:1581138881
    DOI:10.1145/1014052
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 August 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. automatic image captioning
    2. cross-modal correlation
    3. graph-based model

    Qualifiers

    • Article

    Conference

    KDD04

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 07 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Plant leaf image segmentation in natural scenes: a multi-layer graph queries propagation approachPattern Analysis & Applications10.1007/s10044-024-01380-y28:1Online publication date: 1-Mar-2025
    • (2024)Random walk with restart on multilayer networks: from node prioritisation to supervised link prediction and beyondBMC Bioinformatics10.1186/s12859-024-05683-z25:1Online publication date: 14-Feb-2024
    • (2024)AIM: Attributing, Interpreting, Mitigating Data UnfairnessProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671797(2014-2025)Online publication date: 25-Aug-2024
    • (2024)A Mini-Review of Single-Cell Hi-C Embedding MethodsComputational and Structural Biotechnology Journal10.1016/j.csbj.2024.11.002Online publication date: Nov-2024
    • (2024)1D CNNs and Face-Based Random Walks: A Powerful Combination to Enhance Mesh Understanding and 3D Semantic SegmentationComputer Aided Geometric Design10.1016/j.cagd.2024.102379(102379)Online publication date: Aug-2024
    • (2024)Finding future associations in complex networks using multiple network featuresThe Journal of Supercomputing10.1007/s11227-024-06544-581:1Online publication date: 19-Oct-2024
    • (2024)Association Analysis: Basic Concepts and AlgorithmsAssociation Analysis Techniques and Applications in Bioinformatics10.1007/978-981-99-8251-6_2(9-53)Online publication date: 26-Apr-2024
    • (2023)Strong and Weak Supervision Combined with CLIP for Water Surface Garbage DetectionWater10.3390/w1517315615:17(3156)Online publication date: 4-Sep-2023
    • (2023)Attributed Graph Embedding with Random Walk Regularization and Centrality-Based AttentionMathematics10.3390/math1108183011:8(1830)Online publication date: 12-Apr-2023
    • (2023)Link Prediction with Continuous-Time Classical and Quantum WalksEntropy10.3390/e2505073025:5(730)Online publication date: 28-Apr-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media