skip to main content
10.1145/1007568.1007607acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Graph indexing: a frequent structure-based approach

Published: 13 June 2004 Publication History

Abstract

Graph has become increasingly important in modelling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via graph-based indices. In this paper, we investigate the issues of indexing graphs and propose a novel solution by applying a graph mining technique. Different from the existing path-based methods, our approach, called gIndex, makes use of frequent substructure as the basic indexing feature. Frequent substructures are ideal candidates since they explore the intrinsic characteristics of the data and are relatively stable to database updates. To reduce the size of index structure, two techniques, size-increasing support constraint and discriminative fragments, are introduced. Our performance study shows that gIndex has 10 times smaller index size, but achieves 3--10 times better performance in comparison with a typical path-based method, GraphGrep. The gIndex approach not only provides and elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit form data mining, especially frequent pattern mining. Furthermore, the concepts developed here can be applied to indexing sequences, trees, and other complicated structures as well.

References

[1]
S. Beretti, A. Del Bimbo, and E. Vicario. Efficient matching and indexing of graph models in content-based retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23:1089--1105, 2001.]]
[2]
C. Borgelt and M. R. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 211--218, Maebashi, Japan, Dec. 2002.]]
[3]
Q. Chen, A. Lim, and K. W. Ong. D(k)-index: An adaptive structural summary for graph-structured data. In Proc. 2003 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD '03), pages 134--144. San Diego, CA, June 2003.]]
[4]
C. Chung, J. Min, and K. Shim. Apex: An adaptive path index for xml data. In Proc. 2002 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD '02), pages 121--132, Madison, WI, June 2002.]]
[5]
B. Cooper, N. Sample, M. J. Franklin, G. R. Hjaltason, and M. Shadmon. A fast index for semistructured data. In Proc. 2001 Int. Conf. Very Large Data Bases (VLDB '01), pages 341--350, 2001.]]
[6]
R. Goldman and J. Widom. Dataguides: Enabling query formulation and optimization in semistructured databases. In Proc. 1997 Int. Conf. Very Large Data Bases (VLDB '97), pages 436--445, 1997.]]
[7]
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. 2000 European Symp. Principle of Data Mining and Knowledge Discovery (PKDD'00), pages 13--23, Lyon, France, Sept. 1998.]]
[8]
C. A. James, D. Weininger, and J. Delany. Daylight theory manual daylight version 4.82. Daylight Chemical Information Systems, Inc, 2003.]]
[9]
R. Kaushik P. Shenoy, P. Bohannon, and E. Gudes. Exploiting local similarity for efficient indexing of paths in graph structured data. In Proc. 2000 Int. Conf. Data Engineering ICDE'00), San Jose, CA, Feb. 2002.]]
[10]
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. 2001 Int. Conf. Data Mining (ICDM'01), pages 313--320, San Jose, CA, Nov. 2001.]]
[11]
T. Madej, J. F. Gibrat, and S. H. Bryant. Threading a database of protein cores. Proteins, 3-2:289--306, 1995.]]
[12]
T. Milo and D. Suciu. Index structures for path expressions. Lecture Notes in Computer Science, 1540:277--295, 1999.]]
[13]
E. G. M. Petrakis and C. Faloutsos. Similarity searching in medical image databases. Knowledge and Data Engineering, 9(3):435--447, 1997.]]
[14]
D. Shasha, J. T-L Wang, and R. Guigno. Algorithmics and applications of tree and graph searching. In Proc. 21th ACM Symp. Principles of Database Systems (PODS'02), pages 39--52, Madison, WI, Jun. 2002.]]
[15]
A. Shokoufandeh, S. J. Dickinson, K. Siddiqi, and S. W. Zucker. Indexing using a spectral encoding of topological structure. In Proc. IEEE Int'l Conf Computer Vision and Pattern Recognition (CVPR'99), Fort Collins, CO, Jun. 1999.]]
[16]
S. Srinivasa and S. Kumar. A platform based on the multi-dimensional data model for analysis of bio-molecular structures. In Proc. 2003 Int. Conf. Very Large Data Bases (VLDB'03), 2003.]]
[17]
N. Vanetik, E. Gudes, and S. E. Shimony. Computing frequent graph patterns from semistructured data. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 458--465, Maebashi, Japan, Dec, 2002.]]
[18]
T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explorations, 5:59--68, 2003.]]
[19]
H. J. Wolfson and I. Rigoutsos. Geometric hashing: An introduction. IEEE Computational Science and Engineering, 4:10--21, 1997.]]
[20]
X. Yan and J. Han, gSpan: Graph-based substructure pattern mining. In Proc. 2002 Int. Conf. on Data Mining (ICDM'02), pages 721--724, Maebashi, Japan, Dec. 2002.]]
[21]
X. Yan and J. Han. CloseGraph: Mining closed frequent graph patterns. In Proc. 2003 Int. Conf. Knowledge Discovery and Data Mining (KDD'03), pages 286--295, Washington, D.C., Aug. 2003.]]
[22]
M. J. Zaki and K. Gouda. Fast vertical mining using diffsets. In Proc. 2003 Int. Conf. Knowledge Discovery and Data Mining (KDD'03), pages 326--335, Washington, D.C, Aug. 2003.]]

Cited By

View all
  • (2024)LESS: Low-Power Energy-Efficient Subgraph Isomorphism on FPGA2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546632(1-2)Online publication date: 25-Mar-2024
  • (2024)CAVE: Concurrency-Aware Graph Processing on SSDsProceedings of the ACM on Management of Data10.1145/36549282:3(1-26)Online publication date: 30-May-2024
  • (2024)A Comprehensive Survey and Experimental Study of Subgraph Matching: Trends, Unbiasedness, and InteractionProceedings of the ACM on Management of Data10.1145/36393152:1(1-29)Online publication date: 26-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data
June 2004
988 pages
ISBN:1581138598
DOI:10.1145/1007568
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2004

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS04
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)77
  • Downloads (Last 6 weeks)8
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)LESS: Low-Power Energy-Efficient Subgraph Isomorphism on FPGA2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546632(1-2)Online publication date: 25-Mar-2024
  • (2024)CAVE: Concurrency-Aware Graph Processing on SSDsProceedings of the ACM on Management of Data10.1145/36549282:3(1-26)Online publication date: 30-May-2024
  • (2024)A Comprehensive Survey and Experimental Study of Subgraph Matching: Trends, Unbiasedness, and InteractionProceedings of the ACM on Management of Data10.1145/36393152:1(1-29)Online publication date: 26-Mar-2024
  • (2024)Neural Similarity Search on Supergraph ContainmentIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327992036:1(281-295)Online publication date: Jan-2024
  • (2024)IVE: Accelerating Enumeration-Based Subgraph Matching via Exploring Isolated Vertices2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00321(4208-4221)Online publication date: 13-May-2024
  • (2024)Large Subgraph Matching: A Comprehensive and Efficient Approach for Heterogeneous Graphs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00231(2972-2985)Online publication date: 13-May-2024
  • (2024)Authenticated Subgraph Matching in Hybrid-Storage Blockchains2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00159(1986-1998)Online publication date: 13-May-2024
  • (2024)Size-Fixed Group Discovery via Multi-Constrained Graph Pattern MatchingInformation Sciences10.1016/j.ins.2024.121571(121571)Online publication date: Oct-2024
  • (2024)An Experimental Evaluation of Summarisation-Based Frequent Subgraph Mining for Subgraph SearchingSN Computer Science10.1007/s42979-024-03006-w5:6Online publication date: 3-Jul-2024
  • (2024)Optimizing subgraph retrieval and matching with an efficient indexing schemeKnowledge and Information Systems10.1007/s10115-024-02175-766:11(6815-6843)Online publication date: 1-Nov-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media