skip to main content
10.1145/2247596.2247650acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Finding top-k similar graphs in graph databases

Published: 27 March 2012 Publication History

Abstract

Querying similar graphs in graph databases has been widely studied in graph query processing in recent years. Existing works mainly focus on subgraph similarity search and supergraph similarity search. In this paper, we study the problem of finding top-k graphs in a graph database that are most similar to a query graph. This problem has many applications, such as image retrieval and chemical compound structure search. Regarding the similarity measure, feature based and kernel based similarity measures have been used in the literature. But such measures are rough and may lose the connectivity information among substructures. In this paper, we introduce a new similarity measure based on the maximum common subgraph (MCS) of two graphs. We show that this measure can better capture the common and different structures of two graphs. Since computing the MCS of two graphs is NP-hard, we propose an algorithm to answer the top-k graph similarity query using two distance lower bounds with different computational costs, in order to reduce the number of MCS computations. We further introduce an indexing technique, which can better make use of the triangle property of similarities among graphs in the database to get tighter lower bounds. Three different indexing methods are proposed with different tradeoffs between pruning power and construction cost. We conducted extensive performance studies on large real datasets to evaluate the performance of our approaches.

References

[1]
F. N. Abu-Khzam, N. F. Samatova, M. A. Rizk, and M. A. Langston. The maximum common subgraph problem: Faster solutions via vertex cover. In AICCSA, pages 367--373, 2007.
[2]
E. Balas and C. S. Yu. Finding a maximum clique in an arbitrary graph. SIAM J. Comput., 15(4):1054--1068, 1986.
[3]
I. M. Bomze, M. Budinich, M. Pelillo, and C. Rossi. Annealed replication: a new heuristic for the maximum clique problem. Discrete Applied Mathematics, 121(1--3):27--49, 2002.
[4]
H. Bunke and K. Shearer. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters, 19(3--4):255--259, 1998.
[5]
C. Chen, X. Yan, P. S. Yu, J. Han, D.-Q. Zhang, and X. Gu. Towards graph containment search and indexing. In VLDB, pages 926--937, 2007.
[6]
J. Cheng, Y. Ke, A. W.-C. Fu, and J. X. Yu. Fast graph query processing with a low-cost index. The VLDB Journal, pages 1--19, 2010.
[7]
J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards verification-free query processing on graph databases. In SIGMOD, pages 857--872, 2007.
[8]
W. Han, J. Lee, M. Pham, and J. Yu. igraph: a framework for comparisons of disk-based graph indexing techniques. PVLDB, 3(1--2):449--459, 2010.
[9]
H. He and A. Singh. Closure-tree: An index structure for graph queries. In ICDE, pages 38--38, 2006.
[10]
T. Horváth, T. Gärtner, and S. Wrobel. Cyclic pattern kernels for predictive graph mining. In KDD, pages 158--167, 2004.
[11]
H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In ICML, pages 321--328, 2003.
[12]
E. B. Krissinel and K. Henrick. Common subgraph isomorphism detection by backtracking search. Softw., Pract. Exper., 34(6):591--607, 2004.
[13]
J. McGregor. Backtrack search algorithms and the maximal common subgraph problem. Softw., Pract. Exper., 12(1):23--34, 1982.
[14]
J. Raymond, E. Gardiner, and P. Willett. Rascal: Calculation of graph similarity using maximum common edge subgraphs. The Computer Journal, 45(6):631, 2002.
[15]
H. Shang, X. Lin, Y. Zhang, J. X. Yu, and W. Wang. Connected substructure similarity search. In SIGMOD, pages 903--914, 2010.
[16]
H. Shang, Y. Zhang, X. Lin, and J. X. Yu. Taming verification hardness: an efficient algorithm for testing subgraph isomorphism. PVLDB, 1(1):364--375, 2008.
[17]
H. Shang, K. Zhu, X. Lin, Y. Zhang, and R. Ichise. Similarity search on supergraph containment. In ICDE, pages 637--648, 2010.
[18]
D. Shasha, J. T.-L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In PODS, pages 39--52, 2002.
[19]
W. Suters, F. Abu-Khzam, Y. Zhang, C. Symons, N. Samatova, and M. Langston. A new approach and faster exact methods for the maximum common subgraph problem. Computing and combinatorics, pages 717--727, 2005.
[20]
X. Wang, A. M. Smalter, J. Huan, and G. H. Lushington. G-hash: towards fast kernel-based similarity search in large graph databases. In EDBT, pages 472--480, 2009.
[21]
P. Willett, J. Barnard, and G. Downs. Chemical similarity searching. Journal of Chemical Information and Computer Sciences, 38(6):983--996, 1998.
[22]
D. W. Williams, J. Huan, and W. Wang. Graph database indexing using structured graph decomposition. In ICDE, pages 976--985, 2007.
[23]
X. Yan, P. Yu, and J. Han. Substructure similarity search in graph databases. In SIGMOD, pages 766--777, 2005.
[24]
X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD, pages 335--346, 2004.
[25]
S. Zhang, M. Hu, and J. Yang. Treepi: A novel graph indexing method. In ICDE, pages 966--975, 2007.
[26]
S. Zhang, J. Li, H. Gao, and Z. Zou. A novel approach for efficient supergraph query processing on graph databases. In EDBT, pages 204--215, 2009.
[27]
P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: Tree + delta >= graph. In VLDB, pages 938--949, 2007.
[28]
L. Zou, L. Chen, J. X. Yu, and Y. Lu. A novel spectral coding in a large graph database. In EDBT, pages 181--192, 2008.

Cited By

View all
  • (2024)Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and ValidationJournal of Medical Internet Research10.2196/5265526(e52655)Online publication date: 30-May-2024
  • (2024)Neural Similarity Search on Supergraph ContainmentIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327992036:1(281-295)Online publication date: Jan-2024
  • (2022)LAN: Learning-based Approximate k-Nearest Neighbor Search in Graph Databases2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00233(2508-2521)Online publication date: May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '12: Proceedings of the 15th International Conference on Extending Database Technology
March 2012
643 pages
ISBN:9781450307901
DOI:10.1145/2247596
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 March 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph database
  2. similarity search
  3. top-k

Qualifiers

  • Research-article

Funding Sources

Conference

EDBT '12

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and ValidationJournal of Medical Internet Research10.2196/5265526(e52655)Online publication date: 30-May-2024
  • (2024)Neural Similarity Search on Supergraph ContainmentIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327992036:1(281-295)Online publication date: Jan-2024
  • (2022)LAN: Learning-based Approximate k-Nearest Neighbor Search in Graph Databases2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00233(2508-2521)Online publication date: May-2022
  • (2021)Top-k Graph Similarity Search Based on Hierarchical Inverted Index2021 11th International Conference on Information Science and Technology (ICIST)10.1109/ICIST52614.2021.9440632(422-431)Online publication date: 21-May-2021
  • (2021)Collaborative filtering over evolution provenance data for interactive visual data explorationInformation Systems10.1016/j.is.2020.10162095(101620)Online publication date: Jan-2021
  • (2020)Graph-Based Node Finding in Big Complex Contextual Social GraphsComplexity10.1155/2020/79098262020Online publication date: 26-Feb-2020
  • (2020)Deep Neural Matching Models for Graph RetrievalProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401216(1701-1704)Online publication date: 25-Jul-2020
  • (2020)Answering Top-$k$ Graph Similarity Queries in Graph DatabasesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.290660832:8(1459-1474)Online publication date: 1-Aug-2020
  • (2020)Strong Social Graph Based Trust-Oriented Graph Pattern Matching With Multiple ConstraintsIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2019.29204044:5(675-685)Online publication date: Oct-2020
  • (2020)Searching Correlated Patterns From Graph StreamsIEEE Access10.1109/ACCESS.2020.29647858(106690-106704)Online publication date: 2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media