ABSTRACT
We focus on large graphs where nodes have attributes, such as a social network where the nodes are labelled with each person's job title. In such a setting, we want to find subgraphs that match a user query pattern. For example, a "star" query would be, "find a CEO who has strong interactions with a Manager, a Lawyer,and an Accountant, or another structure as close to that as possible". Similarly, a "loop" query could help spot a money laundering ring.
Traditional SQL-based methods, as well as more recent graph indexing methods, will return no answer when an exact match does not exist. This is the first main feature of our method. It can find exact-, as well as near-matches, and it will present them to the user in our proposed "goodness" order. For example, our method tolerates indirect paths between, say, the "CEO" and the "Accountant" of the above sample query, when direct paths don't exist. Its second feature is scalability. In general, if the query has nq nodes and the data graph has n nodes, the problem needs polynomial time complexity O(n n q), which is prohibitive. Our G-Ray ("Graph X-Ray") method finds high-quality subgraphs in time linear on the size of the data graph.
Experimental results on the DLBP author-publication graph (with 356K nodes and 1.9M edges) illustrate both the effectiveness and scalability of our approach. The results agree with our intuition, and the speed is excellent. It takes 4 seconds on average fora 4-node query on the DBLP graph.
- B. Aleman-Meza, C. Halaschek-Wiener, S. Sahoo, A. Sheth, and I. Arpinar. Lecture Notes in Computer Science, volume 3495, chapter Template Based Semantic Similarity for Security Applications, pages 621-622. Springer, 2005. Google ScholarDigital Library
- G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using banks. In ICDE '02: Proceedings of the 18th International Conference on Data Engineering, pages 431--440, 2002. Google ScholarDigital Library
- H. Blau, N. Immerman, and D. Jensen. A visual language for querying and updating graphs. Technical Report 2002-037, Department of Computer Science, University of Massacheusetts, Amherst, 2002.Google Scholar
- S. Chakrabarti. Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kauffman, 2002. Google ScholarDigital Library
- T. Coffman, S. Greenblatt, and S. Marcus. Graph-based technologies for intelligence analysis. Communications of the ACM, Special Issue on Emerging Technologies for Homeland Security, 47(3):45--47, 2004. Google ScholarDigital Library
- D. J. Cook and L. B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research (JAIR), 1:231--255, 1994.Google Scholar
- C. Faloutsos, K. S. McCurley, and A. Tomkins. Fast discovery of connection subgraphs. In KDD '04: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 118--Ü127, 2004. Google ScholarDigital Library
- D. Fogaras and B. Racz. Towards scaling fully personalized pagerank. In Proc. WAW, pages 105--117, 2004.Google ScholarCross Ref
- B. Gallagher. Matching structure and semantics: A survey on graph-based pattern matching. In AAAI FS '06: Papers from the 2006 AAAI Fall Symposium on Capturing and Using Patterns for Evidence Detection, pages 45--53, 2006.Google Scholar
- N. Guarino, C. Masolo, and G. Vetere. Ontoseek: Content-based access to the web. IEEE Intelligent Systems, 14(3):70--80, 1999. Research Track Paper Google ScholarDigital Library
- R. Jin, C. Wang, D. Polshakov, S. Parthasarathy, and G. Agrawal. Discovering frequent topological structures from graph datasets. In KDD '05: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 606--611, 2005. Google ScholarDigital Library
- S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Exploiting the block structure of the web for computing pagerank. In Stanford University Technical Report, 2003.Google Scholar
- Y. Koren, S. North, and C. Volinsky. Measuring and extracting proximity in networks. In KDD '06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 245--255, 2006. Google ScholarDigital Library
- N. Koudas, C. Li, A. Tung, and R. Vernica. Relaxing join and selection queries. In VLDB '06: Proceedings of the 32nd International Conference on Very Large Data Bases, pages 199--210, 2006. Google ScholarDigital Library
- M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery, 11(3):243--271, 2005. Google ScholarDigital Library
- J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. Automatic multimedia cross-modal correlation discovery. In KDD, pages 653--658, 2004. Google ScholarDigital Library
- J. Pei, D. Jiang, and A. Zhang. On mining cross-graph quasi-cliques. In KDD '05: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005. Google ScholarDigital Library
- L. Shapiro and R. Haralick. Structural descriptions and inexact matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 3:504--519, 1981.Google ScholarDigital Library
- H. Tong and C. Faloutsos. Center-piece subgraphs: Problem definition and fast solutions. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 404--413, 2006. Google ScholarDigital Library
- H. Tong, C. Faloutsos, and J.-Y. Pan. Fast random walk with restart and its applications. In ICDM '06: Proceedings of the 6th IEEE International Conference on Data Mining, pages 613--622, 2006. Google ScholarDigital Library
- W.-H. Tsai and K.-S. Fu. Error-correcting isomorphisms of attributed relational graphs for pattern analysis. IEEE Transactions on Systems, Man and Cybernetics, 9:757--768, 1979.Google ScholarCross Ref
- M. Wolverton, P. Berry, I. W. Harrison, J. D. Lowrance, D. Morley, A. C. Rodriguez, E. H. Ruspini, and J. Thoméré. LAW: A workbench for approximate pattern matching in relational data. In IAAI '03: Proceedings of the Fifteenth Conference on Innovative Applications of Artificial Intelligence, pages 143--150, 2003.Google Scholar
- X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM '02: Proceedings of the 2nd IEEE International Conference on Data Mining, pages 721Ü--724, 2002. Google ScholarDigital Library
- X. Yan, P. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In ICDM '04: Proceedings of the 4th International Conference on Data Mining, pages 335--346, 2004. Research Track Paper Google ScholarDigital Library
Index Terms
Fast best-effort pattern matching in large attributed graphs
Recommendations
A fast deterministic detection of small pattern graphs in graphs without large cliques
AbstractWe show that for several pattern graphs on four vertices (e.g., C 4), their induced copies in host graphs with n vertices and no clique on k + 1 vertices can be deterministically detected in O ( n 2.5719 k 0.3176 + n 2 k 2 ) time for k ...
Answering pattern match queries in large graph databases via graph embedding
The growing popularity of graph databases has generated interesting data management problems, such as subgraph search, shortest path query, reachability verification, and pattern matching. Among these, a pattern match query is more flexible compared ...
Minimal 2-matching-covered graphs
A perfect 2-matching M of a graph G is a spanning subgraph of G such that each component of M is either an edge or a cycle. A graph G is said to be 2-matching-covered if every edge of G lies in some perfect 2-matching of G. A 2-matching-covered graph is ...
Comments