skip to main content
10.1145/1281192.1281271acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Fast best-effort pattern matching in large attributed graphs

Published:12 August 2007Publication History

ABSTRACT

We focus on large graphs where nodes have attributes, such as a social network where the nodes are labelled with each person's job title. In such a setting, we want to find subgraphs that match a user query pattern. For example, a "star" query would be, "find a CEO who has strong interactions with a Manager, a Lawyer,and an Accountant, or another structure as close to that as possible". Similarly, a "loop" query could help spot a money laundering ring.

Traditional SQL-based methods, as well as more recent graph indexing methods, will return no answer when an exact match does not exist. This is the first main feature of our method. It can find exact-, as well as near-matches, and it will present them to the user in our proposed "goodness" order. For example, our method tolerates indirect paths between, say, the "CEO" and the "Accountant" of the above sample query, when direct paths don't exist. Its second feature is scalability. In general, if the query has nq nodes and the data graph has n nodes, the problem needs polynomial time complexity O(n n q), which is prohibitive. Our G-Ray ("Graph X-Ray") method finds high-quality subgraphs in time linear on the size of the data graph.

Experimental results on the DLBP author-publication graph (with 356K nodes and 1.9M edges) illustrate both the effectiveness and scalability of our approach. The results agree with our intuition, and the speed is excellent. It takes 4 seconds on average fora 4-node query on the DBLP graph.

References

  1. B. Aleman-Meza, C. Halaschek-Wiener, S. Sahoo, A. Sheth, and I. Arpinar. Lecture Notes in Computer Science, volume 3495, chapter Template Based Semantic Similarity for Security Applications, pages 621-622. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using banks. In ICDE '02: Proceedings of the 18th International Conference on Data Engineering, pages 431--440, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Blau, N. Immerman, and D. Jensen. A visual language for querying and updating graphs. Technical Report 2002-037, Department of Computer Science, University of Massacheusetts, Amherst, 2002.Google ScholarGoogle Scholar
  4. S. Chakrabarti. Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kauffman, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Coffman, S. Greenblatt, and S. Marcus. Graph-based technologies for intelligence analysis. Communications of the ACM, Special Issue on Emerging Technologies for Homeland Security, 47(3):45--47, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. J. Cook and L. B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research (JAIR), 1:231--255, 1994.Google ScholarGoogle Scholar
  7. C. Faloutsos, K. S. McCurley, and A. Tomkins. Fast discovery of connection subgraphs. In KDD '04: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 118--Ü127, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Fogaras and B. Racz. Towards scaling fully personalized pagerank. In Proc. WAW, pages 105--117, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  9. B. Gallagher. Matching structure and semantics: A survey on graph-based pattern matching. In AAAI FS '06: Papers from the 2006 AAAI Fall Symposium on Capturing and Using Patterns for Evidence Detection, pages 45--53, 2006.Google ScholarGoogle Scholar
  10. N. Guarino, C. Masolo, and G. Vetere. Ontoseek: Content-based access to the web. IEEE Intelligent Systems, 14(3):70--80, 1999. Research Track Paper Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Jin, C. Wang, D. Polshakov, S. Parthasarathy, and G. Agrawal. Discovering frequent topological structures from graph datasets. In KDD '05: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 606--611, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Kamvar, T. Haveliwala, C. Manning, and G. Golub. Exploiting the block structure of the web for computing pagerank. In Stanford University Technical Report, 2003.Google ScholarGoogle Scholar
  13. Y. Koren, S. North, and C. Volinsky. Measuring and extracting proximity in networks. In KDD '06: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 245--255, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. Koudas, C. Li, A. Tung, and R. Vernica. Relaxing join and selection queries. In VLDB '06: Proceedings of the 32nd International Conference on Very Large Data Bases, pages 199--210, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery, 11(3):243--271, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J.-Y. Pan, H.-J. Yang, C. Faloutsos, and P. Duygulu. Automatic multimedia cross-modal correlation discovery. In KDD, pages 653--658, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Pei, D. Jiang, and A. Zhang. On mining cross-graph quasi-cliques. In KDD '05: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Shapiro and R. Haralick. Structural descriptions and inexact matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 3:504--519, 1981.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Tong and C. Faloutsos. Center-piece subgraphs: Problem definition and fast solutions. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 404--413, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Tong, C. Faloutsos, and J.-Y. Pan. Fast random walk with restart and its applications. In ICDM '06: Proceedings of the 6th IEEE International Conference on Data Mining, pages 613--622, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W.-H. Tsai and K.-S. Fu. Error-correcting isomorphisms of attributed relational graphs for pattern analysis. IEEE Transactions on Systems, Man and Cybernetics, 9:757--768, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  22. M. Wolverton, P. Berry, I. W. Harrison, J. D. Lowrance, D. Morley, A. C. Rodriguez, E. H. Ruspini, and J. Thoméré. LAW: A workbench for approximate pattern matching in relational data. In IAAI '03: Proceedings of the Fifteenth Conference on Innovative Applications of Artificial Intelligence, pages 143--150, 2003.Google ScholarGoogle Scholar
  23. X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM '02: Proceedings of the 2nd IEEE International Conference on Data Mining, pages 721Ü--724, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. X. Yan, P. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In ICDM '04: Proceedings of the 4th International Conference on Data Mining, pages 335--346, 2004. Research Track Paper Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast best-effort pattern matching in large attributed graphs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
        August 2007
        1080 pages
        ISBN:9781595936097
        DOI:10.1145/1281192

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 August 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        KDD '07 Paper Acceptance Rate111of573submissions,19%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader