Article

ANF: a fast and scalable tool for data mining in massive graphs

Authors:
Christopher R. Palmer

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

,
Phillip B. Gibbons

Intel Research Pittsburgh, Pittsburgh, PA

Intel Research Pittsburgh, Pittsburgh, PA
View Profile

,
Christos Faloutsos

Carnegie Mellon University, Pittsburgh, PA

Carnegie Mellon University, Pittsburgh, PA
View Profile

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data miningJuly 2002Pages 81–90https://doi.org/10.1145/775047.775059

Published:23 July 2002Publication History

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 81–90

ABSTRACT

Graphs are an increasingly important data source, with such important graphs as the Internet and the Web. Other familiar graphs include CAD circuits, phone records, gene sequences, city streets, social networks and academic citations. Any kind of relationship, such as actors appearing in movies, can be represented as a graph. This work presents a data mining tool, called ANF, that can quickly answer a number of interesting questions on graph-represented data, such as the following. How robust is the Internet to failures? What are the most influential database papers? Are there gender differences in movie appearance patterns? At its core, ANF is based on a fast and memory-efficient approach for approximating the complete "neighbourhood function" for a graph. For the Internet graph (268K nodes), ANF's highly-accurate approximation is more than 700 times faster than the exact computation. This reduces the running time from nearly a day to a matter of a minute or two, allowing users to perform ad hoc drill-down tasks and to repeatedly answer questions about changing data sources. To enable this drill-down, ANF employs new techniques for approximating neighbourhood-type functions for graphs with distinguished nodes and/or edges. When compared to the best existing approximation, ANF's approach is both faster and more accurate, given the same resources. Additionally, unlike previous approaches, ANF scales gracefully to handle disk resident graphs. Finally, we present some of our results from mining large graphs using ANF.

References

L. A. Adamic. The small world Web. In Proceedings of the European Conf. on Digital Libraries, 1999.]] Google ScholarDigital Library
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998.]] Google ScholarDigital Library
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, and R. Stata. Graph structure in the Web. In Proceedings of the 9th International World Wide Web Conference, pages 247--256, 2000.]] Google ScholarDigital Library
E. Cohen. Size-estimation framework with applications to transitive closure and reachability. Journal of Computer and System Sciences, 55(3):441--453, December 1997.]] Google ScholarDigital Library
Cook and Holder. Graph-based data mining. ISTA: Intelligent Systems & their applications, 15, 2000.]] Google ScholarDigital Library
CORA search engine. http://www.cora.whizbang.com.]]Google Scholar
P. Domingos and M. Richardson. Mining the network value of customers. In KDD-2001, pages 57--66, 2001.]] Google ScholarDigital Library
M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM, 1999.]] Google ScholarDigital Library
P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31:182--209, 1985.]] Google ScholarDigital Library
IMDB. http://www.imdb.com.]]Google Scholar
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PDKK, pages 13--23, 2000.]] Google ScholarDigital Library
http://cs.bell-labs.com/who/ches/map/.]]Google Scholar
S. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. The Web as a graph. In ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 1--10, 2000.]] Google ScholarDigital Library
R. J. Lipton and J. F. Naughton. Estimating the size of generalized transitive closures. In Proceedings of 15th International Conference on Very Large Data Bases, pages 315--326, 1989.]] Google ScholarDigital Library
M. H. Nodine, M. T. Goodrich, and J. S. Vitter. Blocking for external graph searching. In Proc. ACM PODS Conference (PODS-93), pages 222--232, 1993.]] Google ScholarDigital Library
C. R. Palmer, G. Siganos, M. Faloutsos, C. Faloutsos, and P. Gibbons. The connectivity and fault-tolerance of the Internet topology. In Workshop on Network-Related Data Management (NRDM-2001), 2001.]]Google Scholar
C. R. Palmer and J. G. Steffan. Generating network toplogies that obey power laws. In IEEE Globecom 2000, 2000.]]Google Scholar
G. Salton and M. MeGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.]] Google ScholarDigital Library
http://www.isi.edu/scan/mercator/maps.html.]]Google Scholar
S. L. Tauro, C. Palmer, G. Siganos, and M. Faloutsos. A simple conceptual model for the Internet topology. In IEEE Globecom 2001, 2001.]]Google Scholar

Index Terms

ANF: a fast and scalable tool for data mining in massive graphs

Recommendations

On the Multichromatic Number of s-Stable Kneser Graphs

For positive integers n and s, a subset Sï [n] is s-stable if sï |i-j|ï n-s for distinct i,j∈S . The s-stable r-uniform Kneser hypergraph KGrn,ks-stable is the r-uniform hypergraph that has the collection of all s-stable k-element subsets of [n] as ...
Read More
Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes

An adjacent vertex-distinguishing edge coloring of a simple graph G is a proper edge coloring of G such that incident edge sets of any two adjacent vertices are assigned different sets of colors. A total coloring of a graph G is a coloring of both the ...
Read More
Forbidden Subgraphs and Weak Locally Connected Graphs

A graph is called H-free if it has no induced subgraph isomorphic to H. A graph is called $$N^i$$Ni-locally connected if $$G[\{ x\in V(G): 1\le d_G(w, x)\le i\}]$$G[{x?V(G):1≤dG(w,x)≤i}] is connected and $$N_2$$N2-locally connected if $$G[\{uv: \{uw, vw\...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
July 2002
719 pages
ISBN:158113567X
DOI:10.1145/775047
Conference Chair:
Osmar R. Zaïane
University of Alberta, Canada
,
General Chair:
Randy Goebel
University of Alberta, Canada
,
Program Chairs:
David Hand
Imperial College, UK
,
Daniel Keim
AT&T
,
Raymond Ng
University of British Columbia, Canada
Copyright © 2002 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2002
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
KDD '02 Paper Acceptance Rate44of307submissions,14%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 168
  Total Citations
  View Citations
- 1,738
  Total Downloads
- Downloads (Last 12 months)37
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ANF: a fast and scalable tool for data mining in massive graphs

KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

On the Multichromatic Number of s-Stable Kneser Graphs

Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes

Forbidden Subgraphs and Weak Locally Connected Graphs