ABSTRACT
Graphs are an increasingly important data source, with such important graphs as the Internet and the Web. Other familiar graphs include CAD circuits, phone records, gene sequences, city streets, social networks and academic citations. Any kind of relationship, such as actors appearing in movies, can be represented as a graph. This work presents a data mining tool, called ANF, that can quickly answer a number of interesting questions on graph-represented data, such as the following. How robust is the Internet to failures? What are the most influential database papers? Are there gender differences in movie appearance patterns? At its core, ANF is based on a fast and memory-efficient approach for approximating the complete "neighbourhood function" for a graph. For the Internet graph (268K nodes), ANF's highly-accurate approximation is more than 700 times faster than the exact computation. This reduces the running time from nearly a day to a matter of a minute or two, allowing users to perform ad hoc drill-down tasks and to repeatedly answer questions about changing data sources. To enable this drill-down, ANF employs new techniques for approximating neighbourhood-type functions for graphs with distinguished nodes and/or edges. When compared to the best existing approximation, ANF's approach is both faster and more accurate, given the same resources. Additionally, unlike previous approaches, ANF scales gracefully to handle disk resident graphs. Finally, we present some of our results from mining large graphs using ANF.
- L. A. Adamic. The small world Web. In Proceedings of the European Conf. on Digital Libraries, 1999.]] Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998.]] Google ScholarDigital Library
- A. Broder, R. Kumar, F. Maghoul, P. Raghavan, and R. Stata. Graph structure in the Web. In Proceedings of the 9th International World Wide Web Conference, pages 247--256, 2000.]] Google ScholarDigital Library
- E. Cohen. Size-estimation framework with applications to transitive closure and reachability. Journal of Computer and System Sciences, 55(3):441--453, December 1997.]] Google ScholarDigital Library
- Cook and Holder. Graph-based data mining. ISTA: Intelligent Systems & their applications, 15, 2000.]] Google ScholarDigital Library
- CORA search engine. http://www.cora.whizbang.com.]]Google Scholar
- P. Domingos and M. Richardson. Mining the network value of customers. In KDD-2001, pages 57--66, 2001.]] Google ScholarDigital Library
- M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM, 1999.]] Google ScholarDigital Library
- P. Flajolet and G. N. Martin. Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31:182--209, 1985.]] Google ScholarDigital Library
- IMDB. http://www.imdb.com.]]Google Scholar
- A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PDKK, pages 13--23, 2000.]] Google ScholarDigital Library
- http://cs.bell-labs.com/who/ches/map/.]]Google Scholar
- S. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. The Web as a graph. In ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 1--10, 2000.]] Google ScholarDigital Library
- R. J. Lipton and J. F. Naughton. Estimating the size of generalized transitive closures. In Proceedings of 15th International Conference on Very Large Data Bases, pages 315--326, 1989.]] Google ScholarDigital Library
- M. H. Nodine, M. T. Goodrich, and J. S. Vitter. Blocking for external graph searching. In Proc. ACM PODS Conference (PODS-93), pages 222--232, 1993.]] Google ScholarDigital Library
- C. R. Palmer, G. Siganos, M. Faloutsos, C. Faloutsos, and P. Gibbons. The connectivity and fault-tolerance of the Internet topology. In Workshop on Network-Related Data Management (NRDM-2001), 2001.]]Google Scholar
- C. R. Palmer and J. G. Steffan. Generating network toplogies that obey power laws. In IEEE Globecom 2000, 2000.]]Google Scholar
- G. Salton and M. MeGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.]] Google ScholarDigital Library
- http://www.isi.edu/scan/mercator/maps.html.]]Google Scholar
- S. L. Tauro, C. Palmer, G. Siganos, and M. Faloutsos. A simple conceptual model for the Internet topology. In IEEE Globecom 2001, 2001.]]Google Scholar
Index Terms
- ANF: a fast and scalable tool for data mining in massive graphs
Recommendations
On the Multichromatic Number of s-Stable Kneser Graphs
For positive integers n and s, a subset Sï [n] is s-stable if sï |i-j|ï n-s for distinct i,j∈S . The s-stable r-uniform Kneser hypergraph KGrn,ks-stable is the r-uniform hypergraph that has the collection of all s-stable k-element subsets of [n] as ...
Adjacent vertex-distinguishing edge and total chromatic numbers of hypercubes
An adjacent vertex-distinguishing edge coloring of a simple graph G is a proper edge coloring of G such that incident edge sets of any two adjacent vertices are assigned different sets of colors. A total coloring of a graph G is a coloring of both the ...
Forbidden Subgraphs and Weak Locally Connected Graphs
A graph is called H-free if it has no induced subgraph isomorphic to H. A graph is called $$N^i$$Ni-locally connected if $$G[\{ x\in V(G): 1\le d_G(w, x)\le i\}]$$G[{x?V(G):1≤dG(w,x)≤i}] is connected and $$N_2$$N2-locally connected if $$G[\{uv: \{uw, vw\...
Comments