ABSTRACT
With applications in biology, the world-wide web, and several other areas, mining of graph-structured objects has received significant interest recently. One of the major research directions in this field is concerned with predictive data mining in graph databases where each instance is represented by a graph. Some of the proposed approaches for this task rely on the excellent classification performance of support vector machines. To control the computational cost of these approaches, the underlying kernel functions are based on frequent patterns. In contrast to these approaches, we propose a kernel function based on a natural set of cyclic and tree patterns independent of their frequency, and discuss its computational aspects. To practically demonstrate the effectiveness of our approach, we use the popular NCI-HIV molecule dataset. Our experimental results show that cyclic pattern kernels can be computed quickly and offer predictive performance superior to recent graph kernels based on frequent patterns.
- R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, Chapter 12, pages 307 -- 328. AAAI/MIT Press, Cambridge, USA, 1996. Google ScholarDigital Library
- H. Alt, U. Fuchs, and K. Kriegel. On the number of simple cycles in planar graphs. Combinatorics, Probability & Computing, 8(5):397--405, 1999. Google ScholarDigital Library
- Asai, Arimura, Uno, and Nakano. Discovering frequent substructures in large unordered trees. In Proc. of the 6th International Conference on Discovery Science, volume 2843 of LNAI, pages 47--61. Springer Verlag, 2003.Google Scholar
- C. Borgelt and M. R. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In Proc. of the 2002 IEEE International Conference on Data Mining, pages 51--58. IEEE Computer Society, 2002. Google ScholarDigital Library
- B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proc. of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144--152. ACM Press, 1992. Google ScholarDigital Library
- M. Collins and N. Duffy. Convolution kernels for natural language. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems, volume 14. MIT Press, 2002.Google Scholar
- C. Cortes, P. Haffner, and M. Mohri. Positive definite rational kernels. In Learning Theory and Kernel Machines, 16th Annual Conference on Learning Theory and 7th Kernel Workshop, Proceedings. volume 2843 of LNAI, pages 41--56. Springer Verlag, 2003.Google Scholar
- M. Deshpande, M. Kuramochi, and G. Karypis. Automated approaches for classifying structures. In Proc. of the 2nd ACM SIGKDD Workshop on Data Mining in Bioinformatics, 2002.Google ScholarCross Ref
- M. Deshpande, M. Kuramochi, and G. Karypis. Frequent sub-structure based approaches for classifying chemical compounds. In Proc. of the 3rd IEEE International Conference on Data Mining, pages 35--42. IEEE Computer Society, 2003. Google ScholarDigital Library
- R. Diestel. Graph theory. 2nd edition, Springer Verlag, 2000.Google Scholar
- H.-D. Ebbinghaus and J. Flum. Finite Model Theory. Perspectives in Mathematical Logic. 2nd edition, Springer Verlag, 1999.Google Scholar
- M. Fürer. Graph isomorphism testing without numberics for graphs of bounded eigenvalue multiplicity. In Proc. of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 624--631. ACM Press, 1995. Google ScholarDigital Library
- T. Gärtner. Exponential and geometric kernels for graphs. In NIPS Workshop on Unreal Data: Principles of Modeling Nonvectorial Data, 2002.Google Scholar
- T. Gärtner, K. Driessens, and J. Ramon. Graph kernels and gaussian processes for relational reinforcement learning. In Proc. of the 13th International Conference on Inductive Logic Programming, volume 2835 of LNAI, pages 146--163. Springer Verlag, 2003.Google ScholarCross Ref
- T. Gärtner, P. A. Flach, and S. Wrobel. On graph kernels: Hardness results and efficient alternatives. In Learning Theory and Kernel Machines, 16th Annual Conference on Learning Theory and 7th Kernel Workshop, Proceedings. volume 2843 of LNAI, pages 129--143. Springer Verlag, 2003.Google Scholar
- T. Graepel. PAC-Bayesian Pattern Classification with Kernels. PhD thesis, TU Berlin, 2002.Google Scholar
- D. Haussler. Convolution kernels on discrete structures. Technical report, Department of Computer Science, University of California at Santa Cruz, 1999.Google Scholar
- T. Joachims. Making large--scale SVM learning practical. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods --- Support Vector Learning, pages 169--184. MIT Press, 1999. Google ScholarDigital Library
- H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proc. of the 20th International Conference on Machine Learning, pages 321--328. AAAI Press, 2003.Google Scholar
- S. Kramer, L. D. Raedt, and C. Helma. Molecular feature mining in HIV data. In Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 136--143. ACM Press, 2001. Google ScholarDigital Library
- S. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. The web as a graph. In Proc. of the 19th ACM Symposium on Principles of Database Systems, pages 1--10. ACM Press, 2000. Google ScholarDigital Library
- M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of the IEEE International Conference on Data Mining, pages 313--320, IEEE Computer Society, 2001. Google ScholarDigital Library
- C. Leslie and R. Kuang. Fast kernels for inexact string matching. In Learning Theory and Kernel Machines, 16th Annual Conference on Learning Theory and 7th Kernel Workshop, Proceedings. volume 2843 of LNAI, pages 114--128. Springer Verlag, 2003.Google Scholar
- H. Lodhi, J. Shawe-Taylor, N. Christianini, and C. Watkins. Text classification using string kernels. In T. Leen, T. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, volume 13. MIT Press, 2001.Google Scholar
- F. Provost, T. Fawcett, and R. Kohavi. The case against accuracy estimation for comparing induction algorithms. In Proc. of the 15th International Conference on Machine Learning, pages 445--453. Morgan Kaufmann, 1998. Google ScholarDigital Library
- F. J. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learning, 42(3):203--231, 2001. Google ScholarDigital Library
- R. C. Read and R. E. Tarjan. Bounds on backtrack algorithms for listing cycles, paths, and spanning trees. Networks, 5(3):237--252, 1975.Google ScholarDigital Library
- B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, 2002.Google Scholar
- R. Tarjan. Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1(2):146--160, 1972.Google ScholarDigital Library
- L. G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410--421, 1979.Google ScholarDigital Library
- V. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, 1995. Google ScholarDigital Library
- P. Vismara. Union of all the minimum cycle bases of a graph. The Electronic Journal of Combinatorics, 4(1):73--87, 1997.Google ScholarCross Ref
- C. Watkins. Kernels from matching operations. Technical report, Department of Computer Science, Royal Holloway, University of London, 1999.Google Scholar
- M. Zaki. Efficiently mining frequent trees in a forest. In Proc. of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 71--80. ACM Press, 2002. Google ScholarDigital Library
Index Terms
Cyclic pattern kernels for predictive graph mining
Recommendations
Mining frequent cross-graph quasi-cliques
Joint mining of multiple datasets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in bioinformatics, jointly mining multiple gene expression datasets obtained by different ...
Mining Frequent Subgraph Patterns from Uncertain Graph Data
In many real applications, graph data is subject to uncertainties due to incompleteness and imprecision of data. Mining such uncertain graph data is semantically different from and computationally more challenging than mining conventional exact graph ...
On mining cross-graph quasi-cliques
KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data miningJoint mining of multiple data sets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in cross-market customer segmentation, a group of customers who behave similarly in ...
Comments