skip to main content
10.1145/1014052.1014072acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Cyclic pattern kernels for predictive graph mining

Published:22 August 2004Publication History

ABSTRACT

With applications in biology, the world-wide web, and several other areas, mining of graph-structured objects has received significant interest recently. One of the major research directions in this field is concerned with predictive data mining in graph databases where each instance is represented by a graph. Some of the proposed approaches for this task rely on the excellent classification performance of support vector machines. To control the computational cost of these approaches, the underlying kernel functions are based on frequent patterns. In contrast to these approaches, we propose a kernel function based on a natural set of cyclic and tree patterns independent of their frequency, and discuss its computational aspects. To practically demonstrate the effectiveness of our approach, we use the popular NCI-HIV molecule dataset. Our experimental results show that cyclic pattern kernels can be computed quickly and offer predictive performance superior to recent graph kernels based on frequent patterns.

References

  1. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of association rules. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, Chapter 12, pages 307 -- 328. AAAI/MIT Press, Cambridge, USA, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. H. Alt, U. Fuchs, and K. Kriegel. On the number of simple cycles in planar graphs. Combinatorics, Probability & Computing, 8(5):397--405, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Asai, Arimura, Uno, and Nakano. Discovering frequent substructures in large unordered trees. In Proc. of the 6th International Conference on Discovery Science, volume 2843 of LNAI, pages 47--61. Springer Verlag, 2003.Google ScholarGoogle Scholar
  4. C. Borgelt and M. R. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In Proc. of the 2002 IEEE International Conference on Data Mining, pages 51--58. IEEE Computer Society, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proc. of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144--152. ACM Press, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Collins and N. Duffy. Convolution kernels for natural language. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems, volume 14. MIT Press, 2002.Google ScholarGoogle Scholar
  7. C. Cortes, P. Haffner, and M. Mohri. Positive definite rational kernels. In Learning Theory and Kernel Machines, 16th Annual Conference on Learning Theory and 7th Kernel Workshop, Proceedings. volume 2843 of LNAI, pages 41--56. Springer Verlag, 2003.Google ScholarGoogle Scholar
  8. M. Deshpande, M. Kuramochi, and G. Karypis. Automated approaches for classifying structures. In Proc. of the 2nd ACM SIGKDD Workshop on Data Mining in Bioinformatics, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  9. M. Deshpande, M. Kuramochi, and G. Karypis. Frequent sub-structure based approaches for classifying chemical compounds. In Proc. of the 3rd IEEE International Conference on Data Mining, pages 35--42. IEEE Computer Society, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Diestel. Graph theory. 2nd edition, Springer Verlag, 2000.Google ScholarGoogle Scholar
  11. H.-D. Ebbinghaus and J. Flum. Finite Model Theory. Perspectives in Mathematical Logic. 2nd edition, Springer Verlag, 1999.Google ScholarGoogle Scholar
  12. M. Fürer. Graph isomorphism testing without numberics for graphs of bounded eigenvalue multiplicity. In Proc. of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 624--631. ACM Press, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Gärtner. Exponential and geometric kernels for graphs. In NIPS Workshop on Unreal Data: Principles of Modeling Nonvectorial Data, 2002.Google ScholarGoogle Scholar
  14. T. Gärtner, K. Driessens, and J. Ramon. Graph kernels and gaussian processes for relational reinforcement learning. In Proc. of the 13th International Conference on Inductive Logic Programming, volume 2835 of LNAI, pages 146--163. Springer Verlag, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  15. T. Gärtner, P. A. Flach, and S. Wrobel. On graph kernels: Hardness results and efficient alternatives. In Learning Theory and Kernel Machines, 16th Annual Conference on Learning Theory and 7th Kernel Workshop, Proceedings. volume 2843 of LNAI, pages 129--143. Springer Verlag, 2003.Google ScholarGoogle Scholar
  16. T. Graepel. PAC-Bayesian Pattern Classification with Kernels. PhD thesis, TU Berlin, 2002.Google ScholarGoogle Scholar
  17. D. Haussler. Convolution kernels on discrete structures. Technical report, Department of Computer Science, University of California at Santa Cruz, 1999.Google ScholarGoogle Scholar
  18. T. Joachims. Making large--scale SVM learning practical. In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods --- Support Vector Learning, pages 169--184. MIT Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proc. of the 20th International Conference on Machine Learning, pages 321--328. AAAI Press, 2003.Google ScholarGoogle Scholar
  20. S. Kramer, L. D. Raedt, and C. Helma. Molecular feature mining in HIV data. In Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 136--143. ACM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. The web as a graph. In Proc. of the 19th ACM Symposium on Principles of Database Systems, pages 1--10. ACM Press, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of the IEEE International Conference on Data Mining, pages 313--320, IEEE Computer Society, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Leslie and R. Kuang. Fast kernels for inexact string matching. In Learning Theory and Kernel Machines, 16th Annual Conference on Learning Theory and 7th Kernel Workshop, Proceedings. volume 2843 of LNAI, pages 114--128. Springer Verlag, 2003.Google ScholarGoogle Scholar
  24. H. Lodhi, J. Shawe-Taylor, N. Christianini, and C. Watkins. Text classification using string kernels. In T. Leen, T. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems, volume 13. MIT Press, 2001.Google ScholarGoogle Scholar
  25. F. Provost, T. Fawcett, and R. Kohavi. The case against accuracy estimation for comparing induction algorithms. In Proc. of the 15th International Conference on Machine Learning, pages 445--453. Morgan Kaufmann, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. J. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learning, 42(3):203--231, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. C. Read and R. E. Tarjan. Bounds on backtrack algorithms for listing cycles, paths, and spanning trees. Networks, 5(3):237--252, 1975.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, 2002.Google ScholarGoogle Scholar
  29. R. Tarjan. Depth-first search and linear graph algorithms. SIAM Journal on Computing, 1(2):146--160, 1972.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L. G. Valiant. The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3):410--421, 1979.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. V. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Vismara. Union of all the minimum cycle bases of a graph. The Electronic Journal of Combinatorics, 4(1):73--87, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  33. C. Watkins. Kernels from matching operations. Technical report, Department of Computer Science, Royal Holloway, University of London, 1999.Google ScholarGoogle Scholar
  34. M. Zaki. Efficiently mining frequent trees in a forest. In Proc. of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 71--80. ACM Press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cyclic pattern kernels for predictive graph mining

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
          August 2004
          874 pages
          ISBN:1581138881
          DOI:10.1145/1014052

          Copyright © 2004 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 22 August 2004

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader