skip to main content
10.1145/1601966.1601981acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

OcVFDT: one-class very fast decision tree for one-class classification of data streams

Authors Info & Claims
Published:28 June 2009Publication History

ABSTRACT

Current research on data stream classification mainly focuses on supervised learning, in which a fully labeled data stream is needed for training. However, fully labeled data streams are expensive to obtain, which make the supervised learning approach difficult to be applied to real-life applications. In this paper, we model applications, such as credit fraud detection and intrusion detection, as a one-class data stream classification problem. The cost of fully labeling the data stream is reduced as users only need to provide some positive samples together with the unlabeled samples to the learner. Based on VFDT and POSC4.5, we propose our OcVFDT (One-class Very Fast Decision Tree) algorithm. Experimental study on both synthetic and real-life datasets shows that the OcVFDT has excellent classification performance. Even 80% of the samples in data stream are unlabeled, the classification performance of OcVFDT is still very close to that of VFDT, which is trained on fully labeled stream.

References

  1. B. Calvo, P. Larranaga, and J. A. Lozano. Learning Bayesian classifiers from positive and unlabeled examples. Pattern Recognition Letters, 28:2375--2384, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. Denis, R. Gilleron, and F. Letouzey. Learning from positive and unlabeled examples. Theoretical Computer Science, pages 70--83, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Dietterich. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10(7):1895--1923, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Domingos and G. Hulten. Mining high-speed data streams. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (SIGKDD'00), pages 71--80. ACM New York, NY, USA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Elkan and K. Noto. Learning classifiers from only positive and unlabeled data. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (SIGKDD'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. U. M. Fayyad and K. B. Irani. On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8:87--102, 1992. Google ScholarGoogle ScholarCross RefCross Ref
  7. G. Fung, J. Yu, H. Lu, and P. Yu. Text classification without negative examples revisit. IEEE Transactions on Knowledge and Data Engineering, 18(1):6--20, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Gama, P. Medas, and P. Rodrigues. Learning Decision Trees from Dynamic Data Streams. Journal of Universal Computer Science, 11(8):1353--1366, 2005.Google ScholarGoogle Scholar
  9. J. Gama, R. Rocha, and P. Medas. Accurate decision trees for mining high-speed data streams. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (SIGKDD'03), pages 523--528. ACM Press New York, NY, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Hulten, P. Domingos, and L. Spencer. Mining massive data streams. In The Journal of Machine Learning Research, 2005.Google ScholarGoogle Scholar
  11. G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (SIGKDD'01), pages 97--106. ACM New York, NY, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Jin and G. Agrawal. Efficient decision tree construction on streaming data. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (SIGKDD'03), pages 571--576. ACM New York, NY, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. Lee and B. Liu. Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression. In Proceedings of Twentieth International Conference on Machine Learning. (ICML'03), volume 20, page 448, 2003.Google ScholarGoogle Scholar
  14. D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A New Benchmark Collection for Text Categorization Research. The Journal of Machine Learning Research, 5:361--397, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Liu, Y. Dai, X. Li, W. Lee, and P. Yu. Building text classifiers using positive and unlabeled examples. In Proceedings of the Third IEEE International Conference on Data Mining. (ICDM'03), pages 179--186, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Schölkopf, J. Platt, J. Shawe-Taylor, A. Smola, and R. Williamson. Estimating the Support of a High-Dimensional Distribution. Neural Computation, 13(7):1443--1471, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. W. Street and Y. Kim. A streaming ensemble algorithm (SEA) for large-scale classification. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (SIGKDD'01), pages 377--382. ACM New York, NY, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Wang, W. Fan, P. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (SIGKDD'03), pages 226--235. ACM New York, NY, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Yu. Single-Class Classification with Mapping Convergence. Machine Learning, 61(1):49--69, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Yu, J. Han, and K. Chang. PEBL: web page classification without negative examples. IEEE Transactions on Knowledge and Data Engineering, 16(1):70--81, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Zhang and X. Jin. An automatic construction and organization strategy for ensemble learning on data streams. ACM SIGMOD Record, 35(3):28--33, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Zhang, X. Li, and M. Orlowska. One-Class Classification of Text Streams with Concept Drift. In Proceedings of the Third IEEE International Conference on Data Mining Workshops. (ICDMW'08), pages 116--125, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. X. Zhu, X. Wu, and Y. Yang. Dynamic Classifier Selection for Effective Mining from Noisy Data Streams. In Proceedings of the Fourth IEEE International Conference on Data Mining. (ICDM'04), pages 305--312, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. OcVFDT: one-class very fast decision tree for one-class classification of data streams

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SensorKDD '09: Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
          June 2009
          150 pages
          ISBN:9781605586687
          DOI:10.1145/1601966

          Copyright © 2009 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 28 June 2009

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader