skip to main content
10.1145/342009.335372acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free Access

Mining frequent patterns without candidate generation

Authors Info & Claims
Published:16 May 2000Publication History

ABSTRACT

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns.

In this study, we propose a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent pattern mining methods.

References

  1. 1.R. Agarwal, C. Aggarwal, and V. V. V. Prasad. Depth-first generation of large itemsets for association rules. IBM Tech. Report RC21538, July 1999.Google ScholarGoogle Scholar
  2. 2.R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of frequent itemsets. In J, Parallel and Distributed Computing, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB'9#, pp. 487-499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.R. Agrawal and R. Srikant. Mining sequential patterns. In ICDE'95, pp. 3-14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.R. J. Bayardo. Efficiently mining long patterns from databases. In SIGMOD'98, pp. 85-93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing association rules to correlations. In SIGMOD'97, pp. 265-276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In KDD'99, pp. 43-52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.G. Grahne, L. Lakshmanan, and X. Wang. Efficient mining of constrained correlated sets. In ICDE'00.Google ScholarGoogle ScholarCross RefCross Ref
  9. 9.J. Han, G. Dong, and Y. Yin. Efficient mining of partial periodic patterns in time series database. In ICDE'99, pp. 106-115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.J. Han, J. Pei, and Y. Yin. Mining partial periodicity using frequent pattern trees. In GS Tech, Rep, 99-10, Simon Fraser University, July 1999.Google ScholarGoogle Scholar
  11. 11.M. Kamber, J. Han, and J. Y. Chiang. Metaruleguided mining of multi-dimensional association rules using data cubes. In KDD'97, pp. 207-210.Google ScholarGoogle Scholar
  12. 12.M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo. Finding interesting rules from large sets of discovered association rules. In CIKM'9#, pp. 401-408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.B. Lent, A. Swami, and J. Widom. Clustering association rules. In ICDE'97, pp. 220-231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. 14.H. Mannila, H Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1:259-289, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In SIGMOD'98, pp. 13-24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.J.S. Park, M.S. Chen, and P.S. Yu. An effective hash-based algorithm for mining association rules. In SIGMOD'95, pp. 175-186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. 17.S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systerns: Alternatives and implications. In SIGMOD'98, pp. 343-354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. 18.A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In VLDB'95, pp. 432-443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining causal structures. In VLDB'98, pp. 594-605. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20.R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In KDD'97, pp. 67-73.Google ScholarGoogle Scholar

Index Terms

  1. Mining frequent patterns without candidate generation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data
        May 2000
        604 pages
        ISBN:1581132174
        DOI:10.1145/342009

        Copyright © 2000 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 May 2000

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        SIGMOD '00 Paper Acceptance Rate42of248submissions,17%Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader