Mining frequent patterns without candidate generation

Authors:
Jiawei Han

School of Computing Science, Simon Fraser University

School of Computing Science, Simon Fraser University
View Profile

,
Jian Pei

School of Computing Science, Simon Fraser University

School of Computing Science, Simon Fraser University
View Profile

,
Yiwen Yin

School of Computing Science, Simon Fraser University

School of Computing Science, Simon Fraser University
View Profile

SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of dataMay 2000Pages 1–12https://doi.org/10.1145/342009.335372

Published:16 May 2000Publication History

SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data

Pages 1–12

ABSTRACT

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist prolific patterns and/or long patterns.

In this study, we propose a novel frequent pattern tree (FP-tree) structure, which is an extended prefix-tree structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth. Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a highly condensed, much smaller data structure, which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts a pattern fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns, and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported new frequent pattern mining methods.

References

1.R. Agarwal, C. Aggarwal, and V. V. V. Prasad. Depth-first generation of large itemsets for association rules. IBM Tech. Report RC21538, July 1999.Google Scholar
2.R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for generation of frequent itemsets. In J, Parallel and Distributed Computing, 2000. Google ScholarDigital Library
3.R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB'9#, pp. 487-499. Google ScholarDigital Library
4.R. Agrawal and R. Srikant. Mining sequential patterns. In ICDE'95, pp. 3-14. Google ScholarDigital Library
5.R. J. Bayardo. Efficiently mining long patterns from databases. In SIGMOD'98, pp. 85-93. Google ScholarDigital Library
6.S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing association rules to correlations. In SIGMOD'97, pp. 265-276. Google ScholarDigital Library
7.G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In KDD'99, pp. 43-52. Google ScholarDigital Library
8.G. Grahne, L. Lakshmanan, and X. Wang. Efficient mining of constrained correlated sets. In ICDE'00.Google ScholarCross Ref
9.J. Han, G. Dong, and Y. Yin. Efficient mining of partial periodic patterns in time series database. In ICDE'99, pp. 106-115. Google ScholarDigital Library
10.J. Han, J. Pei, and Y. Yin. Mining partial periodicity using frequent pattern trees. In GS Tech, Rep, 99-10, Simon Fraser University, July 1999.Google Scholar
11.M. Kamber, J. Han, and J. Y. Chiang. Metaruleguided mining of multi-dimensional association rules using data cubes. In KDD'97, pp. 207-210.Google Scholar
12.M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo. Finding interesting rules from large sets of discovered association rules. In CIKM'9#, pp. 401-408. Google ScholarDigital Library
13.B. Lent, A. Swami, and J. Widom. Clustering association rules. In ICDE'97, pp. 220-231. Google ScholarDigital Library
14.H. Mannila, H Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1:259-289, 1997. Google ScholarDigital Library
15.R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In SIGMOD'98, pp. 13-24. Google ScholarDigital Library
16.J.S. Park, M.S. Chen, and P.S. Yu. An effective hash-based algorithm for mining association rules. In SIGMOD'95, pp. 175-186. Google ScholarDigital Library
17.S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systerns: Alternatives and implications. In SIGMOD'98, pp. 343-354. Google ScholarDigital Library
18.A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large databases. In VLDB'95, pp. 432-443. Google ScholarDigital Library
19.C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining causal structures. In VLDB'98, pp. 594-605. Google ScholarDigital Library
20.R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In KDD'97, pp. 67-73.Google Scholar

Index Terms

Mining frequent patterns without candidate generation
1. Information systems
  1. Data management systems
    1. Database design and models
  2. Information systems applications
    1. Data mining

Recommendations

Mining frequent patterns without candidate generation

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test ...
Read More
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test ...
Read More
Mining frequent closed itemsets without candidate generation
ISPA'05: Proceedings of the Third international conference on Parallel and Distributed Processing and Applications

Mining frequent closed itemsets provides complete and non-redundant result for the analysis of frequent pattern. Most of the previous studies adopted the FP-tree based conditional FP-tree generation and candidate itemsets generation-and-test approaches. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data
May 2000
604 pages
ISBN:1581132174
DOI:10.1145/342009
Chairmen:
Maggie Dunham
Southern Methodist Univ.
,
Jeffrey F. Naughton
Univ. of Wisconsin-Madison
,
Weidong Chen
Southern Methodist Univ.
,
Nick Koudas
AT &T Labs
ACM SIGMOD Record Volume 29, Issue 2
June 2000
609 pages
ISSN:0163-5808
DOI:10.1145/335191
Editors:
Weidong Chen
Southern Methodist Univ., Dallas, TX
,
Jeffrey Naughton
Univ. of Wisconsin-Madison, Madison
,
Philip A. Bernstein
Microsoft
Issue’s Table of Contents
Copyright © 2000 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 May 2000
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
SIGMOD '00 Paper Acceptance Rate42of248submissions,17%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4,910
  Total Citations
  View Citations
- 7,564
  Total Downloads
- Downloads (Last 12 months)2,449
- Downloads (Last 6 weeks)368
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mining frequent patterns without candidate generation

SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mining frequent patterns without candidate generation

Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Mining frequent closed itemsets without candidate generation