skip to main content
10.1145/1141277.1141404acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

A probability analysis for candidate-based frequent itemset algorithms

Published: 23 April 2006 Publication History

Abstract

This paper explores the generation of candidates, which is an important step in frequent itemset mining algorithms, from a theoretical point of view. Important notions in our probabilistic analysis are success (a candidate that is frequent), and failure (a candidate that is infrequent). For a selection of candidate-based frequent itemset mining algorithms, the probabilities of these events are studied for the shopping model where all the shoppers are independent and each combination of items has its own probability, so any correlation between items is possible. The Apriori Algorithm is considered in detail; for AIS, Eclat, FP-growth and the Fast Completion Apriori Algorithm, the main principles are sketched. The results of the analysis are used to compare the behaviour of the algorithms for a variety of data distributions.

References

[1]
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. Proc. ACM SIGMOD Int. Conf. Management of Data, pages 207--216, 1993.
[2]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. Proc. of VLDB Conference, pages 487--499, 1994.
[3]
R. Agrawal and R. Srikant. Mining sequential patterns. Proc. IEEE ICDE Int. Conf. on Data Engineering, pages 3--14, 1995.
[4]
R. J. Bayardo. Efficiently mining long patterns from databases. Proc. ACM SIDMOD Int. Conf. on Management of Data, pages 85--93, 1998.
[5]
R. J. Bayardo, B. Goethals, and M. J. (co-chairs) Zaki. Workshop on frequent itemset mining implementations (fimi '04). Brighton, UK, 2004.
[6]
T. Calders. Computational complexity of itemset frequency satisfiability. Proc. ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, pages 143--154, 2004.
[7]
H. Chernoff. A measure of asymptotic efficiency for test of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23:493--507, 1942.
[8]
F. Geerts, B. Goethals, and J. Van den Bussche. A tight upper bound on the number of candidate patterns. Proc. of the first IEEE Int. Conf. on Data Mining, 2001.
[9]
B. Goethals. Efficient frequent pattern mining. PhD thesis, transnational University of Limburg, Diepenbeek, Belgium, December 2002.
[10]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. Proc. ACM SIGMOD Int. Conf. Management of Data, pages 1--12, 2000.
[11]
W. A. Kosters and W. Pijls. Apriori, a depth first implementation. Proc. of the Workshop on Frequent Itemset Mining Implementations, 2003.
[12]
P. W. Purdom, D. Van Gucht, and D. P. Groth. Average case performance of the apriori algorithm. SIAM J. Computing, 33 (5): 1223--1260, 2004.
[13]
M. J. Zaki. Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12 (3):372--390, 2000.
[14]
Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms. Proc. of the 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 401--406, 2001.

Cited By

View all
  • (2010)Mining uncertain data for frequent itemsets that satisfy aggregate constraintsProceedings of the 2010 ACM Symposium on Applied Computing10.1145/1774088.1774305(1034-1038)Online publication date: 22-Mar-2010
  • (2009)Scalable APRIORI-Based Frequent Pattern DiscoveryProceedings of the 2009 International Conference on Computational Science and Engineering - Volume 0110.1109/CSE.2009.51(48-55)Online publication date: 29-Aug-2009
  • (2007)The complexity of satisfying constraints on databases of transactionsActa Informatica10.1007/s00236-007-0060-144:7(591-624)Online publication date: 9-Nov-2007
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '06: Proceedings of the 2006 ACM symposium on Applied computing
April 2006
1967 pages
ISBN:1595931082
DOI:10.1145/1141277
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SAC06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2010)Mining uncertain data for frequent itemsets that satisfy aggregate constraintsProceedings of the 2010 ACM Symposium on Applied Computing10.1145/1774088.1774305(1034-1038)Online publication date: 22-Mar-2010
  • (2009)Scalable APRIORI-Based Frequent Pattern DiscoveryProceedings of the 2009 International Conference on Computational Science and Engineering - Volume 0110.1109/CSE.2009.51(48-55)Online publication date: 29-Aug-2009
  • (2007)The complexity of satisfying constraints on databases of transactionsActa Informatica10.1007/s00236-007-0060-144:7(591-624)Online publication date: 9-Nov-2007
  • (2006)Peak-Jumping Frequent Itemset Mining AlgorithmsKnowledge Discovery in Databases: PKDD 200610.1007/11871637_47(487-494)Online publication date: 2006

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media