Article

Computational complexity of itemset frequency satisfiability

Author:

Toon CaldersAuthors Info & Claims

PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Pages 143 - 154

https://doi.org/10.1145/1055558.1055580

Published: 14 June 2004 Publication History

Abstract

Computing frequent itemsets is one of the most prominent problems in data mining. We introduce a new, related problem, called FREQSAT: given some itemset-interval pairs, does there exist a database such that for every pair the frequency of the itemset falls in the interval? It is shown in this paper that FREQSAT is not finitely axiomatizable and that it is NP-complete. We also study cases in which other characteristics of the database are given as well. These characteristics can complicate FREQSAT even more. For example, when the maximal number of duplicates of a transaction is known, FREQSAT becomes PP-hard. We describe applications of FREQSAT in frequent itemset mining algorithms and privacy in data mining.

References

[1]

R. Agrawal, T. Imilienski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. ACM SIGMOD, pages 207--216, 1993.

Digital Library

[2]

R. Agrawal, and R. Srikant. Privacy-preserving data mining. In Proc. ACM SIGMOD, pages 439--450, 2000.

Digital Library

[3]

Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal. Mining frequent patterns with counting inference. SIGKDD Explorations, 2(2):66--75, 2000.

Digital Library

[4]

R. J. Bayardo, Efficiently mining long patterns from databases. In Proc. ACM SIGMOD, pages 85--93, 1998.

Digital Library

[5]

T. Calders, Deducing bounds on the support of itemsets. In Database Support for Data Mining Applications, LNCS 2682, Springer, to appear 2004.

[6]

T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pages 74--85. Springer, 2002.

Digital Library

[7]

V. Chvátal. Recognizing intersection patterns. Annals of Discrete Mathematics - Combinatorics 79, 8(I):249--251, 1980.

[8]

M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-completeness. Freeman, New York, 1979.

Digital Library

[9]

G. Georgakopoulos, D. Kavvadias, and C. H. Papadimitriou. Probabilistic satisfiability. Journal of Complexity, 4:1--11, 1988.

Digital Library

[10]

J. Hipp, U. Güntzer, and G. Nakhaeizadeh. Algorithms for association rule mining - a general survey and comparison. SIGKDD Explorations, 2(1):58--64, 2000.

Digital Library

[11]

T. Lukasiewicz. Probabilistic logic programming with conditional constraints. ACM Transactions on Computational Logic, 2(3):289--339, 2001.

Digital Library

[12]

H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In Proc. KDD Int. Conf. Knowledge Discovery in Databases, 1996.

[13]

T. Mielikäinen. On inverse frequent set mining. In Workshop on Privacy Preserving Data Mining, 2003.

[14]

N. Nilsson. Probabilistic logic. Artificial Intelligence, 28:71--87, 1986.

Digital Library

[15]

C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.

Cited By

Manco GRitacco ERullo ASaccà DSerra E(2022)Machine learning methods for generating high dimensional discrete datasetsWIREs Data Mining and Knowledge Discovery10.1002/widm.145012:2Online publication date: 18-Jan-2022
https://doi.org/10.1002/widm.1450
Saccá DSerra ERullo A(2019)Extending inverse frequent itemsets mining to generate realistic datasets: complexity, accuracy and emerging applicationsData Mining and Knowledge Discovery10.1007/s10618-019-00643-133:6(1736-1774)Online publication date: 1-Nov-2019
https://dl.acm.org/doi/10.1007/s10618-019-00643-1
Wong IDobbie GKoh Y(2019)Items2Data: Generating Synthetic Boolean Datasets from ItemsetsDatabases Theory and Applications10.1007/978-3-030-12079-5_6(79-90)Online publication date: 23-Jan-2019
https://doi.org/10.1007/978-3-030-12079-5_6
Show More Cited By

Recommendations

Itemset frequency satisfiability: Complexity and axiomatization

Computing frequent itemsets is one of the most prominent problems in data mining. We study the following related problem, called FREQSAT, in depth: given some itemset-interval pairs, does there exist a database such that for every pair the frequency of ...
Non-derivable itemset mining

All frequent itemset mining algorithms rely heavily on the monotonicity principle for pruning. This principle allows for excluding candidate itemsets from the expensive counting phase. In this paper, we present sound and complete deduction rules to ...
Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation
ICDM '05: Proceedings of the Fifth IEEE International Conference on Data Mining

In order to generate synthetic basket datasets for better benchmark testing, it is important to integrate characteristics from real-life databases into the synthetic basket datasets. The characteristics that could be used for this purpose include the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

June 2004

350 pages

ISBN:158113858X

DOI:10.1145/1055558

Conference Chair:
Catriel Beeri
Hebrew University of Jerusalem

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SIGMOD/PODS04

Sponsor:

SIGMOD/PODS04: International Conference on Management of Data and Symposium on Principles Database and Systems

June 14 - 16, 2004

Paris, France

Acceptance Rates

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

35
Total Citations
View Citations
345
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 31 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Manco GRitacco ERullo ASaccà DSerra E(2022)Machine learning methods for generating high dimensional discrete datasetsWIREs Data Mining and Knowledge Discovery10.1002/widm.145012:2Online publication date: 18-Jan-2022
https://doi.org/10.1002/widm.1450
Saccá DSerra ERullo A(2019)Extending inverse frequent itemsets mining to generate realistic datasets: complexity, accuracy and emerging applicationsData Mining and Knowledge Discovery10.1007/s10618-019-00643-133:6(1736-1774)Online publication date: 1-Nov-2019
https://dl.acm.org/doi/10.1007/s10618-019-00643-1
Wong IDobbie GKoh Y(2019)Items2Data: Generating Synthetic Boolean Datasets from ItemsetsDatabases Theory and Applications10.1007/978-3-030-12079-5_6(79-90)Online publication date: 23-Jan-2019
https://doi.org/10.1007/978-3-030-12079-5_6
Thanabal MGanesan L(2016)A Contrary Recurrent Item Set Hierarchy in PPDDMNational Academy Science Letters10.1007/s40009-016-0460-239:4(283-286)Online publication date: 15-Apr-2016
https://doi.org/10.1007/s40009-016-0460-2
Stavropoulos EVerykios VKagklis V(2016)A transversal hypergraph approach for the frequent itemset hiding problemKnowledge and Information Systems10.1007/s10115-015-0862-347:3(625-645)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1007/s10115-015-0862-3
Bhuiyan MHasan M(2016)Interactive knowledge discovery from hidden data through sampling of frequent patternsStatistical Analysis and Data Mining10.1002/sam.113229:4(205-229)Online publication date: 1-Aug-2016
https://dl.acm.org/doi/10.1002/sam.11322
Guzzo AMoccia LSaccà DSerra E(2013)Solving inverse frequent itemset mining with infrequency constraints via large-scale linear programsACM Transactions on Knowledge Discovery from Data10.1145/2541268.25412717:4(1-39)Online publication date: 25-Dec-2013
https://dl.acm.org/doi/10.1145/2541268.2541271
Verykios V(2013)Association rule hiding methodsWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery10.1002/widm.10823:1(28-36)Online publication date: 1-Jan-2013
https://dl.acm.org/doi/10.1002/widm.1082
Espil M(2012)Tractable reasoning problems with fully-characterized association rulesProceedings of the 16th East European conference on Advances in Databases and Information Systems10.1007/978-3-642-33074-2_21(282-295)Online publication date: 18-Sep-2012
https://dl.acm.org/doi/10.1007/978-3-642-33074-2_21
Saccà DSerra EGuzzo A(2012)Count constraints and the inverse OLAP problemProceedings of the 7th international conference on Foundations of Information and Knowledge Systems10.1007/978-3-642-28472-4_20(352-369)Online publication date: 5-Mar-2012
https://dl.acm.org/doi/10.1007/978-3-642-28472-4_20
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten