ACM Home Page
Please provide us with feedback. Feedback
COFI approach for mining frequent itemsets revisited
Full text PdfPdf (267 KB)
Source Data Mining And Knowledge Discovery archive
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery table of contents
Paris, France
SESSION: Short papers table of contents
Pages: 70 - 75  
Year of Publication: 2004
ISBN:1-58113-908-X
Authors
Mohammad El-Hajj  University of Alberta, Edmonton, AB, Canada
Osmar R. Zaïane  University of Alberta, Edmonton, AB, Canada
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 2,   Downloads (12 Months): 46,   Citation Count: 0
Additional Information:

abstract   references   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1008694.1008706
What is a DOI?

ABSTRACT

The COFI approach for mining frequent itemsets, introduced recently, is an efficient algorithm that was demonstrated to outperform state-of-the-art algorithms on synthetic data. For instance, COFI is not only one order of magnitude faster and requires significantly less memory than the popular FP-Growth, it is also very effective with extremely large datasets, better than any reported algorithm. However, COFI has a significant drawback when mining dense transactional databases which is the case with some real datasets. The algorithm performs poorly in these cases because it ends up generating too many local candidates that are doomed to be infrequent. In this paper, we present a new algorithm COFI* for mining frequent itemsets. This novel algorithm uses the same data structure COFI-tree as its predecessor, but partitions the patterns in such a way to avoid the drawbacks of COFI. Moreover, its approach uses a pseudo-Oracle to pinpoint the maximal itemsets, from which all frequent itemsets are derived and counted, avoiding the generation of candidates fated infrequent. Our implementation tested on real and synthetic data shows that COFI* algorithm outperforms state-of-the-art algorithms, among them COFI itself.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
3
4
 
5
6
 
7
M. El-Hajj and O. R. Zaïane. Non recursive generation of frequent k-itemsets from frequent pattern tree representations. In In Proc. of 5th International Conference on Data Warehousing and Knowledge Discovery (DaWak'2003), pages 371--380, September 2003.
 
8
B. Goethals. Frequent pattern mining implementations. http://www.cs.helsinki.fi/u/goethals/software/index.html.
 
9
B. Goethals and M. Zaki. Advances in frequent itemset mining implementations: Introduction to fimi03. In Workshop on Frequent Itemset Mining Implementations (FIMI'03) in conjunction with IEEE-ICDM, 2003.
 
10
11
 
12
IBM_Almaden. Quest synthetic data generation code. http://www.almaden.ibm.com/cs/quest/syndata.html.
 
13
 
14
A. Rungsawang, A. Tangpong, P. Laohawee, and T. Khampachua. Novel query expansion technique using apriori algorithm. In TREC, Gaithersburg, Maryland, 1999.
 
15
Collaborative Colleagues:
Mohammad El-Hajj: colleagues
Osmar R. Zaïane: colleagues

Peer to Peer - Readers of this Article have also read: