ACM Home Page
Please provide us with feedback. Feedback
Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining
Full text PdfPdf (198 KB)
Source Conference on Knowledge Discovery in Data archive
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining table of contents
Washington, D.C.
SESSION: Research track table of contents
Pages: 109 - 118  
Year of Publication: 2003
ISBN:1-58113-737-0
Authors
Mohammad El-Hajj  University of Alberta, Edmonton, AB, Canada
Osmar R. Zaïane  University of Alberta, Edmonton, AB, Canada
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 54,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/956750.956766
What is a DOI?

ABSTRACT

Existing association rule mining algorithms suffer from many problems when mining massive transactional datasets. One major problem is the high memory dependency: either the gigantic data structure built is assumed to fit in main memory, or the recursive mining process is too voracious in memory resources. Another major impediment is the repetitive and interactive nature of any knowledge discovery process. To tune parameters, many runs of the same algorithms are necessary leading to the building of these huge data structures time and again. This paper proposes a new disk-based association rule mining algorithm called Inverted Matrix, which achieves its efficiency by applying three new ideas. First, transactional data is converted into a new database layout called Inverted Matrix that prevents multiple scanning of the database during the mining phase, in which finding frequent patterns could be achieved in less than a full scan with random access. Second, for each frequent item, a relatively small independent tree is built summarizing co-occurrences. Finally, a simple and non-recursive mining process reduces the memory requirements as minimum candidacy generation and counting is needed. Experimental studies reveal that our Inverted Matrix approach outperform FP-Tree especially in mining very large transactional databases with a very large number of unique items. Our random access disk-based approach is particularly advantageous in a repetitive and interactive setting.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
IBM. Almaden. Quest synthetic data generation code. http://www.almaden.ibm.com/cs/quest/syndata.html.
 
4
 
5
C. Borgelt. Apriori implementation. http://fuzzy.cs.uni- magdeburg.de/~borgelt/apriori/apriori.html.
6
 
7
 
8
9
10
 
11
12
13
 
14
 
15
M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining, 1997.
 
16


Collaborative Colleagues:
Mohammad El-Hajj: colleagues
Osmar R. Zaïane: colleagues

Peer to Peer - Readers of this Article have also read: