| Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining |
| Full text |
Pdf
(198 KB)
|
| Source
|
Conference on Knowledge Discovery in Data
archive
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
table of contents
Washington, D.C.
SESSION: Research track
table of contents
Pages: 109 - 118
Year of Publication: 2003
ISBN:1-58113-737-0
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 4, Downloads (12 Months): 54, Citation Count: 3
|
|
|
ABSTRACT
Existing association rule mining algorithms suffer from many problems when mining massive transactional datasets. One major problem is the high memory dependency: either the gigantic data structure built is assumed to fit in main memory, or the recursive mining process is too voracious in memory resources. Another major impediment is the repetitive and interactive nature of any knowledge discovery process. To tune parameters, many runs of the same algorithms are necessary leading to the building of these huge data structures time and again. This paper proposes a new disk-based association rule mining algorithm called Inverted Matrix, which achieves its efficiency by applying three new ideas. First, transactional data is converted into a new database layout called Inverted Matrix that prevents multiple scanning of the database during the mining phase, in which finding frequent patterns could be achieved in less than a full scan with random access. Second, for each frequent item, a relatively small independent tree is built summarizing co-occurrences. Finally, a simple and non-recursive mining process reduces the memory requirements as minimum candidacy generation and counting is needed. Experimental studies reveal that our Inverted Matrix approach outperform FP-Tree especially in mining very large transactional databases with a very large number of unique items. Our random access disk-based approach is particularly advantageous in a repetitive and interactive setting.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Tomasz Imieliński , Arun Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.207-216, May 25-28, 1993, Washington, D.C., United States
|
| |
2
|
|
| |
3
|
IBM. Almaden. Quest synthetic data generation code. http://www.almaden.ibm.com/cs/quest/syndata.html.
|
| |
4
|
|
| |
5
|
C. Borgelt. Apriori implementation. http://fuzzy.cs.uni- magdeburg.de/~borgelt/apriori/apriori.html.
|
 |
6
|
Sergey Brin , Rajeev Motwani , Jeffrey D. Ullman , Shalom Tsur, Dynamic itemset counting and implication rules for market basket data, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.255-264, May 11-15, 1997, Tucson, Arizona, United States
|
| |
7
|
|
| |
8
|
|
 |
9
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
 |
10
|
|
| |
11
|
|
 |
12
|
Junqiang Liu , Yunhe Pan , Ke Wang , Jiawei Han, Mining frequent item sets by opportunistic projection, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
[doi> 10.1145/775047.775081]
|
 |
13
|
Jong Soo Park , Ming-Syan Chen , Philip S. Yu, An effective hash-based algorithm for mining association rules, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.175-186, May 22-25, 1995, San Jose, California, United States
|
| |
14
|
|
| |
15
|
M. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining, 1997.
|
| |
16
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|