|
ABSTRACT
A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability.In this paper we present a novel vertical data representation called Diffset, that only keeps track of differences in the tids of a candidate pattern from its generating frequent patterns. We show that diffsets drastically cut down the size of memory required to store intermediate results. We show how diffsets, when incorporated into previous vertical mining methods, increase the performance significantly.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Rakesh Agrawal , Hiekki Mannila , Ramakrishnan Srikant , Hannu Toivonen , A. Inkeri Verkamo, Fast discovery of association rules, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, 1996
|
 |
2
|
Ramesh C. Agarwal , Charu C. Aggarwal , V. V. V. Prasad, Depth first generation of long patterns, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.108-118, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347114]
|
 |
3
|
Jay Ayres , Jason Flannick , Johannes Gehrke , Tomi Yiu, Sequential PAttern mining using a bitmap representation, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
[doi> 10.1145/775047.775109]
|
 |
4
|
|
 |
5
|
Sergey Brin , Rajeev Motwani , Jeffrey D. Ullman , Shalom Tsur, Dynamic itemset counting and implication rules for market basket data, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.255-264, May 11-15, 1997, Tucson, Arizona, United States
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
|
 |
11
|
Jiawei Han , Jian Pei , Yiwen Yin, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.1-12, May 15-18, 2000, Dallas, Texas, United States
|
| |
12
|
|
| |
13
|
|
 |
14
|
Jong Soo Park , Ming-Syan Chen , Philip S. Yu, An effective hash-based algorithm for mining association rules, Proceedings of the 1995 ACM SIGMOD international conference on Management of data, p.175-186, May 22-25, 1995, San Jose, California, United States
|
| |
15
|
|
| |
16
|
J. Pei, J. Han, and R. Mao. Closet: An efficient algorithm for mining frequent closed itemsets. In SIGMOD Int'l Workshop on Data Mining and Knowledge Discovery, May 2000.
|
 |
17
|
Sunita Sarawagi , Shiby Thomas , Rakesh Agrawal, Integrating association rule mining with relational database systems: alternatives and implications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.343-354, June 01-04, 1998, Seattle, Washington, United States
|
| |
18
|
|
| |
19
|
|
 |
20
|
Pradeep Shenoy , Jayant R. Haritsa , S. Sundarshan , Gaurav Bhalotia , Mayank Bawa , Devavrat Shah, Turbo-charging vertical mining of large databases, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.22-33, May 15-18, 2000, Dallas, Texas, United States
|
 |
21
|
|
| |
22
|
|
| |
23
|
M. J. Zaki and C.-J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In 2nd SIAM Int'l Conf. on Data Mining, April 2002.
|
| |
24
|
M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In 3rd Intl. Conf. on Knowledge Discovery and Data Mining, August 1997.
|
CITED BY 26
|
|
|
|
|
|
|
|
Guangzhu Yu , Keqing Li , Shihuang Shao, Mining high utility itemsets in large high dimensional data, Proceedings of the 1st international conference on Forensic applications and techniques in telecommunications, information, and multimedia and workshop, January 21-23, 2008, Adelaide, Australia
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Xin Zhang , Nikos Mamoulis , David W. Cheung , Yutao Shou, Fast mining of spatial collocations, Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, August 22-25, 2004, Seattle, WA, USA
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|