|
ABSTRACT
SQL-based data mining algorithms are rarely used in practice today. Most performance experiments have shown that SQL-based approaches are inferior to main-memory algorithms. Nevertheless, database vendors try to integrate analysis functionalities to some extent into their query execution and optimization components in order to narrow the gap between data and processing. Such a database support is particularly important when data mining applicatons need to analyze very large datasets or when they need access current data, not a possibly outdated copy of it.We investigate approaches based on SQL for the problem of finding frequent itemsets in a transaction table, including an algorithm that we recently proposed, called Quiver, which employs universal and existential quantifications. This approach employs a table schema for itemsets that is similar to the commonly used vertical layout for transactions: each item of an itemset is stored in a separate row. We argue that expressing the frequent itemset discovery problem using quantifications offers interesting opportunities to process such queries using set containment join or set containment division operators, which are not yet available in commercial database systems. Initial performance experiments reveal that Quiver cannot be processed efficiently by commercial DBMS. However, our experiments with query execution plans that use operators realizing set containment tests suggest that an efficient processing of Quiver is possible.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
Jochen Van den Bercken , Björn Blohsfeld , Jens-Peter Dittrich , Jürgen Krämer , Tobias Schäfer , Martin Schneider , Bernhard Seeger, XXL - A Library Approach to Supporting Efficient Implementations of Advanced Database Queries, Proceedings of the 27th International Conference on Very Large Data Bases, p.39-48, September 11-14, 2001
|
| |
4
|
J. Chen and D. DeWitt. Dynamic Re-grouping of Continuous Queries. In Proceedings VLDB, Hong Kong, China, pages 430--441, August 2002.
|
| |
5
|
M. Gimbel, M. Klein, and P. Lockemann. Interactivity, Scalability and Resource Control for Efficient KDD Support in DBMS. In Proceedings DTDM, Prague, Czech Republic, pages 37--50, March 2002.
|
 |
6
|
|
| |
7
|
|
| |
8
|
S. Helmer. Performance Enhancements for Advanced Database Management Systems. PhD thesis, University of Mannheim, Germany, December 2000.
|
| |
9
|
S. Helmer and G. Moerkotte. Compiling Away Set Containment and Intersection Joins. Technical Report, University of Mannheim, Germany.
|
| |
10
|
|
 |
11
|
|
| |
12
|
|
| |
13
|
H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient Algorithms for Discovering Association Rules. In AAAI Workshop on Knowledge and Discovery in Databases, Seattle, Washington, USA, pages 181--192, July 1994.
|
| |
14
|
|
 |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
|
| |
19
|
R. Rantzau. Frequent Itemset Discovery with SQL Using Universal Quantification. In P. Lanzi and R. Meo, editors, Database Support for Data Mining Applications, volume 2682 of LNCS. Springer, 2003. To appear.
|
| |
20
|
|
 |
21
|
Sunita Sarawagi , Shiby Thomas , Rakesh Agrawal, Integrating association rule mining with relational database systems: alternatives and implications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.343-354, June 01-04, 1998, Seattle, Washington, United States
|
| |
22
|
S. Sarawagi, S. Thomas, and R. Agrawal. Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications. Research Report RJ 10107 (91923), IBM Almaden Research Center, San Jose, California, USA, March 1998.
|
| |
23
|
|
| |
24
|
|
 |
25
|
Chun Zhang , Jeffrey Naughton , David DeWitt , Qiong Luo , Guy Lohman, On supporting containment queries in relational database management systems, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.425-436, May 21-24, 2001, Santa Barbara, California, United States
|
 |
26
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|