ACM Home Page
Please provide us with feedback. Feedback
Processing frequent itemset discovery queries by division and set containment join operators
Full text PdfPdf (174 KB)
Source Data Mining And Knowledge Discovery archive
Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery table of contents
San Diego, California
SESSION: DB integration table of contents
Pages: 20 - 27  
Year of Publication: 2003
Author
Ralf Rantzau  University of Stuttgart, Universitätsstraße, Stuttgart, Germany
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 4,   Downloads (12 Months): 71,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/882082.882089
What is a DOI?

ABSTRACT

SQL-based data mining algorithms are rarely used in practice today. Most performance experiments have shown that SQL-based approaches are inferior to main-memory algorithms. Nevertheless, database vendors try to integrate analysis functionalities to some extent into their query execution and optimization components in order to narrow the gap between data and processing. Such a database support is particularly important when data mining applicatons need to analyze very large datasets or when they need access current data, not a possibly outdated copy of it.We investigate approaches based on SQL for the problem of finding frequent itemsets in a transaction table, including an algorithm that we recently proposed, called Quiver, which employs universal and existential quantifications. This approach employs a table schema for itemsets that is similar to the commonly used vertical layout for transactions: each item of an itemset is stored in a separate row. We argue that expressing the frequent itemset discovery problem using quantifications offers interesting opportunities to process such queries using set containment join or set containment division operators, which are not yet available in commercial database systems. Initial performance experiments reveal that Quiver cannot be processed efficiently by commercial DBMS. However, our experiments with query execution plans that use operators realizing set containment tests suggest that an efficient processing of Quiver is possible.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
 
3
 
4
J. Chen and D. DeWitt. Dynamic Re-grouping of Continuous Queries. In Proceedings VLDB, Hong Kong, China, pages 430--441, August 2002.
 
5
M. Gimbel, M. Klein, and P. Lockemann. Interactivity, Scalability and Resource Control for Efficient KDD Support in DBMS. In Proceedings DTDM, Prague, Czech Republic, pages 37--50, March 2002.
6
 
7
 
8
S. Helmer. Performance Enhancements for Advanced Database Management Systems. PhD thesis, University of Mannheim, Germany, December 2000.
 
9
S. Helmer and G. Moerkotte. Compiling Away Set Containment and Intersection Joins. Technical Report, University of Mannheim, Germany.
 
10
11
 
12
 
13
H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient Algorithms for Discovering Association Rules. In AAAI Workshop on Knowledge and Discovery in Databases, Seattle, Washington, USA, pages 181--192, July 1994.
 
14
15
 
16
 
17
 
18
 
19
R. Rantzau. Frequent Itemset Discovery with SQL Using Universal Quantification. In P. Lanzi and R. Meo, editors, Database Support for Data Mining Applications, volume 2682 of LNCS. Springer, 2003. To appear.
 
20
21
 
22
S. Sarawagi, S. Thomas, and R. Agrawal. Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications. Research Report RJ 10107 (91923), IBM Almaden Research Center, San Jose, California, USA, March 1998.
 
23
 
24
25
26



Peer to Peer - Readers of this Article have also read: