ACM Home Page
Please provide us with feedback. Feedback
Extracting predicates from mining models for efficient query evaluation
Full text PdfPdf (698 KB)
Source ACM Transactions on Database Systems (TODS) archive
Volume 29 ,  Issue 3  (September 2004) table of contents
Pages: 508 - 544  
Year of Publication: 2004
ISSN:0362-5915
Authors
Surajit Chaudhuri  Microsoft Corporation, Redmond, WA
Vivek Narasayya  Microsoft Corporation, Redmond, WA
Sunita Sarawagi  IIT Bombay, Mumbai, India
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 63,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1016028.1016031
What is a DOI?

ABSTRACT

Modern relational database systems are beginning to support ad hoc queries on mining models. In this article, we explore novel techniques for optimizing queries that contain predicates on the results of application of mining models to relational data. For such queries, we use the internal structure of the mining model to automatically derive traditional database predicates. We present algorithms for deriving such predicates for a large class of popular discrete mining models: decision trees, naive Bayes, clustering and linear support vector machines. Our experiments on Microsoft SQL Server demonstrate that these derived predicates can significantly reduce the cost of evaluating such queries.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
 
2
 
3
Agrawal, S., Chaudhuri, S., Kollar, L., and Narasayya, V. 2000. Index tuning wizard for Microsoft SQL server 2000. White paper. Microsoft Corporation, Redmond, WA. Available online at http://msdn.microsoft.com/library/techart/itwforsql.htm.
 
4
Bay, S. D. 1999. The UCI KDD archive {http://kdd.ics.uci.edu}. Department of Information and Computer Science, University of California, Irvine, Irvine, CA.
 
5
Benson, S., McInnes, L. C., More, J. J., and Sarich, J. 2003. TAO (Toolkit for Advanced Optimization) Users Manual. Tech. rep. ANL/MCS-TM-242---Revision 1.5. Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL. Available online at http://www-fp.mcs.anl.gov/tao/.
 
6
Berger, M. and Regoutsos, I. 1991. An algorithm for point clustering and grid generation. IEEE Trans. Syst. Man Cybernet. 21, 5, 1278--86.
 
7
Blake, C. and Merz, C. 1998. UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, Irvine, CA. Available online at http://www.ics.uci.edu/mlearn/MLRepository.html.
 
8
 
9
Chang, C.-C. and Lin, C.-J. 2001. LIBSVM: A Library for Support Vector Machines. Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan. Software available online at http://www.csie.ntu.edu.tw/cjlin/libsvm.
10
 
11
Chaudhuri, S. 1998. Data mining and database systems: Where is the intersection? In Bulletin of the Technical Committee on Data Engineering. Vol. 21.
12
13
14
 
15
 
16
 
17
 
18
Cohen, W. W. 1995. Fast effective rule induction. In Proceedings of the 12th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 115--123.
 
19
 
20
Dougherty, J., Kohavi, R., and Sahami, M. 1995. Supervised and unsupervised discretization of continuous features. In Proceedings of the 12th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 194--202.
 
21
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining (Portland, OR).
22
 
23
Hong, S. J. 1987. MINI: A heuristic algorithm for two-level logic minimization. In Selected Papers on Logic Synthesis for Integrated Circuit Design, R. Newton, Ed. IEEE Press, Los Alamitos, CA.
 
24
 
25
 
26
McLachlan, G. and Basford, K. 1988. Mixture models: Inference and Applications to Clustering. Marcel Dekker, New York, NY.
 
27
 
28
 
29
 
30
 
31
 
32
33
34
 
35

Collaborative Colleagues:
Surajit Chaudhuri: colleagues
Vivek Narasayya: colleagues
Sunita Sarawagi: colleagues

Peer to Peer - Readers of this Article have also read: