|
ABSTRACT
Modern relational database systems are beginning to support ad hoc queries on mining models. In this article, we explore novel techniques for optimizing queries that contain predicates on the results of application of mining models to relational data. For such queries, we use the internal structure of the mining model to automatically derive traditional database predicates. We present algorithms for deriving such predicates for a large class of popular discrete mining models: decision trees, naive Bayes, clustering and linear support vector machines. Our experiments on Microsoft SQL Server demonstrate that these derived predicates can significantly reduce the cost of evaluating such queries.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Rakesh Agrawal , Johannes Gehrke , Dimitrios Gunopulos , Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.94-105, June 01-04, 1998, Seattle, Washington, United States
|
| |
2
|
|
| |
3
|
Agrawal, S., Chaudhuri, S., Kollar, L., and Narasayya, V. 2000. Index tuning wizard for Microsoft SQL server 2000. White paper. Microsoft Corporation, Redmond, WA. Available online at http://msdn.microsoft.com/library/techart/itwforsql.htm.
|
| |
4
|
Bay, S. D. 1999. The UCI KDD archive {http://kdd.ics.uci.edu}. Department of Information and Computer Science, University of California, Irvine, Irvine, CA.
|
| |
5
|
Benson, S., McInnes, L. C., More, J. J., and Sarich, J. 2003. TAO (Toolkit for Advanced Optimization) Users Manual. Tech. rep. ANL/MCS-TM-242---Revision 1.5. Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL. Available online at http://www-fp.mcs.anl.gov/tao/.
|
| |
6
|
Berger, M. and Regoutsos, I. 1991. An algorithm for point clustering and grid generation. IEEE Trans. Syst. Man Cybernet. 21, 5, 1278--86.
|
| |
7
|
Blake, C. and Merz, C. 1998. UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, Irvine, CA. Available online at http://www.ics.uci.edu/mlearn/MLRepository.html.
|
| |
8
|
|
| |
9
|
Chang, C.-C. and Lin, C.-J. 2001. LIBSVM: A Library for Support Vector Machines. Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan. Software available online at http://www.csie.ntu.edu.tw/cjlin/libsvm.
|
 |
10
|
|
| |
11
|
Chaudhuri, S. 1998. Data mining and database systems: Where is the intersection? In Bulletin of the Technical Committee on Data Engineering. Vol. 21.
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
| |
18
|
Cohen, W. W. 1995. Fast effective rule induction. In Proceedings of the 12th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 115--123.
|
| |
19
|
|
| |
20
|
Dougherty, J., Kohavi, R., and Sahami, M. 1995. Supervised and unsupervised discretization of continuous features. In Proceedings of the 12th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 194--202.
|
| |
21
|
Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery in Databases and Data Mining (Portland, OR).
|
 |
22
|
|
| |
23
|
Hong, S. J. 1987. MINI: A heuristic algorithm for two-level logic minimization. In Selected Papers on Logic Synthesis for Integrated Circuit Design, R. Newton, Ed. IEEE Press, Los Alamitos, CA.
|
| |
24
|
|
| |
25
|
|
| |
26
|
McLachlan, G. and Basford, K. 1988. Mixture models: Inference and Applications to Clustering. Marcel Dekker, New York, NY.
|
| |
27
|
|
| |
28
|
|
| |
29
|
C. Mohan , Don Haderle , Yun Wang , Josephine Cheng, Single table access using multiple indexes: optimization, execution, and concurrency control techniques, Proceedings of the international conference on extending database technology on Advances in database technology, p.29-43, March 1990, Venice, Italy
|
| |
30
|
|
| |
31
|
|
| |
32
|
|
 |
33
|
|
 |
34
|
Sunita Sarawagi , Shiby Thomas , Rakesh Agrawal, Integrating association rule mining with relational database systems: alternatives and implications, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.343-354, June 01-04, 1998, Seattle, Washington, United States
|
| |
35
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|