skip to main content
10.1145/1141277.1141403acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

SMART-TV: a fast and scalable nearest neighbor based classifier for data mining

Published: 23 April 2006 Publication History

Abstract

K-nearest neighbors (KNN) is the simplest method for classification. Given a set of objects in a multi-dimensional feature space, the method assigns a category to an unclassified object based on the plurality of category of the k-nearest neighbors. The closeness between objects is determined using a distance measure, e.g. Euclidian distance. Despite its simplicity, KNN also has some drawbacks: 1) it suffers from expensive computational cost in training when the training set contains millions of objects; 2) its classification time is linear to the size of the training set. The larger the training set, the longer it takes to search for the k-nearest neighbors. In this paper, we propose a new algorithm, called SMART-TV (Small Absolute difference of Total Variation), that approximates a set of potential candidates of nearest neighbors by examining the absolute difference of total variation between each data object in the training set and the unclassified object. Then, the k-nearest neighbors are searched from that candidate set. We empirically evaluate the performance of our algorithm on both real and synthetic datasets and find that SMART-TV is fast and scalable. The classification accuracy of SMART-TV is high and comparable to the accuracy of the traditional KNN algorithm.

References

[1]
Abidin, T., et al. Vertical Set Squared Distance: A Fast and Scalable Technique to Compute Total Variation in Large Datasets. Proceeding of the International Conference on Computers and Their Applications (CATA), 2005, 60--65.
[2]
Ankerst, M., et al. OPTICS: Ordering Points to Identify the Clustering Structure. Proceeding of the ACM SIGMOD, 1999, 49--60.
[3]
Cover, T. M. and Hart, P. E. Nearest Neighbor Pattern Classification. IEEE Trans. on Info. Theory, 13, 1967, 21--27.
[4]
Ding, Q., Khan, M., Roy, A., and Perrizo, W. The P-tree Algebra, Proceedings of the ACMSAC, 2002, 426--431.
[5]
Grother, P. J., Candela, G. T., and Blue, J. L. Fast Implementations of Nearest Neighbor Classifiers. Pattern Recognition, 30, 1997, 459--465.
[6]
Hettich, S. and Bay, S. D. The UCI KDD Archive http://kdd.ics.uci.edu. Irvine, University of California, CA., Department of Information and Computer Science, 1999.
[7]
Iris Dataset, http://www.ailab.si/orange/doc/datasets/iris.htm.
[8]
Khan, M., Ding, Q., and Perrizo, W. K-Nearest Neighbor Classification of Spatial Data Streams using P-trees, Proceedings of the PAKDD, 2002, 517--528.
[9]
Mitchell, H. B. and Schaefer, P. A. A Soft K-Nearest Neighbor Voting Scheme. International Journal of Intelligent Systems, 16, 2001, 459--469.
[10]
Perrizo, W. Peano Count Tree Technology, Technical Report NDSU-CSOR-TR-01-1, 2001.
[11]
Serazi, M., et al. DataMIME#8482;. Proceeding of the ACM SIGMOD, 2004, 923--924.
[12]
Vries, A. P., et al. Efficient k-NN Search on Vertically Decomposed Data. Proceedings of the ACM SIGMOD, 2002, 322--333.

Cited By

View all
  • (2021)Indirect Estimation of Vertical Ground Reaction Force from a Body-Mounted INS/GPS Using Machine LearningSensors10.3390/s2104155321:4(1553)Online publication date: 23-Feb-2021
  • (2019)Using centrality measures to extract core pattern of brain dynamics during the resting state.Computer Methods and Programs in Biomedicine10.1016/j.cmpb.2019.104985(104985)Online publication date: Jul-2019
  • (2017)Sentiment Trends and Classifying Stocks Using P-TreesFrom Social Data Mining and Analysis to Prediction and Community Detection10.1007/978-3-319-51367-6_5(103-121)Online publication date: 22-Mar-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '06: Proceedings of the 2006 ACM symposium on Applied computing
April 2006
1967 pages
ISBN:1595931082
DOI:10.1145/1141277
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. k-nearest neighbors classification
  2. vertical data structure
  3. vertical set squared distance

Qualifiers

  • Article

Conference

SAC06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Indirect Estimation of Vertical Ground Reaction Force from a Body-Mounted INS/GPS Using Machine LearningSensors10.3390/s2104155321:4(1553)Online publication date: 23-Feb-2021
  • (2019)Using centrality measures to extract core pattern of brain dynamics during the resting state.Computer Methods and Programs in Biomedicine10.1016/j.cmpb.2019.104985(104985)Online publication date: Jul-2019
  • (2017)Sentiment Trends and Classifying Stocks Using P-TreesFrom Social Data Mining and Analysis to Prediction and Community Detection10.1007/978-3-319-51367-6_5(103-121)Online publication date: 22-Mar-2017
  • (2016)Prediction and analysis of Rheumatic heart disease using kNN classification with ACO2016 International Conference on Data Mining and Advanced Computing (SAPIENCE)10.1109/SAPIENCE.2016.7684132(68-73)Online publication date: Mar-2016
  • (2016)KNN-based dynamic query-driven sample rescaling strategy for class imbalance learningNeurocomputing10.1016/j.neucom.2016.01.043191:C(363-373)Online publication date: 26-May-2016
  • (2015)Classifying Stocks using P-Trees and Investor SentimentProceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 201510.1145/2808797.2808845(1362-1367)Online publication date: 25-Aug-2015
  • (2015)An Efficient Method to Face and Emotion Detection2015 Fifth International Conference on Communication Systems and Network Technologies10.1109/CSNT.2015.155(493-497)Online publication date: Apr-2015
  • (2008)Extensions of the k Nearest Neighbour methods for classification problemsProceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications10.5555/1712759.1712765(23-28)Online publication date: 6-Feb-2008
  • (2006)Parameter optimized, vertical, nearest-neighbor-vote and boundary-based classificationACM SIGKDD Explorations Newsletter10.1145/1233321.12333298:2(63-69)Online publication date: 1-Dec-2006
  • (2006)Efficient Image Classification on Vertically Decomposed DataProceedings of the 22nd International Conference on Data Engineering Workshops10.1109/ICDEW.2006.52Online publication date: 3-Apr-2006

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media