skip to main content
article
Free access

Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection

Published: 01 December 2009 Publication History

Abstract

Nearest neighbor graphs are widely used in data mining and machine learning. A brute-force method to compute the exact kNN graph takes Θ(dn2) time for n data points in the d dimensional Euclidean space. We propose two divide and conquer methods for computing an approximate kNN graph in Θ(dnt) time for high dimensional data (large d). The exponent t ∈ (1,2) is an increasing function of an internal parameter α which governs the size of the common region in the divide step. Experiments show that a high quality graph can usually be obtained with small overlaps, that is, for small values of t. A few of the practical details of the algorithms are as follows. First, the divide step uses an inexpensive Lanczos procedure to perform recursive spectral bisection. After each conquer step, an additional refinement step is performed to improve the accuracy of the graph. Finally, a hash table is used to avoid repeating distance calculations during the divide and conquer process. The combination of these techniques is shown to yield quite effective algorithms for building kNN graphs.

References

[1]
M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computatioin, 16(6):1373-1396, 2003.
[2]
J. Bentley. Multidimensional divide-and-conquer. Communications of the ACM, 23:214-229, 1980.
[3]
J. Bentley, D. Stanat, and E. Williams. The complexity of finding fixed-radius near neighbors. Information Processing Letters, 6:209-213, 1977.
[4]
M. W. Berry. Large scale sparse singular value computations. International Journal of Supercomputer Applications, 6(1):13-49, 1992.
[5]
D. L. Boley. Principal direction divisive partitioning. Data Mining and Knowledge Discovery, 2(4): 324-344, 1998.
[6]
M. Brito, E. Chávez, A. Quiroz, and J. Yukich. Connectivity of the mutual k-nearest neighbor graph in clustering and outlier detection. Statistics & Probability Letters, 35:33-42, 1997.
[7]
P. B. Callahan. Optimal parallel all-nearest-neighbors using the well-separated pair decomposition. In Proceedings of the 34th IEEE Symposium on Foundations of Computer Science, 1993.
[8]
P. B. Callahan and S. Rao Kosaraju. A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields. Journal of the ACM, 42(1):67-90, 1995.
[9]
B. Chazelle. An improved algorithm for the fixed-radius neighbor problem. Information Processing Letters, 16:193-198, 1983.
[10]
H. Choset, K. M. Lynch, S. Hutchinson, G. Kantor, W. Burgard, L. E. Kavraki, and S. Thrun. Principles of Robot Motion: Theory, Algorithms, and Implementations. The MIT Press, 2005.
[11]
K. L. Clarkson. Fast algorithms for the all nearest neighbors problem. In Proceedings of the 24th Annual IEEE Symposium on the Foundations of Computer Science, pages 226-232, 1983.
[12]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, 2nd edition, 2001.
[13]
B. V. Dasarathy. Nearest-neighbor approaches. In Willi Klosgen, Jan M. Zytkow, and Jan Zyt, editors, Handbook of Data Mining and Knowledge Discovery, pages 88-298. Oxford University Press, 2002.
[14]
P. Fränti, O. Virmajoki, and V. Hautamäki. Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Transactions on Pattern Analysis & Machine Intelligence, 28(11):1875-1881, 2006.
[15]
X. He and P. Niyogi. Locality preserving projections. In Advances in Neural Information Processing Systems 16 (NIPS 2004), 2004.
[16]
P. Indyk. Nearest neighbors in high-dimensional spaces. In J. E. Goodman and J. O'Rourke, editors, Handbook of Discrete and Computational Geometry. CRC Press LLC, 2nd edition, 2004.
[17]
P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, 1998.
[18]
F. Juhász and K. Mályusz. Problems of cluster analysis from the viewpoint of numerical analysis, volume 22 of Colloquia Mathematica Societatis Janos Bolyai. North-Holland, Amsterdam, 1980.
[19]
J. M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, 1997.
[20]
E. Kokiopoulou and Y. Saad. Orthogonal neighborhood preserving projections: A projection-based dimensionality reduction technique. IEEE Transactions on Pattern Analysis & Machine Intelligence, 29(12):2143-2156, 2007.
[21]
E. Kushilevitza, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM Journal on Computing, 30(2):457-474, 2000.
[22]
C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. Journal of Research of the National Bureau of Standards, 45:255-282, 1950.
[23]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.
[24]
K.C. Lee, J. Ho, and D. Kriegman. Acquiring linear subspaces for face recognition under variable lighting. IEEE Transactions on Pattern Analysis & Machine Intelligence, 27(5):684-698, 2005.
[25]
T. Liu, A. W. Moore, A. Gray, and K. Yang. An investigation of practical approximate nearest neighbor algorithms. In Proceedings of Neural Information Processing Systems (NIPS 2004), 2004.
[26]
R. Paredes, E. Chávez, K. Figueroa, and G. Navarro. Practical construction of k-nearest neighbor graphs in metric spaces. In Proceedings of the 5th Workshop on Efficient and Experimental Algorithms (WEA'06), 2006.
[27]
E. Plaku and L. E. Kavraki. Distributed computation of the knn graph for large high-dimensional point sets. Journal of Parallel and Distributed Computing, 67(3):346-359, 2007.
[28]
S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323-2326, 2000.
[29]
Y. Saad. On the rates of convergence of the Lanczos and the block-Lanczos methods. SIAM Journal on Numerical Analysis, 17(5):687-706, 1980.
[30]
J. Sankaranarayanan, H. Samet, and A. Varshney. A fast all nearest neighbor algorithm for applications involving large point-clouds. Computers and Graphics, 31(2):157-174, 2007.
[31]
L. K. Saul and S. T. Roweis. Think globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4:119-155, 2003.
[32]
G. Shakhnarovich, T. Darrell, and P. Indyk, editors. Nearest-Neighbor Methods in Learning and Vision: Theory and Practice. The MIT Press, 2006.
[33]
J. Shanbehzadeh and P. O. Ogunbona. On the computational complexity of the LBG and PNN algorithms. IEEE Transactions on Image Processing, 6(4):614-616, 1997.
[34]
T. Sim, S. Baker, and M. Bsat. The CMU pose, illumination, and expression database. IEEE Transactions on Pattern Analysis & Machine Intelligence, 25(12):1615-1618, 2003.
[35]
J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2323, 2000.
[36]
D. Tritchler, S. Fallah, and J. Beyene. A spectral clustering method for microarray data. Computational Statistics & Data Analysis, 49:63-76, 2005.
[37]
P. M. Vaidya. An O(nlogn) algorithm for the all-nearest-neighbors problem. Discrete Computational Geometry, 4:101-115, 1989.
[38]
O. Virmajoki and P. Fränti. Divide-and-conquer algorithm for creating neighborhood graph for clustering. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR'04), 2004.
[39]
J. H. Ward. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301):236-244, 1963.
[40]
Y. Zhao and G. Karypis. Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 55(3):311-331, 2004.

Cited By

View all
  • (2025)On Graph Representation for Attributed Hypergraph ClusteringProceedings of the ACM on Management of Data10.1145/37097413:1(1-26)Online publication date: 11-Feb-2025
  • (2024)AN Identification and Prediction Model Based on PSOInternational Journal of Cognitive Informatics and Natural Intelligence10.4018/IJCINI.34402318:1(1-15)Online publication date: 21-Jun-2024
  • (2024)L-FNNG: Accelerating Large-Scale KNN Graph Construction on CPU-FPGA Heterogeneous PlatformACM Transactions on Reconfigurable Technology and Systems10.1145/365260917:3(1-29)Online publication date: 14-Mar-2024
  • Show More Cited By
  1. Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image The Journal of Machine Learning Research
      The Journal of Machine Learning Research  Volume 10, Issue
      12/1/2009
      2936 pages
      ISSN:1532-4435
      EISSN:1533-7928
      Issue’s Table of Contents

      Publisher

      JMLR.org

      Publication History

      Published: 01 December 2009
      Published in JMLR Volume 10

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)60
      • Downloads (Last 6 weeks)12
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)On Graph Representation for Attributed Hypergraph ClusteringProceedings of the ACM on Management of Data10.1145/37097413:1(1-26)Online publication date: 11-Feb-2025
      • (2024)AN Identification and Prediction Model Based on PSOInternational Journal of Cognitive Informatics and Natural Intelligence10.4018/IJCINI.34402318:1(1-15)Online publication date: 21-Jun-2024
      • (2024)L-FNNG: Accelerating Large-Scale KNN Graph Construction on CPU-FPGA Heterogeneous PlatformACM Transactions on Reconfigurable Technology and Systems10.1145/365260917:3(1-29)Online publication date: 14-Mar-2024
      • (2024)MMPOI: A Multi-Modal Content-Aware Framework for POI RecommendationsProceedings of the ACM Web Conference 202410.1145/3589334.3645449(3454-3463)Online publication date: 13-May-2024
      • (2024)SPACE: Self-Supervised Dual Preference Enhancing Network for Multimodal RecommendationIEEE Transactions on Multimedia10.1109/TMM.2024.338288926(8849-8859)Online publication date: 28-Mar-2024
      • (2024)CMC-MMR: multi-modal recommendation model with cross-modal correctionJournal of Intelligent Information Systems10.1007/s10844-024-00848-x62:5(1187-1211)Online publication date: 1-Oct-2024
      • (2024)Towards user-specific multimodal recommendation via cross-modal attention-enhanced graph convolution networkApplied Intelligence10.1007/s10489-024-06061-155:1Online publication date: 18-Nov-2024
      • (2023)Efficient Distributed Approximate k-Nearest Neighbor Graph Construction by Multiway Random Division ForestProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599327(1097-1106)Online publication date: 6-Aug-2023
      • (2023)GoldFinger: Fast & Approximate Jaccard for Efficient KNN Graph ConstructionsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323268935:11(11461-11475)Online publication date: 1-Nov-2023
      • (2023)Deep Learning for Approximate Nearest Neighbour Search: A Survey and Future DirectionsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322068335:9(8997-9018)Online publication date: 1-Sep-2023
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media