article

Free access

Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection

Authors:

Yousef SaadAuthors Info & Claims

The Journal of Machine Learning Research, Volume 10

Pages 1989 - 2012

Published: 01 December 2009 Publication History

Abstract

Nearest neighbor graphs are widely used in data mining and machine learning. A brute-force method to compute the exact kNN graph takes Θ(dn²) time for n data points in the d dimensional Euclidean space. We propose two divide and conquer methods for computing an approximate kNN graph in Θ(dn^t) time for high dimensional data (large d). The exponent t ∈ (1,2) is an increasing function of an internal parameter α which governs the size of the common region in the divide step. Experiments show that a high quality graph can usually be obtained with small overlaps, that is, for small values of t. A few of the practical details of the algorithms are as follows. First, the divide step uses an inexpensive Lanczos procedure to perform recursive spectral bisection. After each conquer step, an additional refinement step is performed to improve the accuracy of the graph. Finally, a hash table is used to avoid repeating distance calculations during the divide and conquer process. The combination of these techniques is shown to yield quite effective algorithms for building kNN graphs.

References

[1]

M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computatioin, 16(6):1373-1396, 2003.

Digital Library

[2]

J. Bentley. Multidimensional divide-and-conquer. Communications of the ACM, 23:214-229, 1980.

Digital Library

[3]

J. Bentley, D. Stanat, and E. Williams. The complexity of finding fixed-radius near neighbors. Information Processing Letters, 6:209-213, 1977.

[4]

M. W. Berry. Large scale sparse singular value computations. International Journal of Supercomputer Applications, 6(1):13-49, 1992.

Digital Library

[5]

D. L. Boley. Principal direction divisive partitioning. Data Mining and Knowledge Discovery, 2(4): 324-344, 1998.

Digital Library

[6]

M. Brito, E. Chávez, A. Quiroz, and J. Yukich. Connectivity of the mutual k-nearest neighbor graph in clustering and outlier detection. Statistics & Probability Letters, 35:33-42, 1997.

[7]

P. B. Callahan. Optimal parallel all-nearest-neighbors using the well-separated pair decomposition. In Proceedings of the 34th IEEE Symposium on Foundations of Computer Science, 1993.

Digital Library

[8]

P. B. Callahan and S. Rao Kosaraju. A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields. Journal of the ACM, 42(1):67-90, 1995.

Digital Library

[9]

B. Chazelle. An improved algorithm for the fixed-radius neighbor problem. Information Processing Letters, 16:193-198, 1983.

[10]

H. Choset, K. M. Lynch, S. Hutchinson, G. Kantor, W. Burgard, L. E. Kavraki, and S. Thrun. Principles of Robot Motion: Theory, Algorithms, and Implementations. The MIT Press, 2005.

[11]

K. L. Clarkson. Fast algorithms for the all nearest neighbors problem. In Proceedings of the 24th Annual IEEE Symposium on the Foundations of Computer Science, pages 226-232, 1983.

Digital Library

[12]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press, 2nd edition, 2001.

Digital Library

[13]

B. V. Dasarathy. Nearest-neighbor approaches. In Willi Klosgen, Jan M. Zytkow, and Jan Zyt, editors, Handbook of Data Mining and Knowledge Discovery, pages 88-298. Oxford University Press, 2002.

Digital Library

[14]

P. Fränti, O. Virmajoki, and V. Hautamäki. Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Transactions on Pattern Analysis & Machine Intelligence, 28(11):1875-1881, 2006.

Digital Library

[15]

X. He and P. Niyogi. Locality preserving projections. In Advances in Neural Information Processing Systems 16 (NIPS 2004), 2004.

[16]

P. Indyk. Nearest neighbors in high-dimensional spaces. In J. E. Goodman and J. O'Rourke, editors, Handbook of Discrete and Computational Geometry. CRC Press LLC, 2nd edition, 2004.

[17]

P. Indyk and R. Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, 1998.

Digital Library

[18]

F. Juhász and K. Mályusz. Problems of cluster analysis from the viewpoint of numerical analysis, volume 22 of Colloquia Mathematica Societatis Janos Bolyai. North-Holland, Amsterdam, 1980.

[19]

J. M. Kleinberg. Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, 1997.

Digital Library

[20]

E. Kokiopoulou and Y. Saad. Orthogonal neighborhood preserving projections: A projection-based dimensionality reduction technique. IEEE Transactions on Pattern Analysis & Machine Intelligence, 29(12):2143-2156, 2007.

Digital Library

[21]

E. Kushilevitza, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM Journal on Computing, 30(2):457-474, 2000.

Digital Library

[22]

C. Lanczos. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. Journal of Research of the National Bureau of Standards, 45:255-282, 1950.

[23]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.

[24]

K.C. Lee, J. Ho, and D. Kriegman. Acquiring linear subspaces for face recognition under variable lighting. IEEE Transactions on Pattern Analysis & Machine Intelligence, 27(5):684-698, 2005.

Digital Library

[25]

T. Liu, A. W. Moore, A. Gray, and K. Yang. An investigation of practical approximate nearest neighbor algorithms. In Proceedings of Neural Information Processing Systems (NIPS 2004), 2004.

[26]

R. Paredes, E. Chávez, K. Figueroa, and G. Navarro. Practical construction of k-nearest neighbor graphs in metric spaces. In Proceedings of the 5th Workshop on Efficient and Experimental Algorithms (WEA'06), 2006.

Digital Library

[27]

E. Plaku and L. E. Kavraki. Distributed computation of the knn graph for large high-dimensional point sets. Journal of Parallel and Distributed Computing, 67(3):346-359, 2007.

Digital Library

[28]

S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323-2326, 2000.

[29]

Y. Saad. On the rates of convergence of the Lanczos and the block-Lanczos methods. SIAM Journal on Numerical Analysis, 17(5):687-706, 1980.

Digital Library

[30]

J. Sankaranarayanan, H. Samet, and A. Varshney. A fast all nearest neighbor algorithm for applications involving large point-clouds. Computers and Graphics, 31(2):157-174, 2007.

Digital Library

[31]

L. K. Saul and S. T. Roweis. Think globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4:119-155, 2003.

Digital Library

[32]

G. Shakhnarovich, T. Darrell, and P. Indyk, editors. Nearest-Neighbor Methods in Learning and Vision: Theory and Practice. The MIT Press, 2006.

Digital Library

[33]

J. Shanbehzadeh and P. O. Ogunbona. On the computational complexity of the LBG and PNN algorithms. IEEE Transactions on Image Processing, 6(4):614-616, 1997.

Digital Library

[34]

T. Sim, S. Baker, and M. Bsat. The CMU pose, illumination, and expression database. IEEE Transactions on Pattern Analysis & Machine Intelligence, 25(12):1615-1618, 2003.

Digital Library

[35]

J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2323, 2000.

[36]

D. Tritchler, S. Fallah, and J. Beyene. A spectral clustering method for microarray data. Computational Statistics & Data Analysis, 49:63-76, 2005.

Digital Library

[37]

P. M. Vaidya. An O(nlogn) algorithm for the all-nearest-neighbors problem. Discrete Computational Geometry, 4:101-115, 1989.

Digital Library

[38]

O. Virmajoki and P. Fränti. Divide-and-conquer algorithm for creating neighborhood graph for clustering. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR'04), 2004.

Digital Library

[39]

J. H. Ward. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301):236-244, 1963.

[40]

Y. Zhao and G. Karypis. Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning, 55(3):311-331, 2004.

Digital Library

Cited By

Feng ZQiao MPiao CCheng H(2025)On Graph Representation for Attributed Hypergraph ClusteringProceedings of the ACM on Management of Data10.1145/37097413:1(1-26)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709741
Wang HCai TCheng DLi KZhou Y(2024)AN Identification and Prediction Model Based on PSOInternational Journal of Cognitive Informatics and Natural Intelligence10.4018/IJCINI.34402318:1(1-15)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.4018/IJCINI.344023
Liu CLiao XZheng LHuang YLiu HZhang YHe HHuang HZhou JJin H(2024)L-FNNG: Accelerating Large-Scale KNN Graph Construction on CPU-FPGA Heterogeneous PlatformACM Transactions on Reconfigurable Technology and Systems10.1145/365260917:3(1-29)Online publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1145/3652609
Show More Cited By

Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection
1. Computing methodologies
2. Theory of computation

Recommendations

Fast large-scale approximate graph construction for NLP
EMNLP-CoNLL '12: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Many natural language processing problems involve constructing large nearest-neighbor graphs. We propose a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data, our algorithm maintains ...
Parallel Construction of Approximate kNN Graph
DCABES '12: Proceedings of the 2012 11th International Symposium on Distributed Computing and Applications to Business, Engineering & Science

Building k-nearest neighbor (kNN) graphs is a necessary step in such areas as data mining and machine learning. So in this paper, we attempt to study the kNN furthermore, we first propose a parallel algorithm for approximate kNN graph construction and ...
Grassmann Hashing for approximate nearest neighbor search in high dimensional space
ICME '11: Proceedings of the 2011 IEEE International Conference on Multimedia and Expo

Locality-Sensitive Hashing (LSH) approximates nearest neighbors in high dimensions by projecting original data into low-dimensional subspaces. The basic idea is to hash data samples to ensure that the probability of collision is much higher for samples ...

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 10, Issue

12/1/2009

2936 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 December 2009

Published in JMLR Volume 10

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

62
Total Citations
View Citations
460
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)12

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Feng ZQiao MPiao CCheng H(2025)On Graph Representation for Attributed Hypergraph ClusteringProceedings of the ACM on Management of Data10.1145/37097413:1(1-26)Online publication date: 11-Feb-2025
https://dl.acm.org/doi/10.1145/3709741
Wang HCai TCheng DLi KZhou Y(2024)AN Identification and Prediction Model Based on PSOInternational Journal of Cognitive Informatics and Natural Intelligence10.4018/IJCINI.34402318:1(1-15)Online publication date: 21-Jun-2024
https://dl.acm.org/doi/10.4018/IJCINI.344023
Liu CLiao XZheng LHuang YLiu HZhang YHe HHuang HZhou JJin H(2024)L-FNNG: Accelerating Large-Scale KNN Graph Construction on CPU-FPGA Heterogeneous PlatformACM Transactions on Reconfigurable Technology and Systems10.1145/365260917:3(1-29)Online publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1145/3652609
Xu YCong GZhu LCui LChua TNgo CKa-Wei Lee RKumar RLauw H(2024)MMPOI: A Multi-Modal Content-Aware Framework for POI RecommendationsProceedings of the ACM Web Conference 202410.1145/3589334.3645449(3454-3463)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645449
Guo JWen LZhou YSong BChi YYu F(2024)SPACE: Self-Supervised Dual Preference Enhancing Network for Multimodal RecommendationIEEE Transactions on Multimedia10.1109/TMM.2024.338288926(8849-8859)Online publication date: 28-Mar-2024
https://dl.acm.org/doi/10.1109/TMM.2024.3382889
Wang YXia HLiu Y(2024)CMC-MMR: multi-modal recommendation model with cross-modal correctionJournal of Intelligent Information Systems10.1007/s10844-024-00848-x62:5(1187-1211)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1007/s10844-024-00848-x
Wang RLi CZhao Z(2024)Towards user-specific multimodal recommendation via cross-modal attention-enhanced graph convolution networkApplied Intelligence10.1007/s10489-024-06061-155:1Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1007/s10489-024-06061-1
Kim SPark HSingh ASun YAkoglu LGunopulos DYan XKumar ROzcan FYe J(2023)Efficient Distributed Approximate k-Nearest Neighbor Graph Construction by Multiway Random Division ForestProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599327(1097-1106)Online publication date: 6-Aug-2023
https://dl.acm.org/doi/10.1145/3580305.3599327
Guerraoui RKermarrec ANiot GRuas OTaïani F(2023)GoldFinger: Fast & Approximate Jaccard for Efficient KNN Graph ConstructionsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323268935:11(11461-11475)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1109/TKDE.2022.3232689
Li MWang YZhang PWang HFan LLi EWang W(2023)Deep Learning for Approximate Nearest Neighbour Search: A Survey and Future DirectionsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322068335:9(8997-9018)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/TKDE.2022.3220683
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents