skip to main content
research-article

Randomized Embeddings with Slack and High-Dimensional Approximate Nearest Neighbor

Published: 16 April 2018 Publication History

Abstract

Approximate nearest neighbor search (ϵ-ANN) in high dimensions has been mainly addressed by Locality Sensitive Hashing (LSH), which has complexity with polynomial dependence in dimension, sublinear query time, but subquadratic space requirement. We introduce a new “low-quality” embedding for metric spaces requiring that, for some query, there exists an approximate nearest neighbor among the pre-images of its k> 1 approximate nearest neighbors in the target space. In Euclidean spaces, we employ random projections to a dimension inversely proportional to k.
Our approach extends to the decision problem with witness of checking whether there exists an approximate near neighbor; this also implies a solution for ϵ-ANN. After dimension reduction, we store points in a uniform grid of side length ϵ /√ d, where d is the reduced dimension. Given a query, we explore cells intersecting the unit ball around the query. This data structure requires linear space and query time in O(d nρ), ρ ≈ 1-ϵ2i>/log(1ϵ), where n denotes input cardinality and d space dimension. Bounds are improved for doubling subsets via r-nets.
We present our implementation for ϵ-ANN in C++ and experiments for d≤ 960, n≤ 106, using synthetic and real datasets, which confirm the theoretical analysis and, typically, yield better practical performance. We compare to FALCONN, the state-of-the-art implementation of multi-probe LSH: our prototype software is essentially comparable in terms of preprocessing, query time, and storage usage.

References

[1]
I. Abraham, Y. Bartal, T.-H. H. Chan, K. Dhamdhere, A. Gupta, J. Kleinberg, O. Neiman, and A. Slivkins. 2005. Metric embeddings with relaxed guarantees. In Proc. 46th Annual IEEE Symp. Foundations of Computer Science. IEEE Computer Society, Washington, DC, 83--100.
[2]
D. Achlioptas. 2003. Database-friendly random projections: Johnson-lindenstrauss with binary coins. J. Comput. Syst. Sci. 66, 4 (2003), 671--687.
[3]
E. Anagnostopoulos, I. Z. Emiris, and I. Psarros. 2015. Low-quality dimension reduction and high-dimensional approximate nearest neighbor. In Proc. 31st International Symp. on Computational Geometry (SoCG). 436--450.
[4]
A. Andoni and P. Indyk. 2008. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51, 1 (2008), 117--122.
[5]
A. Andoni, P. Indyk, T. Laarhoven, I. P. Razenshteyn, and L. Schmidt. 2015. Practical and optimal LSH for angular distance. In Advances Neural Information Proc. Systems 28: Annual Conf. Neural Information Processing Systems. 1225--1233.
[6]
A. Andoni, T. Laarhoven, I. P. Razenshteyn, and E. Waingarten. 2017. Optimal hashing-based time-space trade-offs for approximate near neighbors. In Proc. of the 28th Annual ACM-SIAM Symp. on Discrete Algorithms, SODA 2017. 47--66.
[7]
A. Andoni and I. Razenshteyn. 2015. Optimal data-dependent hashing for approximate near neighbors. In Proc. 47th ACM Symp. Theory of Computing (STOC’15).
[8]
S. Arya, G. D. da Fonseca, and D. M. Mount. 2011. Approximate polytope membership queries. In Proc. 43rd Annual ACM Symp. Theory of Computing (STOC’11). 579--586.
[9]
S. Arya, T. Malamatos, and D. M. Mount. 2009. Space-time tradeoffs for approximate nearest neighbor searching. J. ACM 57, 1, Article 1 (2009), 54 pages.
[10]
S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. 1998. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45, 6 (1998), 891--923.
[11]
G. Avarikioti, I. Z. Emiris, L. Kavouras, and I. Psarros. 2017. High-dimensional approximate r-nets. In Proc. 28th Annual ACM-SIAM Symp. on Discrete Algorithms, SODA. 16--30.
[12]
Y. Bartal and L. Gottlieb. 2015. Approximate nearest neighbor search for &ell;<sub>p</sub>-spaces (2 &lt; p &gt; ∞) via embeddings. CoRR abs/1512.01775 (2015). http://arxiv.org/abs/1512.01775
[13]
Y. Bartal, B. Recht, and L. J. Schulman. 2011. Dimensionality reduction: Beyond the Johnson-Lindenstrauss bound. In Proc. 22nd Annual ACM-SIAM Symp. on Discrete Algorithms. SIAM, 868--887. http://dl.acm.org/citation.cfm?id&equals;2133036.2133104
[14]
A. Beygelzimer, S. Kakade, and J. Langford. 2006. Cover trees for nearest neighbor. In Proc. 23rd Intern. Conf. Machine Learning (ICML’06). 97--104.
[15]
S. Dasgupta and Y. Freund. 2008. Random projection trees and low dimensional manifolds. In Proc. 40th Annual ACM Symp. Theory of Computing (STOC’08). 537--546.
[16]
S. Dasgupta and A. Gupta. 2003. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algorithms 22, 1 (2003), 60--65.
[17]
S. Dasgupta and K. Sinha. 2015. Randomized partition trees for nearest neighbor search. Algorithmica 72, 1 (2015), 237--263.
[18]
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. 2004. Locality-sensitive hashing scheme based on P-stable distributions. In Proc. 20th Annual Symp. Computational Geometry (SCG’04). 253--262.
[19]
D. Eppstein, S. Har-Peled, and A. Sidiropoulos. 2015. Approximate greedy clustering and distance selection for graph metrics. CoRR abs/1507.01555 (2015). http://arxiv.org/abs/1507.01555
[20]
L. Gottlieb and R. Krauthgamer. 2015. A nonlinear approach to dimension reduction. Discrete 8 Computational Geometry 54, 2 (2015), 291--315.
[21]
A. Gupta, R. Krauthgamer, and J. R. Lee. 2003. Bounded geometries, fractals, and low-distortion embeddings. In Proc. 44th Annual IEEE Symp. Foundations of Computer Science (FOCS’03). 534--541.
[22]
S. Har-Peled. 2004. Clustering motion. DCG 31, 4 (2004), 545--565.
[23]
S. Har-Peled, P. Indyk, and R. Motwani. 2012. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of Computing 8, 1 (2012), 321--350.
[24]
S. Har-Peled and M. Mendel. 2005. Fast construction of nets in low dimensional metrics, and their applications. In Proc. 21st Annual Symp. Computational Geometry (SCG’05). 150--158.
[25]
P. Indyk and R. Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. 30th Annual ACM Symp. Theory of Computing (STOC’98). 604--613.
[26]
P. Indyk and A. Naor. 2007. Nearest-neighbor-preserving embeddings. ACM Trans. Algorithms 3, 3, Article 31 (2007).
[27]
H. Jegou, M. Douze, and C. Schmid. 2011. Product quantization for nearest neighbor search. IEEE Trans. on Pattern Analysis and Machine Intelligence 33, 1 (2011), 117--128.
[28]
W. Johnson and J. Lindenstrauss. 1984. Extensions of Lipschitz mappings into a Hilbert space. In Proc. Conf. in Modern Analysis and Probability (New Haven, Conn., 1982) (Contemporary Mathematics), Vol. 26. American Mathematical Society, 189--206.
[29]
M. Kapralov. 2015. Smooth tradeoffs between insert and query complexity in nearest neighbor search. In Proc. 34th ACM Symp. on Principles of Database Systems (PODS). 329--342.
[30]
D. R. Karger and M. Ruhl. 2002. Finding nearest neighbors in growth-restricted metrics. In Proc. 34th Annual ACM Symp. Theory of Computing (STOC’02). 741--750.
[31]
R. Krauthgamer and J. R. Lee. 2004. Navigating nets: Simple algorithms for proximity search. In Proc. 15th Annual ACM-SIAM Symp. Discrete Algorithms (SODA’04). 798--807.
[32]
S. Meiser. 1993. Point location in arrangements of hyperplanes. Inf. Comput. 106, 2 (1993), 286--303.
[33]
R. O’Donnell, Y. Wu, and Y. Zhou. 2014. Optimal lower bounds for locality-sensitive hashing (except when is tiny). ACM Trans. Comput. Theory 6, 1 (2014), 5:1--5:13.
[34]
R. Panigrahy. 2006. Entropy based nearest neighbor search in high dimensions. In Proc. 17th Annual ACM-SIAM Symp. Discrete Algorithms (SODA’06). 1186--1195.
[35]
Y. Sun, W. Wang, J. Qin, Y. Zhang, and X. Lin. 2014. SRS: Solving C-approximate nearest neighbor queries in high dimensional Euclidean space with a tiny index. Proc. VLDB Endowment 8, 1 (Sept. 2014), 1--12.
[36]
S. Vempala. 2012. Randomly-oriented k-d trees adapt to intrinsic dimension. In Proc. Foundations of Software Technology 8 Theor. Computer Science. 48--57.
[37]
Y. Bengio, Y. LeCun, L. Bottou, and P. Haffner. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11, 2278--2324.

Cited By

View all
  • (2023)Near-neighbor preserving dimension reduction via coverings for doubling subsets of ℓ 1 Theoretical Computer Science10.1016/j.tcs.2022.11.031942:C(169-179)Online publication date: 9-Jan-2023

Index Terms

  1. Randomized Embeddings with Slack and High-Dimensional Approximate Nearest Neighbor

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Algorithms
    ACM Transactions on Algorithms  Volume 14, Issue 2
    April 2018
    339 pages
    ISSN:1549-6325
    EISSN:1549-6333
    DOI:10.1145/3196491
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 April 2018
    Accepted: 01 December 2017
    Revised: 01 April 2017
    Received: 01 May 2016
    Published in TALG Volume 14, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Approximate nearest neighbor
    2. Johnson-Lindenstrauss Lemma
    3. curse of dimensionality
    4. doubling dimension
    5. experimental study
    6. randomized embeddings

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • European Unions Horizon 2020 research and innovation programme
    • “Human Resources Development, Education and Lifelong Learning”
    • European Social Fund and the Greek Government
    • State Scholarships Foundation of Greece, financed by action “Supporting human resources in research through the implementation of doctoral research”

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Near-neighbor preserving dimension reduction via coverings for doubling subsets of ℓ 1 Theoretical Computer Science10.1016/j.tcs.2022.11.031942:C(169-179)Online publication date: 9-Jan-2023

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media