research-article

Data-dependent hashing via nonlinear spectral gaps

Authors:
Alexandr Andoni

Columbia University, USA

Columbia University, USA
View Profile

,
Assaf Naor

Princeton University, USA

Princeton University, USA
View Profile

,
Aleksandar Nikolov

University of Toronto, Canada

University of Toronto, Canada
View Profile

,
Ilya Razenshteyn

Microsoft Research, USA

Microsoft Research, USA
View Profile

,
Erik Waingarten

Columbia University, USA

Columbia University, USA
View Profile

STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of ComputingJune 2018Pages 787–800https://doi.org/10.1145/3188745.3188846

Published:20 June 2018Publication History

STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing

Pages 787–800

ABSTRACT

We establish a generic reduction from _nonlinear spectral gaps_ of metric spaces to data-dependent Locality-Sensitive Hashing, yielding a new approach to the high-dimensional Approximate Near Neighbor Search problem (ANN) under various distance functions. Using this reduction, we obtain the following results:

* For _general_ d-dimensional normed spaces and n-point datasets, we obtain a _cell-probe_ ANN data structure with approximation O(logd/ε²), space d^O(1) n^1+ε, and d^O(1)n^ε cell probes per query, for any ε>0. No non-trivial approximation was known before in this generality other than the O(√d) bound which follows from embedding a general norm into ℓ₂. * For ℓ_p and Schatten-p norms, we improve the data structure further, to obtain approximation O(p) and sublinear query _time_. For ℓ_p, this improves upon the previous best approximation 2^O(p) (which required polynomial as opposed to near-linear in n space). For the Schatten-p norm, no non-trivial ANN data structure was known before this work.

Previous approaches to the ANN problem either exploit the low dimensionality of a metric, requiring space exponential in the dimension, or circumvent the curse of dimensionality by embedding a metric into a ”tractable” space, such as ℓ₁. Our new generic reduction proceeds differently from both of these approaches using a novel partitioning method.

Supplemental Material

6a-1.mp4

mp4

27.4 MB

Download

References

Alexandr Andoni. 2009.Google Scholar
Nearest Neighbor Search: the Old, the New, and the Impossible. Ph.D. Dissertation. MIT.Google Scholar
Alexandr Andoni. 2010. Nearest Neighbor Search in high-dimensional spaces. (2010).Google Scholar
Invited talk at the Workshop on Barriers in Computational Complexity II, http://www.mit.edu/~andoni/nnsbarriers.pdf.Google Scholar
Alexandr Andoni and Piotr Indyk. 2006.Google Scholar
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS ’2006). 459–468. Google ScholarDigital Library
Alexandr Andoni and Piotr Indyk. 2017. Nearest Neighbors in High-Dimensional Spaces. In Handbook of Discrete and Computational Geometry, Jacob E. Goodman, Joseph O’Rourke, and Csaba D. Tóth (Eds.). CRC Press LLC, 1133–1153.Google Scholar
Alexandr Andoni, Piotr Indyk, and Robert Krauthgamer. 2009. Overcoming the ℓ 1 Non-Embeddability Barrier: Algorithms for Product Metrics. In Proceedings of the 20th ACM-SIAM Symposium on Discrete Algorithms (SODA ’2009). 865–874. Google ScholarDigital Library
Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn. 2014. Beyond Locality-Sensitive Hashing. In Proceedings of the 25th ACM-SIAM Symposium on Discrete Algorithms (SODA ’2014). 1018–1028. Available as arXiv:1306.1547. Google ScholarDigital Library
Alexandr Andoni, Piotr Indyk, and Ilya Razenshteyn. 2018. Approximate Nearest Neighbor Search in High Dimensions. In Proceedings of ICM 2018 (to appear).Google Scholar
Alexandr Andoni, Robert Krauthgamer, and Ilya Razenshteyn. 2015.Google Scholar
Sketching and Embedding are Equivalent for Norms. In Proceedings of the 47th ACM Symposium on the Theory of Computing (STOC ’2015). 479–488. Available as arXiv:1411.2577. Google ScholarDigital Library
Alexandr Andoni, Thijs Laarhoven, Ilya Razenshteyn, and Erik Waingarten. 2017.Google Scholar
Optimal Hashing-based Time–Space Trade-offs for Approximate Near Neighbors. In Proceedings of the 28th ACM-SIAM Symposium on Discrete Algorithms (SODA ’2017). 47–66. Available as arXiv:1608.03580. Google ScholarDigital Library
Alexandr Andoni, Huy L. Nguyen, Aleksandar Nikolov, Ilya Razenshteyn, and Erik Waingarten. 2017.Google Scholar
Approximate Near Neighbors for General Symmetric Norms. In Proceedings of the 49th ACM Symposium on the Theory of Computing (STOC ’2017). 902–913. Available as arXiv:1611.06222. Google ScholarDigital Library
Alexandr Andoni and Ilya Razenshteyn. 2015. Optimal Data-Dependent Hashing for Approximate Near Neighbors. In Proceedings of the 47th ACM Symposium on the Theory of Computing (STOC ’2015). 793–801. Available as arXiv:1501.01062. Google ScholarDigital Library
Alexandr Andoni and Ilya Razenshteyn. 2016.Google Scholar
Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing. In Proceedings of the 32nd International Symposium on Computational Geometry (SoCG ’2016). 9:1–9:11. Available as arXiv:1507.04299.Google Scholar
Sanjeev Arora, Elad Hazan, and Satyen Kale. 2012. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications. Theory of Computing 8, 1 (2012), 121–164.Google ScholarCross Ref
Keith Ball. 1997.Google Scholar
An Elementary Introduction to Modern Convex Geometry. MSRI Publications, Vol. 31. Cambridge University Press.Google Scholar
Yair Bartal and Lee-Ad Gottlieb. 2015. Approximate Nearest Neighbor Search for ℓ p -Spaces ( 2 < p < ∞) via Embeddings. (2015). Available as arXiv:1512.01775.Google Scholar
Alina Beygelzimer, Sham Kakade, and John Langford. 2006.Google Scholar
Cover Trees for Nearest Neighbor. In Proceedings of the 23rd International Conference on Machine Learning (ICML ’2006). 97–104. Google ScholarDigital Library
Jaroslaw Blasiok, Vladimir Braverman, Stephen R. Chestnut, Robert Krauthgamer, and Lin F. Yang. 2017. Streaming Symmetric Norms via Measure Concentration. In Proceedings of the 49th ACM Symposium on the Theory of Computing (STOC ’2017). Available as arXiv:1511.01111. Google ScholarDigital Library
Moses Charikar. 2002. Similarity Estimation Techniques from Rounding Algorithms. In Proceedings of the 34th ACM Symposium on the Theory of Computing (STOC ’2002). 380–388. Google ScholarDigital Library
Jeff Cheeger. 1969. A Lower Bound for the Smallest Eigenvalue of the Laplacian. In Proceedings of the Princeton conference in honor of Professor S. Bochner. 195–199.Google Scholar
F. R. K. Chung. 1996. Laplacians of graphs and Cheeger’s inequalities. In Combinatorics, Paul Erdős is eighty, Vol. 2 (Keszthely, 1993). Bolyai Soc. Math. Stud., Vol. 2. János Bolyai Math. Soc., Budapest, 157–172.Google Scholar
Kenneth L. Clarkson. 1999. Nearest Neighbor Queries in Metric Spaces. Discrete and Computational Geometry 22, 1 (1999), 63–93.Google ScholarCross Ref
Efim D. Gluskin. 1981. Diameter of the Minkowski Compactum is Approximately Equal to n. Functional Analysis and Its Applications 15, 1 (1981), 57–58.Google ScholarCross Ref
Moritz Hardt and Guy N. Rothblum. 2010. A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, October 23-26, 2010, Las Vegas, Nevada, USA. IEEE Computer Society, 61–70. Google ScholarDigital Library
Piotr Indyk. 2001.Google Scholar
On Approximate Nearest Neighbors under ℓ ∞ Norm. J. Comput. System Sci. 63, 4 (2001), 627–638. Google ScholarDigital Library
Piotr Indyk. 2002. Approximate Nearest Neighbor Algorithms for Fréchet Distance via Product Metrics. In Proceedings of the 18th ACM Symposium on Computational Geometry (SoCG ’2002). 102–106. Google ScholarDigital Library
Piotr Indyk. 2004. Approximate Nearest Neighbor under Edit Distance via Product Metrics. In Proceedings of the 15th ACM-SIAM Symposium on Discrete Algorithms (SODA ’2004). 646–650. Google ScholarDigital Library
Piotr Indyk and Rajeev Motwani. 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proceedings of the 30th ACM Symposium on the Theory of Computing (STOC ’1998). 604–613. Google ScholarDigital Library
Piotr Indyk and Nitin Thaper. 2003. Fast Color Image Retrieval via Embeddings. (2003).Google Scholar
Workshop on Statistical and Computational Theories of Vision (at ICCV).Google Scholar
Fritz John. 1948. Extremum Problems with Inequalities as Subsidiary Conditions. In Studies and Essays Presented to R. Courant on his 60th Birthday, January 8, 1948.Google Scholar
Interscience Publishers, Inc., New York, N. Y., 187–204.Google Scholar
David R. Karger and Matthias Ruhl. 2002. Finding Nearest Neighbors in Growth-Restricted Metrics. In Proceedings of the 34th ACM Symposium on the Theory of Computing (STOC ’2002). 741–750. Google ScholarDigital Library
Robert Krauthgamer and James R. Lee. 2004. Navigating Nets: Simple Algorithms for Proximity Search. In Proceedings of the 15th ACM-SIAM Symposium on Discrete Algorithms (SODA ’2004). 798–807. Google ScholarDigital Library
Yi Li, Huy L. Nguyên, and David P. Woodruff. 2014. On Sketching Matrix Norms and the Top Singular Vector. In Proceedings of the 25th ACM-SIAM Symposium on Discrete Algorithms (SODA ’2014). 1562–1581. Google ScholarDigital Library
Yi Li and David P. Woodruff. 2016. On Approximating Functions of the Singular Values in a Stream. In Proceedings of the 48th ACM Symposium on the Theory of Computing (STOC ’2016). 726–739. Google ScholarDigital Library
Yi Li and David P. Woodruff. 2016. Tight Bounds for Sketching the Operator Norm, Schatten Norms, and Subspace Embeddings. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 19th International Workshop, APPROX ’2016, and 20th International Workshop, RANDOM ’2016. 39:1– 39:11.Google Scholar
Yi Li and David P. Woodruff. 2017. Embeddings of Schatten Norms with Applications to Data Streams. In Proceedings of the 44th International Colloquium on Automata, Languages and Programming (ICALP ’2017). 60:1–60:14.Google Scholar
Jiří Matoušek. 1997. On Embedding Expanders into ℓ p Spaces. Israel Journal of Mathematics 102 (1997), 189–197.Google ScholarCross Ref
Manor Mendel and Assaf Naor. 2014. Nonlinear Spectral Calculus and Super-Expanders. Publications Mathématiques de l’IHÉS 119, 1 (2014), 1–95.Google ScholarCross Ref
Manor Mendel and Assaf Naor. 2015. Expanders with Respect to Hadamard Spaces and Random Graphs. Duke Mathematical Journal 164, 8 (2015), 1471–1548.Google ScholarCross Ref
Peter Bro Miltersen. 1999. Cell Probe Complexity – a Survey. In Advances in Data Structures.Google Scholar
Assaf Naor. 2017. A Spectral Gap Precludes Low-Dimensional Embeddings. In Proceedings of the 33rd International Symposium on Computational Geometry (SoCG ’2017). 50:1–50:16.Google Scholar
Assaf Naor. 2018. Metric Dimension Reduction: a Snapshot of the Ribe Program. In Proceedings of ICM 2018 (to appear).Google Scholar
Assaf Naor, Gilles Pisier, and Gideon Schechtman. 2018.Google Scholar
Impossibility of Dimension Reduction in the Nuclear Norm. In Proceedings of the 29th ACM-SIAM Symposium on Discrete Algorithms (SODA ’2018). Google ScholarDigital Library
Assaf Naor and Yuval Rabani. 2006. On Approximate Nearest Neighbor Search in ℓ p, p > 2. (2006). Manuscript, available on request.Google Scholar
Assaf Naor and Gideon Schechtman. 2007. Planar Earthmover is not in L 1. SIAM J. Comput. 37, 3 (2007), 804–826. Google ScholarDigital Library
Huy L. Nguyên. 2014.Google Scholar
Algorithms for High Dimensional Data. Ph.D. Dissertation. Princeton University. Available as http://arks.princeton.edu/ark: /88435/dsp01b8515q61f.Google Scholar
Rafail Ostrovsky and Yuval Rabani. 2007. Low Distortion Embedding for Edit Distance. J. ACM 54, 5 (2007), 23:1–23:16. Google ScholarDigital Library
Ilya Razenshteyn. 2017.Google Scholar
High-Dimensional Similarity Search and Sketching: Algorithms and Hardness. Ph.D. Dissertation. Massachusetts Institute of Technology.Google Scholar
Éric Ricard. 2015. Hölder Estimates for the Noncommutative Mazur Map. Archiv der Mathematik 104, 1 (2015), 37–45.Google ScholarCross Ref
Daniel A. Spielman. 2015. Conductance, the Normalized Laplacian, and Cheeger’s Inequality. Lecture Notes. http://www.cs.yale.edu/homes/spielman/561/lect06- 15. pdfGoogle Scholar
Andrew Chi-Chih Yao. 1981.Google Scholar

Index Terms

Data-dependent hashing via nonlinear spectral gaps
1. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Dimensionality reduction
2. Theory of computation
  1. Design and analysis of algorithms
  2. Randomness, geometry and discrete structures
    1. Computational geometry
    2. Random projections and metric embeddings

Recommendations

Optimal Data-Dependent Hashing for Approximate Near Neighbors
STOC '15: Proceedings of the forty-seventh annual ACM symposium on Theory of Computing

We show an optimal data-dependent hashing scheme for the approximate near neighbor problem. For an n-point dataset in a d-dimensional space our data structure achieves query time O(d ⋅ n^ρ+o(1)) and space O(n^1+ρ+o(1) + d ⋅ n), where ρ=1/(2c²-1) for the ...
Read More
Query-aware locality-sensitive hashing scheme for $$l_p$$lp norm

The problem of c-Approximate Nearest Neighbor (c-ANN) search in high-dimensional space is fundamentally important in many applications, such as image database and data mining. Locality-Sensitive Hashing (LSH) and its variants are the well-known indexing ...
Read More
Confirmation Sampling for Exact Nearest Neighbor Search
Similarity Search and Applications
Abstract
Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC ’98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing
June 2018
1332 pages
ISBN:9781450355599
DOI:10.1145/3188745
General Chairs:
Ilias Diakonikolas
University of Southern California, USA
,
David Kempe
University of Southern California, USA
,
Program Chair:
Monika Henzinger
University of Vienna, Austria
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Nearest neighbor search
locality-sensitive hashing
nonlinear spectral gaps
randomized space partitions
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,469of4,586submissions,32%
Upcoming Conference
STOC '24

Sponsor:

sigact

56th Annual ACM Symposium on Theory of Computing (STOC 2024)

June 24 - 28, 2024

Vancouver , BC , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 414
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Data-dependent hashing via nonlinear spectral gaps

STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Query-aware locality-sensitive hashing scheme for $$l_p$$lp norm

Confirmation Sampling for Exact Nearest Neighbor Search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Data-dependent hashing via nonlinear spectral gaps

STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Query-aware locality-sensitive hashing scheme for $$l_p$$lp norm

Confirmation Sampling for Exact Nearest Neighbor Search

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media