skip to main content
10.1145/3188745.3188846acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article

Data-dependent hashing via nonlinear spectral gaps

Published:20 June 2018Publication History

ABSTRACT

We establish a generic reduction from _nonlinear spectral gaps_ of metric spaces to data-dependent Locality-Sensitive Hashing, yielding a new approach to the high-dimensional Approximate Near Neighbor Search problem (ANN) under various distance functions. Using this reduction, we obtain the following results:

* For _general_ d-dimensional normed spaces and n-point datasets, we obtain a _cell-probe_ ANN data structure with approximation O(logd2), space dO(1) n1+ε, and dO(1)nε cell probes per query, for any ε>0. No non-trivial approximation was known before in this generality other than the O(√d) bound which follows from embedding a general norm into ℓ2. * For ℓp and Schatten-p norms, we improve the data structure further, to obtain approximation O(p) and sublinear query _time_. For ℓp, this improves upon the previous best approximation 2O(p) (which required polynomial as opposed to near-linear in n space). For the Schatten-p norm, no non-trivial ANN data structure was known before this work.

Previous approaches to the ANN problem either exploit the low dimensionality of a metric, requiring space exponential in the dimension, or circumvent the curse of dimensionality by embedding a metric into a ”tractable” space, such as ℓ1. Our new generic reduction proceeds differently from both of these approaches using a novel partitioning method.

Skip Supplemental Material Section

Supplemental Material

6a-1.mp4

mp4

27.4 MB

References

  1. Alexandr Andoni. 2009.Google ScholarGoogle Scholar
  2. Nearest Neighbor Search: the Old, the New, and the Impossible. Ph.D. Dissertation. MIT.Google ScholarGoogle Scholar
  3. Alexandr Andoni. 2010. Nearest Neighbor Search in high-dimensional spaces. (2010).Google ScholarGoogle Scholar
  4. Invited talk at the Workshop on Barriers in Computational Complexity II, http://www.mit.edu/~andoni/nnsbarriers.pdf.Google ScholarGoogle Scholar
  5. Alexandr Andoni and Piotr Indyk. 2006.Google ScholarGoogle Scholar
  6. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS ’2006). 459–468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Alexandr Andoni and Piotr Indyk. 2017. Nearest Neighbors in High-Dimensional Spaces. In Handbook of Discrete and Computational Geometry, Jacob E. Goodman, Joseph O’Rourke, and Csaba D. Tóth (Eds.). CRC Press LLC, 1133–1153.Google ScholarGoogle Scholar
  8. Alexandr Andoni, Piotr Indyk, and Robert Krauthgamer. 2009. Overcoming the ℓ 1 Non-Embeddability Barrier: Algorithms for Product Metrics. In Proceedings of the 20th ACM-SIAM Symposium on Discrete Algorithms (SODA ’2009). 865–874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn. 2014. Beyond Locality-Sensitive Hashing. In Proceedings of the 25th ACM-SIAM Symposium on Discrete Algorithms (SODA ’2014). 1018–1028. Available as arXiv:1306.1547. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Alexandr Andoni, Piotr Indyk, and Ilya Razenshteyn. 2018. Approximate Nearest Neighbor Search in High Dimensions. In Proceedings of ICM 2018 (to appear).Google ScholarGoogle Scholar
  11. Alexandr Andoni, Robert Krauthgamer, and Ilya Razenshteyn. 2015.Google ScholarGoogle Scholar
  12. Sketching and Embedding are Equivalent for Norms. In Proceedings of the 47th ACM Symposium on the Theory of Computing (STOC ’2015). 479–488. Available as arXiv:1411.2577. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alexandr Andoni, Thijs Laarhoven, Ilya Razenshteyn, and Erik Waingarten. 2017.Google ScholarGoogle Scholar
  14. Optimal Hashing-based Time–Space Trade-offs for Approximate Near Neighbors. In Proceedings of the 28th ACM-SIAM Symposium on Discrete Algorithms (SODA ’2017). 47–66. Available as arXiv:1608.03580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Alexandr Andoni, Huy L. Nguyen, Aleksandar Nikolov, Ilya Razenshteyn, and Erik Waingarten. 2017.Google ScholarGoogle Scholar
  16. Approximate Near Neighbors for General Symmetric Norms. In Proceedings of the 49th ACM Symposium on the Theory of Computing (STOC ’2017). 902–913. Available as arXiv:1611.06222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Alexandr Andoni and Ilya Razenshteyn. 2015. Optimal Data-Dependent Hashing for Approximate Near Neighbors. In Proceedings of the 47th ACM Symposium on the Theory of Computing (STOC ’2015). 793–801. Available as arXiv:1501.01062. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Alexandr Andoni and Ilya Razenshteyn. 2016.Google ScholarGoogle Scholar
  19. Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing. In Proceedings of the 32nd International Symposium on Computational Geometry (SoCG ’2016). 9:1–9:11. Available as arXiv:1507.04299.Google ScholarGoogle Scholar
  20. Sanjeev Arora, Elad Hazan, and Satyen Kale. 2012. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications. Theory of Computing 8, 1 (2012), 121–164.Google ScholarGoogle ScholarCross RefCross Ref
  21. Keith Ball. 1997.Google ScholarGoogle Scholar
  22. An Elementary Introduction to Modern Convex Geometry. MSRI Publications, Vol. 31. Cambridge University Press.Google ScholarGoogle Scholar
  23. Yair Bartal and Lee-Ad Gottlieb. 2015. Approximate Nearest Neighbor Search for ℓ p -Spaces ( 2 < p < ∞) via Embeddings. (2015). Available as arXiv:1512.01775.Google ScholarGoogle Scholar
  24. Alina Beygelzimer, Sham Kakade, and John Langford. 2006.Google ScholarGoogle Scholar
  25. Cover Trees for Nearest Neighbor. In Proceedings of the 23rd International Conference on Machine Learning (ICML ’2006). 97–104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jaroslaw Blasiok, Vladimir Braverman, Stephen R. Chestnut, Robert Krauthgamer, and Lin F. Yang. 2017. Streaming Symmetric Norms via Measure Concentration. In Proceedings of the 49th ACM Symposium on the Theory of Computing (STOC ’2017). Available as arXiv:1511.01111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Moses Charikar. 2002. Similarity Estimation Techniques from Rounding Algorithms. In Proceedings of the 34th ACM Symposium on the Theory of Computing (STOC ’2002). 380–388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jeff Cheeger. 1969. A Lower Bound for the Smallest Eigenvalue of the Laplacian. In Proceedings of the Princeton conference in honor of Professor S. Bochner. 195–199.Google ScholarGoogle Scholar
  29. F. R. K. Chung. 1996. Laplacians of graphs and Cheeger’s inequalities. In Combinatorics, Paul Erdős is eighty, Vol. 2 (Keszthely, 1993). Bolyai Soc. Math. Stud., Vol. 2. János Bolyai Math. Soc., Budapest, 157–172.Google ScholarGoogle Scholar
  30. Kenneth L. Clarkson. 1999. Nearest Neighbor Queries in Metric Spaces. Discrete and Computational Geometry 22, 1 (1999), 63–93.Google ScholarGoogle ScholarCross RefCross Ref
  31. Efim D. Gluskin. 1981. Diameter of the Minkowski Compactum is Approximately Equal to n. Functional Analysis and Its Applications 15, 1 (1981), 57–58.Google ScholarGoogle ScholarCross RefCross Ref
  32. Moritz Hardt and Guy N. Rothblum. 2010. A Multiplicative Weights Mechanism for Privacy-Preserving Data Analysis. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, October 23-26, 2010, Las Vegas, Nevada, USA. IEEE Computer Society, 61–70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Piotr Indyk. 2001.Google ScholarGoogle Scholar
  34. On Approximate Nearest Neighbors under ℓ ∞ Norm. J. Comput. System Sci. 63, 4 (2001), 627–638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Piotr Indyk. 2002. Approximate Nearest Neighbor Algorithms for Fréchet Distance via Product Metrics. In Proceedings of the 18th ACM Symposium on Computational Geometry (SoCG ’2002). 102–106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Piotr Indyk. 2004. Approximate Nearest Neighbor under Edit Distance via Product Metrics. In Proceedings of the 15th ACM-SIAM Symposium on Discrete Algorithms (SODA ’2004). 646–650. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Piotr Indyk and Rajeev Motwani. 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In Proceedings of the 30th ACM Symposium on the Theory of Computing (STOC ’1998). 604–613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Piotr Indyk and Nitin Thaper. 2003. Fast Color Image Retrieval via Embeddings. (2003).Google ScholarGoogle Scholar
  39. Workshop on Statistical and Computational Theories of Vision (at ICCV).Google ScholarGoogle Scholar
  40. Fritz John. 1948. Extremum Problems with Inequalities as Subsidiary Conditions. In Studies and Essays Presented to R. Courant on his 60th Birthday, January 8, 1948.Google ScholarGoogle Scholar
  41. Interscience Publishers, Inc., New York, N. Y., 187–204.Google ScholarGoogle Scholar
  42. David R. Karger and Matthias Ruhl. 2002. Finding Nearest Neighbors in Growth-Restricted Metrics. In Proceedings of the 34th ACM Symposium on the Theory of Computing (STOC ’2002). 741–750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Robert Krauthgamer and James R. Lee. 2004. Navigating Nets: Simple Algorithms for Proximity Search. In Proceedings of the 15th ACM-SIAM Symposium on Discrete Algorithms (SODA ’2004). 798–807. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Yi Li, Huy L. Nguyên, and David P. Woodruff. 2014. On Sketching Matrix Norms and the Top Singular Vector. In Proceedings of the 25th ACM-SIAM Symposium on Discrete Algorithms (SODA ’2014). 1562–1581. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yi Li and David P. Woodruff. 2016. On Approximating Functions of the Singular Values in a Stream. In Proceedings of the 48th ACM Symposium on the Theory of Computing (STOC ’2016). 726–739. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Yi Li and David P. Woodruff. 2016. Tight Bounds for Sketching the Operator Norm, Schatten Norms, and Subspace Embeddings. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 19th International Workshop, APPROX ’2016, and 20th International Workshop, RANDOM ’2016. 39:1– 39:11.Google ScholarGoogle Scholar
  47. Yi Li and David P. Woodruff. 2017. Embeddings of Schatten Norms with Applications to Data Streams. In Proceedings of the 44th International Colloquium on Automata, Languages and Programming (ICALP ’2017). 60:1–60:14.Google ScholarGoogle Scholar
  48. Jiří Matoušek. 1997. On Embedding Expanders into ℓ p Spaces. Israel Journal of Mathematics 102 (1997), 189–197.Google ScholarGoogle ScholarCross RefCross Ref
  49. Manor Mendel and Assaf Naor. 2014. Nonlinear Spectral Calculus and Super-Expanders. Publications Mathématiques de l’IHÉS 119, 1 (2014), 1–95.Google ScholarGoogle ScholarCross RefCross Ref
  50. Manor Mendel and Assaf Naor. 2015. Expanders with Respect to Hadamard Spaces and Random Graphs. Duke Mathematical Journal 164, 8 (2015), 1471–1548.Google ScholarGoogle ScholarCross RefCross Ref
  51. Peter Bro Miltersen. 1999. Cell Probe Complexity – a Survey. In Advances in Data Structures.Google ScholarGoogle Scholar
  52. Assaf Naor. 2017. A Spectral Gap Precludes Low-Dimensional Embeddings. In Proceedings of the 33rd International Symposium on Computational Geometry (SoCG ’2017). 50:1–50:16.Google ScholarGoogle Scholar
  53. Assaf Naor. 2018. Metric Dimension Reduction: a Snapshot of the Ribe Program. In Proceedings of ICM 2018 (to appear).Google ScholarGoogle Scholar
  54. Assaf Naor, Gilles Pisier, and Gideon Schechtman. 2018.Google ScholarGoogle Scholar
  55. Impossibility of Dimension Reduction in the Nuclear Norm. In Proceedings of the 29th ACM-SIAM Symposium on Discrete Algorithms (SODA ’2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Assaf Naor and Yuval Rabani. 2006. On Approximate Nearest Neighbor Search in ℓ p, p > 2. (2006). Manuscript, available on request.Google ScholarGoogle Scholar
  57. Assaf Naor and Gideon Schechtman. 2007. Planar Earthmover is not in L 1. SIAM J. Comput. 37, 3 (2007), 804–826. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Huy L. Nguyên. 2014.Google ScholarGoogle Scholar
  59. Algorithms for High Dimensional Data. Ph.D. Dissertation. Princeton University. Available as http://arks.princeton.edu/ark: /88435/dsp01b8515q61f.Google ScholarGoogle Scholar
  60. Rafail Ostrovsky and Yuval Rabani. 2007. Low Distortion Embedding for Edit Distance. J. ACM 54, 5 (2007), 23:1–23:16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Ilya Razenshteyn. 2017.Google ScholarGoogle Scholar
  62. High-Dimensional Similarity Search and Sketching: Algorithms and Hardness. Ph.D. Dissertation. Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  63. Éric Ricard. 2015. Hölder Estimates for the Noncommutative Mazur Map. Archiv der Mathematik 104, 1 (2015), 37–45.Google ScholarGoogle ScholarCross RefCross Ref
  64. Daniel A. Spielman. 2015. Conductance, the Normalized Laplacian, and Cheeger’s Inequality. Lecture Notes. http://www.cs.yale.edu/homes/spielman/561/lect06- 15. pdfGoogle ScholarGoogle Scholar
  65. Andrew Chi-Chih Yao. 1981.Google ScholarGoogle Scholar

Index Terms

  1. Data-dependent hashing via nonlinear spectral gaps

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            STOC 2018: Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing
            June 2018
            1332 pages
            ISBN:9781450355599
            DOI:10.1145/3188745

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 June 2018

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate1,469of4,586submissions,32%

            Upcoming Conference

            STOC '24
            56th Annual ACM Symposium on Theory of Computing (STOC 2024)
            June 24 - 28, 2024
            Vancouver , BC , Canada

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader