skip to main content
10.1145/3097983.3098195acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Bolt: Accelerated Data Mining with Fast Vector Compression

Published:13 August 2017Publication History

ABSTRACT

Vectors of data are at the heart of machine learning and data mining. Recently, vector quantization methods have shown great promise in reducing both the time and space costs of operating on vectors. We introduce a vector quantization algorithm that can compress vectors over 12x faster than existing techniques while also accelerating approximate vector operations such as distance and dot product computations by up to 10x. Because it can encode over 2GB of vectors per second, it makes vector quantization cheap enough to employ in many more circumstances. For example, using our technique to compute approximate dot products in a nested loop can multiply matrices faster than a state-of-the-art BLAS implementation, even when our algorithm must first compress the matrices. In addition to showing the above speedups, we demonstrate that our approach can accelerate nearest neighbor search and maximum inner product search by over 100x compared to floating point operations and 10x compared to other vector quantization methods. Our approximate Euclidean distance and dot product computations are not only faster than those of related algorithms with slower encodings, but also faster than Hamming distance computations, which have direct hardware support on the tested platforms. We also assess the errors of our algorithm's approximate distances and dot products, and find that it is competitive with existing, slower vector quantization algorithms.

References

  1. Nir Ailon and Bernard Chazelle 2009. The Fast Johnson-Lindenstrauss Transform and Approximate Nearest Neighbors. SIAM Journal on Computing (SICOMP) Vol. 39, 1 (2009), 302--322. http://dx.doi.org/10.1137/060673096 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig Schmidt. 2015. Practical and optimal LSH for angular distance. In Advances in Neural Information Processing Systems. 1225--1233.Google ScholarGoogle Scholar
  3. Fabien André, Anne-Marie Kermarrec, and Nicolas Le Scouarnec 2015. Cache locality is not enough: high-performance nearest neighbor search with product quantization fast scan. Proceedings of the VLDB Endowment Vol. 9, 4 (2015), 288--299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Artem Babenko, Relja Arandjelović, and Victor Lempitsky. 2016. Pairwise Quantization. arXiv preprint arXiv:1606.01550 (2016).Google ScholarGoogle Scholar
  5. Artem Babenko and Victor Lempitsky 2012. The inverted multi-index. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 3069--3076. Google ScholarGoogle ScholarCross RefCross Ref
  6. Artem Babenko and Victor Lempitsky 2014. Additive quantization for extreme vector compression Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 931--938.Google ScholarGoogle Scholar
  7. Artem Babenko and Victor Lempitsky 2015. Tree Quantization for Large-Scale Similarity Search and Classification. CVPR (2015), 1--9. showURL%papers3://publication/uuid/F4762974-BB97--4208-B035--508945A90EFCGoogle ScholarGoogle Scholar
  8. Artem Babenko and Victor Lempitsky 2016. Efficient indexing of billion-scale datasets of deep descriptors Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2055--2063.Google ScholarGoogle Scholar
  9. Mariusz Bojarski, Anna Choromanska, Krzysztof Choromanski, Francois Fagan, Cedric Gouy-Pailler, Anne Morvan, Nouri Sakr, Tamas Sarlos, and Jamal Atif. 2016. Structured adaptive and random spinners for fast machine learning computations. arXiv preprint arXiv:1610.06209 (2016).Google ScholarGoogle Scholar
  10. Wenlin Chen, James T Wilson, Stephen Tyree, Kilian Q Weinberger, and Yixin Chen. 2015. Compressing Neural Networks with the Hashing Trick. ICML. 2285--2294.Google ScholarGoogle Scholar
  11. Sanjoy Dasgupta and Anupam Gupta 2003. An elementary proof of a theorem of Johnson and Lindenstrauss. Random Structures & Algorithms Vol. 22, 1 (2003), 60--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Datar, N. Immorlica, Piotr Indyk, and V.S. Mirrokni. 2004. Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. Proceedings of the Twentieth Annual Symposium on Computational Geometry (2004), 253--262. http://dx.doi.org/10.1007/s13398-014-0173--7.2showeprint[arxiv]1412.7149Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Felix X Yu, Ananda Theertha Suresh, Krzysztof M Choromanski, Daniel N Holtmann-Rice, and Sanjiv Kumar 2016. Orthogonal Random Features. Advances in Neural Information Processing Systems 29, bibfieldeditorD. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 1975--1983. showURL%http://papers.nips.cc/paper/6246-orthogonal-random-features.pdfGoogle ScholarGoogle Scholar
  14. Ting Zhang, Chao Du, and Jingdong Wang 2014. Composite Quantization for Approximate Nearest Neighbor Search. Proceedings of the 31st International Conference on Machine Learning (ICML-14) Vol. 32 (2014), 838--846.Google ScholarGoogle Scholar

Index Terms

  1. Bolt: Accelerated Data Mining with Fast Vector Compression

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
          August 2017
          2240 pages
          ISBN:9781450348874
          DOI:10.1145/3097983

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 August 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader