research-article

One-Pass Ranking Models for Low-Latency Product Recommendations

Authors:
Antonino Freno

Zalando, Berlin, Germany

Zalando, Berlin, Germany
View Profile

,
Martin Saveski

MIT Media Lab, Cambridge, MA, USA

MIT Media Lab, Cambridge, MA, USA
View Profile

,
Rodolphe Jenatton

Amazon, Berlin, Germany

Amazon, Berlin, Germany
View Profile

,
Cédric Archambeau

Amazon, Berlin, Germany

Amazon, Berlin, Germany
View Profile

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2015Pages 1789–1798https://doi.org/10.1145/2783258.2788579

Published:10 August 2015Publication History

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1789–1798

ABSTRACT

Purchase logs collected in e-commerce platforms provide rich information about customer preferences. These logs can be leveraged to improve the quality of product recommendations by feeding them to machine-learned ranking models. However, a variety of deployment constraints limit the naive applicability of machine learning to this problem. First, the amount and the dimensionality of the data make in-memory learning simply not possible. Second, the drift of customers' preference over time require to retrain the ranking model regularly with freshly collected data. This limits the time that is available for training to prohibitively short intervals. Third, ranking in real-time is necessary whenever the query complexity prevents us from caching the predictions. This constraint requires to minimize prediction time (or equivalently maximize the data throughput), which in turn may prevent us from achieving the accuracy necessary in web-scale industrial applications. In this paper, we investigate how the practical challenges faced in this setting can be tackled via an online learning to rank approach. Sparse models will be the key to reduce prediction latency, whereas one-pass stochastic optimization will minimize the training time and restrict the memory footprint. Interestingly, and perhaps surprisingly, extensive experiments show that one-pass learning preserves most of the predictive performance. Additionally, we study a variety of online learning algorithms that enforce sparsity and provide insights to help the practitioner make an informed decision about which approach to pick. We report results on a massive purchase log dataset from the Amazon retail website, as well as on several benchmarks from the LETOR corpus.

Supplemental Material

p1789.mp4

mp4

140.8 MB

Download

References

F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1):1--106, 2011. Google ScholarDigital Library
L. Bottou and Y. LeCun. Large scale online learning. In Advances in Neural Information Processing Systems, volume 16, pages 217--224, 2004.Google Scholar
C. J. C. Burges, R. Ragno, and Q. V. Le. Learning to rank with nonsmooth cost functions. In Advances in Neural Information Processing Systems (NIPS), pages 193--200, 2006.Google Scholar
C. J. C. Burges, K. M. Svore, P. N. Bennett, A. Pastusiak, and Q. Wu. Learning to Rank Using an Ensemble of Lambda-Gradient Models. In Proceedings of the Yahoo! Learning to Rank Challenge, held at ICML 2010, Haifa, Israel, June 25, 2010, pages 25--35, 2011.Google ScholarDigital Library
S. Büttcher, C. L. Clarke, and G. V. Cormack. Information Retrieval: Implementing and Evaluating Search Engines. MIT Press, 2010. Google ScholarDigital Library
Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine learning (ICML 2007), pages 129--136, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
B. Croft, D. Metzler, and T. Strohman. Search Engines: Information Retrieval in Practice. Addison-Wesley, Boston (MA), 2009. Google ScholarDigital Library
A. Defazio, F. Bach, and S. Lacoste-Julien. Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Technical report, preprint arXiv:1407.0202, 2014.Google Scholar
J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12:2121--2159, 2011. Google ScholarDigital Library
J. C. Duchi and Y. Singer. Efficient online and batch learning using forward backward splitting. Journal of Machine Learning Research, 10:2899--2934, 2009. Google ScholarDigital Library
Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An Efficient Boosting Algorithm for Combining Preferences. Journal of Maching Learning Research, 4:933--969, 2003. Google ScholarDigital Library
R. Herbrich, T. Graepel, and K. Obermayer. Large Margin Rank Boundaries for Ordinal Regression. In Smola, Bartlett, Schölkopf, and Schuurmans, editors, Advances in Large Margin Classifiers, chapter 7, pages 115--132. MIT Press, 2000.Google Scholar
M. Hoffman, B. Shahriari, and N. de Freitas. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), pages 365--374, 2014.Google Scholar
C. Hu, J. T. Kwok, and W. Pan. Accelerated gradient methods for stochastic optimization and online learning. In Advances in Neural Information Processing Systems, 2009.Google ScholarDigital Library
F. Hutter, H. H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In Proceedings of LION-5, page 507--523, 2011. Google ScholarDigital Library
T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 133--142, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
D. R. Jones, M. Schonlau, and W. J. Welch. Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4):455--492, 1998. Google ScholarDigital Library
H. Lai, Y. Pan, C. Liu, L. Lin, and J. Wu. Sparse Learning-to-Rank via an Efficient Primal-Dual Algorithm. IEEE Transactions on Computers, 62:1221--1233, 2013. Google ScholarDigital Library
J. Langford, L. Li, and T. Zhang. Sparse Online Learning via Truncated Gradient. Journal of Machine Learning Research, 10:777--801, 2009. Google ScholarDigital Library
L. Laporte, R. Flamary, S. Canu, S. Déjean, and J. Mothe. Nonconvex Regularizations for Feature Selection in Ranking With Sparse SVM. IEEE Trans. Neural Netw. Learning Syst., 25(6):1118--1130, 2014.Google ScholarCross Ref
T.-Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3):225--331, 2009. Google ScholarDigital Library
C. D. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarCross Ref
B. H. McMahan. Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, April 11--13, 2011, pages 525--533, 2011.Google Scholar
H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, et al. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1222--1230. ACM, 2013. Google ScholarDigital Library
A. Mohan, Z. Chen, and K. Q. Weinberger. Web-Search Ranking with Initialized Gradient Boosted Regression Trees. In Yahoo! Learning to Rank Challenge, pages 77--89, 2011.Google ScholarDigital Library
S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. In Advances in Neural Information Processing Systems, 2009.Google Scholar
M. Y. Park and T. Hastie. L1-regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society. Series B, 69(4):659--677, 2007.Google ScholarCross Ref
T. Qin, T.-Y. Liu, and H. Li. A general approximation framework for direct optimization of information retrieval measures. Information Retrieval, 13(4):375--397, 2010. Google ScholarDigital Library
D. Sculley. Large scale learning to rank. In NIPS Workshop on Advances in Ranking. 2009.Google Scholar
J. Snoek, H. Larochelle, and R. P. Adams. Practical bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems, pages 2960--2968, 2012.Google ScholarDigital Library
Z. Sun, T. Qin, Q. Tao, and J. Wang. Robust sparse rank learning for non-smooth ranking measures. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, Boston, MA, USA, July 19--23, 2009, pages 259--266. ACM, 2009. Google ScholarDigital Library
J. Weston, S. Bengio, and N. Usunier. WSABIE: Scaling Up to Large Vocabulary Image Annotation. In IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16--22, 2011, pages 2764--2770, 2011. Google ScholarDigital Library
L. Xiao. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization. Journal of Machine Learning Research, 11:2543--2596, 2010. Google ScholarDigital Library
J. Xu and H. Li. Adarank: A boosting algorithm for information retrieval. In SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 391--398, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B, 67(2):301--320, 2005.Google ScholarCross Ref

Index Terms

One-Pass Ranking Models for Low-Latency Product Recommendations
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

Adaptive, Personalized Diversity for Visual Discovery
RecSys '16: Proceedings of the 10th ACM Conference on Recommender Systems

Search queries are appropriate when users have explicit intent, but they perform poorly when the intent is difficult to express or if the user is simply looking to be inspired. Visual browsing systems allow e-commerce platforms to address these scenarios ...
Read More
Optimizing Similar Item Recommendations in a Semi-structured Marketplace to Maximize Conversion
RecSys '16: Proceedings of the 10th ACM Conference on Recommender Systems

This paper tackles the problem of recommendations in eBay's large semi-structured marketplace. eBay's variable inventory and lack of structured information about listings makes traditional collaborative filtering algorithms difficult to use. We discuss ...
Read More
E-commerce in Your Inbox: Product Recommendations at Scale
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In recent years online advertising has become increasingly ubiquitous and effective. Advertisements shown to visitors fund sites and apps that publish digital content, manage social networks, and operate e-mail services. Given such large variety of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2015
2378 pages
ISBN:9781450336642
DOI:10.1145/2783258
General Chairs:
Longbing Cao
University of Technology, Sydney
,
Chengqi Zhang
University of Technology, Sydney
,
Program Chairs:
Thorsten Joachims
Cornell University
,
Geoff Webb
Monash University
,
Dragos D. Margineantu
Boeing Research
,
Graham Williams
Australian Taxation Office
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 August 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
one-pass learning
online learning to rank
recommendation systems
sparse models
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '15 Paper Acceptance Rate160of819submissions,20%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

KDD '24: The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 712
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

One-Pass Ranking Models for Low-Latency Product Recommendations

KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Adaptive, Personalized Diversity for Visual Discovery

Optimizing Similar Item Recommendations in a Semi-structured Marketplace to Maximize Conversion

E-commerce in Your Inbox: Product Recommendations at Scale