ABSTRACT
Purchase logs collected in e-commerce platforms provide rich information about customer preferences. These logs can be leveraged to improve the quality of product recommendations by feeding them to machine-learned ranking models. However, a variety of deployment constraints limit the naive applicability of machine learning to this problem. First, the amount and the dimensionality of the data make in-memory learning simply not possible. Second, the drift of customers' preference over time require to retrain the ranking model regularly with freshly collected data. This limits the time that is available for training to prohibitively short intervals. Third, ranking in real-time is necessary whenever the query complexity prevents us from caching the predictions. This constraint requires to minimize prediction time (or equivalently maximize the data throughput), which in turn may prevent us from achieving the accuracy necessary in web-scale industrial applications. In this paper, we investigate how the practical challenges faced in this setting can be tackled via an online learning to rank approach. Sparse models will be the key to reduce prediction latency, whereas one-pass stochastic optimization will minimize the training time and restrict the memory footprint. Interestingly, and perhaps surprisingly, extensive experiments show that one-pass learning preserves most of the predictive performance. Additionally, we study a variety of online learning algorithms that enforce sparsity and provide insights to help the practitioner make an informed decision about which approach to pick. We report results on a massive purchase log dataset from the Amazon retail website, as well as on several benchmarks from the LETOR corpus.
Supplemental Material
- F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning, 4(1):1--106, 2011. Google ScholarDigital Library
- L. Bottou and Y. LeCun. Large scale online learning. In Advances in Neural Information Processing Systems, volume 16, pages 217--224, 2004.Google Scholar
- C. J. C. Burges, R. Ragno, and Q. V. Le. Learning to rank with nonsmooth cost functions. In Advances in Neural Information Processing Systems (NIPS), pages 193--200, 2006.Google Scholar
- C. J. C. Burges, K. M. Svore, P. N. Bennett, A. Pastusiak, and Q. Wu. Learning to Rank Using an Ensemble of Lambda-Gradient Models. In Proceedings of the Yahoo! Learning to Rank Challenge, held at ICML 2010, Haifa, Israel, June 25, 2010, pages 25--35, 2011.Google ScholarDigital Library
- S. Büttcher, C. L. Clarke, and G. V. Cormack. Information Retrieval: Implementing and Evaluating Search Engines. MIT Press, 2010. Google ScholarDigital Library
- Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine learning (ICML 2007), pages 129--136, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- B. Croft, D. Metzler, and T. Strohman. Search Engines: Information Retrieval in Practice. Addison-Wesley, Boston (MA), 2009. Google ScholarDigital Library
- A. Defazio, F. Bach, and S. Lacoste-Julien. Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Technical report, preprint arXiv:1407.0202, 2014.Google Scholar
- J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12:2121--2159, 2011. Google ScholarDigital Library
- J. C. Duchi and Y. Singer. Efficient online and batch learning using forward backward splitting. Journal of Machine Learning Research, 10:2899--2934, 2009. Google ScholarDigital Library
- Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An Efficient Boosting Algorithm for Combining Preferences. Journal of Maching Learning Research, 4:933--969, 2003. Google ScholarDigital Library
- R. Herbrich, T. Graepel, and K. Obermayer. Large Margin Rank Boundaries for Ordinal Regression. In Smola, Bartlett, Schölkopf, and Schuurmans, editors, Advances in Large Margin Classifiers, chapter 7, pages 115--132. MIT Press, 2000.Google Scholar
- M. Hoffman, B. Shahriari, and N. de Freitas. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), pages 365--374, 2014.Google Scholar
- C. Hu, J. T. Kwok, and W. Pan. Accelerated gradient methods for stochastic optimization and online learning. In Advances in Neural Information Processing Systems, 2009.Google ScholarDigital Library
- F. Hutter, H. H. Hoos, and K. Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In Proceedings of LION-5, page 507--523, 2011. Google ScholarDigital Library
- T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 133--142, New York, NY, USA, 2002. ACM. Google ScholarDigital Library
- D. R. Jones, M. Schonlau, and W. J. Welch. Efficient global optimization of expensive black-box functions. Journal of Global optimization, 13(4):455--492, 1998. Google ScholarDigital Library
- H. Lai, Y. Pan, C. Liu, L. Lin, and J. Wu. Sparse Learning-to-Rank via an Efficient Primal-Dual Algorithm. IEEE Transactions on Computers, 62:1221--1233, 2013. Google ScholarDigital Library
- J. Langford, L. Li, and T. Zhang. Sparse Online Learning via Truncated Gradient. Journal of Machine Learning Research, 10:777--801, 2009. Google ScholarDigital Library
- L. Laporte, R. Flamary, S. Canu, S. Déjean, and J. Mothe. Nonconvex Regularizations for Feature Selection in Ranking With Sparse SVM. IEEE Trans. Neural Netw. Learning Syst., 25(6):1118--1130, 2014.Google ScholarCross Ref
- T.-Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Information Retrieval, 3(3):225--331, 2009. Google ScholarDigital Library
- C. D. Manning, P. Raghavan, and H. Schutze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarCross Ref
- B. H. McMahan. Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, April 11--13, 2011, pages 525--533, 2011.Google Scholar
- H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, et al. Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1222--1230. ACM, 2013. Google ScholarDigital Library
- A. Mohan, Z. Chen, and K. Q. Weinberger. Web-Search Ranking with Initialized Gradient Boosted Regression Trees. In Yahoo! Learning to Rank Challenge, pages 77--89, 2011.Google ScholarDigital Library
- S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. In Advances in Neural Information Processing Systems, 2009.Google Scholar
- M. Y. Park and T. Hastie. L1-regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society. Series B, 69(4):659--677, 2007.Google ScholarCross Ref
- T. Qin, T.-Y. Liu, and H. Li. A general approximation framework for direct optimization of information retrieval measures. Information Retrieval, 13(4):375--397, 2010. Google ScholarDigital Library
- D. Sculley. Large scale learning to rank. In NIPS Workshop on Advances in Ranking. 2009.Google Scholar
- J. Snoek, H. Larochelle, and R. P. Adams. Practical bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems, pages 2960--2968, 2012.Google ScholarDigital Library
- Z. Sun, T. Qin, Q. Tao, and J. Wang. Robust sparse rank learning for non-smooth ranking measures. In Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, Boston, MA, USA, July 19--23, 2009, pages 259--266. ACM, 2009. Google ScholarDigital Library
- J. Weston, S. Bengio, and N. Usunier. WSABIE: Scaling Up to Large Vocabulary Image Annotation. In IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16--22, 2011, pages 2764--2770, 2011. Google ScholarDigital Library
- L. Xiao. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization. Journal of Machine Learning Research, 11:2543--2596, 2010. Google ScholarDigital Library
- J. Xu and H. Li. Adarank: A boosting algorithm for information retrieval. In SIGIR '07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 391--398, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society. Series B, 67(2):301--320, 2005.Google ScholarCross Ref
Index Terms
- One-Pass Ranking Models for Low-Latency Product Recommendations
Recommendations
Adaptive, Personalized Diversity for Visual Discovery
RecSys '16: Proceedings of the 10th ACM Conference on Recommender SystemsSearch queries are appropriate when users have explicit intent, but they perform poorly when the intent is difficult to express or if the user is simply looking to be inspired. Visual browsing systems allow e-commerce platforms to address these scenarios ...
Optimizing Similar Item Recommendations in a Semi-structured Marketplace to Maximize Conversion
RecSys '16: Proceedings of the 10th ACM Conference on Recommender SystemsThis paper tackles the problem of recommendations in eBay's large semi-structured marketplace. eBay's variable inventory and lack of structured information about listings makes traditional collaborative filtering algorithms difficult to use. We discuss ...
E-commerce in Your Inbox: Product Recommendations at Scale
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningIn recent years online advertising has become increasingly ubiquitous and effective. Advertisements shown to visitors fund sites and apps that publish digital content, manage social networks, and operate e-mail services. Given such large variety of ...
Comments