ABSTRACT
Click-through rate prediction is an essential task in industrial applications, such as online advertising. Recently deep learning based models have been proposed, which follow a similar Embedding&MLP paradigm. In these methods large scale sparse input features are first mapped into low dimensional embedding vectors, and then transformed into fixed-length vectors in a group-wise manner, finally concatenated together to fed into a multilayer perceptron (MLP) to learn the nonlinear relations among features. In this way, user features are compressed into a fixed-length representation vector, in regardless of what candidate ads are. The use of fixed-length vector will be a bottleneck, which brings difficulty for Embedding&MLP methods to capture user's diverse interests effectively from rich historical behaviors. In this paper, we propose a novel model: Deep Interest Network (DIN) which tackles this challenge by designing a local activation unit to adaptively learn the representation of user interests from historical behaviors with respect to a certain ad. This representation vector varies over different ads, improving the expressive ability of model greatly. Besides, we develop two techniques: mini-batch aware regularization and data adaptive activation function which can help training industrial deep networks with hundreds of millions of parameters. Experiments on two public datasets as well as an Alibaba real production dataset with over 2 billion samples demonstrate the effectiveness of proposed approaches, which achieve superior performance compared with state-of-the-art methods. DIN now has been successfully deployed in the online display advertising system in Alibaba, serving the main traffic.
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate Proceedings of the 3rd International Conference on Learning Representations.Google Scholar
- Ducharme Réjean Bengio Yoshua et al. 2003. A neural probabilistic language model. Journal of Machine Learning Research (2003), 1137--1155. Google ScholarDigital Library
- Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 191--198. Google ScholarDigital Library
- Cheng H. et al. 2016a. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM. Google ScholarDigital Library
- Qu Y. et al. 2016b. Product-Based Neural Networks for User Response Prediction Proceedings of the 16th International Conference on Data Mining. IEEE.Google Scholar
- Zhu H. et al. 2017. Optimized Cost per Click in Taobao Display Advertising Proceedings of the 23rd International Conference on Knowledge Discovery and Data Mining. ACM, 2191--2200. Google ScholarDigital Library
- Tom Fawcett. 2006. An introduction to ROC analysis. Pattern recognition letters Vol. 27, 8 (2006), 861--874. Google ScholarDigital Library
- Kun Gai, Xiaoqiang Zhu, et almbox. 2017. Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction. arXiv preprint arXiv:1704.05194 (2017).Google Scholar
- Huifeng Guo, Ruiming Tang, et almbox. 2017. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction Proceedings of the 26th International Joint Conference on Artificial Intelligence. 1725--1731. Google ScholarDigital Library
- F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems Vol. 5, 4 (2015). Google ScholarDigital Library
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026--1034. Google ScholarDigital Library
- Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In Proceedings of the 25th International Conference on World Wide Web. 507--517. Google ScholarDigital Library
- Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2261--2269.Google Scholar
- Diederik Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations.Google Scholar
- Mu Li, Ziqi Liu, Alexander J Smola, and Yu-Xiang Wang. 2016. DiFacto: Distributed factorization machines. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining. 377--386. Google ScholarDigital Library
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research Vol. 9, Nov (2008), 2579--2605.Google Scholar
- Julian Mcauley, Christopher Targett, Qinfeng Shi, and Van Den Hengel Anton. 2015. Image-Based Recommendations on Styles and Substitutes Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 43--52. Google ScholarDigital Library
- H. Brendan Mcmahan, H. Brendan Holt, et almbox. 2014. Ad Click Prediction: a View from the Trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1222--1230. Google ScholarDigital Library
- Steffen Rendle. 2010. Factorization machines. In Proceedings of the 10th International Conference on Data Mining. IEEE, 995--1000. Google ScholarDigital Library
- Ying Shan, T Ryan Hoens, Jian Jiao, Haijing Wang, Dong Yu, and JC Mao. 2016. Deep Crossing: Web-scale modeling without manually crafted combinatorial features Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 255--262. Google ScholarDigital Library
- Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research Vol. 15, 1 (2014), 1929--1958. Google ScholarDigital Library
- Andreas Veit, Balazs Kovacs, et almbox. 2015. Learning Visual Clothing Style With Heterogeneous Dyadic Co-Occurrences Proceedings of the IEEE International Conference on Computer Vision. Google ScholarDigital Library
- Ronald J Williams and David Zipser. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural computation (1989), 270--280. Google ScholarDigital Library
- Ling Yan, Wu-jun Li, Gui-Rong Xue, and Dingyi Han. 2014. Coupled group lasso for web-scale ctr prediction in display advertising Proceedings of the 31th International Conference on Machine Learning. 802--810. Google ScholarDigital Library
- Shuangfei Zhai, Keng-hao Chang, Ruofei Zhang, and Zhongfei Mark Zhang. 2016. Deepintent: Learning attentions for online advertising with recurrent neural networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1295--1304. Google ScholarDigital Library
Index Terms
- Deep Interest Network for Click-Through Rate Prediction
Recommendations
Click-through rate prediction with the user memory network
DLP-KDD '19: Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse DataClick-through rate (CTR) prediction is a critical task in online advertising systems. Models like Deep Neural Networks (DNNs) are simple but stateless. They consider each target ad independently and cannot directly extract useful information contained ...
An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy
ADKDD'17: Proceedings of the ADKDD'17Etsy1 is a global marketplace where people across the world connect to make, buy and sell unique goods. Sellers at Etsy can promote their product listings via advertising campaigns similar to traditional sponsored search ads. Click-Through Rate (CTR) ...
Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction
WWW '19: The World Wide Web ConferenceClick-Through Rate prediction is an important task in recommender systems, which aims to estimate the probability of a user to click on a given item. Recently, many deep models have been proposed to learn low-order and high-order feature interactions ...
Comments