skip to main content
10.1145/3219819.3219823acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Deep Interest Network for Click-Through Rate Prediction

Published:19 July 2018Publication History

ABSTRACT

Click-through rate prediction is an essential task in industrial applications, such as online advertising. Recently deep learning based models have been proposed, which follow a similar Embedding&MLP paradigm. In these methods large scale sparse input features are first mapped into low dimensional embedding vectors, and then transformed into fixed-length vectors in a group-wise manner, finally concatenated together to fed into a multilayer perceptron (MLP) to learn the nonlinear relations among features. In this way, user features are compressed into a fixed-length representation vector, in regardless of what candidate ads are. The use of fixed-length vector will be a bottleneck, which brings difficulty for Embedding&MLP methods to capture user's diverse interests effectively from rich historical behaviors. In this paper, we propose a novel model: Deep Interest Network (DIN) which tackles this challenge by designing a local activation unit to adaptively learn the representation of user interests from historical behaviors with respect to a certain ad. This representation vector varies over different ads, improving the expressive ability of model greatly. Besides, we develop two techniques: mini-batch aware regularization and data adaptive activation function which can help training industrial deep networks with hundreds of millions of parameters. Experiments on two public datasets as well as an Alibaba real production dataset with over 2 billion samples demonstrate the effectiveness of proposed approaches, which achieve superior performance compared with state-of-the-art methods. DIN now has been successfully deployed in the online display advertising system in Alibaba, serving the main traffic.

References

  1. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate Proceedings of the 3rd International Conference on Learning Representations.Google ScholarGoogle Scholar
  2. Ducharme Réjean Bengio Yoshua et al. 2003. A neural probabilistic language model. Journal of Machine Learning Research (2003), 1137--1155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 191--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cheng H. et al. 2016a. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Qu Y. et al. 2016b. Product-Based Neural Networks for User Response Prediction Proceedings of the 16th International Conference on Data Mining. IEEE.Google ScholarGoogle Scholar
  6. Zhu H. et al. 2017. Optimized Cost per Click in Taobao Display Advertising Proceedings of the 23rd International Conference on Knowledge Discovery and Data Mining. ACM, 2191--2200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tom Fawcett. 2006. An introduction to ROC analysis. Pattern recognition letters Vol. 27, 8 (2006), 861--874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kun Gai, Xiaoqiang Zhu, et almbox. 2017. Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction. arXiv preprint arXiv:1704.05194 (2017).Google ScholarGoogle Scholar
  9. Huifeng Guo, Ruiming Tang, et almbox. 2017. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction Proceedings of the 26th International Joint Conference on Artificial Intelligence. 1725--1731. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems Vol. 5, 4 (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026--1034. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In Proceedings of the 25th International Conference on World Wide Web. 507--517. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2261--2269.Google ScholarGoogle Scholar
  14. Diederik Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations.Google ScholarGoogle Scholar
  15. Mu Li, Ziqi Liu, Alexander J Smola, and Yu-Xiang Wang. 2016. DiFacto: Distributed factorization machines. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining. 377--386. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research Vol. 9, Nov (2008), 2579--2605.Google ScholarGoogle Scholar
  17. Julian Mcauley, Christopher Targett, Qinfeng Shi, and Van Den Hengel Anton. 2015. Image-Based Recommendations on Styles and Substitutes Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 43--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Brendan Mcmahan, H. Brendan Holt, et almbox. 2014. Ad Click Prediction: a View from the Trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1222--1230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Steffen Rendle. 2010. Factorization machines. In Proceedings of the 10th International Conference on Data Mining. IEEE, 995--1000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ying Shan, T Ryan Hoens, Jian Jiao, Haijing Wang, Dong Yu, and JC Mao. 2016. Deep Crossing: Web-scale modeling without manually crafted combinatorial features Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 255--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research Vol. 15, 1 (2014), 1929--1958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Andreas Veit, Balazs Kovacs, et almbox. 2015. Learning Visual Clothing Style With Heterogeneous Dyadic Co-Occurrences Proceedings of the IEEE International Conference on Computer Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ronald J Williams and David Zipser. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural computation (1989), 270--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ling Yan, Wu-jun Li, Gui-Rong Xue, and Dingyi Han. 2014. Coupled group lasso for web-scale ctr prediction in display advertising Proceedings of the 31th International Conference on Machine Learning. 802--810. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Shuangfei Zhai, Keng-hao Chang, Ruofei Zhang, and Zhongfei Mark Zhang. 2016. Deepintent: Learning attentions for online advertising with recurrent neural networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1295--1304. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Deep Interest Network for Click-Through Rate Prediction

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
        July 2018
        2925 pages
        ISBN:9781450355520
        DOI:10.1145/3219819

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 July 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader