skip to main content
10.1145/3357384.3357895acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer

Published:03 November 2019Publication History

ABSTRACT

Modeling users' dynamic preferences from their historical behaviors is challenging and crucial for recommendation systems. Previous methods employ sequential neural networks to encode users' historical interactions from left to right into hidden representations for making recommendations. Despite their effectiveness, we argue that such left-to-right unidirectional models are sub-optimal due to the limitations including: \begin enumerate* [label=series\itshape\alph*\upshape)] \item unidirectional architectures restrict the power of hidden representation in users' behavior sequences; \item they often assume a rigidly ordered sequence which is not always practical. \end enumerate* To address these limitations, we proposed a sequential recommendation model called BERT4Rec, which employs the deep bidirectional self-attention to model user behavior sequences. To avoid the information leakage and efficiently train the bidirectional model, we adopt the Cloze objective to sequential recommendation, predicting the random masked items in the sequence by jointly conditioning on their left and right context. In this way, we learn a bidirectional representation model to make recommendations by allowing each item in user historical behaviors to fuse information from both left and right sides. Extensive experiments on four benchmark datasets show that our model outperforms various state-of-the-art sequential models consistently.

References

  1. Lei Jimmy Ba, Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer Normalization. CoRR , Vol. abs/1607.06450 (2016).Google ScholarGoogle Scholar
  2. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of ICLR .Google ScholarGoogle Scholar
  3. Xu Chen, Hongteng Xu, Yongfeng Zhang, Jiaxi Tang, Yixin Cao, Zheng Qin, and Hongyuan Zha. 2018. Sequential Recommendation with User Memory Networks. In Proceedings of WSDM . ACM, 108--116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation. In Proceedings of EMNLP . 1724--1734.Google ScholarGoogle ScholarCross RefCross Ref
  5. Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks for YouTube Recommendations. In Proceedings of RecSys . ACM, 191--198.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL .Google ScholarGoogle Scholar
  7. Tim Donkers, Benedikt Loepp, and Jürgen Ziegler. 2017. Sequential User-based Recurrent Neural Network Recommendations. In Proceedings of RecSys . 152--160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst. , Vol. 5, 4, Article 19 (Dec. 2015), bibinfonumpages19 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of CVPR. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  10. Ruining He, Wang-Cheng Kang, and Julian McAuley. 2017a. Translation-based Recommendation. In Proceedings of RecSys. ACM, 161--169.Google ScholarGoogle Scholar
  11. Ruining He and Julian McAuley. 2016. Fusing Similarity Models with Markov Chains for Sparse Sequential Recommendation. In Proceedings of ICDM. 191--200.Google ScholarGoogle ScholarCross RefCross Ref
  12. Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017b. Neural Collaborative Filtering. In Proceedings of WWW. 173--182.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dan Hendrycks and Kevin Gimpel. 2016. Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units. CoRR , Vol. abs/1606.08415 (2016).Google ScholarGoogle Scholar
  14. Balázs Hidasi and Alexandros Karatzoglou. 2018. Recurrent Neural Networks with Top-k Gains for Session-based Recommendations. In Proceedings of CIKM. ACM, 843--852.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. 2016. Session-based Recommendations with Recurrent Neural Networks. In Proceedings of ICLR .Google ScholarGoogle Scholar
  16. Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. In Deep Learning and Representation Learning Workshop .Google ScholarGoogle Scholar
  17. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation , Vol. 9, 8 (Nov. 1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Liang Hu, Longbing Cao, Shoujin Wang, Guandong Xu, Jian Cao, and Zhiping Gu. 2017. Diversifying Personalized Recommendation with User-session Context. In Proceedings of IJCAI . 1858--1864.Google ScholarGoogle ScholarCross RefCross Ref
  19. Jin Huang, Wayne Xin Zhao, Hongjian Dou, Ji-Rong Wen, and Edward Y. Chang. 2018. Improving Sequential Recommendation with Knowledge-Enhanced Memory Networks. In Proceedings of SIGIR. ACM, 505--514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Santosh Kabbur, Xia Ning, and George Karypis. 2013. FISM: Factored Item Similarity Models for top-N Recommender Systems. In Proceedings of KDD . ACM, 659--667.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Wang-Cheng Kang, Chen Fang, Zhaowen Wang, and Julian McAuley. 2017. Visually-Aware Fashion Recommendation and Design with Generative Image Models. In Proceedings of ICDM. 207--216.Google ScholarGoogle ScholarCross RefCross Ref
  22. Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Recommendation. In Proceedings of ICDM. 197--206.Google ScholarGoogle ScholarCross RefCross Ref
  23. Donghyun Kim, Chanyoung Park, Jinoh Oh, Sungyoung Lee, and Hwanjo Yu. 2016. Convolutional Matrix Factorization for Document Context-Aware Recommendation. In Proceedings of RecSys. ACM, 233--240.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of ICLR .Google ScholarGoogle Scholar
  25. Yehuda Koren. 2008. Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model. In Proceedings of KDD. ACM, 426--434.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yehuda Koren and Robert Bell. 2011. Advances in Collaborative Filtering. In Recommender Systems Handbook . Springer US, Boston, MA, 145--186.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer , Vol. 42, 8 (Aug. 2009), 30--37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural Attentive Session-based Recommendation. In Proceedings of CIKM. ACM, 1419--1428.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jian Li, Zhaopeng Tu, Baosong Yang, Michael R. Lyu, and Tong Zhang. 2018. Multi-Head Attention with Disagreement Regularization. In Proceedings of EMNLP . 2897--2903.Google ScholarGoogle ScholarCross RefCross Ref
  30. Zhouhan Lin, Minwei Feng, C'i cero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A Structured Self-attentive Sentence Embedding. In Proceedings of ICLR .Google ScholarGoogle Scholar
  31. Greg Linden, Brent Smith, and Jeremy York. 2003. Amazon.Com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Computing , Vol. 7, 1 (Jan. 2003), 76--80.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer. 2018a. Generating Wikipedia by Summarizing Long Sequences. In Proceedings of ICLR .Google ScholarGoogle Scholar
  33. Qiao Liu, Yifu Zeng, Refuoe Mokhosi, and Haibin Zhang. 2018b. STAMP: Short-Term Attention/Memory Priority Model for Session-based Recommendation. In Proceedings of KDD. ACM, 1831--1839.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. 2015. Image-Based Recommendations on Styles and Substitutes. In Proceedings of SIGIR . ACM, 43--52.Google ScholarGoogle Scholar
  35. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR , Vol. abs/1301.3781 (2013).Google ScholarGoogle Scholar
  36. Massimo Quadrana, Alexandros Karatzoglou, Balázs Hidasi, and Paolo Cremonesi. 2017. Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks. In Proceedings of RecSys. ACM, 130--137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. In OpenAI Technical report .Google ScholarGoogle Scholar
  38. Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language Models are Unsupervised Multitask Learners. In OpenAI Technical report .Google ScholarGoogle Scholar
  39. Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of UAI . AUAI Press, Arlington, Virginia, United States, 452--461.Google ScholarGoogle Scholar
  40. Steffen Rendle, Christoph Freudenthaler, and Lars Schmidt-Thieme. 2010. Factorizing Personalized Markov Chains for Next-basket Recommendation. In Proceedings of WWW. ACM, 811--820.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ruslan Salakhutdinov and Andriy Mnih. 2007. Probabilistic Matrix Factorization. In Proceedings of NIPS. Curran Associates Inc., USA, 1257--1264.Google ScholarGoogle Scholar
  42. Ruslan Salakhutdinov, Andriy Mnih, and Geoffrey Hinton. 2007. Restricted Boltzmann Machines for Collaborative Filtering. In Proceedings of ICML . 791--798.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based Collaborative Filtering Recommendation Algorithms. In Proceedings of WWW . ACM, 285--295.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015. AutoRec: Autoencoders Meet Collaborative Filtering. In Proceedings of WWW . ACM, 111--112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Guy Shani, David Heckerman, and Ronen I. Brafman. 2005. An MDP-Based Recommender System. J. Mach. Learn. Res. , Vol. 6 (Dec. 2005), 1265--1295.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. 2018. Self-Attention with Relative Position Representations. In Proceedings of NAACL . 464--468.Google ScholarGoogle ScholarCross RefCross Ref
  47. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. , Vol. 15, 1 (Jan. 2014), 1929--1958.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Gongbo Tang, Mathias Müller, Annette Rios, and Rico Sennrich. 2018. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures. In Proceedings of EMNLP. 4263--4272.Google ScholarGoogle ScholarCross RefCross Ref
  49. Jiaxi Tang and Ke Wang. 2018. Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding. In Proceedings of WSDM. 565--573.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Wilson L. Taylor. 1953. "Cloze Procedure": A New Tool for Measuring Readability. Journalism Bulletin , Vol. 30, 4 (1953), 415--433.Google ScholarGoogle ScholarCross RefCross Ref
  51. Aaron van den Oord, Sander Dieleman, and Benjamin Schrauwen. 2013. Deep content-based music recommendation. In Proceedings of NIPS . 2643--2651.Google ScholarGoogle Scholar
  52. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. Curran Associates, Inc., 5998--6008.Google ScholarGoogle Scholar
  53. Hao Wang, Naiyan Wang, and Dit-Yan Yeung. 2015. Collaborative Deep Learning for Recommender Systems. In Proceedings of KDD . ACM, 1235--1244.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Shoujin Wang, Liang Hu, Longbing Cao, Xiaoshui Huang, Defu Lian, and Wei Liu. 2018. Attention-Based Transactional Context Embedding for Next-Item Recommendation. In Proceedings of AAAI. 2532--2539.Google ScholarGoogle Scholar
  55. Suhang Wang, Yilin Wang, Jiliang Tang, Kai Shu, Suhas Ranganath, and Huan Liu. 2017. What Your Images Reveal: Exploiting Visual Contents for Point-of-Interest Recommendation. In Proceedings of WWW. 391--400.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Chao-Yuan Wu, Amr Ahmed, Alex Beutel, Alexander J. Smola, and How Jing. 2017. Recurrent Recommender Networks. In Proceedings of WSDM. ACM, 495--503.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Yao Wu, Christopher DuBois, Alice X. Zheng, and Martin Ester. 2016. Collaborative Denoising Auto-Encoders for Top-N Recommender Systems. In Proceedings of WSDM . ACM, 153--162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical Attention Networks for Document Classification. In Proceedings of NAACL . 1480--1489.Google ScholarGoogle ScholarCross RefCross Ref
  59. Feng Yu, Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. 2016. A Dynamic Recurrent Model for Next Basket Recommendation. In Proceedings of SIGIR . ACM, 729--732.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
      November 2019
      3373 pages
      ISBN:9781450369763
      DOI:10.1145/3357384

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 November 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CIKM '19 Paper Acceptance Rate202of1,031submissions,20%Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader