ABSTRACT
Multimedia content is dominating today's Web information. The nature of multimedia user-item interactions is 1/0 binary implicit feedback (e.g., photo likes, video views, song downloads, etc.), which can be collected at a larger scale with a much lower cost than explicit feedback (e.g., product ratings). However, the majority of existing collaborative filtering (CF) systems are not well-designed for multimedia recommendation, since they ignore the implicitness in users' interactions with multimedia content. We argue that, in multimedia recommendation, there exists item- and component-level implicitness which blurs the underlying users' preferences. The item-level implicitness means that users' preferences on items (e.g. photos, videos, songs, etc.) are unknown, while the component-level implicitness means that inside each item users' preferences on different components (e.g. regions in an image, frames of a video, etc.) are unknown. For example, a 'view'' on a video does not provide any specific information about how the user likes the video (i.e.item-level) and which parts of the video the user is interested in (i.e.component-level). In this paper, we introduce a novel attention mechanism in CF to address the challenging item- and component-level implicit feedback in multimedia recommendation, dubbed Attentive Collaborative Filtering (ACF). Specifically, our attention model is a neural network that consists of two attention modules: the component-level attention module, starting from any content feature extraction network (e.g. CNN for images/videos), which learns to select informative components of multimedia items, and the item-level attention module, which learns to score the item preferences. ACF can be seamlessly incorporated into classic CF models with implicit feedback, such as BPR and SVD++, and efficiently trained using SGD. Through extensive experiments on two real-world multimedia Web services: Vine and Pinterest, we show that ACF significantly outperforms state-of-the-art CF methods.
- D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2014.Google Scholar
- S. Baluja, R. Seth, D. Sivakumar, Y. Jing, J. Yagnik, S. Kumar, D. Ravichandran, and M. Aly. Video suggestion and discovery for youtube: taking random walks through the view graph. In WWW, pages 895--904. ACM, 2008. Google ScholarDigital Library
- M. Bendersky, L. G. Pueyo, J. J. Harmsen, V. Josifovski, and D. Lepikhin. Up next: retrieval methods for large scale related video suggestion. In KDD, pages 1769--1778. ACM, 2014. Google ScholarDigital Library
- B. Chen, J. Wang, Q. Huang, and T. Mei. Personalized video recommendation through tripartite graph propagation. In Proceedings of the International Conference on Multimedia, pages 1133--1136. ACM, 2012. Google ScholarDigital Library
- J. Chen. Multi-modal learning: Study on A large-scale micro-video data collection. In Proceedings of the 2016 ACM Conference on Multimedia Conference, MM 2016, Amsterdam, The Netherlands, October 15-19, 2016, pages 1454--1458. ACM, 2016. Google ScholarDigital Library
- J. Chen, X. Song, L. Nie, X. Wang, H. Zhang, and T. Chua. Micro tells macro: Predicting the popularity of micro-videos via a transductive model. In MM, pages 898--907. ACM, 2016. Google ScholarDigital Library
- L. Chen, H. Zhang, J. Xiao, L. Nie, J. Shao, W. Liu, and T.-S. Chua. Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning. In CVPR. IEEE, 2017.Google Scholar
- T. Chen, X. He, and M.-Y. Kan. Context-aware image tweet modelling and recommendation. In MM, pages 1018--1027. ACM, 2016. Google ScholarDigital Library
- T. Chen, W. Zhang, Q. Lu, K. Chen, Z. Zheng, and Y. Yu. Svdfeature: a toolkit for feature-based collaborative filtering. JMLR, 13:3619--3622, 2012.Google ScholarDigital Library
- X. Chen, Y. Zhang, H. X. Qingyao Ai, J. Yan, and Z. Qin. Personalized key frame recommendation. In SIGIR. ACM, 2017.Google ScholarDigital Library
- Z. Cheng and J. Shen. On effective location-aware music recommendation. TOIS, 34(2):13:1--13:32, 2016.Google ScholarDigital Library
- P. Cui, Z. Wang, and Z. Su. What videos are similar with you? Learning a common attributed representation for video recommendation. In MM, pages 597--606. ACM, 2014.Google Scholar
- A. Farseev, I. Samborskii, A. Filchenkov, and T.-S. Chua. Cross-domain recommendation via clustering on multi-layer graphs. In SIGIR. ACM, 2017.Google ScholarDigital Library
- F. Feng, L. Nie, X. Wang, R. Hong, and C. Tat-Seng. Computational social indicators: a case study of chinese university ranking. In SIGIR. ACM, 2017.Google ScholarDigital Library
- X. Geng, H. Zhang, J. Bian, and T. Chua. Learning image and user features for recommendation in social networks. In ICCV, pages 4274--4282. IEEE, 2015. Google ScholarDigital Library
- X. Geng, H. Zhang, Z. Song, Y. Yang, H. Luan, and T. Chua. One of a kind: User profiling by social curation. In MM, pages 567--576. ACM, 2014. Google ScholarDigital Library
- X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In JMLR, pages 249--256. JMLR.org, 2010.Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, pages 770--778. IEEE, 2016. Google ScholarCross Ref
- X. He, M. Gao, M.-Y. Kan, and D. Wang. Birank: Towards ranking on bipartite graphs. TKDE, 29(1):57--71, 2017. Google ScholarDigital Library
- X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua. Neural collaborative filtering. In WWW, pages 173--182. ACM, 2017. Google ScholarDigital Library
- X. He, H. Zhang, M. Kan, and T. Chua. Fast matrix factorization for online recommendation with implicit feedback. In SIGIR, pages 549--558. ACM, 2016. Google ScholarDigital Library
- R. Hu, M. Rohrbach, J. Andreas, T. Darrell, and K. Saenko. Modeling relationships in referential expressions with compositional modular networks. In CVPR, 2016.Google Scholar
- Y. Hu, Y. Koren, and C. Volinsky. Collaborative filtering for implicit feedback datasets. In ICDM, pages 263--272. IEEE, 2008. Google ScholarDigital Library
- S. Kabbur, X. Ning, and G. Karypis. FISM: factored item similarity models for top-n recommender systems. In KDD, pages 659--667. ACM, 2013. Google ScholarDigital Library
- Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In KDD, pages 426--434. ACM, 2008. Google ScholarDigital Library
- T. Mei, B. Yang, X. Hua, L. Yang, S. Yang, and S. Li. Videoreach: an online video recommendation system. In SIGIR, pages 767--768. ACM, 2007. Google ScholarDigital Library
- R. Pan, Y. Zhou, B. Cao, N. N. Liu, R. M. Lukose, M. Scholz, and Q. Yang. One-class collaborative filtering. In ICDM, pages 502--511. IEEE, 2008. Google ScholarDigital Library
- M. J. Pazzani and D. Billsus. Content-based recommendation systems. In Proceedings of the Adaptive Web, Methods and Strategies of Web Personalization, pages 325--341. Springer, 2007. Google ScholarCross Ref
- S. Rendle. Factorization machines. In ICDM, pages 995--1000. IEEE, 2010. Google ScholarDigital Library
- S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme. BPR: bayesian personalized ranking from implicit feedback. In UAI, pages 452--461. IEEE, 2009.Google ScholarDigital Library
- B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. In WWW, pages 285--295. ACM, 2001. Google ScholarDigital Library
- J. Shen, M. Wang, S. Yan, and P. Cui. Multimedia recommendation: technology and techniques. In SIGIR, page 1131. ACM, 2013. Google ScholarDigital Library
- A. van den Oord, S. Dieleman, and B. Schrauwen. Deep content-based music recommendation. In NIPS, pages 2643--2651. NIPS Foundation, 2013.Google ScholarDigital Library
- M. Wang, H. Li, D. Tao, K. Lu, and X. Wu. Multimodal graph-based reranking for web image search. TIP, 21(11):4649--4661, 2012. Google ScholarDigital Library
- M. Wang, X. Liu, and X. Wu. Visual classification by l1-hypergraph modeling. TKDE, 27(9):2564--2574, 2015.Google ScholarDigital Library
- S. Wang, Y. Wang, J. Tang, K. Shu, S. Ranganath, and H. Liu. What your images reveal: Exploiting visual contents for point-of-interest recommendation. In WWW, pages 391--400. ACM, 2017. Google ScholarDigital Library
- X. Wang, X. He, L. Nie, and T.-S. Chua. Item silk road: Recommending items from information domains to social users. In SIGIR. ACM, 2017.Google ScholarDigital Library
- X. Wang, L. Nie, X. Song, D. Zhang, and T.-S. Chua. Unifying virtual and physical worlds: Learning toward local and global consistency. TOIS, 36(1):4, 2017. Google ScholarDigital Library
- K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, pages 2048--2057. JMLR.org, 2015.Google ScholarDigital Library
- Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. In CVPR, pages 4651--4659. IEEE, 2016. Google ScholarCross Ref
- M. Zanfir, E. Marinoiu, and C. Sminchisescu. Spatio-temporal attention models for grounded video captioning. In ACCV, pages 104--119. Springer, 2016.Google Scholar
- H. Zhang, Z. Kyaw, S.-F. Chang, and T.-S. Chua. Visual translation embedding network for visual relation detection. In CVPR, 2017.Google ScholarCross Ref
- H. Zhang, F. Shen, W. Liu, X. He, H. Luan, and T. Chua. Discrete collaborative filtering. In SIGIR, pages 325--334. ACM, 2016. Google ScholarDigital Library
- H. Zhang, Z. Zha, Y. Yang, S. Yan, Y. Gao, and T. Chua. Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In MM, pages 33--42. ACM, 2013. Google ScholarDigital Library
- J. Zhang, L. Nie, X. Wang, X. He, X. Huang, and T. Chua. Shorter-is-better: Venue category estimation from micro-video. In MM, pages 1415--1424. ACM, 2016.Google ScholarDigital Library
- Z. Zhao and M. Shang. User-based collaborative-filtering recommendation algorithms on hadoop. In KDD, pages 478--481. ACM, 2010.Google Scholar
Index Terms
Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention
Recommendations
A Similarity Measure for Collaborative Filtering with Implicit Feedback
ICIC '07: Proceedings of the 3rd International Conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial IntelligenceCollaborative Filtering(CF) is a widely accepted method of creating recommender systems. CF is based on the similarities among users or items. Measures of similarity including the Pearson Correlation Coefficient and the Cosine Similarity work quite well ...
One-class collaborative filtering based on rating prediction and ranking prediction
One-Class Collaborative Filtering (OCCF) has recently received much attention in recommendation communities due to their close relationship with real industry problem settings. However, the problem with previous research studies on OCCF is that they ...
Trust-based collaborative filtering: tackling the cold start problem using regular equivalence
RecSys '18: Proceedings of the 12th ACM Conference on Recommender SystemsUser-based Collaborative Filtering (CF) is one of the most popular approaches to create recommender systems. This approach is based on finding the most relevant k users from whose rating history we can extract items to recommend. CF, however, suffers ...
Comments