ABSTRACT
Recommender systems typically leverage two types of signals to effectively recommend items to users: user activities and content matching between user and item profiles, and recommendation models in literature are usually categorized into collaborative filtering models, content-based models and hybrid models. In practice, when rich profiles about users and items are available, and user activities are sparse (cold-start), effective content matching signals become much more important in the relevance of the recommendation. The de-facto method to measure similarity between two pieces of text is computing the cosine similarity of the two bags of words, and each word is weighted by TF (term frequency within the document) x IDF (inverted document frequency of the word within the corpus). In general sense, TF can represent any local weighting scheme of the word within each document, and IDF can represent any global weighting scheme of the word across the corpus. In this paper, we focus on the latter, i.e., optimizing the global term weights, for a particular recommendation domain by leveraging supervised approaches. The intuition is that some frequent words (lower IDF, e.g. ``database'') can be essential and predictive for relevant recommendation, while some rare words (higher IDF, e.g. the name of a small company) could have less predictive power. Given plenty of observed activities between users and items as training data, we should be able to learn better domain-specific global term weights, which can further improve the relevance of recommendation.
We propose a unified method that can simultaneously learn the weights of multiple content matching signals, as well as global term weights for specific recommendation tasks. Our method is efficient to handle large-scale training data generated by production recommender systems. And experiments on LinkedIn job recommendation data justify the effectiveness of our approach.
- G. Adomavicius, R. Sankaranarayanan, S. Sen, and A. Tuzhilin. Incorporating contextual information in recommender systems using a multidimensional approach. ACM Trans. on Information Systems (TOIS), 23(1):103--145, 2005. Google ScholarDigital Library
- G. Adomavicius and A. Tuzhilin. Context-aware recommender systems. In Recommender Systems Handbook, pages 217--253. Springer, 2011. Google ScholarCross Ref
- D. Agarwal and B.-C. Chen. flda: matrix factorization through latent dirichlet allocation. In Proc. of the third ACM Int. Conf. on Web Search and Data Mining, pages 91--100, 2010. Google ScholarDigital Library
- L. Barak, I. Dagan, and E. Shnarch. Text categorization from category name via lexical reference. In Proc. of Human Language Technologies: The 2009 Annual Conf. of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pages 33--36, 2009. Google ScholarDigital Library
- M. De Gemmis, P. Lops, G. Semeraro, and P. Basile. Integrating tags in a semantic content-based recommender. In Proc. of the 2008 ACM Conf. on Recommender Systems, pages 163--170, 2008. Google ScholarDigital Library
- F. Debole and F. Sebastiani. Supervised term weighting for automated text categorization. In Text Mining and its Applications, pages 81--97. Springer, 2004. Google ScholarCross Ref
- Z.-H. Deng, K.-H. Luo, and H.-L. Yu. A study of supervised term weighting scheme for sentiment analysis. Expert Systems with Applications, 41(7):3506--3513, 2014. Google ScholarDigital Library
- Z.-H. Deng, S.-W. Tang, D.-Q. Yang, M. Z. L.-Y. Li, and K.-Q. Xie. A comparative study on feature weight in text categorization. In Advanced Web Technologies and Applications, pages 588--597. Springer, 2004. Google ScholarCross Ref
- J. Diederich and T. Iofciu. Finding communities of practice from user profiles based on folksonomies. In Proc. of the 1st Int. Workshop on Building Technology Enhanced Learning solutions for Communities of Practice (TEL-CoPs'06), pages 288--297, 2006.Google Scholar
- J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12:2121--2159, 2011. Google ScholarDigital Library
- G. Ganu, N. Elhadad, and A. Marian. Beyond the stars: Improving rating predictions using review text content. In Proc. of the 12th Int. Workshop on the Web and Databases, volume 9, pages 1--6, 2009.Google Scholar
- Y. Gu, Y. Sun, N. Jiang, B. Wang, and T. Chen. Topic-factorized ideal point estimation model for legislative voting network. In Proc. of the 20th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 183--192, 2014. Google ScholarDigital Library
- N. Hariri, B. Mobasher, and R. Burke. Query-driven context aware recommendation. In Proc. of the 7th ACM Conf. on Recommender Systems, pages 9--16, 2013. Google ScholarDigital Library
- P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. In Proc. of the 22nd ACM Int. Conf. on Information and Knowledge Management, pages 2333--2338, 2013. Google ScholarDigital Library
- M. Lan, C. L. Tan, and H.-B. Low. Proposing a new term weighting scheme for text categorization. In Proc. 2006 AAAI Conf. on Artificial Intelligence, volume 6, pages 763--768, 2006. Google ScholarDigital Library
- M. Lan, C. L. Tan, J. Su, and Y. Lu. Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. on Pattern Analysis and Machine Intelligence, 31(4):721--735, 2009. Google ScholarDigital Library
- Y. Li, J. Nie, Y. Zhang, B. Wang, B. Yan, and F. Weng. Contextual recommendation based on text mining. In Proc. of the 23rd Int. Conf. on Computational Linguistics: Posters, pages 692--700, 2010. Google ScholarDigital Library
- P. Lops, M. De Gemmis, and G. Semeraro. Content-based recommender systems: State of the art and trends. In Recommender Systems Handbook, pages 73--105. Springer, 2011. Google Scholar
- Q. Luo, E. Chen, and H. Xiong. A semantic term weighting scheme for text categorization. Expert Systems with Applications, 38(10):12708--12716, 2011. Google ScholarDigital Library
- H. Mak, I. Koprinska, and J. Poon. Intimate: A web-based movie recommender using text categorization. In IEEE/WIC Int. Conf. on Web Intelligence, pages 602--605, 2003. Google ScholarDigital Library
- J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proc. of the 7th ACM Conf. on Recommender Systems, pages 165--172, 2013. Google ScholarDigital Library
- H. B. McMahan and M. Streeter. Adaptive bound optimization for online convex optimization. In Proc. of the 23rd Annual Conf. on Learning Theory (COLT), 2010.Google Scholar
- D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. Technical report, DTIC Document, 1985.Google Scholar
- M. Schmidt, G. Fung, and R. Rosales. Optimization methods for l1-regularization. University of British Columbia, Technical Report TR-2009, 19, 2009.Google Scholar
- Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. A latent semantic model with convolutional-pooling structure for information retrieval. In Proc. of the 23rd ACM Int. Conf. on Information and Knowledge Management, pages 101--110, 2014. Google ScholarDigital Library
- Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. Learning semantic representations using convolutional neural networks for web search. In Proc. of the Companion Publication of the 23rd Int. Conf. on World Wide Web Companion, pages 373--374, 2014. Google ScholarDigital Library
- P. Soucy and G. W. Mineau. Beyond tfidf weighting for text categorization in the vector space model. In Proc. 19th Joint Int. Conf. Artificial Intelligence, volume 5, pages 1130--1135, 2005. Google ScholarDigital Library
- K. Sparck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11--21, 1972. Google Scholar
- C. Wang and D. M. Blei. Collaborative topic modeling for recommending scientific articles. In Proc. of the 17th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 448--456, 2011. Google ScholarDigital Library
- J. Wang and D. Hardtke. User latent preference model for better downside management in recommender systems. In Proc. of the 24th Int. Conf. on World Wide Web, pages 1209--1219, 2015. Google ScholarDigital Library
- Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In Proc. 1997 Int. Conf. Machine Learning, volume 97, pages 412--420, 1997. Google ScholarDigital Library
Index Terms
- Learning Global Term Weights for Content-based Recommender Systems
Recommendations
Investigating serendipity in recommender systems based on real user feedback
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied ComputingOver the past several years, research in recommender systems has emphasized the importance of serendipity, but there is still no consensus on the definition of this concept and whether serendipitous items should be recommended is still not a well-...
Acquiring User Information Needs for Recommender Systems
WI-IAT '13: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 03Most recommender systems attempt to use collaborative filtering, content-based filtering or hybrid approach to recommend items to new users. Collaborative filtering recommends items to new users based on their similar neighbours, and content-based ...
A Scalable, Accurate Hybrid Recommender System
WKDD '10: Proceedings of the 2010 Third International Conference on Knowledge Discovery and Data MiningRecommender systems apply machine learning techniques for filtering unseen information and can predict whether a user would like a given resource. There are three main types of recommender systems: collaborative filtering, content-based filtering, and ...
Comments