skip to main content
10.1145/2872427.2883069acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Public Access

Learning Global Term Weights for Content-based Recommender Systems

Published:11 April 2016Publication History

ABSTRACT

Recommender systems typically leverage two types of signals to effectively recommend items to users: user activities and content matching between user and item profiles, and recommendation models in literature are usually categorized into collaborative filtering models, content-based models and hybrid models. In practice, when rich profiles about users and items are available, and user activities are sparse (cold-start), effective content matching signals become much more important in the relevance of the recommendation. The de-facto method to measure similarity between two pieces of text is computing the cosine similarity of the two bags of words, and each word is weighted by TF (term frequency within the document) x IDF (inverted document frequency of the word within the corpus). In general sense, TF can represent any local weighting scheme of the word within each document, and IDF can represent any global weighting scheme of the word across the corpus. In this paper, we focus on the latter, i.e., optimizing the global term weights, for a particular recommendation domain by leveraging supervised approaches. The intuition is that some frequent words (lower IDF, e.g. ``database'') can be essential and predictive for relevant recommendation, while some rare words (higher IDF, e.g. the name of a small company) could have less predictive power. Given plenty of observed activities between users and items as training data, we should be able to learn better domain-specific global term weights, which can further improve the relevance of recommendation.

We propose a unified method that can simultaneously learn the weights of multiple content matching signals, as well as global term weights for specific recommendation tasks. Our method is efficient to handle large-scale training data generated by production recommender systems. And experiments on LinkedIn job recommendation data justify the effectiveness of our approach.

References

  1. G. Adomavicius, R. Sankaranarayanan, S. Sen, and A. Tuzhilin. Incorporating contextual information in recommender systems using a multidimensional approach. ACM Trans. on Information Systems (TOIS), 23(1):103--145, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Adomavicius and A. Tuzhilin. Context-aware recommender systems. In Recommender Systems Handbook, pages 217--253. Springer, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  3. D. Agarwal and B.-C. Chen. flda: matrix factorization through latent dirichlet allocation. In Proc. of the third ACM Int. Conf. on Web Search and Data Mining, pages 91--100, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Barak, I. Dagan, and E. Shnarch. Text categorization from category name via lexical reference. In Proc. of Human Language Technologies: The 2009 Annual Conf. of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pages 33--36, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. De Gemmis, P. Lops, G. Semeraro, and P. Basile. Integrating tags in a semantic content-based recommender. In Proc. of the 2008 ACM Conf. on Recommender Systems, pages 163--170, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. Debole and F. Sebastiani. Supervised term weighting for automated text categorization. In Text Mining and its Applications, pages 81--97. Springer, 2004. Google ScholarGoogle ScholarCross RefCross Ref
  7. Z.-H. Deng, K.-H. Luo, and H.-L. Yu. A study of supervised term weighting scheme for sentiment analysis. Expert Systems with Applications, 41(7):3506--3513, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Z.-H. Deng, S.-W. Tang, D.-Q. Yang, M. Z. L.-Y. Li, and K.-Q. Xie. A comparative study on feature weight in text categorization. In Advanced Web Technologies and Applications, pages 588--597. Springer, 2004. Google ScholarGoogle ScholarCross RefCross Ref
  9. J. Diederich and T. Iofciu. Finding communities of practice from user profiles based on folksonomies. In Proc. of the 1st Int. Workshop on Building Technology Enhanced Learning solutions for Communities of Practice (TEL-CoPs'06), pages 288--297, 2006.Google ScholarGoogle Scholar
  10. J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12:2121--2159, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Ganu, N. Elhadad, and A. Marian. Beyond the stars: Improving rating predictions using review text content. In Proc. of the 12th Int. Workshop on the Web and Databases, volume 9, pages 1--6, 2009.Google ScholarGoogle Scholar
  12. Y. Gu, Y. Sun, N. Jiang, B. Wang, and T. Chen. Topic-factorized ideal point estimation model for legislative voting network. In Proc. of the 20th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 183--192, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Hariri, B. Mobasher, and R. Burke. Query-driven context aware recommendation. In Proc. of the 7th ACM Conf. on Recommender Systems, pages 9--16, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. In Proc. of the 22nd ACM Int. Conf. on Information and Knowledge Management, pages 2333--2338, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Lan, C. L. Tan, and H.-B. Low. Proposing a new term weighting scheme for text categorization. In Proc. 2006 AAAI Conf. on Artificial Intelligence, volume 6, pages 763--768, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Lan, C. L. Tan, J. Su, and Y. Lu. Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. on Pattern Analysis and Machine Intelligence, 31(4):721--735, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Li, J. Nie, Y. Zhang, B. Wang, B. Yan, and F. Weng. Contextual recommendation based on text mining. In Proc. of the 23rd Int. Conf. on Computational Linguistics: Posters, pages 692--700, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Lops, M. De Gemmis, and G. Semeraro. Content-based recommender systems: State of the art and trends. In Recommender Systems Handbook, pages 73--105. Springer, 2011. Google ScholarGoogle Scholar
  19. Q. Luo, E. Chen, and H. Xiong. A semantic term weighting scheme for text categorization. Expert Systems with Applications, 38(10):12708--12716, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Mak, I. Koprinska, and J. Poon. Intimate: A web-based movie recommender using text categorization. In IEEE/WIC Int. Conf. on Web Intelligence, pages 602--605, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proc. of the 7th ACM Conf. on Recommender Systems, pages 165--172, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. B. McMahan and M. Streeter. Adaptive bound optimization for online convex optimization. In Proc. of the 23rd Annual Conf. on Learning Theory (COLT), 2010.Google ScholarGoogle Scholar
  23. D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. Technical report, DTIC Document, 1985.Google ScholarGoogle Scholar
  24. M. Schmidt, G. Fung, and R. Rosales. Optimization methods for l1-regularization. University of British Columbia, Technical Report TR-2009, 19, 2009.Google ScholarGoogle Scholar
  25. Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. A latent semantic model with convolutional-pooling structure for information retrieval. In Proc. of the 23rd ACM Int. Conf. on Information and Knowledge Management, pages 101--110, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. Learning semantic representations using convolutional neural networks for web search. In Proc. of the Companion Publication of the 23rd Int. Conf. on World Wide Web Companion, pages 373--374, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Soucy and G. W. Mineau. Beyond tfidf weighting for text categorization in the vector space model. In Proc. 19th Joint Int. Conf. Artificial Intelligence, volume 5, pages 1130--1135, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. Sparck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11--21, 1972. Google ScholarGoogle Scholar
  29. C. Wang and D. M. Blei. Collaborative topic modeling for recommending scientific articles. In Proc. of the 17th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 448--456, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Wang and D. Hardtke. User latent preference model for better downside management in recommender systems. In Proc. of the 24th Int. Conf. on World Wide Web, pages 1209--1219, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In Proc. 1997 Int. Conf. Machine Learning, volume 97, pages 412--420, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning Global Term Weights for Content-based Recommender Systems

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            WWW '16: Proceedings of the 25th International Conference on World Wide Web
            April 2016
            1482 pages
            ISBN:9781450341431

            Copyright © 2016 Copyright is held by the International World Wide Web Conference Committee (IW3C2)

            Publisher

            International World Wide Web Conferences Steering Committee

            Republic and Canton of Geneva, Switzerland

            Publication History

            • Published: 11 April 2016

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            WWW '16 Paper Acceptance Rate115of727submissions,16%Overall Acceptance Rate1,899of8,196submissions,23%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader