research-article

Public Access

Learning Global Term Weights for Content-based Recommender Systems

Authors:
Yupeng Gu

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

,
Bo Zhao

LinkedIn Corporation, Sunnyvale, CA, USA

LinkedIn Corporation, Sunnyvale, CA, USA
View Profile

,
David Hardtke

LinkedIn Corporation, Sunnyvale, CA, USA

LinkedIn Corporation, Sunnyvale, CA, USA
View Profile

,
Yizhou Sun

Northeastern University, Boston, MA, USA

Northeastern University, Boston, MA, USA
View Profile

WWW '16: Proceedings of the 25th International Conference on World Wide WebApril 2016Pages 391–400https://doi.org/10.1145/2872427.2883069

Published:11 April 2016Publication History

WWW '16: Proceedings of the 25th International Conference on World Wide Web

Pages 391–400

ABSTRACT

Recommender systems typically leverage two types of signals to effectively recommend items to users: user activities and content matching between user and item profiles, and recommendation models in literature are usually categorized into collaborative filtering models, content-based models and hybrid models. In practice, when rich profiles about users and items are available, and user activities are sparse (cold-start), effective content matching signals become much more important in the relevance of the recommendation. The de-facto method to measure similarity between two pieces of text is computing the cosine similarity of the two bags of words, and each word is weighted by TF (term frequency within the document) x IDF (inverted document frequency of the word within the corpus). In general sense, TF can represent any local weighting scheme of the word within each document, and IDF can represent any global weighting scheme of the word across the corpus. In this paper, we focus on the latter, i.e., optimizing the global term weights, for a particular recommendation domain by leveraging supervised approaches. The intuition is that some frequent words (lower IDF, e.g. ``database'') can be essential and predictive for relevant recommendation, while some rare words (higher IDF, e.g. the name of a small company) could have less predictive power. Given plenty of observed activities between users and items as training data, we should be able to learn better domain-specific global term weights, which can further improve the relevance of recommendation.

We propose a unified method that can simultaneously learn the weights of multiple content matching signals, as well as global term weights for specific recommendation tasks. Our method is efficient to handle large-scale training data generated by production recommender systems. And experiments on LinkedIn job recommendation data justify the effectiveness of our approach.

References

G. Adomavicius, R. Sankaranarayanan, S. Sen, and A. Tuzhilin. Incorporating contextual information in recommender systems using a multidimensional approach. ACM Trans. on Information Systems (TOIS), 23(1):103--145, 2005. Google ScholarDigital Library
G. Adomavicius and A. Tuzhilin. Context-aware recommender systems. In Recommender Systems Handbook, pages 217--253. Springer, 2011. Google ScholarCross Ref
D. Agarwal and B.-C. Chen. flda: matrix factorization through latent dirichlet allocation. In Proc. of the third ACM Int. Conf. on Web Search and Data Mining, pages 91--100, 2010. Google ScholarDigital Library
L. Barak, I. Dagan, and E. Shnarch. Text categorization from category name via lexical reference. In Proc. of Human Language Technologies: The 2009 Annual Conf. of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pages 33--36, 2009. Google ScholarDigital Library
M. De Gemmis, P. Lops, G. Semeraro, and P. Basile. Integrating tags in a semantic content-based recommender. In Proc. of the 2008 ACM Conf. on Recommender Systems, pages 163--170, 2008. Google ScholarDigital Library
F. Debole and F. Sebastiani. Supervised term weighting for automated text categorization. In Text Mining and its Applications, pages 81--97. Springer, 2004. Google ScholarCross Ref
Z.-H. Deng, K.-H. Luo, and H.-L. Yu. A study of supervised term weighting scheme for sentiment analysis. Expert Systems with Applications, 41(7):3506--3513, 2014. Google ScholarDigital Library
Z.-H. Deng, S.-W. Tang, D.-Q. Yang, M. Z. L.-Y. Li, and K.-Q. Xie. A comparative study on feature weight in text categorization. In Advanced Web Technologies and Applications, pages 588--597. Springer, 2004. Google ScholarCross Ref
J. Diederich and T. Iofciu. Finding communities of practice from user profiles based on folksonomies. In Proc. of the 1st Int. Workshop on Building Technology Enhanced Learning solutions for Communities of Practice (TEL-CoPs'06), pages 288--297, 2006.Google Scholar
J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. The Journal of Machine Learning Research, 12:2121--2159, 2011. Google ScholarDigital Library
G. Ganu, N. Elhadad, and A. Marian. Beyond the stars: Improving rating predictions using review text content. In Proc. of the 12th Int. Workshop on the Web and Databases, volume 9, pages 1--6, 2009.Google Scholar
Y. Gu, Y. Sun, N. Jiang, B. Wang, and T. Chen. Topic-factorized ideal point estimation model for legislative voting network. In Proc. of the 20th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 183--192, 2014. Google ScholarDigital Library
N. Hariri, B. Mobasher, and R. Burke. Query-driven context aware recommendation. In Proc. of the 7th ACM Conf. on Recommender Systems, pages 9--16, 2013. Google ScholarDigital Library
P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. In Proc. of the 22nd ACM Int. Conf. on Information and Knowledge Management, pages 2333--2338, 2013. Google ScholarDigital Library
M. Lan, C. L. Tan, and H.-B. Low. Proposing a new term weighting scheme for text categorization. In Proc. 2006 AAAI Conf. on Artificial Intelligence, volume 6, pages 763--768, 2006. Google ScholarDigital Library
M. Lan, C. L. Tan, J. Su, and Y. Lu. Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. on Pattern Analysis and Machine Intelligence, 31(4):721--735, 2009. Google ScholarDigital Library
Y. Li, J. Nie, Y. Zhang, B. Wang, B. Yan, and F. Weng. Contextual recommendation based on text mining. In Proc. of the 23rd Int. Conf. on Computational Linguistics: Posters, pages 692--700, 2010. Google ScholarDigital Library
P. Lops, M. De Gemmis, and G. Semeraro. Content-based recommender systems: State of the art and trends. In Recommender Systems Handbook, pages 73--105. Springer, 2011. Google Scholar
Q. Luo, E. Chen, and H. Xiong. A semantic term weighting scheme for text categorization. Expert Systems with Applications, 38(10):12708--12716, 2011. Google ScholarDigital Library
H. Mak, I. Koprinska, and J. Poon. Intimate: A web-based movie recommender using text categorization. In IEEE/WIC Int. Conf. on Web Intelligence, pages 602--605, 2003. Google ScholarDigital Library
J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proc. of the 7th ACM Conf. on Recommender Systems, pages 165--172, 2013. Google ScholarDigital Library
H. B. McMahan and M. Streeter. Adaptive bound optimization for online convex optimization. In Proc. of the 23rd Annual Conf. on Learning Theory (COLT), 2010.Google Scholar
D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representations by error propagation. Technical report, DTIC Document, 1985.Google Scholar
M. Schmidt, G. Fung, and R. Rosales. Optimization methods for l1-regularization. University of British Columbia, Technical Report TR-2009, 19, 2009.Google Scholar
Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. A latent semantic model with convolutional-pooling structure for information retrieval. In Proc. of the 23rd ACM Int. Conf. on Information and Knowledge Management, pages 101--110, 2014. Google ScholarDigital Library
Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. Learning semantic representations using convolutional neural networks for web search. In Proc. of the Companion Publication of the 23rd Int. Conf. on World Wide Web Companion, pages 373--374, 2014. Google ScholarDigital Library
P. Soucy and G. W. Mineau. Beyond tfidf weighting for text categorization in the vector space model. In Proc. 19th Joint Int. Conf. Artificial Intelligence, volume 5, pages 1130--1135, 2005. Google ScholarDigital Library
K. Sparck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1):11--21, 1972. Google Scholar
C. Wang and D. M. Blei. Collaborative topic modeling for recommending scientific articles. In Proc. of the 17th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 448--456, 2011. Google ScholarDigital Library
J. Wang and D. Hardtke. User latent preference model for better downside management in recommender systems. In Proc. of the 24th Int. Conf. on World Wide Web, pages 1209--1219, 2015. Google ScholarDigital Library
Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In Proc. 1997 Int. Conf. Machine Learning, volume 97, pages 412--420, 1997. Google ScholarDigital Library

Index Terms

Learning Global Term Weights for Content-based Recommender Systems
1. Human-centered computing
  1. Collaborative and social computing
    1. Collaborative and social computing theory, concepts and paradigms
      1. Collaborative filtering
2. Information systems
  1. Information retrieval
    1. Document representation
    2. Retrieval tasks and goals
      1. Recommender systems
  2. Information systems applications
    1. Data mining
      1. Data cleaning

Recommendations

Investigating serendipity in recommender systems based on real user feedback
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

Over the past several years, research in recommender systems has emphasized the importance of serendipity, but there is still no consensus on the definition of this concept and whether serendipitous items should be recommended is still not a well-...
Read More
Acquiring User Information Needs for Recommender Systems
WI-IAT '13: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 03

Most recommender systems attempt to use collaborative filtering, content-based filtering or hybrid approach to recommend items to new users. Collaborative filtering recommends items to new users based on their similar neighbours, and content-based ...
Read More
A Scalable, Accurate Hybrid Recommender System
WKDD '10: Proceedings of the 2010 Third International Conference on Knowledge Discovery and Data Mining

Recommender systems apply machine learning techniques for filtering unseen information and can predict whether a user would like a given resource. There are three main types of recommender systems: collaborative filtering, content-based filtering, and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '16: Proceedings of the 25th International Conference on World Wide Web
April 2016
1482 pages
ISBN:9781450341431
General Chairs:
Jacqueline Bourdeau
Tele-university (TELUQ), Montreal, QC, Canada
,
Jim A. Hendler
Rensselaer Polytechnic Institute, Troy, NY, USA
,
Roger Nkambou Nkambou
Université du Québec à Montréal, Montreal, QC, Canada
,
Program Chairs:
Ian Horrocks
University of Oxford, UK
,
Ben Y. Zhao
University of California at Santa Barbara, CA, USA
Copyright © 2016 Copyright is held by the International World Wide Web Conference Committee (IW3C2)
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 11 April 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
feature selection
recommender systems
term weighting
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '16 Paper Acceptance Rate115of727submissions,16%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 718
  Total Downloads
- Downloads (Last 12 months)78
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning Global Term Weights for Content-based Recommender Systems

WWW '16: Proceedings of the 25th International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Investigating serendipity in recommender systems based on real user feedback

Acquiring User Information Needs for Recommender Systems

A Scalable, Accurate Hybrid Recommender System

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Learning Global Term Weights for Content-based Recommender Systems

WWW '16: Proceedings of the 25th International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Investigating serendipity in recommender systems based on real user feedback

Acquiring User Information Needs for Recommender Systems

A Scalable, Accurate Hybrid Recommender System

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media