ABSTRACT
Ranking is a very important topic in information retrieval. While algorithms for learning ranking models have been intensively studied, this is not the case for feature selection, despite of its importance. The reality is that many feature selection methods used in classification are directly applied to ranking. We argue that because of the striking differences between ranking and classification, it is better to develop different feature selection methods for ranking. To this end, we propose a new feature selection method in this paper. Specifically, for each feature we use its value to rank the training instances, and define the ranking accuracy in terms of a performance measure or a loss function as the importance of the feature. We also define the correlation between the ranking results of two features as the similarity between them. Based on the definitions, we formulate the feature selection issue as an optimization problem, for which it is to find the features with maximum total importance scores and minimum total similarity scores. We also demonstrate how to solve the optimization problem in an efficient way. We have tested the effectiveness of our feature selection method on two information retrieval datasets and with two ranking models. Experimental results show that our method can outperform traditional feature selection methods for the ranking task.
- R. Battiti. Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks. vol. 5, NO.4, July 1994.Google ScholarDigital Library
- P. Borlund. The concept of relevance in IR. Journal of the American Society for Information Science and Technology 54(10): 913--925, 2003. Google ScholarDigital Library
- L. Breiman, J. H. Friedman, R. A. Olshen, and C.J.Stone. Classification and regression trees. Wadsworth and Brooks, 1984.Google Scholar
- C. Burges, T. Shaked, E. Renshaw, A .Lazier, M. Deeds, N. Hamilton, G. Hullender. Learning to rank using gradient descent. ICML 2005. Google ScholarDigital Library
- A. Blum and P. Langley. Selection of relevant features and examples in machine learning. AI, 97(1-2), 1997. Google ScholarDigital Library
- Y. Cao, J. Xu, T. Y. Liu, H. Li, Y. Huang, H. W. Hon. Adapting ranking SVM to document retrieval. SIGIR 2006. Google ScholarDigital Library
- G. Forman. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 2003. Google ScholarDigital Library
- I. Guyon, A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003. Google ScholarDigital Library
- W. Hersh, C. Buckley, T. J. Leone, and D. Hick-man. OHSUMED: an interactive retrieval evaluation and new large text collection for research. SIGIR 1994. Google ScholarDigital Library
- R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. Advances in Large Margin Classifiers, MIT Press, Pages: 115--132, 2000.Google Scholar
- K. Jarvelin and J. Kekalainen. Cumulated grain-based evaluation of IR techniques, ACM Transactions on Information Systems, 2002. Google ScholarDigital Library
- T. Joachims. Making large-scale SVM learning practical. Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT-Press, 1999. Google ScholarDigital Library
- T. Joachims. Optimizing search engines using clickthrough data. KDD 2002. Google ScholarDigital Library
- N. Kwak, C. H. Choi. Input feature selection for classification problems. Neural Networks, IEEE Transactions on Neural Networks, vol.13, No.1, January 2002. Google ScholarDigital Library
- R. Kohavi, G. H. John. Wrappers for feature selection. Artificial Intelligence, 1997. Google ScholarDigital Library
- M. Kendall. Rank correlation methods. Oxford University Press, 1990.Google Scholar
- J. Lafferty and C. Zhai. Document language models, query models, and risk minimization for Information Retrieval. SIGIR 2001. Google ScholarDigital Library
- A. M. Liebetrau. Measures of association, volume 32 of Quantitative Applications in the Social Sciences. Sage Publications, Inc., 1983.Google Scholar
- W. Lior, S. Bileschi. Combining variable selection with dimensionality reduction. CVPR 2005.Google Scholar
- D. Mladenic and M. Grobelnik. Feature selection for unbalanced class distribution and Naíve Bayes. ICML 1999. Google ScholarDigital Library
- R. Nallapati. Discriminative models for information retrieval. SIGIR 2004. Google ScholarDigital Library
- A. Y. Ng. Feature selection, L1 vs. L2 regularization, and rotational invariance. ICML 2004. Google ScholarDigital Library
- J. Ponte and W. B. Croft. A language model approach to information retrieval. SIGIR 1998. Google ScholarDigital Library
- T. Qin, T. Y. Liu, X. D. Zhang, Z. Chen, and W. Y. Ma. A study of relevance propagation for web search. SIGIR 2005. Google ScholarDigital Library
- S. Robertson. Overview of the okapi projects, Journal of Documentation, Vol. 53, No. 1, pp. 3--7, 1997.Google ScholarCross Ref
- S. Robertson and D.A. Hull. The TREC-9 Filtering Track Final Report. Proceeding of the 9th Text Retrieval Conference, pages 25--40, 2000.Google Scholar
- S. Theodoridis, K. Koutroumbas. Pattern recognition. Academic Press, New York, 1999. Google ScholarDigital Library
- E.M. Voorhees and D.K. Harman. TREC: experiment and evaluation in Information Retrieval. MIT Press, 2005. Google ScholarDigital Library
- J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio and V. Vapnik. Feature selection for SVMs. NIPS 2001.Google Scholar
- G. R. Xue, Q. Yang, H. J. Zeng, Y. Yu, and Z. Chen. Exploiting the hierarchical structure for link analysis. SIGIR 2005. Google ScholarDigital Library
- Y. Yang and Jan O. Pedersen. A comparative study on feature selection in text categorization. ICML 1997. Google ScholarDigital Library
- R.B. Yates, B.R. Neto. Modern information retrieval, Addison Wesley, 1999. Google ScholarDigital Library
- MSN, http://www.msn.com.Google Scholar
Index Terms
- Feature selection for ranking
Recommendations
Feature selection for ranking using boosted trees
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementModern search engines have to be fast to satisfy users, so there are hard back-end latency requirements. The set of features useful for search ranking functions, though, continues to grow, making feature computation a latency bottleneck. As a result, ...
Graph-based Feature Selection Method for Learning to Rank
ICCIP '20: Proceedings of the 6th International Conference on Communication and Information ProcessingIn this paper, a graph-based feature selection method for learning to rank, called FS-SCPR, is proposed. FS-SCPR models feature relationships as a graph and selects a subset of features that have minimum redundancy with each other and have maximum ...
Hierarchical feature selection for ranking
WWW '10: Proceedings of the 19th international conference on World wide webRanking is an essential part of information retrieval(IR) tasks such as Web search. Nowadays there are hundreds of features for ranking. So learning to rank(LTR), an interdisciplinary field of IR and machine learning(ML), has attracted increasing ...
Comments