skip to main content
10.1145/1277741.1277810acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

A combined component approach for finding collection-adapted ranking functions based on genetic programming

Published: 23 July 2007 Publication History

Abstract

In this paper, we propose a new method to discover collection-adapted ranking functions based on Genetic Programming (GP). Our Combined Component Approach (CCA)is based on the combination of several term-weighting components (i.e.,term frequency, collection frequency, normalization) extracted from well-known ranking functions. In contrast to related work, the GP terminals in our CCA are not based on simple statistical information of a document collection, but on meaningful, effective, and proven components. Experimental results show that our approach was able to outper form standard TF-IDF, BM25 and another GP-based approach in two different collections. CCA obtained improvements in mean average precision up to 40.87% for the TREC-8 collection, and 24.85% for the WBR99 collection (a large Brazilian Web collection), over the baseline functions. The CCA evolution process also was able to reduce the overtraining, commonly found in machine learning methods, especially genetic programming, and to converge faster than the other GP-based approach used for comparison.

References

[1]
J. Allan, J. P. Callan, F. Feng, and D. Malin. INQUERY and TREC-8. In Proceedings of TREC-8, pages 637--644, Gaithersburg, MD, 1999. NIST Special Publication 500-246.
[2]
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley-Longman, Boston, MA, 1999.
[3]
B. T. Bartell, G. W. Cottrell, and R. K. Belew. Automatic combination of multiple ranked retrieval systems. In Proceedings of the 17th ACM SIGIR, pages 173--181, 1994.
[4]
C. Buckley, A. Singhal, and M. Mitra. New retrieval approaches using smart: TREC 4. In Proceedings of TREC-4, pages 25--48, Gaithersburg, MD, 1996. NIST Special Publication 500-236.
[5]
W. Fan, E. A. Fox, P. Pathak, and H. Wu. The effects of fitness functions on genetic programming-based ranking discovery for web search. Journal of the American Society for Information Science and Technology, 55(7):628--636, 2004.
[6]
W. Fan, M. Gordon, and P. Pathak. On linear mixture of expert approaches to information retrieval. Decision Support Systems, 42(2):975--987, 2006.
[7]
W. Fan, M. D. Gordon, and P. Pathak. Personalization of search engine services for effective retrieval and knowledge management. In Proceedings of the 21st Intern. Conf. on Inf. Systems, pages 20--34, Brisbane, Australia, 2000.
[8]
W. Fan, M. D. Gordon, and P. Pathak. Discovery of context-specific ranking functions for effective information retrieval using genetic programming. IEEE Transactions on Knowledge and Data Engineering, 16(4):523--527, 2004.
[9]
W. Fan, M. D. Gordon, and P. Pathak. A generic ranking function discovery framework by genetic programming for information retrieval. Information Processing and Management, 40(4):587--602, 2004.
[10]
W. Fan, M. D. Gordon, and P. Pathak. Genetic programming-based discovery of ranking functions for effective web search. Journal of Manag. Inf. Syst., 21(4):37--56, 2005.
[11]
J. R. Koza. Genetic Programming: On the programming of computers by natural selection. MIT Press, Cambridge, 1992.
[12]
A. Lacerda, M. Cristo, M. A. Goncalves, W. Fan, N. Ziviani, and B. Ribeiro--Neto. Learning to advertise. In Proceedings of the 29th ACM SIGIR, pages 549--556, 2006.
[13]
N. Oren. Reexamining tf.idf based information retrieval with genetic programming. In Proceedings of the SAICSIT 2002 Conference, pages 224--234, 2002.
[14]
P. Pathak, M. Gordon, and W. Fan. Effective information retrieval using genetic algorithms based matching functions adaptation. In Proceedings of the 33rd HICSS, Hawaii, 2000.
[15]
B. Pôssas, N. Ziviani, J. Wagner Meira, and B. Ribeiro-Neto. Set-based vector model: An efficient approach for correlation-based ranking. ACM TOIS, 23(4):397--429, 2005.
[16]
S. E. Robertson and K. S. Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129--146, 1976.
[17]
S. E. Robertson and S. Walker. Okapi/keenbow at TREC-8. In Proceedings of TREC-8, pages 151--162, Gaithersburg, MD, 1999. NIST Special Publication 500-246.
[18]
S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proceedings of TREC-3, pages 109--126, Gaithersburg, MD, 1995. NIST Special Publication 500-226.
[19]
G. Salton. The SMART retrieval system - Experiments in automatic document processing. Prentice Hall Inc., Upper Saddle River, NJ, 1971.
[20]
G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513--523, 1988.
[21]
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 19th ACM SIGIR, pages 21--29, 1996.
[22]
A. Trotman. Learning to rank. Information Retrieval, 8(3):359--381, 2005.
[23]
C. C. Vogt and G. W. Cottrell. Fusion via a linear combination of scores. Information Retrieval, 1(3):151--173, 1999.
[24]
E. M. Voorhees and D. Harman. Overview of the eighth Text REtrieval Conference (TREC-8). In Proceedings of TREC-8, pages 1--24, Gaithersburg, MD, 1999. NIST Spec.Publ. 500-246.
[25]
I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, San Francisco, CA, 1999.
[26]
J. Zobel and A. Moffat. Exploring the similarity space. SIGIR Forum, 32(1):453--490, 1998.

Cited By

View all
  • (2024)Learning to rank through graph-based feature fusion using fuzzy integral operatorsApplied Intelligence10.1007/s10489-024-05755-w54:22(11914-11932)Online publication date: 1-Nov-2024
  • (2021)Twitter trends: A ranking algorithm analysis on real time dataExpert Systems with Applications10.1016/j.eswa.2020.113990164(113990)Online publication date: Feb-2021
  • (2020)Learning to Weight for Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.288344632:2(302-316)Online publication date: 1-Feb-2020
  • Show More Cited By

Index Terms

  1. A combined component approach for finding collection-adapted ranking functions based on genetic programming

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
      July 2007
      946 pages
      ISBN:9781595935977
      DOI:10.1145/1277741
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 July 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. genetic programming
      2. information retrieval
      3. machine learning
      4. ranking functions
      5. term-weighting

      Qualifiers

      • Article

      Conference

      SIGIR07
      Sponsor:
      SIGIR07: The 30th Annual International SIGIR Conference
      July 23 - 27, 2007
      Amsterdam, The Netherlands

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Learning to rank through graph-based feature fusion using fuzzy integral operatorsApplied Intelligence10.1007/s10489-024-05755-w54:22(11914-11932)Online publication date: 1-Nov-2024
      • (2021)Twitter trends: A ranking algorithm analysis on real time dataExpert Systems with Applications10.1016/j.eswa.2020.113990164(113990)Online publication date: Feb-2021
      • (2020)Learning to Weight for Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.288344632:2(302-316)Online publication date: 1-Feb-2020
      • (2020)Effective Lightweight Learning-to-Rank Method Using Unified Term ImpactsIEEE Access10.1109/ACCESS.2020.29869438(70420-70437)Online publication date: 2020
      • (2020)Information Retrieval and Artificial IntelligenceA Guided Tour of Artificial Intelligence Research10.1007/978-3-030-06170-8_5(147-180)Online publication date: 8-May-2020
      • (2019)An effective and efficient algorithm for ranking web documents via genetic programmingProceedings of the 34th ACM/SIGAPP Symposium on Applied Computing10.1145/3297280.3297385(1065-1072)Online publication date: 8-Apr-2019
      • (2018)DiTeX: Disease-related topic extraction system through internet-based sourcesPLOS ONE10.1371/journal.pone.020193313:8(e0201933)Online publication date: 3-Aug-2018
      • (2018)Worldwide emerging disease-related information extraction system from news dataProceedings of the 16th ACM Conference on Embedded Networked Sensor Systems10.1145/3274783.3275168(331-332)Online publication date: 4-Nov-2018
      • (2018)Data-Fusion Techniques for Open-Set Recognition ProblemsIEEE Access10.1109/ACCESS.2018.28242406(21242-21265)Online publication date: 2018
      • (2018)Characteristics Analysis of Data From News and Social Network ServicesIEEE Access10.1109/ACCESS.2018.28187926(18061-18073)Online publication date: 2018
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media