skip to main content
10.1145/3342558.3345402acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Multi-Objective GP Strategies for Topical Search Integrating Wikipedia Concepts

Published: 23 September 2019 Publication History

Abstract

Genetic Programming techniques have demonstrated great potential in dealing with the problem of query generation. This work explores different Multi-Objective Genetic Programming strategies for evolving a collection of topic-based Boolean queries. It compares three approaches to build topical Boolean queries: using terms, incorporating Wikipedia semantics (Wikipedia concepts) and a hybrid approach, using a combination of both terms and concepts. In addition, different fitness functions are combined giving rise to seven multi-objective schemes. In particular, we investigate the use of the proposed strategies in conjunction with novel fitness functions aimed at attaining high diversity based on the information-theoretic notion of entropy and Jaccard similarity. Experiments were completed using 25 topics from a dataset consisting of approximately 350,000 webpages classified into 448 topics. The results reveal that the use of Wikipedia concepts does not result in statistically significant improvements in precision, global recall or diversity when compared to the term-based approaches. However, the use of concepts has a positive effect on query interpretability since the use of terms leads to artificial queries that are hard to interpret by humans. In the meantime, concept-based queries contain a smaller number of operands than the term-based ones, hence resulting in better execution times without a loss in retrieval performance.

Supplementary Material

baggio (baggio.zip)
Supplemental movie, appendix, image and software files for, Multi-Objective GP Strategies for Topical Search Integrating Wikipedia Concepts

References

[1]
Cecilia Baggio, Rocío L. Cecchini, Carlos M. Lorenzetti, and Ana G. Maguitman. 2016. An Entropy-Based Approach for Preserving Diversity in Evolutionary Topical Search. In Simposio Argentino de Inteligencia Artificial (ASAI) - JAIIO 45. 62--69.
[2]
Jay Budzik, Kristian J Hammond, and Larry Birnbaum. 2001. Information Access in Context. Knowledge-Based Systems 14 (2001).
[3]
Rocío L. Cecchini, Carlos M. Lorenzetti, Ana G. Maguitman, and Nélida B. Brignole. 2008. Using genetic algorithms to evolve a population of topical queries. Information Processing and Management 44, 6 (2008).
[4]
Rocío L. Cecchini, Carlos M. Lorenzetti, Ana G. Maguitman, and Nélida B. Brignole. 2010. Multiobjective evolutionary algorithms for context-based search. Journal of the American Society for Information Science and Technology (2010).
[5]
Rocío L. Cecchini, Carlos M. Lorenzetti, Ana G. Maguitman, and Ignacio Ponzoni. 2018. Topic relevance and diversity in information retrieval from large datasets: A multi-objective evolutionary algorithm approach. Applied Soft Computing (2018).
[6]
Sitong Chen, Abidalrahman Moh'd, Seyednaser Nourashrafeddin, and Evangelos Milios. 2018. Active High-Recall Information Retrieval from Domain-Specific Text Corpora based on Query Documents. In Proc. of the ACM Symposium on Document Engineering 2018. ACM.
[7]
Carlos A Coello Coello, Gary B Lamant, and David A Van Veldhuizen. 2007. Evolutionary Algorithms for Solving Multi-Objective Problems (2nd ed.). Springer-Verlag New York, Inc, New York, NY, Boston, MA.
[8]
Oscar Cordón, Enrique Herrera-Viedma, and María Luque. 2002. Evolutionary learning of boolean queries by multiobjective genetic programming. In International Conference on Parallel Problem Solving from Nature. Springer, 710--719.
[9]
O. Cordón, E. Herrera-Viedma, and M. Luque. 2006. Improving the learning of Boolean queries by means of a multiobjective IQBE evolutionary algorithm. Information Processing and Management (2006).
[10]
Charles Darwin. 1859. On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life. John Murray, UK.
[11]
Agoston E. Eiben and James E. Smith. 2003. Introduction to Evolutionary Computing. Springer.
[12]
Paolo Ferragina and Ugo Scaiella. 2010. TAGME: On-the-fly Annotation of Short Text Fragments (by Wikipedia Entities). In Proc. of the 19th ACM international conference on Information and knowledge management. ACM, 1625--1628.
[13]
Paolo Ferragina and Ugo Scaiella. 2012. Fast and Accurate Annotation of Short Texts with Wikipedia Pages. IEEE Software 29, 1 (2012).
[14]
Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2002. Placing search in context: The concept revisited. ACM Transactions on information systems 20, 1 (2002).
[15]
Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI International Joint Conference on Artificial Intelligence (2007).
[16]
Giannis Varelas, Epimenidis Voutsakis, Paraskevi Raftopoulou, Euripides G.M. Petrakis, and Evangelos E. Milios. 2005. Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web. World Wide Web Internet And Web Information Systems (2005).
[17]
Samer Hassan and Rada Mihalcea. 2011. Semantic Relatedness Using Salient Semantic Analysis. Proc. of the 25th AAAI Conf. on Artificial Intelligence (2011).
[18]
John H Holland. 1975. Adaptation in natural and artificial systems. The University of Michigan Press, Ann Arbor, MI, USA.
[19]
Hao Hu, Mingxi Zhang, Zhenying He, Peng Wang, and Wei Wang. 2013. Diversifying Query Suggestions by Using Topics from Wikipedia. In Proc. of the 2013 International Joint Conferences on Web Intelligence and Intelligent Agent Technologies. IEEE Computer Society, 139--146.
[20]
Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou. 2009. Exploiting Wikipedia as external knowledge for document clustering. Proc. of the 15th international conference on Knowledge discovery and data mining (2009).
[21]
Deb Kalyanmoy. 2001. Multi-Objective Optimization Using Evolutionary Algorithms. John Wiley and Sons.
[22]
John R. Koza. 1992. Genetic programming: on the programming of computers by means of natural selection. MIT press.
[23]
David Leake, Ana Maguitman, and Thomas Reichherzer. 2005. Exploiting rich context: An incremental approach to context-based web search. In International and Interdisciplinary Conference on Modeling and Using Context. Springer, 254--267.
[24]
Haifeng Liu, Xiangjie Kong, Xiaomei Bai, Wei Wang, Teshome Megersa Bekele, and Feng Xia. 2015. Context-based collaborative filtering for citation recommendation. IEEE Access 3 (2015).
[25]
Shuang Liu, Fang Liu, Clement Yu, and Weiyi Meng. 2004. An effective approach to document retrieval via utilizing WordNet and recognizing phrases. Proc. of the 27th annual international conference on Research and development in IR (2004).
[26]
Antonio Gabriel López-Herrera, Enrique Herrera-Viedma, and Francisco Herrera. 2009. A study of the use of multi-objective evolutionary algorithms to learn Boolean queries: A comparative study. Journal of the American Society for Information Science and Technology (2009).
[27]
Carlos Lorenzetti, Ana Maguitman, and Cecilia Baggio. 2019. DMOZ 2006 Dataset and its Wikification. Mendeley Data, v1.
[28]
Carlos M Lorenzetti and Ana G Maguitman. 2009. A semi-supervised incremental algorithm to automatically formulate topical queries. Information Sciences 179, 12 (2009).
[29]
Sean Luke and Liviu Panait. 2002. Fighting Bloat with Nonparametric Parsimony Pressure. (2002).
[30]
Jing Luo, Bo Meng, Changqin Quan, and Xinhui Tu. 2015. Exploiting salient semantic analysis for IR. Enterprise Information Systems 10 (2015).
[31]
Pekka Malo, Pyry Siitari, and Ankur Sinha. 2013. Automated query learning with Wikipedia and genetic programming. Artificial Intelligence 194 (2013).
[32]
Rada Mihalcea and Andras Csomai. 2007. Wikify!: linking documents to encyclopedic knowledge. Proc. of the sixteenth ACM conference on Conference on information and knowledge management (2007).
[33]
David Milne and Ian H. Witten. 2007. An Effective, Low-Cost Measure of Semantic Relatedness Obtained from Wikipedia Links. Artificial Intelligence (2007).
[34]
David Milne and Ian H. Witten. 2008. Learning to link with wikipedia. Proc. of the 17th ACM conference on Information and knowledge mining - CIKM '08 (2008).
[35]
Seyednaser Nourashrafeddin, Evangelos Milios, and Drik V. Arnold. 2014. An ensemble approach for text document clustering using Wikipedia concepts. Proc. of the 2014 ACM symposium on Document engineering (2014).
[36]
Xiang Ren, Yujing Wang, Xiao Yu, Jun Yan, Zheng Chen, and Jiawei Han. 2014. Heterogeneous graph-based intent learning with queries, web pages and Wikipedia concepts. Proc. of the 7th ACM international conference on Web search and data mining (2014).
[37]
Bradley J. Rhodes and Thad Starner. 1996. Remembrance Agent: A Continuously Running Automated Information Retrieval System. In Proc. of The First International Conference on The Practical Application Of Intelligent Agents and Multi Agent Technology. 487--493.
[38]
Ehsan Sherkat and Evangelos E. Milios. 2017. Vector embedding of wikipedia concepts and entities. In International conference on applications of natural language to information systems. Springer, 418--428.
[39]
Michael Strube and Simone Paolo Ponzetto. 2006. WikiRelate! Computing Semantic Relatedness Using Wikipedia. February (2006).

Cited By

View all
  • (2023)Multi-objective genetic programming strategies for topic-based search with a focus on diversity and global recallPeerJ Computer Science10.7717/peerj-cs.17109(e1710)Online publication date: 30-Nov-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '19: Proceedings of the ACM Symposium on Document Engineering 2019
September 2019
254 pages
ISBN:9781450368872
DOI:10.1145/3342558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 September 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Query Optimization
  2. Recall Maximization
  3. Similarity Measures
  4. Topical Recommendation
  5. Wikification

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Department of Foreign Affairs, Trade and Development of Canada
  • Natural Sciences and Engineering Research Council of Canada
  • Agencia Nacional de Promoción Científica y Tecnológica

Conference

DocEng '19
Sponsor:
DocEng '19: ACM Symposium on Document Engineering 2019
September 23 - 26, 2019
Berlin, Germany

Acceptance Rates

DocEng '19 Paper Acceptance Rate 30 of 77 submissions, 39%;
Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Multi-objective genetic programming strategies for topic-based search with a focus on diversity and global recallPeerJ Computer Science10.7717/peerj-cs.17109(e1710)Online publication date: 30-Nov-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media