ABSTRACT
Information retrieval (IR) evaluation scores are generally designed to measure the effectiveness with which relevant documents are identified and retrieved. Many scores have been proposed for this purpose over the years. These have primarily focused on aspects of precision and recall, and while these are often discussed with equal importance, in practice most attention has been given to precision focused metrics. Even for recall-oriented IR tasks of growing importance, such as patent retrieval, these precision based scores remain the primary evaluation measures. Our study examines different evaluation measures for a recall-oriented patent retrieval task and demonstrates the limitations of the current scores in comparing different IR systems for this task. We introduce PRES, a novel evaluation metric for this type of application taking account of recall and the user's search effort. The behaviour of PRES is demonstrated on 48 runs from the CLEF-IP 2009 patent retrieval track. A full analysis of the performance of PRES shows its suitability for measuring the retrieval effectiveness of systems from a recall focused perspective taking into account the user's expected search effort.
- Ali M. S., Consens, M. P., Kazai, G., and Lalmas, M. Structural relevance: A common basis for the evaluation of structured document retrieval. In Proceedings of CIKM '08, pages 1153--1162, 2008. Google ScholarDigital Library
- Aslam J. A., and E. Yilmaz. Estimating average precision with incomplete and imperfect judgments. In Proceedings of CIKM' 06, page 102--111, 2006. Google ScholarDigital Library
- Azzopardi L., de Rijke, M., and K. Balog. Building simulated queries for known-item topics: an analysis using six european languages. In Proceedings of SIGIR '07, pages 455--462, 2007. Google ScholarDigital Library
- Azzopardi, L. and Vinay, V. Retrievability. An evaluation measure for higher order information access tasks. In Proccedings of CIKM '08, pages 1425--1426, 2008. Google ScholarDigital Library
- Baeza-Yates, J., and Ribeiro-Neto, B. Modern Information Retrieval. Addison Wesley, 1999. Google ScholarDigital Library
- Bashir, S., and Rauber A. Analyzing Document Retrievability in Patent Retrieval Settings. In Proceedings of Database and Expert Systems Applications (DEXA 2009), pages 753--760, 2009. Google ScholarDigital Library
- Buckley, C., and Voorhees, E. M. Evaluating Evaluation Measure Stability. In Proceedings of SIGIR 2000, pages 33--40, 2000. Google ScholarDigital Library
- Buckley, C., Dimmick, D., Soboroff, I., and E. Voorhees. Bias and the limits of pooling. In Proceedings of SIGIR '06, pages 619--620, 2006. Google ScholarDigital Library
- Buckley, C., and Voorhees, E. M. Retrieval evaluation with incomplete information. In Proceedings of SIGIR '04, pages 25--32, 2004. Google ScholarDigital Library
- Carterette, B., Bennett, P. N. Chickering, D. M., and Dumais, S. T. Here or There: Preference Judgments for Relevance. In Proceedings of ECIR '08, pages 16--27, 2008. Google ScholarDigital Library
- Cleverdon, C. The Cranfield Tests on Index Language Devices. In: Sparck Jones, K. and Willett, P. (eds.). Readings in Information Retrieval, pages 47--59, Morgan Kaufmann, 1997. Google ScholarDigital Library
- Hull, D. Using statistical testing in the evaluation of retrieval experiments. In Proceedings of SIGIR '93, pages 329--338, 1993. Google ScholarDigital Library
- Fujii, A., Iwayama, M., and Kando, N. Overview of Patent Retrieval Task at NTCIR-4. In Proceedings of the 4th NTCIR Workshop, 2004.Google Scholar
- Graf, E., and Azzopardi, L. A methodology for building a patent test collection for prior art search. In Proceedings of the 2nd EVIA Workshop, pages 60--71, 2008.Google Scholar
- Jordan, C., Watters, C., and Gao, Q. Using controlled query generation to evaluate blind relevance feedback algorithms. In Proceedings of JCDL '06, pages 286--295, 2006. Google ScholarDigital Library
- Kamps, J., Pehcevski, J., Kazai, G., Lalmas, M., and Robertson, S. INEX 2007 evaluation measures. In Proceedings of INEX '07, pages 24--33, 2007.Google Scholar
- Kendall, M. A new measure of rank correlation. Biometrika, 30(1/2):81--93, 1938.Google ScholarCross Ref
- Mandl, T. Recent developments in the evaluation of information retrieval systems: moving toward diversity and practical applications. Informatica, 32:27--38, 2008.Google Scholar
- Moffat, A., and Zobel, J. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst. 27(1):1--27, 2008. Google ScholarDigital Library
- Oard, D. W., Hedin, B., Tomlinson, S., and Baron, J. R. Overview of the TREC 2008 legal track. In Proceedings of TREC 2008, 2008.Google Scholar
- van Rijsbergen, C. J. Information Retrieval, 2nd edition. Butterworths, 1979. Google ScholarDigital Library
- Robertson S. E. The parametric description of the retrieval tests. Part 2: Overall measures. Journal of Documentation, 25(2):93--107, 1969.Google ScholarCross Ref
- Robertson, S. A new interpretation of average precision. In Proceedings of SIGIR '08, pages 689--690, 2008. Google ScholarDigital Library
- Rocchio J. Performance indices for document retrieval systems. In Information storage and retrieval, Computation Laboratory of Harvard University, Cambridge, MA, 1964.Google Scholar
- Roda G., Tait J., Piroi F., and Zenz V. CLEF-IP 2009: retrieval experiments in the Intellectual Property domain. In Proceedings of CLEF '09, 2009. Google ScholarDigital Library
- Tague J., Nelson, M., and Wu, H. Problems in the simulation of bibliographic retrieval systems. In Proceeding of SIGIR '81, pages 66--71, 1981. Google ScholarDigital Library
- Tomlinson S., Oard, D. W., Baron, J. R., and Thompson, P. Overview of the TREC 2007 Legal Track. In Proceedings of TREC 2007, 2007.Google Scholar
- Voorhees, E. M., and Tice, D. M. The TREC-8 Question Answering Track Evaluation. In Proceedings of TREC 1999, pages 77--82, 1999.Google Scholar
- Voorhees, E. M. The Philosophy of Information Retrieval Evaluation. In Evaluation of Cross-Language Information Retrieval System, Proceedings of CLEF '02, pages 355--370, 2002. Google ScholarDigital Library
- Voorhees, E. M. The TREC robust retrieval track. In SIGIR Forum 39(1):11--20, 2005. Google ScholarDigital Library
- Xue, X., and Croft W. B. Automatic Query Generation for Patent Search. In Proceedings of CIKM'09, pages 2037--2040, 2009. Google ScholarDigital Library
- Zhu, J., and Tait, J. A proposal for chemical information retrieval evaluation. In In Proceedings of the 1st ACM Workshop on Patent Information Retrieval at CIKM '08, pages 15--18, 2008. Google ScholarDigital Library
Index Terms
- PRES: a score metric for evaluating recall-oriented information retrieval applications
Recommendations
Proposal of two-stage patent retrieval method considering the claim structure
The importance of patents is increasing in global society. In preparing a patent application, it is essential to search for related patents that may invalidate the invention. However, it is time-consuming to identify them among the millions of patents. ...
Learning-Based pseudo-relevance feedback for patent retrieval
IRFC'12: Proceedings of the 5th conference on Multidisciplinary Information RetrievalPseudo-relevance feedback (PRF) is an effective approach in Information Retrieval but unfortunately many experiments have shown that PRF is ineffective in patent retrieval. This is because the quality of initial results in the patent retrieval is poor ...
An empirical study on retrieval models for different document genres: patents and newspaper articles
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrievalReflecting the rapid growth in the utilization of large test collections for information retrieval since the 1990s, extensive comparative experiments have been performed to explore the effectiveness of various retrieval models. However, most collections ...
Comments