tutorial

Towards minimal test collections for evaluation of audio music similarity and retrieval

Authors:
Julián Urbano

University Carlos III of Madrid, Leganes, Spain

University Carlos III of Madrid, Leganes, Spain
View Profile

,
Markus Schedl

Johannes Kepler University, Linz, Austria

Johannes Kepler University, Linz, Austria
View Profile

WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebApril 2012Pages 917–924https://doi.org/10.1145/2187980.2188223

Published:16 April 2012Publication History

WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web

Pages 917–924

ABSTRACT

Reliable evaluation of Information Retrieval systems requires large amounts of relevance judgments. Making these annotations is quite complex and tedious for many Music Information Retrieval tasks, so performing such evaluations requires too much effort. A low-cost alternative is the application of Minimal Test Collection algorithms, which offer quite reliable results while significantly reducing the annotation effort. The idea is to incrementally select what documents to judge so that we can compute estimates of the effectiveness differences between systems with a certain degree of confidence. In this paper we show a first approach towards its application to the evaluation of the Audio Music Similarity and Retrieval task, run by the annual MIREX evaluation campaign. An analysis with the MIREX 2011 data shows that the judging effort can be reduced to about 35% to obtain results with 95% confidence.

References

B. Carterette. Low-Cost and Robust Evaluation of Information Retrieval Systems. Ph.D. dissertation, Department of Computer Science, University of Massachusetts Amherst, 2008. Google ScholarDigital Library
B. Carterette. Robust Test Collections for Retrieval Evaluation. In International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 55--62, 2007. Google ScholarDigital Library
B. Carterette, J. Allan, and R. Sitaraman. Minimal Test Collections for Retrieval Evaluation. In International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 268--275, 2006. Google ScholarDigital Library
J.S. Downie. The Scientific Evaluation of Music Information Retrieval Systems: Foundations and Future. Computer Music Journal. 28(2): 12--23, 2004. Google ScholarDigital Library
J.S. Downie, A.F. Ehmann, M. Bay, and M.C. Jones. . The Music Information Retrieval Evaluation eXchange: Some Observations and Insights. In Advances in Music Information Retrieval, W.R. Zbigniew and A.A. Wieczorkowska, eds. Springer. 2010, 93--115.Google Scholar
J. Urbano. Information Retrieval Meta-Evaluation: Challenges and Opportunities in the Music Domain. In International Society for Music Information Retrieval Conference, pages 609--614, 2011.Google Scholar
J. Urbano, D. Martín, M. Marrero, and J. Morato. Audio Music Similarity and Retrieval: Evaluation Power and Stability. In International Society for Music Information Retrieval Conference, pages 597--602, 2011.Google Scholar
E.M. Voorhees. Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. Information Processing and Management. 36(5): 697--716, 2000. Google ScholarDigital Library
E.M. Voorhees and D.K. Harman. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005. Google ScholarDigital Library

Index Terms

Towards minimal test collections for evaluation of audio music similarity and retrieval
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

Minimal test collections for retrieval evaluation
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Accurate estimation of information retrieval evaluation metrics such as average precision require large sets of relevance judgments. Building sets large enough for evaluation of real-world implementations is at best inefficient, at worst infeasible. In ...
Read More
Music similarity and retrieval
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

This tutorial serves as an introductory course to the field of and state-of-the-art in music information retrieval (MIR) and in particular to music similarity estimation which is an essential component of music retrieval. Apart from explaining ...
Read More
Robust test collections for retrieval evaluation
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Low-cost methods for acquiring relevance judgments can be a boon to researchers who need to evaluate new retrieval tasks or topics but do not have the resources to make thousands of judgments. While these judgments are very useful for a one-time ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web
April 2012
1250 pages
ISBN:9781450312301
DOI:10.1145/2187980
General Chairs:
Alain Mille
Université de Lyon, France
,
Fabien Gandon
INRIA, France
,
Jacques Misselis
HP, France
,
Program Chairs:
Michael Rabinovich
Case Western Reserve University, USA
,
Steffen Staab
University of Koblenz-Landau, Germany
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 April 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
evaluation
music information retrieval
relevance judgments
test collections
Qualifiers
- tutorial
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 91
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards minimal test collections for evaluation of audio music similarity and retrieval

WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Minimal test collections for retrieval evaluation

Music similarity and retrieval

Robust test collections for retrieval evaluation