skip to main content
10.1145/3097983.3098045acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Luck is Hard to Beat: The Difficulty of Sports Prediction

Published:13 August 2017Publication History

ABSTRACT

Predicting the outcome of sports events is a hard task. We quantify this difficulty with a coefficient that measures the distance between the observed final results of sports leagues and idealized perfectly balanced competitions in terms of skill. This indicates the relative presence of luck and skill. We collected and analyzed all games from 198 sports leagues comprising 1503 seasons from 84 countries of 4 different sports: basketball, soccer, volleyball and handball. We measured the competitiveness by countries and sports. We also identify in each season which teams, if removed from its league, result in a completely random tournament. Surprisingly, not many of them are needed. As another contribution of this paper, we propose a probabilistic graphical model to learn about the teams' skills and to decompose the relative weights of luck and skill in each game. We break down the skill component into factors associated with the teams' characteristics. The model also allows to estimate as 0.36 the probability that an underdog team wins in the NBA league, with a home advantage adding 0.09 to this probability. As shown in the first part of the paper, luck is substantially present even in the most competitive championships, which partially explains why sophisticated and complex feature-based models hardly beat simple models in the task of forecasting sports' outcomes.

Skip Supplemental Material Section

Supplemental Material

assuncao_sports_prediction.mp4

mp4

347.5 MB

References

  1. C. Anderson and D. Sally 2013. The Numbers Game: Why Everything You Know about Football is Wrong. Penguin Books, Limited, UK.Google ScholarGoogle Scholar
  2. E Ben-Naim, NW Hengartner, S Redner, and F Vazquez. 2013. Randomness in competitions. Journal of Statistical Physics Vol. 151, 3--4 (2013), 458--474.Google ScholarGoogle ScholarCross RefCross Ref
  3. E Ben-Naim, NW, F Vazquez, and S Redner. 2007. What is the most Competitive Sport? Journal of the Korean Physics Society Vol. 50 (2007), 124--126. Google ScholarGoogle ScholarCross RefCross Ref
  4. Eli Ben-Naim, Federico Vazquez, and Sidney Redner. 2006. Parity and predictability of competitions. Journal of Quantitative Analysis in Sports Vol. 2, 4 (2006), 1--12. Google ScholarGoogle ScholarCross RefCross Ref
  5. Joel Brooks, Matthew Kerr, and John Guttag 2016. Developing a Data-Driven Player Ranking in Soccer Using Predictive Model Weights Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, USA, 49--55.Google ScholarGoogle Scholar
  6. William Chan, Pascal Courty, and Li Hao 2009. Suspense: Dynamic Incentives in Sports Contests. The Economic Journal Vol. 119, 534 (2009), 24--46.Google ScholarGoogle ScholarCross RefCross Ref
  7. Shuo Chen and Thorsten Joachims 2016. Predicting matchups and preferences in context. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, ACM, USA, 775--784. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Wei-Sen Chen and Yin-Kuan Du 2009. Using neural networks and data mining techniques for the financial distress prediction model. Expert Systems with Applications Vol. 36, 2 (2009), 4075--4086. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Raphael Chetrite, Roland Diel, and Matthieu Lerasle. 2015. The number of potential winners in Bradley-Terry model in random environment. arXiv preprint arXiv:1509.07265 Vol. - (2015).Google ScholarGoogle Scholar
  10. Rodney Fort and Joel Maxcy 2003. "Competitive Balance in Sports Leagues: An Introduction". Journal of Sports Economics Vol. 4, 2 (2003), 154--160. Google ScholarGoogle ScholarCross RefCross Ref
  11. Rodney Fort and James Quirk 2011. Optimal competitive balance in a season ticket league. Economic inquiry, Vol. 49, 2 (2011), 464--473. Google ScholarGoogle ScholarCross RefCross Ref
  12. Alan Gabel and Sidney Redner 2012. Random Walk Picture of Basketball Scoring. Journal of Quantitative Analysis in Sports Vol. 8, 1 (2012), 1--18.Google ScholarGoogle ScholarCross RefCross Ref
  13. Dominique Haughton, Mark-David McLaughlin, Kevin Mentzer, and Changan Zhang 2015. Oscar Prediction and Prediction Markets. Movie Analytics. Springer, -, 37--39.Google ScholarGoogle Scholar
  14. I.U.L. Khanin. 2000. Emotions in Sport. Human Kinetics, -.Google ScholarGoogle Scholar
  15. Travis Martin, Jake M. Hofman, Amit Sharma, Ashton Anderson, and Duncan J. Watts 2016. Exploring Limits to Prediction in Complex Social Systems Proceedings of the 25th International Conference on World Wide Web. WWW '16, -, 683--694.Google ScholarGoogle Scholar
  16. Sears Merritt and Aaron Clauset 2014. Scoring dynamics across professional team sports: tempo, balance and predictability. EPJ Data Science, Vol. 3, 1 (2014), 4. Google ScholarGoogle ScholarCross RefCross Ref
  17. P Dorian Owen. 2013. Measurement of competitive balance and uncertainty of outcome. Handbook on the economics of professional football, Vol. -, - (2013), 41--59.Google ScholarGoogle Scholar
  18. Leto Peel and Aaron Clauset 2015. Predicting sports scoring dynamics with restoration and anti-persistence Data Mining (ICDM), 2015 IEEE International Conference on. IEEE, -, 339--348.Google ScholarGoogle Scholar
  19. Konstantinos Pelechrinis, Evangelos Papalexakis, and Christos Faloutsos 2016. Sportsnetrank: Network-based sports team ranking. ACM SIGKDD Workshop on Large Scale Sports Analytics Vol. - (2016).Google ScholarGoogle Scholar
  20. A. Shergold. 2015. Algerian League is so tight all 16 teams can mathematically still win the title with four rounds of matches to go. (2015). showURL%http://www.dailymail.co.uk/sport/football/article-3057285/Google ScholarGoogle Scholar
  21. D. Spiegelhalter. 2007. Football Leagues. (2007). http://understandinguncertainty.org/node/314shownote[http://understandinguncertainty.org/node/314; accessed 26-June-2016].Google ScholarGoogle Scholar
  22. David J Spiegelhalter, Nicola G Best, Bradley P Carlin, and Angelika Van Der Linde 2002. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 64, 4 (2002), 583--639.Google ScholarGoogle ScholarCross RefCross Ref
  23. Andranik Tumasjan, Timm Oliver Sprenger, Philipp G Sandner, and Isabell M Welpe 2010. Predicting elections with twitter: What 140 characters reveal about political sentiment. ICWSM, Vol. 10, 1 (2010), 178--185.Google ScholarGoogle ScholarCross RefCross Ref
  24. Jan Van Haaren, Horesh Ben Shitrit, Jesse Davis, and Pascal Fua 2016. Analyzing volleyball match data from the 2014 World Championships using machine learning techniques Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, USA.Google ScholarGoogle Scholar
  25. Pedro OS Vaz de Melo, Virgilio AF Almeida, Antonio AF Loureiro, and Christos Faloutsos 2012. Forecasting in the NBA and other team sports: Network effects in action. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 6, 3 (2012), 13.Google ScholarGoogle Scholar
  26. Petar Vravcar, Erik Štrumbelj, and Igor Kononenko. 2016. Modeling basketball play-by-play data. Expert Systems with Applications Vol. 44 (2016), 58--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Qing Wang, Hengshu Zhu, Wei Hu, Zhiyong Shen, and Yuan Yao 2015. Discerning tactical patterns for professional soccer teams: an enhanced topic model with applications. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ACM, -, 2197--2206. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Xinyu Wei, Patrick Lucey, Stuart Morgan, Peter Carr, Machar Reid, and Sridha Sridharan. 2015. Predicting serves in tennis using style priors. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, USA, 2207--2215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Andrew S. Zimbalist. 2002. Competitive Balance in Sports Leagues: An Introduction. Journal of Sports Economics Vol. 3, 2 (2002), 111--121. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Luck is Hard to Beat: The Difficulty of Sports Prediction

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
        August 2017
        2240 pages
        ISBN:9781450348874
        DOI:10.1145/3097983

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 August 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

        Upcoming Conference

        KDD '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader