ABSTRACT
Predicting the outcome of sports events is a hard task. We quantify this difficulty with a coefficient that measures the distance between the observed final results of sports leagues and idealized perfectly balanced competitions in terms of skill. This indicates the relative presence of luck and skill. We collected and analyzed all games from 198 sports leagues comprising 1503 seasons from 84 countries of 4 different sports: basketball, soccer, volleyball and handball. We measured the competitiveness by countries and sports. We also identify in each season which teams, if removed from its league, result in a completely random tournament. Surprisingly, not many of them are needed. As another contribution of this paper, we propose a probabilistic graphical model to learn about the teams' skills and to decompose the relative weights of luck and skill in each game. We break down the skill component into factors associated with the teams' characteristics. The model also allows to estimate as 0.36 the probability that an underdog team wins in the NBA league, with a home advantage adding 0.09 to this probability. As shown in the first part of the paper, luck is substantially present even in the most competitive championships, which partially explains why sophisticated and complex feature-based models hardly beat simple models in the task of forecasting sports' outcomes.
Supplemental Material
- C. Anderson and D. Sally 2013. The Numbers Game: Why Everything You Know about Football is Wrong. Penguin Books, Limited, UK.Google Scholar
- E Ben-Naim, NW Hengartner, S Redner, and F Vazquez. 2013. Randomness in competitions. Journal of Statistical Physics Vol. 151, 3--4 (2013), 458--474.Google ScholarCross Ref
- E Ben-Naim, NW, F Vazquez, and S Redner. 2007. What is the most Competitive Sport? Journal of the Korean Physics Society Vol. 50 (2007), 124--126. Google ScholarCross Ref
- Eli Ben-Naim, Federico Vazquez, and Sidney Redner. 2006. Parity and predictability of competitions. Journal of Quantitative Analysis in Sports Vol. 2, 4 (2006), 1--12. Google ScholarCross Ref
- Joel Brooks, Matthew Kerr, and John Guttag 2016. Developing a Data-Driven Player Ranking in Soccer Using Predictive Model Weights Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, USA, 49--55.Google Scholar
- William Chan, Pascal Courty, and Li Hao 2009. Suspense: Dynamic Incentives in Sports Contests. The Economic Journal Vol. 119, 534 (2009), 24--46.Google ScholarCross Ref
- Shuo Chen and Thorsten Joachims 2016. Predicting matchups and preferences in context. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, ACM, USA, 775--784. Google ScholarDigital Library
- Wei-Sen Chen and Yin-Kuan Du 2009. Using neural networks and data mining techniques for the financial distress prediction model. Expert Systems with Applications Vol. 36, 2 (2009), 4075--4086. Google ScholarDigital Library
- Raphael Chetrite, Roland Diel, and Matthieu Lerasle. 2015. The number of potential winners in Bradley-Terry model in random environment. arXiv preprint arXiv:1509.07265 Vol. - (2015).Google Scholar
- Rodney Fort and Joel Maxcy 2003. "Competitive Balance in Sports Leagues: An Introduction". Journal of Sports Economics Vol. 4, 2 (2003), 154--160. Google ScholarCross Ref
- Rodney Fort and James Quirk 2011. Optimal competitive balance in a season ticket league. Economic inquiry, Vol. 49, 2 (2011), 464--473. Google ScholarCross Ref
- Alan Gabel and Sidney Redner 2012. Random Walk Picture of Basketball Scoring. Journal of Quantitative Analysis in Sports Vol. 8, 1 (2012), 1--18.Google ScholarCross Ref
- Dominique Haughton, Mark-David McLaughlin, Kevin Mentzer, and Changan Zhang 2015. Oscar Prediction and Prediction Markets. Movie Analytics. Springer, -, 37--39.Google Scholar
- I.U.L. Khanin. 2000. Emotions in Sport. Human Kinetics, -.Google Scholar
- Travis Martin, Jake M. Hofman, Amit Sharma, Ashton Anderson, and Duncan J. Watts 2016. Exploring Limits to Prediction in Complex Social Systems Proceedings of the 25th International Conference on World Wide Web. WWW '16, -, 683--694.Google Scholar
- Sears Merritt and Aaron Clauset 2014. Scoring dynamics across professional team sports: tempo, balance and predictability. EPJ Data Science, Vol. 3, 1 (2014), 4. Google ScholarCross Ref
- P Dorian Owen. 2013. Measurement of competitive balance and uncertainty of outcome. Handbook on the economics of professional football, Vol. -, - (2013), 41--59.Google Scholar
- Leto Peel and Aaron Clauset 2015. Predicting sports scoring dynamics with restoration and anti-persistence Data Mining (ICDM), 2015 IEEE International Conference on. IEEE, -, 339--348.Google Scholar
- Konstantinos Pelechrinis, Evangelos Papalexakis, and Christos Faloutsos 2016. Sportsnetrank: Network-based sports team ranking. ACM SIGKDD Workshop on Large Scale Sports Analytics Vol. - (2016).Google Scholar
- A. Shergold. 2015. Algerian League is so tight all 16 teams can mathematically still win the title with four rounds of matches to go. (2015). showURL%http://www.dailymail.co.uk/sport/football/article-3057285/Google Scholar
- D. Spiegelhalter. 2007. Football Leagues. (2007). http://understandinguncertainty.org/node/314shownote[http://understandinguncertainty.org/node/314; accessed 26-June-2016].Google Scholar
- David J Spiegelhalter, Nicola G Best, Bradley P Carlin, and Angelika Van Der Linde 2002. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 64, 4 (2002), 583--639.Google ScholarCross Ref
- Andranik Tumasjan, Timm Oliver Sprenger, Philipp G Sandner, and Isabell M Welpe 2010. Predicting elections with twitter: What 140 characters reveal about political sentiment. ICWSM, Vol. 10, 1 (2010), 178--185.Google ScholarCross Ref
- Jan Van Haaren, Horesh Ben Shitrit, Jesse Davis, and Pascal Fua 2016. Analyzing volleyball match data from the 2014 World Championships using machine learning techniques Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, USA.Google Scholar
- Pedro OS Vaz de Melo, Virgilio AF Almeida, Antonio AF Loureiro, and Christos Faloutsos 2012. Forecasting in the NBA and other team sports: Network effects in action. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 6, 3 (2012), 13.Google Scholar
- Petar Vravcar, Erik Štrumbelj, and Igor Kononenko. 2016. Modeling basketball play-by-play data. Expert Systems with Applications Vol. 44 (2016), 58--66. Google ScholarDigital Library
- Qing Wang, Hengshu Zhu, Wei Hu, Zhiyong Shen, and Yuan Yao 2015. Discerning tactical patterns for professional soccer teams: an enhanced topic model with applications. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ACM, -, 2197--2206. Google ScholarDigital Library
- Xinyu Wei, Patrick Lucey, Stuart Morgan, Peter Carr, Machar Reid, and Sridha Sridharan. 2015. Predicting serves in tennis using style priors. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, USA, 2207--2215. Google ScholarDigital Library
- Andrew S. Zimbalist. 2002. Competitive Balance in Sports Leagues: An Introduction. Journal of Sports Economics Vol. 3, 2 (2002), 111--121. Google ScholarCross Ref
Index Terms
- Luck is Hard to Beat: The Difficulty of Sports Prediction
Recommendations
A Bayesian Approach to In-Game Win Probability in Soccer
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningIn-game win probability models, which provide a sports team's likelihood of winning at each point in a game based on historical observations, are becoming increasingly popular. In baseball, basketball and American football, they have become important ...
Forecasting in the NBA and other team sports: Network effects in action
The multi-million sports-betting market is based on the fact that the task of predicting the outcome of a sports event is very hard. Even with the aid of an uncountable number of descriptive statistics and background information, only a few can ...
Beat the cheater: computing game-theoretic strategies for when to kick a gambler out of a casino
AAAI'14: Proceedings of the Twenty-Eighth AAAI Conference on Artificial IntelligenceGambles in casinos are usually set up so that the casino makes a profit in expectation--as long as gamblers play honestly. However, some gamblers are able to cheat, reducing the casino's profit. How should the casino address this? A common strategy is ...
Comments