Despite the Harvard Business Review‘s insistence that data analyst is the sexiest job of the 21st century, ask a non-quant about popular references to data analyssis and you are quite likely to hear some reference to Moneyball (be that book or film). Spoiler alert: “sabermetric” data analysis enabled a baseball team with less money to beat another one that had a lot more money.
Very cool, except – in possibly the most inflammatory statement likely to make it onto this blog – in general watching team sport matches at length is pretty pointless.
Evidence? Clauset et al. have contributed to the field in their recent paper “Safe leads and lead changes in competitive team sports”, published recently in the Physical Review journal.
Within it, they attempt to use data to model and validate how the lead changes between teams playing certain sports. For instance, team A might score the first point in a match, but – specific-sport-allowing – team B might well then score 2 points and seize the lead. The usual rule of course is whoever happens to have the lead after a set amount of time is deemed the winner.
Although they dabble quite successfully in others, the sport they model most accurately is basketball. Their rationale for starting here is that basketball has a high rate of points scoring, with NBA statistics showing an average of 93.6 baskets with an average value of 2.07 points per basket.
Modelling frequent events accurately is almost always easier than modelling infrequent events, so it’s clear why they picked basketball over UK football for instance, where FiveThirtyEight reports that the most common score found in almost 200,000 English football games was a thrilling 1:0. This occurred in about 16% of the matches. In fact not far off 10% of games ended with no-one scoring and no-one winning at all, just to make it sound even more exciting.
Anyway, that aside, how did Clauset’s team model the changes in lead of basketball so accurately that it significantly beat previous heuristics? Advanced logistic neural network forest tree linear super-regressions? Nope, they used a random walk.
For those unfamiliar with random walk models, it’s quite easy to understand at least at the simplest level.
You can imagine a random walk in physical terms. Consider a situation where you’re standing on a platform and can walk either forwards or backwards. Flip a coin – heads you walk forwards, tails you walk backwards. Repeat until 48 minutes have elapsed and consider that your result.
Sounds fantastically trivial, right? What in the uber-complexities of reality could really be modelled by anything derived from such a basic process? Oh, nothing much, just simple things like the stock market and molecular movements amongst others.
And sports, apparently.
The team concludes:
A model based on random walks provides a remarkably good description for the dynamics of scoring in competitive team sports.
In fact the same set of laws can determine many aspects of having the lead in a game.
…we found that the celebrated arcsine law of Eq. (1) closely describes the distribution of times for: (i) one team is leading …,
(ii) the last lead change in a game …
and (iii) when the maximal lead in the game occurs…
The model even covers the empirical fact that if something exciting is going to happen (an “extremal value”) then it tends to be near the very start or the very end of the game.
Lest it be said that I am unfairly representing the model due to my personal views of the merits of long-term sport viewing, towards the end of the article the authors similarly commit:
Cynically, our results suggest that one should watch only the first few and last few minutes of a professional basketball game; the rest of the game is as predictable as watching repeated coin tossings.
And I don’t think they mean that in a positive way!
For the full formulae, validation and so on, see the original paper.
But being in the middle of an arena-crowd watching said sport is probably not an ideal time to whip out the scientific calculator to determine if the lead will change and when – so there is a handy rule of thumb one can use to determine if the match is effectively over, as Slate reports.
it can be expressed as a rule of thumb for determining what the lead and remaining time have to be for a team to have a 90 percent chance at maintaining that lead:
L = .4602√t
, where L is the lead and t is the number of seconds remaining.*
As even the most ardent fan is unlikely to think in terms of seconds remaining, the below chart will tell you when it’s safe to make your excuses and leave the NBA stadium, assuming a 90% confidence level is within your tolerance.
Assuming a standard 48 minute basketball match, locate the number of minutes that have elapsed already on the x axis, and if the current winning team is leading by at least the y-axis number of points then they are at least 90% sure to win overall. For instance, if you’ve watched 40 minutes of play, and your team is ahead by around 10 points then there’s really not much point in watching it play out – go flip some real coins at the bartender whilst there’s not a queue.
(Journal reference: Phys. Rev. E 91, 062815 (2015))