Post by eric on Sept 14, 2018 20:49:56 GMT
This begs the question, how predictable should they be?
If we want to predict the outcome of a game, we can just look at which team has a better average margin of victory. But if we want to know how likely the win is, we need to look at how each team's margin varies. For example, if one team scored exactly 110 points every game and the other scored exactly 109, the first team would always win. If the first team instead scored from 101 to 120 and the other from 100 to 119, the first team would still win more games but it wouldn't be 100%.
It turns out that these ranges are well described by the normal distribution. Let's look at the 2001 Globetrotters team to illustrate. They scored 124 per game on average with a standard deviation of 14. If we overlay their actual distribution of points against the normal distribution defined by those two parameters using a moving seven point average, we get:
Nice! So to predict any given game we'll start with points per game scored for each team, adjust them based on the other team's defense compared to league average, adjust them again based on which team is at home, and end up with average expected points per game for each. Then we apply our standard deviation, and get distributions that look like this:
The team with a curve further to the right scores more points on average, while a team with a higher peak has less variation in how many they score. If we want to know how often the Jazz will win, we just take every point along their curve and see how much of the Bucks distribution is less than it. For example, the Jazz had a .0028 chance of scoring exactly 120 points, and the Bucks had a .6266 chance of scoring less than 120 points, so that point gave the Jazz a .0018 chance of winning. Graphically it looks like this:
When we get all the way through we find the Jazz had a .5992 chance of winning in Milwaukee and the Bucks had a .3988 chance, which is short of 100% because there was a .0020 chance they'd tie. (Obviously basketball games can't end in ties though, and it's very small so we can skip that part.) Now we just need to turn those figures plus the same for games played in Utah and we end up with the Bucks winning the series 25% of the time. If you're a real glutton for punishment, check out the algebra!
So in practice the teams only play enough games to decide the series, but luckily we don't need to know how long the series goes, which is a much more complicated equation. We only care about configurations where the Bucks will win at least four games out of the seven. If they win games 1-5 and lose games 6-7, that's just as good for our purposes as them losing games 1-3 and winning games 4-7. Since all of our results will have the Bucks winning at least four, there's never a case where the Jazz would have won four early in the series and precluded the later games.
First we look at the cases where the Bucks get *exactly* four wins. They play four on the road and three at home so the possibilities are:
four road wins,
three road and one home,
two road and two home, and
one road and three home.
And they have to play exactly seven games so we fill in the rest with the appropriate losses. These are called macrostates, but we also need to know how many possible paths there are (or microstates) to get to each one so we know how relatively likely each one is. The series goes in order of road, road, home, home, road, home, road.
-Since there are only four road games, there's only one possible path to four road wins - WWLLWLW
-Three road and one home win have a lot more possibilities. There are four ways to set up the road wins (one for each spot we can put the loss) and for each road win set up there are three ways to set up the home win (one for each spot we can put the win)...
LWWLWLW
LWLWWLW
LWLLWWW
WLWLWLW
WLLWWLW
WLLLWWW
WWWLLLW
WWLWLLW
WWLLLWW
WWWLWLL
WWLWWLL
WWLLWWL
...and that makes 12. Note how "losing game 7" in the last three microstates doesn't matter, because the Bucks already have four wins in each case.
And you can continue this with each entry if you like, but for now I'll just point out as a check how if we set road W% equal to home W% the formula collapses into the familiar coefficients of the binomial expansion:
35 = 1+12+18+4
21 = 3+12+6
7 = 3+4
1 = 1
So we've got that going for us.
Whew! That was an emotional roller-coaster. Let's apply this to the past five sim and IRL seasons and see what happens.
The method is quite accurate in both leagues predicting the outcome of regular season games, and gets more accurate the stronger it feels the gap is. We can measure the strength of prediction with z score, which is just the difference from 50% (a toss up) divided by the standard deviation.
Not bad, eh? In both realities the system works. When the system says the result is near a toss up, the result is a toss up. When the system gives a lead pipe lock, it flirts with 95% accuracy. If anything it's very slightly stronger in sim league so, returning to the thesis, why aren't the TMBSL playoffs as predictable as real life? For that we need to consider how often the z scores come up in the respective playoffs:
Eureka! While sim league is equally or even more reliable for a given strength of prediction, predictions in real life just happen to be much stronger on average. We see so many more 3-6 or 1-4 upsets mainly because our 1-4 isn't as 1-4 as real life's 1-4.
You've been so good! Here's a Kina pic.
If we want to predict the outcome of a game, we can just look at which team has a better average margin of victory. But if we want to know how likely the win is, we need to look at how each team's margin varies. For example, if one team scored exactly 110 points every game and the other scored exactly 109, the first team would always win. If the first team instead scored from 101 to 120 and the other from 100 to 119, the first team would still win more games but it wouldn't be 100%.
It turns out that these ranges are well described by the normal distribution. Let's look at the 2001 Globetrotters team to illustrate. They scored 124 per game on average with a standard deviation of 14. If we overlay their actual distribution of points against the normal distribution defined by those two parameters using a moving seven point average, we get:
Nice! So to predict any given game we'll start with points per game scored for each team, adjust them based on the other team's defense compared to league average, adjust them again based on which team is at home, and end up with average expected points per game for each. Then we apply our standard deviation, and get distributions that look like this:
The team with a curve further to the right scores more points on average, while a team with a higher peak has less variation in how many they score. If we want to know how often the Jazz will win, we just take every point along their curve and see how much of the Bucks distribution is less than it. For example, the Jazz had a .0028 chance of scoring exactly 120 points, and the Bucks had a .6266 chance of scoring less than 120 points, so that point gave the Jazz a .0018 chance of winning. Graphically it looks like this:
When we get all the way through we find the Jazz had a .5992 chance of winning in Milwaukee and the Bucks had a .3988 chance, which is short of 100% because there was a .0020 chance they'd tie. (Obviously basketball games can't end in ties though, and it's very small so we can skip that part.) Now we just need to turn those figures plus the same for games played in Utah and we end up with the Bucks winning the series 25% of the time. If you're a real glutton for punishment, check out the algebra!
So in practice the teams only play enough games to decide the series, but luckily we don't need to know how long the series goes, which is a much more complicated equation. We only care about configurations where the Bucks will win at least four games out of the seven. If they win games 1-5 and lose games 6-7, that's just as good for our purposes as them losing games 1-3 and winning games 4-7. Since all of our results will have the Bucks winning at least four, there's never a case where the Jazz would have won four early in the series and precluded the later games.
+1*Wroad^4*Lhome^3+12*Wroad^3*Whome^1*Laway^1*Lhome^2+18*Wroad^2*Whome^2*Laway^2*Lhome^1+4*Wroad^1*Whome^3*Laway^3
+3*Wroad^4*Whome^1*Lhome^2+12*Wroad^3*Whome^2*Laway^1*Lhome^1+ 6*Wroad^2*Whome^3*Laway^2
+3*Wroad^4*Whome^2*Lhome^1+ 4*Wroad^3*Whome^3*Laway^1
+1*Wroad^4*Whome^3
First we look at the cases where the Bucks get *exactly* four wins. They play four on the road and three at home so the possibilities are:
four road wins,
three road and one home,
two road and two home, and
one road and three home.
And they have to play exactly seven games so we fill in the rest with the appropriate losses. These are called macrostates, but we also need to know how many possible paths there are (or microstates) to get to each one so we know how relatively likely each one is. The series goes in order of road, road, home, home, road, home, road.
-Since there are only four road games, there's only one possible path to four road wins - WWLLWLW
-Three road and one home win have a lot more possibilities. There are four ways to set up the road wins (one for each spot we can put the loss) and for each road win set up there are three ways to set up the home win (one for each spot we can put the win)...
LWWLWLW
LWLWWLW
LWLLWWW
WLWLWLW
WLLWWLW
WLLLWWW
WWWLLLW
WWLWLLW
WWLLLWW
WWWLWLL
WWLWWLL
WWLLWWL
...and that makes 12. Note how "losing game 7" in the last three microstates doesn't matter, because the Bucks already have four wins in each case.
And you can continue this with each entry if you like, but for now I'll just point out as a check how if we set road W% equal to home W% the formula collapses into the familiar coefficients of the binomial expansion:
35 = 1+12+18+4
21 = 3+12+6
7 = 3+4
1 = 1
So we've got that going for us.
Whew! That was an emotional roller-coaster. Let's apply this to the past five sim and IRL seasons and see what happens.
The method is quite accurate in both leagues predicting the outcome of regular season games, and gets more accurate the stronger it feels the gap is. We can measure the strength of prediction with z score, which is just the difference from 50% (a toss up) divided by the standard deviation.
Not bad, eh? In both realities the system works. When the system says the result is near a toss up, the result is a toss up. When the system gives a lead pipe lock, it flirts with 95% accuracy. If anything it's very slightly stronger in sim league so, returning to the thesis, why aren't the TMBSL playoffs as predictable as real life? For that we need to consider how often the z scores come up in the respective playoffs:
real life sim life
y t pct y t pct
2- 25 42 .595 34 49 .694
3 15 17 .882 19 22 .864
4 11 12 .917 4 4 1.000
5 4 4 1.000 0 0
Eureka! While sim league is equally or even more reliable for a given strength of prediction, predictions in real life just happen to be much stronger on average. We see so many more 3-6 or 1-4 upsets mainly because our 1-4 isn't as 1-4 as real life's 1-4.
You've been so good! Here's a Kina pic.