July 10, 2013
The Mystery of the Missing .500 Teams, Part Two
A couple of weeks ago, I wrote about the distribution of team wins, and the discovery that the distribution may in fact be bimodal, not normal as one might expect.
One of the predictions that came from this theory was that teams right at .500 would, counterintuitively, tend to regress away from the mean. So one thing we can do is actually check to see if the real world behaves the way we expect it to. I took all teams from 1969 on with even numbers of games and split them into “halves” of even-number games. I use scare-quotes for halves since in order to boost the sample size, I split into increments of two and kept any pair where both “halves” were within 20 games of each other. Then I looked at teams that were exactly .500 in the “before” sample— 716 teams total—and saw what they did afterward:>
Again, we see a pronounced bimodal pattern in the data. What’s interesting is that we don’t even see it coming out to .500 in the aggregate; the average is .497, close enough to .500 that we could chalk it up to a sampling issue, but the median is .489. While 323 teams have records greater than .500, 359 teams have records under .500 (with 34 teams exactly at that mark). Looking at the most common records (after prorating to wins in a 162-game schedule, to control for the uneven number of games in the “after” samples):
Which looks rather similar to the chart in the last article.
So what we have is a weird little case of teams actually fleeing from the mean, rather than regressing toward it. We have a theory as to why this might be, if you’ll recall what we said last time. There’s very little glory in finishing right at .500. It’s hard to make the playoffs at that record, and if you do you’re at a disadvantage compared to the other playoff teams in both seeding and talent. The incentives are lined up for teams to either finish above .500 and contend, or retool and finish below .500.
So I took my split-halves sample and looked at all teams from 1985 on (1985 being the first year I have salary data for all 30 teams), not just teams at .500 in the “before” sample. And I looked at five variables:
And I looked at how well they predict rest-of-season win percentage, using an ordinary least squares regression:
“Constant,” also known as the intercept, is the predicted value if all the input values are zero. P-value is a test of statistical significance; the common rule of thumb (bear in mind that’s all it is, though) is that values above .05 are not significant. All of our values are statistically significant. This could be the result of overfitting, so we can check three model selection criterion—the Bayesian information criterion, the Akaike information criterion, and the Hannan–Quinn information criterion. None of them improves when the games back, TV market size and salary index terms are omitted. That suggests that although the differences between a regression equation omitting them and one including them are small (the adjusted R-squared goes from 0.27 to 0.28, and the standard error goes from .0722 to .0717), it’s not a product of overfitting.
What’s the practical use of all of this? One standard deviation’s change of games back results in a change of .005 in predicted rest-of-season win percentage, one SD change of TV market size means a change of .004, and one SD change of salary index is worth .006. Because TV market size and salary are substantially correlated at .61, the observed results are likely to be somewhat more pronounced than this suggests.
What this seems to tell us is that there is a small but real targeting effect in rest-of-season wins, akin to the notion of a “self-aware coin.” Teams that are closer to the division leaders are going to perform better in the after sample than teams further behind, given the same expected performance otherwise. Moreover, “large-market teams” (broadly speaking) are going to do better than small-market teams, all else being equal. Here, there’s a bit of a mystery as to what the root cause could be—it could be that such teams have more resources to invest in improving the team mid-season, it could be that they have greater financial incentives to do so, it could be that they are protecting a larger previous investment in the club by doubling down. Or it could be some combination of some or all of these causes. (It could even be that a high-payroll .500 team is more likely to be underperforming their true potential, while a low-payroll .500 team is overperforming theirs.)
In terms of what we do around here, this means that our playoff odds report might be underestimating the rest-of-season performance of teams close to the division leader (although it suggests that we’re underestimating the rest-of-season performance of division leaders as well, so it’s possible that the net result is no significant change in playoff odds probability). It also means that our assessments, both now and in the preseason, are slightly underrating large-market clubs and overrating small market clubs. (The Dodgers’ recent acquisition of Ricky Nolasco and his salary from the Marlins in exchange for some magic beans is an illustration of the sort of thing our naïve model is likely not capturing.) It’s something we’ll look at including in our simulations in the future.
In the larger picture, this is a reminder that MLB teams are not simply random number generators, nor are the players on them. They’re run by and composed of real people who respond to incentives, and they can change what they’re doing in response to results. This doesn’t invalidate the use of tools that treat teams and players as random-number generators, mind you—they can and often do produce useful results. But it does suggest that there are other approaches to analyzing baseball that can produce new and surprising conclusions, ones that can deepen our understanding of the game and the people playing it.