Manufactured Runs: The Mystery of the Missing .500 Teams, Part Two

A couple of weeks ago, I wrote about the distribution of team wins, and the discovery that the distribution may in fact be bimodal, not normal as one might expect.

One of the predictions that came from this theory was that teams right at .500 would, counterintuitively, tend to regress away from the mean. So one thing we can do is actually check to see if the real world behaves the way we expect it to. I took all teams from 1969 on with even numbers of games and split them into “halves” of even-number games. I use scare-quotes for halves since in order to boost the sample size, I split into increments of two and kept any pair where both “halves” were within 20 games of each other. Then I looked at teams that were exactly .500 in the “before” sample— 716 teams total—and saw what they did afterward:

Again, we see a pronounced bimodal pattern in the data. What’s interesting is that we don’t even see it coming out to .500 in the aggregate; the average is .497, close enough to .500 that we could chalk it up to a sampling issue, but the median is .489. While 323 teams have records greater than .500, 359 teams have records under .500 (with 34 teams exactly at that mark). Looking at the most common records (after prorating to wins in a 162-game schedule, to control for the uneven number of games in the “after” samples):

Win%	Num
0.525	55
0.475	46
0.488	44
0.512	42
0.500	34
0.444	24
0.537	23
0.568	21
0.451	21
0.463	20
0.420	20

Which looks rather similar to the chart in the last article.

So what we have is a weird little case of teams actually fleeing from the mean, rather than regressing toward it. We have a theory as to why this might be, if you’ll recall what we said last time. There’s very little glory in finishing right at .500. It’s hard to make the playoffs at that record, and if you do you’re at a disadvantage compared to the other playoff teams in both seeding and talent. The incentives are lined up for teams to either finish above .500 and contend, or retool and finish below .500.

So I took my split-halves sample and looked at all teams from 1985 on (1985 being the first year I have salary data for all 30 teams), not just teams at .500 in the “before” sample. And I looked at five variables:

A team’s actual win percentage in the before sample,
A team’s third-order win percentage in the before sample (not the whole season),
How many games back of the division leader a team was as of the last day of the before sample,
A team’s TV market size, as defined by this Nate Silver study, and
A team’s payroll for that season, divided by the average team’s payroll that year (which I termed the “payroll index”).

And I looked at how well they predict rest-of-season win percentage, using an ordinary least squares regression:

	Coefficient	Standard err.	p-value
Constant	0.2242	0.0121069	1.08E-74
Win Percent	0.127259	0.0295882	1.72E-05
Third-Order WPCT	0.385595	0.0245993	2.09E-54
Games Back	-0.000741	0.00025036	0.0031
TV Market	9.39E-10	2.78E-10	0.0007
Salary Index	0.0159546	0.00331814	1.56E-06

“Constant,” also known as the intercept, is the predicted value if all the input values are zero. P-value is a test of statistical significance; the common rule of thumb (bear in mind that’s all it is, though) is that values above .05 are not significant. All of our values are statistically significant. This could be the result of overfitting, so we can check three model selection criterion—the Bayesian information criterion, the Akaike information criterion, and the Hannan–Quinn information criterion. None of them improves when the games back, TV market size and salary index terms are omitted. That suggests that although the differences between a regression equation omitting them and one including them are small (the adjusted R-squared goes from 0.27 to 0.28, and the standard error goes from .0722 to .0717), it’s not a product of overfitting.

What’s the practical use of all of this? One standard deviation’s change of games back results in a change of .005 in predicted rest-of-season win percentage, one SD change of TV market size means a change of .004, and one SD change of salary index is worth .006. Because TV market size and salary are substantially correlated at .61, the observed results are likely to be somewhat more pronounced than this suggests.

What this seems to tell us is that there is a small but real targeting effect in rest-of-season wins, akin to the notion of a “self-aware coin.” Teams that are closer to the division leaders are going to perform better in the after sample than teams further behind, given the same expected performance otherwise. Moreover, “large-market teams” (broadly speaking) are going to do better than small-market teams, all else being equal. Here, there’s a bit of a mystery as to what the root cause could be—it could be that such teams have more resources to invest in improving the team mid-season, it could be that they have greater financial incentives to do so, it could be that they are protecting a larger previous investment in the club by doubling down. Or it could be some combination of some or all of these causes. (It could even be that a high-payroll .500 team is more likely to be underperforming their true potential, while a low-payroll .500 team is overperforming theirs.)

In terms of what we do around here, this means that our playoff odds report might be underestimating the rest-of-season performance of teams close to the division leader (although it suggests that we’re underestimating the rest-of-season performance of division leaders as well, so it’s possible that the net result is no significant change in playoff odds probability). It also means that our assessments, both now and in the preseason, are slightly underrating large-market clubs and overrating small market clubs. (The Dodgers’ recent acquisition of Ricky Nolasco and his salary from the Marlins in exchange for some magic beans is an illustration of the sort of thing our naïve model is likely not capturing.) It’s something we’ll look at including in our simulations in the future.

In the larger picture, this is a reminder that MLB teams are not simply random number generators, nor are the players on them. They’re run by and composed of real people who respond to incentives, and they can change what they’re doing in response to results. This doesn’t invalidate the use of tools that treat teams and players as random-number generators, mind you—they can and often do produce useful results. But it does suggest that there are other approaches to analyzing baseball that can produce new and surprising conclusions, ones that can deepen our understanding of the game and the people playing it.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

You need to be logged in to comment. Login or Subscribe

WoodyS

7/10

I think that you're coming around to the very logical hypothesis that there are a number of distributions of team win percentage, not just one.

Your description of the specification search implies that you automatically use a third-order variable when you forecast. Is that true, or does it just appear that way?

Reply to WoodyS

cwyers

There's some existing research on the best way to predict future team wins from previous results, and that indicates that Pythagorean will outperform real wins, and adjusted Pythag (second/third order) will outperform regular Pythag. I included real wins as well as Pythag because games back is obviously more correlated with real wins than Pythagorean wins, and I wanted to control for that to make sure the games back effect actually had to do with games back, not a correlation with actual wins.

Reply to cwyers

hitmannls

I've always wondered why the preseason odds predicted wins in the high 80's for contenders, rather than the low 90's which is normal. Seems like your on to something.

Reply to hitmannls

gweedoh565

Might there be multicollinearity issues with win percentage, third-order wpct, and games back since they would all be correlated with each other to some degree? It would be interesting to see how the coefficients change if two of those three predictors were excluded.

Reply to gweedoh565

drmorris

7/11

I'm just a caveman -- confused and frightened by your data science -- but isn't the better next-season performance of teams closer to contention simply indicative of GMs investing in their (perceived) window of opportunity? And conversely, teams farther away blowing it all up?

Reply to drmorris

I haven't actually looked at teams between seasons yet, this is just in-season so far.

Manufactured Runs: The Mystery of the Missing .500 Teams, Part Two

Thank you for reading

Latest Articles

speX ’24: Week Four $

Will I Be Drawing These Stupid Rabbits Forever? $

Deep League Landscape ’24: Week Four $

MLU: Bratt Frustrates Opposing Hitters $

Box Score Banter: Knuckling (Way, Way) Up B

Colin Wyers

More about:

Latest Articles

speX ’24: Week Four $

Will I Be Drawing These Stupid Rabbits Forever? $

Deep League Landscape ’24: Week Four $

Thank you for reading

Related Articles

Latest Articles

More about:

Latest Articles

Related Articles