We live in a statistical ecosystem that is dominated by WAR, a statistic that for all its perks does contain some weaknesses. WAR–in an attempt to compare all players to a common baseline–specifically assigns a value to players with the intention of stripping away all of the context of his teammates. There’s no secret here. This is celebrated as the great triumph of WAR. Where RBI or runs scored were decent indicators of a hitter’s abilities, they were also dependent on the abilities of his teammates. As an individual measure, WAR makes sense as a way to compare everyone to the same baseline.
Problem: the big trophy at the end of the year goes to the team that won the World Series. Sure, the more individual talent that a team has on its bench, the better. However, is the way to evaluate a team to simply add up all the WAR?
I think at this point, we have enough evidence to say that the answer is “no.” We know for example that WAR doesn’t do a great job valuing relievers, mostly because the actual amount of value that a reliever will produce has a bit more to do with what role he’s used in, purposely or accidentally, than strictly his stats. The more interesting question is how much of a difference that makes. If the answer is “the whole is greater than the sum of its parts, but the effect size is only half a run,” then it’s not something that we should worry about.
I don’t think we’re going to solve this question today, because for one, there is going to be variance between a team’s talent (in the sense of “add up all the WAR”) and their results because of timing and random chance. But what if there are provable ways in which a team can generate value by putting its players together in a certain way?
Warning! Gory Mathematical Details Ahead!
In 2016, there were 21,744 runs scored, MLB-wide. There were 5,610 home runs hit. That means that about three quarters of all runs scored were, by definition, the work of at least two players. A hitter got on base and someone else knocked him in. Baseball might be a game where the atomic unit is the pitcher-batter confrontation, but the way that runs are scored is a matter of interaction.
In the past, I’ve done research on the idea of a stars-and-scrubs lineup, by starting out with a lineup of nine league-average batters, and then making one of the hitters better, and another worse by an equal amount. I then made a lineup with two “stars” and two “scrubs” book-ending a lineup with five average hitters. Because teams can load the “stars” up front in the lineup, and get them more plate appearances, they will score more runs with stars and scrubs than they would with a lineup of nine average hitters.
One thing that I wanted to look into more generally was the idea of bunching within a lineup. For example, if a leadoff hitter is good at getting on base, it’s not the overall abilities of the other eight hitters that determines whether his hits will eventually become runs. It’s mostly the quality of the second and third hitters that we should look to. Similarly, the second hitter is going to be most affected by the two or three guys behind him.
It’s one thing to have a bad hitter and to hit him ninth. It’s another thing to have three bad hitters and have to hit them 7-8-9. It allows a valley where it’s just easier to get three outs. It means that even if the sixth hitter is a good one, he’s not going to get as much help scoring. Having better hitters in the lineup who don’t make outs as often also means more plate appearances for someone else in the lineup. It can beget a virtuous cycle where having a fifth hitter who doesn’t make as many outs means more plate appearances for the sixth hitter, and if he’s good, that’s more chance for him to add value and to “create” another plate appearance for someone else.
On the flip side, having a bunch of bad hitters means that if a lineup does have a good hitter, he won’t get as many plate appearances. Offense works in such a way that a bunch of good hitters will actually compound each other. I wanted to look at the effect a little more, so I created (well, brought back out) a lineup simulation model. For the initiated, it’s a Monte Carlo Markov model, which is a nice way of saying that I built a simulator that models baseball through a bunch of dice rolls. That’s not a perfect way to model baseball, but it does allow us to control all of the inputs.
I started with a lineup that was composed of nine league-average (using 2016 stats) hitters. I then slowly replaced each hitter, starting at the bottom of the lineup, with a hitter who had the composite stats of an American League ninth-spot hitter. The actual downgrade, in absolute terms, for each replacement is the same, but how many of them and where they hit in the lineup changes. I had the computer run 50,000 nine-inning simulations for each lineup.
And the results are …
Lineup |
Runs Per Game |
Delta Per 162 (from above line) |
All Average |
4.2498 |
*** |
One #9 hitter |
4.1209 |
20.88 |
Two #9 hitters |
4.0061 |
18.60 |
Three #9 hitters |
3.9242 |
13.27 |
Four #9 hitters |
3.7979 |
20.46 |
Five #9 hitters |
3.6652 |
21.50 |
Six #9 hitters |
3.5286 |
22.13 |
Seven #9 hitters |
3.4014 |
20.61 |
Eight #9 hitters |
3.2967 |
16.96 |
Nine #9 hitters |
3.1899 |
17.30 |
It’s clear that downgrading from an average hitter to a ninth-spot hitter is worth 18 or 19 runs, but it’s not always a consistent amount. Some of that might be a bit of variation in the Monte Carlo model, but it’s not always the same amount, and it can vary by 2-3 runs, or around 10-15 percent of what we might expect from a linear function.
***
Now let’s consider defense. In theory, there’s an upper limit to how good a defensive player can be until it stops mattering. To take an extreme example, suppose that a fielder could move 100 feet per second. He could probably catch anything on the field, but it would include areas where a team already has coverage. Similarly, a fly ball that hangs up for an hour could be caught by any of the nine players on the field, after they’ve had time for a meeting and some tea.
Geometrically, we know that there comes a point where putting a rangy left fielder next to a rangy center fielder will have diminishing returns. But with the introduction of the new Statcast catch probability measures, we might be able to put some numbers on that. Statcast identifies how far away a (out)fielder was when he tried to go for a fly ball, how long it hung up in the air, and whether he made the catch. So far, we’ve only got aggregate data chopped into bins of five feet of distance each (i.e., 0 feet, 5 feet, 10 feet, etc.), but we can work with that.
Obviously, a ball that is zero feet from the nearest fielder can be caught easily and a ball that is in the air for seven seconds as well. There’s actually a very small number of balls for which the probability of catching it is in doubt. For each bin of data, we can calculate the number of feet per second, on average, the fielder would have to travel to make the catch. Using a logistic regression, we can also estimate a function for how likely a fielder is to make a catch, based on what speed he could travel.
It turns out that if a fielder was moving at a speed of 18.1 feet per second, he would get to 50 percent of balls. However, 77 percent of balls in the data set were recorded as caught, so a fielder who is moving at a rate of 20.1 feet per second is going to get to a league-average number of balls. That’s about 4.4 seconds over 90 feet. (Makes sense, because a 4.3 time down to first base from the batter’s box is considered a 50 grade or “average” runner, but the old saw is that the majority of major leaguers are “below average.”)
We can see in the data set that the average fly ball is in the air for 4.56 seconds. Assuming that the two outfielders are even average runners, they could both cover 91.6 feet. Assuming that they start the play 120 feet apart in the outfield, we can draw two circles with a radius of 91.6 feet and centers 120 feet apart. The area that they share is about 6,068 square feet, out of a total of 26,359 square feet for each of their ranges, or about 23 percent of their range which overlaps. That means that 23 percent of the “average” balls that one of the outfielders could get to, another one could as well, again assuming that they are both roughly average runners.
Outfield range is not a perfect circle. People are slower going back than they are running forward, and eventually, you run into the wall, but this gives us some idea of how much outfielders really do overlap. Now, that overlapping isn't entirely bad.
Let’s mix and match some outfielders. We’ll take three gents (and clone them when necessary). One can run 80 feet in 4.56 seconds. One runs 90 feet. One runs 100 feet. We assume that they start 120 feet apart.
Left Fielder Range |
Center Fielder Range |
Total Area Covered (sf) |
Overlap (of CF into LF’s range) |
80 |
80 |
37,311 |
14.4% |
80 |
90 |
41,415 |
20.6% |
80 |
100 |
46,003 |
27.4% |
90 |
90 |
45,318 |
21.9% |
90 |
100 |
49,704 |
28.1% |
100 |
100 |
53,886 |
28.4% |
Take a look at the third and fourth lines on that table. We see that a leadfoot left fielder playing next to a jackrabbit in center field has a lot of overlap. The center fielder could get to roughly 27.4 percent of the balls that the left fielder could. If we give them equally average ranges, we see that there’s not as much overlap between the two. They aren’t bumping into each other as much. And they actually cover less ground than the star-and-scrub outfield. We may reflexively think of overlap between fielders as a bad thing. Not necessarily.
At the same time, in the outfield, we are getting some diminishing returns. For example, looking at the first two lines on the table, we see that the center fielder “improves” from a range of 80 feet to a range of 90 feet. A circle with a radius of 90 feet is 1.27 times bigger than one with a radius of 80 feet. However, the total area covered by the upgraded outfield is only 1.11 times bigger than the area covered by the one with the lesser center fielder.
While offensive talent tends to compound in its results, defensive talent does the opposite. Good defenders will eventually just get in each other’s way and you don’t get the full benefit of all their talent. It’s also something of a measurement issue. What do we do with a ball that two fielders could have caught, but only one of them does (because only one of them needs to)? Right now, it’s as if that ball never happened, which at least doesn’t penalize our left fielder who got called off by the center fielder, but it doesn’t grant him credit which he could rightly claim. If we assumed that left and center fielders called each other off at roughly the same rates, then at least this non-credit would be shared evenly. However, we know that this is not the case. The center fielder usually pulls rank when calling for a ball. It is the order of things. We haven’t really come up with a way to account for that yet.
Easy Credit
Baseball statistics are usually conceptualized in terms of individual credit. One player gets credit for the putout. One for the home run. Even when something goes wrong, errors are assigned to one person. And we can roll all of it together into an uber-metric. That has its place.
But what of the fact that the reason that the batter got the plate appearance, which then resulted in the home run, was due to the fact that another hitter got a hit, rather than made an out? We don’t think of these interaction terms or how the pieces all fit together. There’s reasonable evidence here that those sorts of interaction effects have at least some effect on the game, even if the effect size is less than “just add all the WARs.”
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
The new book on the Cubs' success details the emphasis Theo Epstein put on character. There are certainly factors that contribute to the success of a team that can't be part of a simple, cold crunching of the numbers. That fact got a little lost for awhile while the sabermetricians and the doubters tried to do all they could to discredit each other.
Hopefully, we are entering an era of cooperation where both sides will recognize the value that can be found in analytics as well as that which can be found in the knowable/understandable, but unmeasureable (in the classic sense).
Perhaps someday in a grand and glorious future, WAR and grit can live happily together.
One reason I've never quite been on board with the 'defensive revolution', going all the way back to when it first began in the wake of the 2005 White Sox's championship, is that I've never been convinced you can add defensive runs together like you pretty much can with hitting (with the big caveats noted in this piece). It's a sort of mantra that "a run saved on defense is worth as much as one scored on offense," and while that's true in a literal, retrospective sense I really don't think it is in terms of the team-planning sense, if that makes sense (no (sorry)), at least with the tools we currently have to attribute defensive runs at the individual level. I.e., if you replace a replacement-level player with someone worth 4 "WAR" with the bat and 0 with the glove, you can expect the team's record to improve by 4-ish wins, but if you replace them with someone worth 2 and 2 on paper, you might only get 3 more wins in reality.
Something based on Statcast and/or a more probabilistic evaluation of defense might hold the answers, but I have a hard time seeing how that could be expressed in such a convenient/blunt "this player is worth 50 total runs" kind of way. It might be more like "Bobby Outfielder is worth 30 runs with the bat and has a 90% catch-rate radius of 75 feet" or somesuch. Which might be for the best, but it sure makes it harder to rank players across positions, which if we're being honest is like half the reason we're interested in WAR in the first place... :)