Team captain and 39-year-old farewell tour participant Derek Jeter is currently the starting shortstop for the New York Yankees. That is the way of things and has been since I was in high school. But the Yankees also have Brendan Ryan on their roster. Ryan is a noted defensive wizard while Jeter is [must…not…make…Jeter fielding joke]. However, Ryan “hit” only .197/.255/.273 last year in 349 plate appearances. Is there a case to be made for Ryan as the starting shortstop based on his defensive prowess? Keep in mind that the Yankees could bury Ryan in the batting order to limit his exposure, move the ever-under-appreciated Brett Gardner up to the two-spot, pinch hit for Ryan late in the game, and enjoy that sweet glove for eight innings a night. Is that enough to overtake De-rek Je-ter?
Let’s go one step further and assume that Jeter will return to his 2011 and 2012 form. In those years, he was worth 1.4 and 3.0 Wins Above Replacement Player, respectively. Ryan, in those same years, was worth 3.5 and 1.9 wins, based mostly on his stellar defense. Thanks to Jeter’s injuries and Ryan’s offensive nosedive, the two checked in at roughly replacement level last season. Even discounting Ryan’s expectations a bit, could we not make the case that while they have two very different skillsets, they are at least in the same ZIP code when it comes to overall value?
Okay, so the Yankees aren’t actually going to bench Captain America in favor of Brendan Ryan, but the Yankees aren’t the only team facing this sort of a decision. This is a classic bat vs. glove positional battle. The Dodgers seem confused about whether to play Alexander Guerrero at second base, despite the fact that he does not appear to own a glove. Their other option is to play some utility type there who has a decent glove, but not much of a bat. Michael Morse and Gregor Blanco have a similar dynamic going in San Francisco.
What’s the cost of carrying a starter who can’t hit? It’s true that a team really can bury him in the nine-hole if they want. But what if a team tried carrying two of these players? Three?
It’s only in the last decade that advanced defensive metrics have been publicly available to give us a full understanding of how much defense matters. With the implementation of WAR(P), it’s become easier to roll both defensive and offensive numbers into one uber-metric. Now we can directly compare players with very different skillsets against a common baseline, although there is a weakness in WAR(P) for which we need to account. The offensive component in WAR(P) is generally based on the idea that each event that a player generates has a certain run value (e.g., a home run is worth roughly 1.4 runs). The idea is that we pretend that all players live on “average teams” and that they always bat in “average situations,” with an average number of runners on base. Home runs are worth more when there are runners on, and some teams employ hitters who are better at getting on base than others. For WAR(P), where the goal is to create a context-neutral common baseline to use for comparison, pretending that everything is average is a feature. But for teams making decisions about their specific circumstances, it’s a bug.
Consider our Jeter vs. Ryan debate. Let’s return to the halcyon days of 2012, before Jeter hurt his ankle (and before Brendan Ryan was a member of the Yankees), when he posted a line of .316/.362/.429. In 2012, Ryan put up a slash line of .194/.277/.278. Had Ryan been a member of the Yankees and their only option at short, he would have been hitting ninth, meaning that other hitters would have been moved up higher in the batting order out of necessity. Not only that, but Ryan’s general aversion to getting on base would have meant that there would have been fewer runners on when the lineup flipped over.
The guys at the top of the lineup are good, and you want them to have runners on base to knock in. A bad hitter at the bottom of a lineup makes the good things that the top-of-the-lineup guys do less valuable by robbing them of men-on-base situations. On top of that, hitters who otherwise would have been hitting eighth would have hit seventh, meaning that they would get more at-bats over the course of a season. In one game, the effects might not shine through. But over the course of 162 games, the little losses at the margins always add up to something. How much does it matter here?
Warning! Gory Mathematical Details Ahead!
Before we get into the math, we need to start with a very important point. All of the analyses that we’re about to conduct are done with real live data (2009-2013), and so they reflect the zeitgeist of how lineups are constructed in the past few years. Teams generally put together lineups with on-base at the top, power in the middle, and leftovers at the end. (Or alternatively, speed at the top, strikeouts in the middle, and scrappy guys at the bottom.) “What is the best way to construct a lineup ex nihilo?” is a different question, and one that we won’t be answering today.
I looked at the starting lineups for all games played from 2009-2013 and calculated the in-season OBP for each member of that lineup. I used only games played in American League parks because the pitcher’s spot in the NL becomes just a string of pinch-hitters by the sixth inning. I also “ended” each game at the end of the eighth inning. This is because half the time, the home team doesn’t bat in the bottom of the ninth, so in some games, I’d be getting eight innings worth of data and in some, nine. (And in some, 10 or 11 or if we want to get all #weirdbaseball, 16). I found how many runs the team scored that day in those eight innings.
I ran a regression using the OBPs for each of the nine spots in the lineup to see what contribution each one made to the number of runs that each team scored (in the first eight innings). Your coefficients from that regression.
Batting Order Position |
Coefficient |
1 |
3.727 |
2 |
3.153 |
3 |
2.227 |
4 |
2.797 |
5 |
3.719 |
6 |
3.094 |
7 |
4.391 |
8 |
3.062 |
9 |
2.989 |
Constant |
-5.358 |
First off, what’s up with the seventh spot in the lineup? That one has the heaviest weight in terms of runs scored. So the guy with the best OBP should be placed there? Not exactly. You have to think of the context around these numbers. Teams do not end up with a weak no. 7 hitter at random. If they are putting a guy with a toothpick bat in the seven hole, it probably means that they have two even weaker hitters in the eighth and ninth spots. If a team has a good seven-hole hitter, it probably means that they have six other good hitters in the lineup.
The importance of the seven-hole hitter is only partly about what he actually does. There’s another chunk of that value that’s based on the fact that the quality of the seventh hitter is an indicator of that quality of the other guys in the lineup. But structurally, that seven spot does hold an important function too. If you have three duds all bunched together, it creates a nice little valley for the pitcher to coast through and makes it even more likely that when the good hitters come up (nos. 1, 2, and 3), there won’t be anyone on base for them to play with.
Let’s take an example of a realistic decision that a manager might have. He is trying to decide between a good-glove guy with a .290 OBP hitting ninth vs. a good hitter with a .340 OBP who would hit fifth. Let’s assume that if he goes with the glove guy, everyone else moves up a spot in line. First, leaving the .340 OBP guy in (and looking only at the spots that will change).
Batting Order Position |
Coefficient |
OBP |
Value |
1 |
3.727 |
— |
— |
2 |
3.153 |
— |
— |
3 |
2.227 |
— |
— |
4 |
2.797 |
— |
— |
5 |
3.719 |
.340 |
1.264 |
6 |
3.094 |
.330 |
1.021 |
7 |
4.391 |
.320 |
1.405 |
8 |
3.062 |
.310 |
0.949 |
9 |
2.989 |
.300 |
0.897 |
Constant |
-5.358 |
— |
— |
Total |
— |
— |
5.536 |
And now for using the glove guy.
Batting Order Position |
Coefficient |
OBP |
Value |
1 |
3.727 |
— |
— |
2 |
3.153 |
— |
— |
3 |
2.227 |
— |
— |
4 |
2.797 |
— |
— |
5 |
3.719 |
.330 |
1.227 |
6 |
3.094 |
.320 |
0.990 |
7 |
4.391 |
.310 |
1.361 |
8 |
3.062 |
.300 |
0.919 |
9 |
2.989 |
.290 |
0.867 |
Constant |
-5.358 |
— |
— |
Total |
— |
— |
5.364 |
The difference is 0.172 runs per game (over eight innings). Pro-rating that out to nine innings and 162 games, we get 31.3 runs that we estimate that the offense will bleed away as the result of the weaker bat. If the upgrade in defense is that good, it might be worth it. Then again, aside from Andrelton Simmons, the spread between the best regular shortstop in baseball last year (Pedro Florimon, according to DRS) and the worst (Jed Lowrie) was 30 runs. At most other positions, there were cases where you could find pairs of players at the same position separated by 30 runs or more, so they’re out there, but they aren’t very common.
Of course, this is a contrived scenario using fake numbers that are artificially round. A team making this decision would want to plug in actual players. If the good glove guy would be replacing a guy who can’t hit anyway, it’s not as big a deal. Also, there will be those who wonder why I used OBP (because it makes things easier to understand) when instead I should have used a better indicator of offensive value. I tried the same calculations with a linear weights (per PA) approach and got the same basic message.
So yes, you can bury a guy in the nine hole, and it might actually make sense. But the more you mess with the lineup, the more “echo” effects there are. Suppose that for some reason, the .340 guy whom the manager was considering replacing was batting in the ninth spot legitimately (hello 1995 Indians!) and so subbing him out doesn’t disrupt the lineup at all. Replacing him with .290 OBP guy ends up costing his team only 27.2 runs over the season), a difference of four runs from what we calculated above. Another way to say that would be 13 percent of the initial effect. Context matters.
The other takeaway from this is that while that you can bury a good defender in the nine hole, and perhaps another in the eight hole, but the “cost” of punting batting order spots grows. You can’t pinch hit for everyone late in the game, and there are batting order inter-dependencies. If a team wants to go max defense and punt offense, the cost of that each move isn’t just Player One’s expected offensive value minus Player Two’s. A baseball game is a dynamic system with lots of moving parts. You can’t just swap one thing out for the other and assume that everything else will function the same way.
Let’s go back to Jeter vs. Ryan. If the Yankees were to actually start Ryan at shortstop on a consistent basis, they would move Brett Gardner (and his career .352 OBP—we’ll round to .350) from the seven- to the two-spot (see, ready-made Jeter replacement!), bump Kelly Johnson (stuck around .310 over the past three years) and Brian Roberts (who posted a .312 mark last year in limited duty—we’ll call him a .300 guy) up a couple of notches each, and “bury” Ryan at the bottom. We’ll assume that Jeter can still get on base at a .350 clip, consistent with his performance from 2010-2012 (when he was healthy), and that Ryan is every bit the .260 OBP guy that his 2012-2013 stats suggest he is. First, the lineup with Jeter.
Batting Order Position |
Coefficient |
OBP |
Value |
1 |
3.727 |
— |
— |
Jeter |
3.153 |
.350 |
1.104 |
3 |
2.227 |
— |
— |
4 |
2.797 |
— |
— |
5 |
3.719 |
— |
— |
6 |
3.094 |
— |
— |
Gardner |
4.391 |
.350 |
1.537 |
Johnson |
3.062 |
.310 |
0.949 |
Roberts |
2.989 |
.300 |
0.897 |
Total |
4.487 |
Now, the lineup with Ryan:
Batting Order Position |
Coefficient |
OBP |
Value |
1 |
3.727 |
— |
— |
Gardner |
3.153 |
.350 |
1.104 |
3 |
2.227 |
— |
— |
4 |
2.797 |
— |
— |
5 |
3.719 |
— |
— |
6 |
3.094 |
— |
— |
Johnson |
4.391 |
.310 |
1.361 |
Roberts |
3.062 |
.300 |
0.919 |
Ryan |
2.989 |
.260 |
0.777 |
Total |
4.161 |
If you pro-rate that to nine innings and 162 games, the difference between the two lineups is just shy of 60 runs (59.4). Even if you assume that Ryan is a .280 OBP guy, the difference shrinks only to 48 runs. Yes, Derek Jeter is 40, has a bad ankle, and was never a good fielder to start with, but the idea that Brendan Ryan is going to make up 50-60 runs worth of value over Derek Jeter with his glove is hard to swallow.
What’s interesting to note here is that the biggest effect of this whole exercise comes not from the lost productivity in the ninth spot from replacing Roberts with Ryan, but from forcing Gardner to the two spot and shortening the lineup that way. In the past, Jeter has been a roughly a -20 run defender over the course of a season and Ryan has checked in around +20. When you consider their performances out of context, as WAR(P) does, you could make the case that they might be each other’s equals or perhaps that Ryan is Jeter’s superior. Yet it seems more likely that the Yankees, in their situation, would be better off with Jeter at short, despite his defensive shortcomings. When you take a closer look, you can see that Jeter’s hitting ability within the context of what actually would happen to the Yankees’ lineup is more valuable because of how his hitting allows the team to set up the rest of their lineup.
The Trouble with Max Defense
A few years ago, the Seattle Mariners ran their much-discussed “max defense” experiment. The general idea was that defense was undervalued and that players with similar WAR(P) values, but whose value was mostly tied up in their defense, would provide the same bang for a cheaper price. (The Mariners have since switched over to the “Defense? What’s that?” strategy.) What they found out the hard way was that when you trade offense for defense, it’s not a one-for-one trade. The team that they built struggled to score runs, even before they could blame their troubles on the woes of Justin Smoak, Jesus Montero, and Dustin Ackley. Yes, there will be cases where trading a bat for a glove makes sense, but it’s not as easy as just lining up predicted WAR(P) totals and seeing which one is bigger. There are things that teams can do to minimize the damage wrought by a poor hitter, but there are consequences, and the more bats you try to hide, the worse the consequences get, and the more you have to gain from the glove to make it worth your while.
My point isn’t that defense doesn’t matter. It most certainly does, and it should be considered in any valuation of a player. My point is that the idea that the mantra that a run is a run is a run falls apart a little bit when you take measurement out of the abstract and try to apply it to real-world situations. It’s not that WAR(P) needs to be scrapped. It’s that we need to understand what question it answers (and it’s a valuable and important one) and decide whether that’s the question that we are currently trying to answer. A bad offensive player makes the players around him less good. (There’s evidence to suggest that a good defensive player actually makes his fellow fielders worse as well!)
The case of Derek Jeter and Brendan Ryan shows this off rather nicely. You might make the case that the projections for Jeter returning to his pre-injury form are too optimistic or that Ryan might be a slightly better hitter than I give him credit for (though he’ll never be mistaken for Ted Williams). In that case, you could re-run the numbers and see how things shake out. But the point is that you have to be careful about relying on a single win-value stat.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
Somehow, in 2011, Jeter's .040 points of OBP over Ryan, as well as slight increases in HR and SB, netted only .5 wins of BWAR. Baffling, honestly.
Is this one reason why the NL has fallen so badly behind the AL in terms of overall quality? It seems to me that the pitcher's presence at the bottom of the order makes it even more costly to hide a bad hitter in an NL lineup, and therefore, complicates any effort to slot in an all-glove guy for NL teams.
At the same time, of course, all-bat guys (or those who risk becoming same) also fit more easily into an AL lineup, thanks to the flexibility afforded by the DH. And because of DHs, pitchers get fewer breaks in the AL, so starting hurlers have higher utility for AL teams, too. The only things NL teams can afford to value more highly than American League clubs, it seems to me, are guys who do a little bit of everything, especially off the bench. It's a screaming inequality that can't be solved until the senior circuit gets with the times and adds the DH.
Anyway, obviously, thought-provoking, fun work. I love the detail you provide. While I had no trouble understanding or enjoying them, there was something fundamentally unsatisfying about the 'in a vacuum' models that dominated baseball research last decade. I love getting into how differently things can work in specific situations. Thanks for your usual excellence, RC.
This can potentially be used to help make some in-season decisions: when the inevitable injuries befell the elder Yanks and the Jayson Nixes of the world begin getting regular duty, might be worth putting Ryan out there to play defense every day.
It seems that, were I to follow the logic here, I would be placing my best hitters in the lineup following the order of the magnitude of the co-efficient, in order to maximize the number I get in the Total Value cell - and as you said, that really isn't how this works.
So, I guess my question is, would it make more sense to estimate a coefficient for each spot in the lineup based on the quality of players I am starting each day, rather than the "generic" coefficients presented here?
http://www.businessinsider.com/mlb-supercomputer-2014-3
I imagine it comes down to which defensive measurement you feel the most confident in, but it would have to be frustrating if the math told you the offensive difference between the Jeter and the Ryan lineups was (for example) 10 runs, but the two most widely used defensive measures valued Ryan's defense as +5 and +15 runs versus Jeter's...
Great article. I was hoping Brendan could contribute this year and maybe even give them a little more punch due to getting back to a hitter's ballpark. PECOTA is predicting a TAV closer to his career #'s, but his 2013 marks even in the Bronx were particularly abysmal.
What are your thoughts when comparing a good offensive catcher versus a good framing catcher?