March 17, 2014
The Viability of Burying a Bad Bat
Team captain and 39-year-old farewell tour participant Derek Jeter is currently the starting shortstop for the New York Yankees. That is the way of things and has been since I was in high school. But the Yankees also have Brendan Ryan on their roster. Ryan is a noted defensive wizard while Jeter is [must…not…make…Jeter fielding joke]. However, Ryan “hit” only .197/.255/.273 last year in 349 plate appearances. Is there a case to be made for Ryan as the starting shortstop based on his defensive prowess? Keep in mind that the Yankees could bury Ryan in the batting order to limit his exposure, move the ever-under-appreciated Brett Gardner up to the two-spot, pinch hit for Ryan late in the game, and enjoy that sweet glove for eight innings a night. Is that enough to overtake De-rek Je-ter?
Let’s go one step further and assume that Jeter will return to his 2011 and 2012 form. In those years, he was worth 1.4 and 3.0 Wins Above Replacement Player, respectively. Ryan, in those same years, was worth 3.5 and 1.9 wins, based mostly on his stellar defense. Thanks to Jeter’s injuries and Ryan’s offensive nosedive, the two checked in at roughly replacement level last season. Even discounting Ryan’s expectations a bit, could we not make the case that while they have two very different skillsets, they are at least in the same ZIP code when it comes to overall value?
Okay, so the Yankees aren’t actually going to bench Captain America in favor of Brendan Ryan, but the Yankees aren’t the only team facing this sort of a decision. This is a classic bat vs. glove positional battle. The Dodgers seem confused about whether to play Alexander Guerrero at second base, despite the fact that he does not appear to own a glove. Their other option is to play some utility type there who has a decent glove, but not much of a bat. Michael Morse and Gregor Blanco have a similar dynamic going in San Francisco.
What’s the cost of carrying a starter who can’t hit? It’s true that a team really can bury him in the nine-hole if they want. But what if a team tried carrying two of these players? Three?
It’s only in the last decade that advanced defensive metrics have been publicly available to give us a full understanding of how much defense matters. With the implementation of WAR(P), it’s become easier to roll both defensive and offensive numbers into one uber-metric. Now we can directly compare players with very different skillsets against a common baseline, although there is a weakness in WAR(P) for which we need to account. The offensive component in WAR(P) is generally based on the idea that each event that a player generates has a certain run value (e.g., a home run is worth roughly 1.4 runs). The idea is that we pretend that all players live on “average teams” and that they always bat in “average situations,” with an average number of runners on base. Home runs are worth more when there are runners on, and some teams employ hitters who are better at getting on base than others. For WAR(P), where the goal is to create a context-neutral common baseline to use for comparison, pretending that everything is average is a feature. But for teams making decisions about their specific circumstances, it’s a bug.
Consider our Jeter vs. Ryan debate. Let’s return to the halcyon days of 2012, before Jeter hurt his ankle (and before Brendan Ryan was a member of the Yankees), when he posted a line of .316/.362/.429. In 2012, Ryan put up a slash line of .194/.277/.278. Had Ryan been a member of the Yankees and their only option at short, he would have been hitting ninth, meaning that other hitters would have been moved up higher in the batting order out of necessity. Not only that, but Ryan’s general aversion to getting on base would have meant that there would have been fewer runners on when the lineup flipped over.
The guys at the top of the lineup are good, and you want them to have runners on base to knock in. A bad hitter at the bottom of a lineup makes the good things that the top-of-the-lineup guys do less valuable by robbing them of men-on-base situations. On top of that, hitters who otherwise would have been hitting eighth would have hit seventh, meaning that they would get more at-bats over the course of a season. In one game, the effects might not shine through. But over the course of 162 games, the little losses at the margins always add up to something. How much does it matter here?
Warning! Gory Mathematical Details Ahead!
I looked at the starting lineups for all games played from 2009-2013 and calculated the in-season OBP for each member of that lineup. I used only games played in American League parks because the pitcher’s spot in the NL becomes just a string of pinch-hitters by the sixth inning. I also “ended” each game at the end of the eighth inning. This is because half the time, the home team doesn’t bat in the bottom of the ninth, so in some games, I’d be getting eight innings worth of data and in some, nine. (And in some, 10 or 11 or if we want to get all #weirdbaseball, 16). I found how many runs the team scored that day in those eight innings.
I ran a regression using the OBPs for each of the nine spots in the lineup to see what contribution each one made to the number of runs that each team scored (in the first eight innings). Your coefficients from that regression.
First off, what’s up with the seventh spot in the lineup? That one has the heaviest weight in terms of runs scored. So the guy with the best OBP should be placed there? Not exactly. You have to think of the context around these numbers. Teams do not end up with a weak no. 7 hitter at random. If they are putting a guy with a toothpick bat in the seven hole, it probably means that they have two even weaker hitters in the eighth and ninth spots. If a team has a good seven-hole hitter, it probably means that they have six other good hitters in the lineup.
The importance of the seven-hole hitter is only partly about what he actually does. There’s another chunk of that value that’s based on the fact that the quality of the seventh hitter is an indicator of that quality of the other guys in the lineup. But structurally, that seven spot does hold an important function too. If you have three duds all bunched together, it creates a nice little valley for the pitcher to coast through and makes it even more likely that when the good hitters come up (nos. 1, 2, and 3), there won’t be anyone on base for them to play with.
Let’s take an example of a realistic decision that a manager might have. He is trying to decide between a good-glove guy with a .290 OBP hitting ninth vs. a good hitter with a .340 OBP who would hit fifth. Let’s assume that if he goes with the glove guy, everyone else moves up a spot in line. First, leaving the .340 OBP guy in (and looking only at the spots that will change).
And now for using the glove guy.
The difference is 0.172 runs per game (over eight innings). Pro-rating that out to nine innings and 162 games, we get 31.3 runs that we estimate that the offense will bleed away as the result of the weaker bat. If the upgrade in defense is that good, it might be worth it. Then again, aside from Andrelton Simmons, the spread between the best regular shortstop in baseball last year (Pedro Florimon, according to DRS) and the worst (Jed Lowrie) was 30 runs. At most other positions, there were cases where you could find pairs of players at the same position separated by 30 runs or more, so they’re out there, but they aren’t very common.
Of course, this is a contrived scenario using fake numbers that are artificially round. A team making this decision would want to plug in actual players. If the good glove guy would be replacing a guy who can’t hit anyway, it’s not as big a deal. Also, there will be those who wonder why I used OBP (because it makes things easier to understand) when instead I should have used a better indicator of offensive value. I tried the same calculations with a linear weights (per PA) approach and got the same basic message.
So yes, you can bury a guy in the nine hole, and it might actually make sense. But the more you mess with the lineup, the more “echo” effects there are. Suppose that for some reason, the .340 guy whom the manager was considering replacing was batting in the ninth spot legitimately (hello 1995 Indians!) and so subbing him out doesn’t disrupt the lineup at all. Replacing him with .290 OBP guy ends up costing his team only 27.2 runs over the season), a difference of four runs from what we calculated above. Another way to say that would be 13 percent of the initial effect. Context matters.
The other takeaway from this is that while that you can bury a good defender in the nine hole, and perhaps another in the eight hole, but the “cost” of punting batting order spots grows. You can’t pinch hit for everyone late in the game, and there are batting order inter-dependencies. If a team wants to go max defense and punt offense, the cost of that each move isn’t just Player One’s expected offensive value minus Player Two’s. A baseball game is a dynamic system with lots of moving parts. You can’t just swap one thing out for the other and assume that everything else will function the same way.
Let’s go back to Jeter vs. Ryan. If the Yankees were to actually start Ryan at shortstop on a consistent basis, they would move Brett Gardner (and his career .352 OBP—we’ll round to .350) from the seven- to the two-spot (see, ready-made Jeter replacement!), bump Kelly Johnson (stuck around .310 over the past three years) and Brian Roberts (who posted a .312 mark last year in limited duty—we’ll call him a .300 guy) up a couple of notches each, and “bury” Ryan at the bottom. We’ll assume that Jeter can still get on base at a .350 clip, consistent with his performance from 2010-2012 (when he was healthy), and that Ryan is every bit the .260 OBP guy that his 2012-2013 stats suggest he is. First, the lineup with Jeter.
Now, the lineup with Ryan:
If you pro-rate that to nine innings and 162 games, the difference between the two lineups is just shy of 60 runs (59.4). Even if you assume that Ryan is a .280 OBP guy, the difference shrinks only to 48 runs. Yes, Derek Jeter is 40, has a bad ankle, and was never a good fielder to start with, but the idea that Brendan Ryan is going to make up 50-60 runs worth of value over Derek Jeter with his glove is hard to swallow.
What’s interesting to note here is that the biggest effect of this whole exercise comes not from the lost productivity in the ninth spot from replacing Roberts with Ryan, but from forcing Gardner to the two spot and shortening the lineup that way. In the past, Jeter has been a roughly a -20 run defender over the course of a season and Ryan has checked in around +20. When you consider their performances out of context, as WAR(P) does, you could make the case that they might be each other’s equals or perhaps that Ryan is Jeter’s superior. Yet it seems more likely that the Yankees, in their situation, would be better off with Jeter at short, despite his defensive shortcomings. When you take a closer look, you can see that Jeter’s hitting ability within the context of what actually would happen to the Yankees’ lineup is more valuable because of how his hitting allows the team to set up the rest of their lineup.
The Trouble with Max Defense
My point isn’t that defense doesn’t matter. It most certainly does, and it should be considered in any valuation of a player. My point is that the idea that the mantra that a run is a run is a run falls apart a little bit when you take measurement out of the abstract and try to apply it to real-world situations. It’s not that WAR(P) needs to be scrapped. It’s that we need to understand what question it answers (and it’s a valuable and important one) and decide whether that’s the question that we are currently trying to answer. A bad offensive player makes the players around him less good. (There’s evidence to suggest that a good defensive player actually makes his fellow fielders worse as well!)
The case of Derek Jeter and Brendan Ryan shows this off rather nicely. You might make the case that the projections for Jeter returning to his pre-injury form are too optimistic or that Ryan might be a slightly better hitter than I give him credit for (though he’ll never be mistaken for Ted Williams). In that case, you could re-run the numbers and see how things shake out. But the point is that you have to be careful about relying on a single win-value stat.