October 11, 2012
Is Joe Saunders a Double Play Machine?
On last Friday's episode of Effectively Wild, the daily podcast from Baseball Prospectus, our own Ben Lindbergh and Sam Miller (and guest Marc Normandin) discussed Buck Showalter's decision to start Joe Saunders in the AL wild card play-in game against the Texas Rangers. They noted that Saunders—who'll get the call again tonight in Game Four of the Yankees-Orioles ALDS—does not have amazing stuff and allows a lot of runners to reach base, and also that he does not have an exceedingly high groundball rate. Still, he seems to induce more groundballs at opportune times, and as a result, he gets a lot of double plays to bail him out of some major jams. Perhaps Saunders changes his approach with a runner on first and no one out in an intentional bid to get a groundball. It would make complete sense that he would do so.
Sure enough, Saunders pitched 5 2/3 innings and induced three twin-killings on the way to an Orioles win. Ben and Sam (and Marc) are the smartest human beings on the face of the earth.
But maybe this is just another case of selective memory and a fortunate example. Do pitchers really induce more grounders in double-play situations? The issue was preliminarily addressed by James Gentile over at Beyond the Boxscore last month, and he found that there was little evidence to suggest that groundball rates increased when the fields were ripe for a double play. I decided to take a second look.
Warning! Gory Mathematical Details Ahead!
I started with my trusty 2008-2011 Retrosheet database. I coded all plate appearances for whether they ended in a groundball. I also coded all balls in play for whether they were a groundball (this will become important in a moment.) Finally, I coded each plate appearance for whether it represented a potential double-play situation—that is, runner on first and one or zero outs. Only plate appearances in which a pitcher who faced at least 250 batters opposed a batter who had at least 250 PA were welcome. Pitchers batting were kindly excused.
This set up perfectly for a binary logistic regression, using a framework I've used elsewhere. I calculated the batter's groundball percentage, the pitcher's groundball percentage, and the league groundball percentage. All percentages were converted into odds ratios: through the formula (batter OR * pitcher OR / league OR), one can determine an "expected" rate for a groundball to happen given this batter/pitcher matchup. I converted the resulting expected odds ratio into a logged odds ratio, primarily because I was about to shove it into a binary logistic regression, which works on logged odds ratio anyway. This serves as our control for batter and pitcher matchup effects. If an event is randomly distributed (i.e., the result is, Strat-O-Matic style, simply a function of the average probabilities of the batter and pitcher), then other variables will have no predictive power.
I entered the binary variable of whether a plate appearance featured a potential double play as my second predictor. The result was that a potential double play did make it more likely that a plate appearance would end in a groundball. However, it had no effect on whether a ball in play would be a grounder. In fact, when I ran the same analyses for fly balls, the same effect showed up. It means that a given plate appearance is more likely to end up with a ball in play, but that the chances that the ball would be a worm burner relative to a fly ball or line drive were unchanged.
This means that per PA, it is more likely that a pitcher would give up a groundball in a double-play situation, but it isn't because he suddenly has his sinker working. Pitchers who generally get a lot of grounders will see that effect amplified by the fact that the batter is more likely to get the bat on the ball.
That covers all pitchers in all plate appearances that fit the sample criteria. But—and here is a methodological critique of sabermetrics—this assumes that all pitchers are the same in this ability. What if there were individual differences between pitchers?
I ran a bunch of little individual binary logistic regressions. For each pitcher-year, I ran a regression predicting the effect of a potential double play on groundball tendencies. If you must know, 1,074 little regressions. In this way, I was able to build 1,074 models of how a potential double play affected David Aardsma in 2009, and then Alfredo Aceves in 2009, and so on. Next, I was able to say, "If Aardsma had faced a league-average batter from 2009, what would the model predict the chances were of his inducing a groundball in this plate appearance with and without a potential double play? What was the difference between those two?"
I looked at the AR(1) rho intra-class correlation for these differences over the four years in the data set (2008-2011). AR(1) rho is like a year-to-year correlation (and you can read it similarly), except that it can accept more than two years into the analysis. The result? .046. Pathetic.
What it Means
Does Joe Saunders get more groundballs at opportune times? Yes. Sorta. Mostly because he's apparently more prone to pitch to contact, which will produce more batted balls of all types. His GB/FB ratio will remain largely unchanged. It’s only after controlling for the general groundball tendencies of the batter and pitcher, and by looking at events per plate appearance rather than by balls in play, that the effect becomes clear.
This particular talent is not specific to individual pitchers (at least as far as we can measure with these methods). It's something of an added bonus that all pitchers get. And an added liability, seeing as they also give up more fly balls, which, rather than turning into two outs, sometimes fly over the wall and turn into two runs.
These findings may also have implications for ERA estimators that use batted-ball data, such as xFIP or SIERA. There’s something of an implicit assumption within the formulae for these metrics that the chances of a groundball are independent of the circumstances. (In fairness, these metrics were created to be quick estimators.) A pitcher who finds himself with a lot of runners on first base—perhaps through walks, perhaps through an unfortunate BABIP—will have more balls in play in those situations (and fewer strikeouts and walks). Future work on ERA estimators will have to take this into account.