October 21, 2013
The Effects of the Shutdown (Inning)
In the month of October, we’ve been hearing a lot of talk about shutdowns. No, not the debt ceiling thingy. The one that really matters: the shutdown inning. It’s playoff time, during which we often confuse something that players routinely do for an amazing feat of bravery and virtue. This postseason, everyone’s all a-twitter with thoughts of “shutdown innings.” For a pitcher, it’s the half-inning after your team scores (according to some people, it has to be scoring that leads to your team tying the game or taking the lead). Your job, in this most sacred of innings, is to keep the other team from scoring. It’s totally okay to give up runs if your team didn’t score last inning, apparently. Suddenly, that other shutdown seems downright logical.
I get the idea that people are going for. It’s a cousin of the “momentum” argument. By scoring in the last half-inning, one team has “grabbed the momentum.” A shutdown inning is the equivalent of yelling “no backsies!” It’s kind of a sneaky way to argue the existence of momentum within a game, without using the word. After all, who could argue against a pitcher allowing zero runs in an inning? The problem is that there’s a hidden assumption that a “shutdown” inning is more than just a scoreless inning. It is, in some way, more difficult than other innings, and if it is accomplished it adds something extra beyond the fact that the other team didn’t score. And if a pitcher does allow a run, he is somehow a moral failure for ceding momentum back to the other team.
I think the #GoryMath signal just went up.
Warning! Gory Mathematical Details Ahead!
I scored the inning as a shutdown if the pitcher recorded all three outs and gave up no runs. (Easy enough.) He got a non-shutdown if a run scored on his watch, even if he didn’t get through the inning. He got an incomplete (missing data) if no runs scored on his watch, but he was removed from the game before recording three outs. I used the same basic criteria to code all innings, whether shutdown situations or not, as scoreless or not.
For each pitcher, I determined the percentage of all of his innings (within each year) that ended up being scoreless (minimum 50 completed innings). I converted this to logged-odds ratio and used it as a control variable, since some pitchers are clearly better at throwing goose-eggs on the board than others (regardless of the situation). Next I ran a logistic regression predicting whether or not an inning would be a scoreless one based on the control variable I just created, and a dummy variable coding for whether the inning involved a potential shutdown. This will tell us whether shutdown innings really are harder than “regular” innings to hold the other team from scoring.
The answer was that possible shutdown situation actually had more scoreless innings logged than we might otherwise expect. The effect was not huge (a couple tenths of a percentage point in terms of probability), but it was significant. (That happens when you’re playing with a file that has a few hundred thousands of innings in it.)
Next, I looked at whether the shutdown inning had any effect for the pitching team beyond the fact that the pitcher just pitched a scoreless inning. I calculated the expected win percentage within the data set (1993-2012) for the beginning of each half-inning (so, baserunners and outs are both taken out of the equation, and we need to worry only about the score). I looked to see what the expected win percentage was for the team that had just finished pitching (again, taking the log of the odds-ratio) and whether they eventually won that particular game. I also coded for whether the inning directly prior was a “shutdown” inning or not. Mix all of it together into a binary logistic regression and we find that there is a (sorta) significant effect for a shutdown inning predicting a slightly better chance at winning the game after a shutdown inning.
The reason that I say “sorta” is that I ran this a few different ways. For example, I isolated the effect of only the next inning after the shutdown, and also checked whether a shutdown earlier in the game continued to have an effect for the rest of the game. I looked to see whether the finding held in a “strict” shutdown inning (the team scored to either tie or take the lead) vs. just any old inning after the pitcher’s team had scored (this is the “liberal definition” of a shutdown). In some cases, the “shutdown” variable was significant. In others, it wasn’t. Even when it was significant, the effect size was small, again coming in at a couple of tenths of a percentage point. Even in cases where the shutdown variable wasn’t significant, the general trend was toward shutdowns being helpful to the cause, but frankly, in a data set this big, if something isn’t obviously significant, no matter how you bend the data set, it’s not really much to worry about.
Finally, I wanted to see whether individual pitchers had some sort of talent for putting up these shutdown innings. Now, the trick here is that it’s not as easy as simply saying “Adam Wainwright had this many chances and posted a shutdown inning in 80 percent of them”, and then comparing that to his usual rate of scoreless innings. First, we need to establish some idea of how many chances it takes until we have a somewhat reliable estimate of Adam Wainwright’s abilities in posting shutdown innings. To answer that question, I ran a reliability analysis similar to what I’ve done previously. In the past, I’ve addressed the issue of how many plate appearances it takes until we have a good estimate of a player’s abilities on a number of measures. For example, we intuitively know that a batter who goes 2-for-5 one night is not really a .400 hitter. But, if he went 200-for-500, we’d feel a bit more comfortable saying that he was. And we’d be impressed.
I used Kuder-Richardson reliability, which is suitable for binary outcomes (he either posted a shutdown inning when given a chance or he didn’t.) I ran into a sample size issue at first when looking at the more strict definition of “shutdown” (team tied or took the lead as the result of the scoring). In the 20 years under study, the pitcher with the most such opportunities under the strict definition was Andy Pettitte with 176. I was able to sustain a decent sample size up to 100 opportunities (meaning that I could look at split-half samples of 50 opportunities each) and reliability reached only the mid-.20s range. (Ideally, we’d want to see it in the .70 range).
Instead, I looked at innings in which the pitcher’s team had previously scored, whether or not it tied the game or gave them the lead (the “liberal definition”). Using these numbers, I found that at a sampling frame of 500 career opportunities (so, comparing split-half samples of 250 potential shutdown innings), reliability checked in at .67. What this means is that we need roughly 260 or 270 potential shutdown innings to get some idea of a pitcher’s true talent. It’s going to take a pitcher roughly 3-5 years of starting full time to accumulate that many chances under the more lenient definition. One-year percentages aren’t going to be good reflections of a pitcher’s true talent. But we might be able to talk about performance over the course of a career.
For pitchers who had more than 250 opportunities (liberal definition) from 1993-2012 to pitch after their team had scored, the rate at which they put up scoreless innings in these situations correlated with their success pitching after a scoreless inning at .62. This suggests that the two “abilities” run parallel to each other for the most part.
Jack Morris for the Hall of Fame
A small meditation on why it might actually be easier than expected to throw a shutdown inning: Some time ago, I ran a study on the tendencies of managers to attempt a stolen base after they’d had one of their runners caught stealing. I figured that they would be “once bitten, twice shy.” It turns out that managers actually become more aggressive afterward. I have to wonder if a team’s hitters, once the lead has been snatched away from them, become similarly over-aggressive to try to regain the lead. Behavioral economics has shown that people react particularly negatively to having something taken away from them, and maybe in trying to get back what they lost, players actually become less effective. That’s little more than idle speculation at this point, but it’s worth asking. We also know that there is a very small effect for throwing a shutdown inning on winning the game. It is tiny and not all that reliable in its significance level. Whatever we’re dealing with here, it isn’t a huge effect. Bully on you for throwing a scoreless inning, but let’s hold off on the Congressional Medal of Honor.
We also now know that simply looking at shutdown inning percentage over one year, while it might describe what happened, isn’t a reliable enough description of what a pitcher’s true talent level is. We need a few years before that becomes apparent, much like clutch hitting does or even BABIP, and even when we get to a big enough sampling frame, shutdown percentage tracks pretty closely with scoreless inning percentage in general. So, while it’s not a good idea to put any faith in seasonal numbers, we can accurately talk about (for guys who have been around a while) a pitcher’s career prowess, albeit using a watered-down definition of shutdown inning. And it generally goes along with his overall talent level.
It’s possible that there are pitchers who have shown that in shutdown innings, they really do raise their game. Because we need career-level numbers, I looked at all events from 1973 to 2012. In those years, the pitcher who had the biggest differential between his results during a proper shutdown opportunity (the strict definition, where his team either tied or took the lead directly before) versus all other situations is Pat Hentgen (82 percent scoreless innings in shutdown situations, but only 70 percent otherwise), followed by John Burkett (81-70), Terry Mulholland (78-70), and…Jack Morris (80-72). (Take that, Hall of Fame voters! Jack Morris really did step up his game in a shutdown inning!)
The worst? Dave Stewart (59-73), Pete Harnisch (61-71), Matt Moore (60-69), and Esteban Loaiza (63-73). With that said, no one in that time period crossed over the threshold of pitching 260 potential shutdown innings, using the strict definition. (Roger Clemens came closest with 239). If we’re going to be strict about our definition of shutdown, we need to treat even those career shutdown percentages with a little bit of suspicion.
If we use the more lenient definition of shutdown (only requiring that the pitcher’s team scored directly before), your leader is Tim Lincecum (83-70), and Jack Morris drops to an unfortunate +2 percent split (72-70).
So, is the shutdown inning a big deal? The numbers present a (pretty flimsy) case that a shutdown inning is more than just any old scoreless inning, but even at that, not all that much more. There is something to be said for the fact that, because of the way we’re defining things, the game is either tied or, if the team took the lead, it’s more likely to be a small lead. So, a shutdown inning is a scoreless inning thrown in a somewhat higher-leverage situation. I think that does have to count for something.
Shutdown innings certainly make for something good to talk about during a broadcast, but let’s keep them in perspective, shall we?
Special thanks to Ben Lindbergh for suggesting the topic based on a conversation that he heard between Buck Martinez and Jack Morris on a Blue Jays broadcast in May. He also wanted to know what Morris’ numbers were.