January 29, 2014
Throw the Flag
Challenges and the Replay Review System
About that instant replay system that MLB put in place—we found a little problem with it. It started with us asking a pretty easy question. What is the best strategy for a manager to use in deciding when to throw “the flag” to challenge a call? We were sitting around talking about it, and the answer that we came up with is actually kinda scary: Managers should just throw that flag for any close play, the first time that they see one. When we say any close play, we mean just about anything that they have a smidgen of belief could be overturned by consulting a replay. And they shouldn’t fear throwing it even in the first inning, or throwing it to contest something that would give them only a trivial advantage.
If managers are truly doing it right (in the mathematical sense of the word), there will be a lot of replay challenges on plays where the audience will say “Yeah, it was close… but c’mon, it wasn’t that close.” Even if they’re not doing it right, there will still be plenty of those. This is entirely different from the putative goal of the system, which was, as Tony La Russa says, to go after “the dramatic miss, not all misses.”
This may seem counterintuitive, but managers should be losing challenges. A lot of them. So many of them, in fact, that the best managers in terms of maximizing “run production” gained from challenges will almost certainly be the worst managers in terms of challenges won percentage. It’s kind of like that old adage, “You miss 100 percent of the shots you never take.” Well, since there’s essentially no cost for missing, any time managers see a challenge opportunity, they should take it.
Warning! Gory Mathematical Details Ahead!
Then we tried to determine the most trivial application of using the challenge. The best that we came up with was a 2-0 count, when the pitch comes near to the batter and may have hit him. It’s the difference between ball three and essentially skipping right to ball four. The hitting team’s manager would prefer first base to a 3-0 count, but the difference isn’t that big. (We estimated it at some small fraction of a run.) From a strictly mathematical point of view, it’s important that we figure some of these things out. The reason is that while we don’t know exactly what’s going to happen over the first six innings of the game, we can make some reasonable assumptions about what’s going to happen and use the basic laws of probability combined with expected value to figure out the costs and benefits of different strategies.
The problem with the challenge system is that… well, there just aren’t a lot of close plays in MLB. Sure, everyone remembers the ones that are, but mostly in the same way that people pay attention to airplane crashes when the overwhelming majority of flights have no problems at all. The truth is that on most grounders, the batter is out by a couple steps. On most fly balls, there’s no question as to whether the fielder made the catch. Most home runs go several rows back over the fence. You might be familiar with this study from 2010 in which ESPN researchers found that there were roughly 1.35 calls during the average game that were close enough to justify using replay.
Now, according to the new rules, manager-initiated challenges will essentially end after the end of the sixth inning, so we’re probably talking about an average of 0.9 plays that are close enough to need replay within the first six innings (⅔ of 1.35). And if we were managing a team, we’d challenge only the calls that didn’t benefit our team. Assuming that half would go our way and half would not, that’s 0.45 plays that are both replay eligible and that we would be interested in challenging expected over the course of six innings.
The ESPN study suggested that nearly two-thirds of the time, the umpires actually got the call right on the field. In 20 percent of cases, they got it wrong, and in 14 percent, the evidence was too close to call (presumably, the call on the field would stand). Of the 0.45 plays that we might challenge, only 20 percent would be overturned. Suddenly, we’re down to an expected value of 0.09 plays (20 percent of 0.45) that would be close enough to need replay, would benefit our team from being overruled, and that would theoretically be overturned by a challenge. Again, we don’t know exactly what will happen. The umpires might have an awful day at the office, completely messing up five different calls that all have major implications on the game, but you have to set your strategy based on a realistic expectation of what might happen, rather than fear of what could possibly happen.
We don’t know what sort of reviewable play might present itself during the course of a game, but as we established above, the most that a play could be worth would be four runs (give or take). That means that even if I somehow knew that any disputed plays that took place during the game would be one of these four-run humdingers, these types of challengeable plays happen so rarely that the greatest expected value that we can hope for is 0.36 runs (0.09 plays x 4 runs). That’s the expected value in the top of the first inning, right after the PA guy yells the names of the various players so that everyone can fill out their scorecard.
As the game goes on, the chances for a replay-inducing call go down because there are fewer plays left to be made. (We’ll talk about that more in a minute.) The rules do say that if a manager gets a challenge right, he gets another but would be allowed no more than two. But that would require two challengeable plays. Even if we just lazily double 0.36 runs, we still haven’t even reached three quarters of a run, and we’re asking for a confluence of events rarely seen outside of science fiction or an important Cubs game to happen twice within the first six innings of the same game. In expected value terms, the cost of challenging the first play of the game probably isn’t even half a run—at the absolute maximum.
And let’s talk about what plays are actually likely to be reviewed. A much more likely play would actually be a bang-bang play at first, or a trap/no trap call that might be the difference between a single or an out. Changing a single into an out is worth something around .70 runs (give or take). Even a safe/out call at home plate, where the stakes are a scored run and no extra out or an erased baserunner and an extra out, is worth around a run and a half. (Let’s assume a single with a runner on second and no out, and a play at the plate. The run expectancy matrix says that, in 2013, one out, runner on first is worth .49 runs, and a scored run with a runner on first and no outs is worth 1.82 runs, a net value of 1.33 runs.) Let’s set the value of the types of plays most likely to be reviewed at an even one run. That’s probably generous. That brings the expected value of being able to challenge a call later in the game down to a measly .09 runs.
So let’s say a manager is thinking of challenging a call early in the game. He needs to ask himself two questions: How big is the possible reward if the call is overturned, and how likely is it that the call will be overturned? If it’s a bang-bang play at first (again, worth about .70 runs), and he believes that by challenging the call, he’s got a 50/50 shot of having the call overturned, then he inherently believes that making the challenge is worth .35 runs, and he should probably make it.
In the first inning, the expected value of holding on to that challenge is around .09 runs. Because we estimate that changing a ball to a strike is worth 0.17 runs, if a manager believes he has even a 50/50 shot of getting a call as meaningless as that one changed in his favor, it’s worth it at any point in the game. (Note: ball/strike calls are not actually eligible for review, they’re just a convenient baseball play worth very few runs.) So, even for something as small as the 2-0 count, “Did he get hit by the pitch or not?” situation, if the manager is fairly sure that the batter did get hit by the pitch, he should immediately walk out and ask for a replay.
The Bottom Line
This mathematical problem is exacerbated by the fact that unlike in the NFL, where making a bad challenge costs the team a very valuable timeout (which can be used to stop the clock, prevent a costly penalty, or better prepare the team for some later situation), a bad challenge in the MLB costs essentially nothing. With no enduring damage to the team’s chances caused by making a bad challenge, the most costly scenario is one in which the challenge is never used.
We expect that at first, most managers will be scared to use their challenges at all. They might see a situation in the first inning that might call for a second look, but demure figuring that another call might come along later in the game. Remember when we said that teams should plan strategies based on reasonable assumptions about the likelihoods of different outcomes rather than fear of what might happen? Most teams actually operate on the latter. After all, what will happen if a manager uses a challenge on a seemingly trivial call in the second inning, and a safe/out call of importance does come up in the fifth? He’ll be roasted for it in the media.
But realize that the new system fundamentally incentivizes challenges that will come off looking silly and petty. Eventually, someone will make one of these challenges and be mathematically justified in doing so. However, he’ll be seen as breaking some unwritten rule of decorum on the field, and his cleanup hitter will get a fastball in the ribs as a result. Plus, the umpires may grow to resent having to be constantly called out for things that 90 percent of the stadium already knows won’t be overturned.
To reiterate, even if the umpires will hate this ridiculously over-challenging manager, and even if the players will be frustrated with this over-challenging manager, and even if the media will roast this over-challenging manager, and even if you’ll detest this over-challenging manager for making a mockery of the challenge system, he’ll be mathematically doing the right thing in terms of helping his team win games. And it’s a bit strange to create a rule in which the optimal strategy will be universally detested. We should create rules that make people respect and admire the incredibly tough job that managers have, not ones that beg them to make a fool of themselves for the sake of winning.
So if we may, here a couple of small changes to the policy that would help MLB sidestep these issues. First, the idea of challenges, whether managers are throwing a little flag or not, is neat and works just fine for the NFL, and it’s a fun little strategic wrinkle to play with in the game of baseball. And yes, we look forward to more calls being made correctly. But the challenge system is incentivizing behavior that will make the game much less aesthetically pleasing. Instead, let the keys to the replay system rest with the umpires for the whole game, rather than just from the seventh inning onward. Umpires may use replay when they feel a call might need to be reviewed and use the same basic infrastructure. If you are worried that it will lead to managers coming out of the dugout to “suggest” a replay, then simply make suggesting a replay an offense similar to arguing balls and strikes, punishable by ejection. That gives you all the benefits of replay with none of the unfortunate inadvertent consequences.
If we’re going to keep the current challenge system, in which the power to challenge rests with the manager, we need some penalty for being wrong to dramatically change the threshold of what merits a challenge. If there were some penalty that potentially cost the team runs, then managers would want to be reasonably sure that they were going to win before they made a challenge. We suggest an out, either added to the current inning or added to the next one. Even this penalty would strongly favor managers challenging plays that they were convinced had been called wrong, correcting obvious mistakes, which was the intended point of the challenge system in the first place.
Special thanks to Gabe Kapler, who had a simple question for Dan in preparation for something he was writing, and to GChat for enabling the conversation that led to this article.
Dan Brooks is an author of Baseball Prospectus. Follow @brooksbaseball