Analysis Begins with Questions
One of the great things about baseball analysis is that you can take a relatively simple question—like “who is the best relief pitcher in baseball?”—and break it down into dozens of sub-questions. Put another way, we have to answer questions about methodology before we can answer the substantive question at hand.
An ideal quantitative methodology for determining the best relief pitcher in baseball would rely on information known to be predictive of future success, including peripheral stats like strikeouts and walks. But it’s also true that, because even peripheral stats can bounce around a fair bit over a limited number of innings, we want to know something about outcomes. Outcomes give us a measure of performance that doesn’t entirely overlap with peripherals. By using win/run expectancy to value relievers, we are not simply duplicating the information provided by peripheral stats while adding noise to the mixture. On the contrary, we are valuing relievers on a different axis: the crucible of high leverage.
But there’s another, and I believe overlooked, reason to focus on outcomes. To see why, we have to start with a very particular sort of answer to the question posed at the outset of this column (“who is the best relief pitcher in baseball?”). So let’s borrow some methodology from an article I wrote back in March to look at some of the best relievers not yet slapped with the closer tag. We’ll take WXRL and SIERA for all pitchers who have pitched solely in relief. For each pitcher, we’ll calculate how many standard deviations they are away from the mean in each category. Then we’ll add them together. For example, a pitcher who was one standard deviation better than the mean in both SIERA and WXRL would get a score of two.
So if you do that and figure out the top 10 scores for the current season, you get a list that looks like the following:
Name |
||
2.47 |
4.78 |
|
2.85 |
4.71 |
|
1.96 |
3.46 |
|
3.02 |
4.65 |
|
2.38 |
3.61 |
|
2.18 |
3.25 |
|
1.29 |
2.13 |
|
3.07 |
3.85 |
|
2.67 |
3.00 |
|
2.13 |
2.31 |
This is a decent list. The kind of list you can live with a for a while, thinking to yourself that there are some well-established guys (Bell, Marmol, Soria) as well as some targets of man-crushes (Thornton, Benoit, Adams). But then you notice the glaring omission. See it yet?
One Size Fits Almost All
The problem with this list, and with any list like it created in the last 10 years really, is that you could just throw it out. Who needs a list like this when the answer to the question “who is the best relief pitcher in baseball?” is so simple? You could just give the same answer—Mariano Rivera—every year and be right in a very real sense every year. It’s not that Rivera fares especially poorly by this metric. He actually comes in 13th out of nearly 200 qualified pitchers. It’s that by not putting him closer to the top, the list fails to pass the smell test—it appears wrong on its face. Instead of a definitive ranking, we get a neat-o list with little meaningful difference between the individual rankings. That’s disappointing.
Part of the reason why Rivera doesn’t rank higher is because his SIERA is not especially fantastic this year. At 3.01, Rivera clocks in at 31st among relievers with at least 20 innings. By more traditional metrics, of course, Rivera is having a superlative year: 1.06 ERA, 36/7 K/BB in 42 1/3 IP. That isn’t enough for SIERA, however. His relatively low K rate hurts him, and he gets no credit for the exceptional job he’s done at preventing home runs (just the one this year, thank you very much). He’s also having a career year in terms of preventing hits: just 23 all season. So what gives?
Let’s start with the hit rates. Rivera has been consistently excellent at preventing hits. His career BABIP is just .263, and he has allowed fewer than seven hits per nine innings. To illustrate the point, let’s compare Rivera’s seasonal hit rates to the distribution of hit rates for all relievers from 2007 through this year.

Now, I suppose it’s possible that if we could clone Rivera and run his career over again, we’d end up with a spread closer to the mean for all relievers. But I didn’t single out Rivera’s hit rates because they were low, I singled them out because all of his other pitching attributes have been so dominant for so long. That means our selection bias sensors can calm down for just a second, and we can marvel at what is a pretty darn impressive ability to limit hits.
Of course, any pitcher who strikes out enough batters will have a similar sort of distribution, since hits per nine innings is influenced by the total amount of contact made by opposing batters. To illustrate this point, let’s compare Rivera’s chart above to a similar chart for another highly successful reliever: Billy Wagner.

Wagner’s career .265 BABIP is just barely behind Rivera’s mark, and his average H/9 of 6.0 is actually lower than Rivera’s, as reflected in the chart above. What we can learn from these pitchers is that the shorthand that pitchers don’t control outcomes on balls in play breaks down at the extreme represented by elite relief pitchers. These pitchers—who basically only pitch with maximum effort in higher than average leverage situations—can in fact show the ability to limit hits more than starting pitchers. This is true even above and beyond the ~.005 split in BABIP between relievers and starters. That means estimators that model reliever performance on the assumption that hit rates revert heavily back to the mean may overlook this particular skill. But it also means Rivera isn’t unique—in other words, that can’t be what keeps him from the top of the chart above, since plenty of relievers exhibit the same ability.
Now we should ask whether there is any other skill that Rivera has that we might be overlooking. Sure, he has posted above average ground ball rates his entire career, but that is definitely reflected in his SIERA (which includes ground ball rates as well as various ground ball interaction terms). What about home runs? Rivera has allowed just 61 home runs in his career—one that has spanned 1,132 innings. Since 1996, he hasn’t allowed more than seven home runs in a single season, and from 1996-2009, he has allowed three home runs or fewer in eight different seasons.
In fact, if we look at the distribution of Rivera’s seasonal HR/9 rates versus all relievers from 2007 through this season, we notice just how exceptional Rivera has been at limiting home runs.

Now if we compare that with a similar chart for Wagner, we’ll see a big difference.

Now, some of that is due to the differences in ground ball rates Rivera and Wagner have put up over the course of their careers. But whatever your favorite ERA predictor/post-dictor/third-world dictator may be, most assume in one way or another that limiting fly balls—not home runs—is the relevant skill. And while that appears to be true for someone like Wagner, who has posted rates of home runs per fly ball that hover around league average, it is decidedly not the case with a rate environment extremophile like Rivera.
Question of the Day
It appears as though our best estimators do a good job for 99 percent of all players, but break down at extremes. A pitcher like Rivera, who is extreme in almost every way possible, simply doesn’t rate properly if you use the same metrics used to measure other guys. The alternative view is that Rivera really isn’t quite as good as he’s made out to be. Is anyone willing to make that argument?
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
Having watched most of Rivera's career, there's no way that the stats we use can reflect his abilities. Even most of the hits he gives up aren't usual. Maybe you should have SIERA take into account all of the infield hits Rivera gives up on weak contact or the dozens of bloop hits on broken bat swings. They seems like line drives in the box score, but they won't score runs at the same rates as usual hits.
http://baseballanalysts.com/archives/2009/10/mariano_rivera.php
With most other elite pitchers, hitters still have a couple ways of being successful - 1) by guessing what they'll throw and "sitting" on a pitch, or 2) by having the pitcher make a mistake and lay one over the center of the plate.
Mariano virtually eliminates 1) with the cutter, and he also makes ridiculously few mistakes.
(I don't know if there is a stat for pitching mistakes, but maybe there should be - pitches "centered up" in say, the middle 3 inches of the plate, about waist high. I saw a graphic of Mariano's pitches, and there is an uncanny vacuum there, like a black hole.)
HIs is an approach that couldn't possibly work as a starter, and I don't think it's an approach that could work for other relievers, like, say Carlos Marmol, who wipes hitters out with his slider on 2-strike counts, but often uses his FB to get ahead. If Marmol had to rely only on his slider, as Rivera does his cutter, we'd have a different Marmol.
Regardless of SIERA, there's no diminishing what Rivera has accomplished. He has over 100 more WAR in relief than Lee Smith, who is second in all-time WAR accumulated in relief. Is Rivera not quite as good as he's made out to be? Nope. He's better.
I don't mean to come across as combative - you know these things better than I do (and I'm assuming it was some sort of data availability issue) - but absent an explanation it seems a bit sloppy.
However, I do want to make some points about your particular criticisms. I will assume that you are wondering why I didn't create density charts for HR/FB and BABIP, since I did reference each in my article. The data are certainly available to make analogous charts for those stats. I chose not to do so for two related but distinct reasons.
The first is that those stats have been fetishized as measures of pitcher "luck" in a way that I generally, and this column in particular, hope to undermine. That is to say, I take the point of this story to be about how, despite our best efforts to correct for certain trends in populations, we often cannot accurately postdict performance for all players in a season (and that this is particularly true for players at the extremes). If I were to use HR/FB and BABIP, we'd essentially be doing the sort of thing I am disclaiming.
The second reason is that I think "chart fatigue" is real, so I try to limit the number of data dumps in my articles. That means editing with an eye toward visualizations that tell the story I am relating. HR/9 and H/9 do that well, because each measure ties up several skills. Some of those skills lots of pitchers have, some of them only a few pitchers have, and one or two of them are possessed by crazy outliers like Rivera. To drive this point home, I wanted to choose metrics for my visualization that captured all of those skills, and HR/9 and H/9 both do that.
Thanks for taking the time to read and comment.
It seemed too good to be true, so I had to look it up. He's only had 1 such season since 1996, that being this one. If you're looking at the same statistics table I am (from his B-R page) then you get his IBB column. The numbers are always in the low single digits, so one can easily mistake them for HR allowed if you lose track for a second and don't look at the column headings. And indeed, he has five separate seasons of either zero or one intentional walks since 1996.
I can only assume that Extreme Sports Punk Number One would strongly approve of this article.
I would argue the opposite. He's spent his career in the AL East (with Jeter playing behind him) in this current era of mega-offense and currently resides in the homer-friendliest ballpark in MLB. Oh, and he's pitched about two full seasons' worth of playoff games (often against some of the game's best lineups) where his stats DWARF his regular season stats.
btw, his one HR allowed this year is the shortest one Jason Kubel's hit (341 ft).
I can think of three offhand who I'm pretty sure could have posted comparable career numbers - specifically Goose Gossage, Rollie Fingers & Bruce Sutter. They (like all the closers back then) routinely pitched two or even three innings for a save, and seldom were called into a game unless the opposition was threatening to score. Imagine them each being called in to get three outs with no one on base.
My point is that if you say that Rivera is on a par with guys like that, I'll go along with you. But I often hear Rivera held up as the greatest relief pitcher ever, standing head and shoulders above guys like those I just named, and I'm not so sure about that.
Thanks.
And your other axis, "HR/9" and "H/9"; is this showing a distribution of individual games, or seasons, or what? I can understand that when one plots ALL other pitchers that one would get a smooth looking distibution even if the HR/9 or H/9 were seasons, because the number of player-seasons would be quite large. But when one is considering only one pitcher and for only a portion of his career I can only assume that the distribution would be decidedly unsmooth unless the plot was of individual games.
So you can see I am still confused about what this is showing, other than say, Mo gave up less HR/9 than the rest of the relievers he was being compared to, but I already kinda knew that.
I looked up kernal density plot (on Wikipedia) since I had never heard of that before, and the explanation there didn't really shed light on what I was looking at in your article.
Would it be accurate (or helpful) to think that the "Frequency" axis was counting games that fit the variable on the horizontal axis, like a histogram?
Thanks very much.