Two years ago, I wrote the first DRA essay, focusing on the challenge of modeling descriptive versus predictive player performance. At the time, my prognosis for threading that needle was rather grim:
What is it, exactly, that you want to know? For example:
(1) Do you care primarily about a pitcher’s past performance?
(2) Are you more worried about how many runs the pitcher will allow going forward?
(3) Or do you want to know how truly talented the pitcher is, divorced from his results this year or next?
The reader’s likely response is: “I’d like one metric that excels at all three!” Sadly, when it comes to composite pitcher metrics, this might not be possible.
The article reviewed a variety of metrics from plain RA9 to Fielding Independent Pitching (FIP) to SIERA (Skill Independent Earned Run Average) to show that all of them made sacrifices that committed them to one direction or the other.
DRA itself has tried to ride alternate sides of this fence. In its first year (2015), we elected to focus on descriptive performance, and designed DRA to be the best descriptive metric of what had previously happened short of RA9 itself.
Last year, we began to question the value of prioritizing descriptive performance, and switched to focusing on future performance instead. Again, though, this was presented in terms of a choice: decide which direction you care about, and resign yourself to it.
As always, we prefer to measure success objectively. To do that, we use a Spearman correlation[1] (from 0 to 1), weighted by innings pitched, for 2010 to the present, to compare metrics. When you compare FIP to last season’s DRA formula, you get the following:
Metric |
Descriptive |
Reliability |
Predictive |
FIP[2] |
0.68 |
0.38 |
0.31 |
DRA 2016 |
0.53 |
0.50 |
0.33 |
On this chart, Descriptive is the correlation between the metric and the player’s runs allowed per nine innings (RA9) that same year. Reliability is the consistency with which the metric rates the same player in one year and then the next. Finally, Predictive measures the extent to which the metric corresponds to next year’s RA9.
DRA 2016 went all-in on reliability, viewing a consistent description of a player’s skills as the primary virtue of a component-based metric. In other words, we placed a priority on the same player getting assigned the same DRA for his skills one year as the next. This left FIP as the pitching estimator with the best handle on descriptive performance, but given a choice between two emphases, we thought descriptive performance was the inferior one. Furthermore, focusing on reliability gave us the ability to solve the challenge of DIPS, and better assess a pitcher’s true skill with respect to Batting Average on Balls in Play (BABIP).
But what if you no longer had to make this compromise? What if you could, truly, do a bit of everything: have a metric that accurately describes what a pitcher did while also reliably forecasting the skills that pitcher would bring with them to the future? If you didn’t have to choose between them, wouldn’t you want your measure of pitcher value to deliver both?
Of course you would. Thus, we are pleased to say that with the 2017 update to DRA, you can almost have it all. Again using seasons 2010 to the present, here are the weighted Spearman correlations for our metrics, this time including the updates to DRA:
Metric |
Descriptive |
Reliability (next year of metric) |
Predictive (next year's RA9) |
FIP |
0.68 |
0.38 |
0.31 |
DRA 2016 |
0.53 |
0.50 |
0.33 |
DRA 2017 |
0.68 |
0.51 |
0.34 |
Going forward, DRA has basically the same (actually slightly better) reliability and predictive qualities as before. But we’ve now managed to make DRA estimates every bit as descriptive as FIP, while preserving the other features that made DRA uniquely valuable. It has taken two years, but we’ve managed to solve a problem that we had written off as unsolvable.
How did we do this? Primarily by incorporating pitch classifications from PitchInfo into many of the DRA models. We no longer grade pitchers solely on the fact of an event happening, controlling only for externalities like platoon and stadium. Now, our models actively incorporate intrinsic pitcher information about the actual pitches that were thrown. Called strike probability, recently unveiled in connection with our pitch tunnels work, is now an explicit input in most models. Many models now also consider the type of pitch thrown (sinker? changeup? knuckleball?), the velocity of the pitch, the horizontal and vertical angles on the pitch, and the amount of vertical drop as the pitch approaches the plate.
Most of these characteristics are also classified (in a manner of speaking) by MLB’s PITCHf/x system, although we (not surprisingly) prefer the adjustments and re-classifications made by PitchInfo. Not all events benefit from these types of inputs, but for those that do (like home runs and other balls in play) the amount of additional information is enormously useful, and substantially responsible for the no-cost improvement in descriptive power shown above.
This year’s rollout reflects other tweaks as well. We’ve incorporated MLB Gameday’s fielding coordinates on balls in play to improve accuracy. We’ve also parallelized the 23 models inside DRA so that they can be run over the course of an hour, rather than five hours—meaning you can see updated values by breakfast each day instead of mid-afternoon. Finally, after discussion with Neil Weinberg, we’ve tweaked the formula for DRA-minus to make it more straightforward. By using a similar method to that of ERA-minus and FIP-minus, we think DRA-minus, which allows you to compare players across seasons, will be easier to understand and use.
The Effects of the Changes
What effect does this have on the numbers themselves? Let’s start with DRA, and with the pitchers who now look better, compared to where they were last year:
Fullname |
fld_team |
DRA_2016 |
DRA_2017 |
RA9 |
Diff |
|
MIL |
7.12 |
5.64 |
5.42 |
179.33 |
-1.48 |
|
NYN |
3.39 |
2.67 |
2.99 |
183.67 |
-0.71 |
|
BAL |
4.75 |
4.06 |
3.82 |
172.00 |
-0.69 |
|
CHN |
5.44 |
4.78 |
4.16 |
166.67 |
-0.66 |
|
SFN |
4.38 |
3.73 |
3.90 |
203.33 |
-0.66 |
|
CHN |
3.93 |
3.38 |
3.28 |
197.33 |
-0.55 |
None of these are earth-shattering, but these pitchers benefit notably when DRA focuses more on their stuff than their outputs. Jimmy Nelson’s 2016 performance has been upgraded from abysmal to merely rather bad. Noah Syndergaard has gotten even more frightening. Chris Tillman is upgraded to average, and Jeff Samardzija becomes above average. Jake Arrieta, who was DRA’s whipping boy at the start of last year, jumps back up into the realm of quite good, although like other Cubs pitchers his results are still a bit inflated by the quality of the defense behind him.
In turn, let’s look at pitchers who took a hit:
Fullname |
fld_team |
DRA_2016 |
DRA_2017 |
RA9 |
IP |
Diff |
NYA |
3.16 |
4.08 |
4.16 |
179.67 |
0.91 |
|
CLE |
3.53 |
4.40 |
5.02 |
174.00 |
0.87 |
|
TEX |
2.84 |
3.48 |
3.72 |
200.67 |
0.65 |
|
LAN |
2.66 |
3.26 |
3.69 |
175.67 |
0.59 |
|
NYA |
2.95 |
3.49 |
5.02 |
175.67 |
0.54 |
|
WAS |
3.50 |
3.97 |
4.97 |
177.33 |
0.48 |
|
HOU |
3.89 |
4.36 |
4.48 |
184.67 |
0.47 |
The quality of these pitchers’ stuff belied their results. CC Sabathia, who had a large gap between his DRA and RA9 last year, has now been downgraded close to his actually-charged runs. Josh Tomlin takes a major hit as well, although he still checks in as much better than the runs charged to him. Particularly satisfying is the decline for Michael Pineda, whose outlier status last year provided sport for certain MLB Network hosts during sabermetric TV appearances. That said, DRA remains of the opinion that Pineda’s stuff is much better than his results. The Yankees' coaching staff agrees with this, and we’ll just have to see if he can prove us all right, finally.
A refreshed version of DRA means that we have also refreshed the DRA Runs table, a subset of other stats that quickly summarizes what we think will be of most interest to you. In addition to a pitcher’s team, DRA, and innings pitched, we also are providing you with (1) their runs above average in “not in play” (NIP) events (walks, strikeouts, HBP), (2) their runs above average in “hit” events (singles through home runs), as well as (3) their runs above average in “out” events. In sequence, these will tell you the general areas where a pitcher is either succeeding or getting roughed up, as compared to an average pitcher with the same opponents and stadiums.
The best pitchers tend to do particularly well in NIP runs; others (also) specialize in limiting hard contact, which is reflected in hit runs, and still others specialize in minimizing BABIP, which is reflected in out runs. These are reflected in the headings NIP_Runs, HIT_Runs, and OUT_Runs respectively. In all of these categories, negative numbers are favorable to the pitcher (good) and positive numbers are hurting the pitcher (bad).
Lastly, let’s take a quick look at the effect of these updates on DRA-minus. Since its purpose is to allow comparison across seasons, we’ll give you a short list of the updated “best seasons” since 1951, which is DRA’s current earliest season. In light of one of the names on this list, we’ll just provide this without further comment:
fullname |
fld_team |
year |
DRA_minus |
DRA_final |
IP |
ATL |
1995 |
32 |
1.48 |
209.7 |
|
BOS |
2000 |
34 |
1.69 |
217 |
|
Pedro Martinez |
BOS |
1999 |
35 |
1.71 |
213.3 |
SEA |
1995 |
37 |
1.70 |
214.3 |
|
Randy Johnson |
ARI |
2004 |
38 |
1.74 |
245.7 |
ARI |
2002 |
40 |
1.78 |
259.3 |
|
Randy Johnson |
ARI |
2001 |
42 |
1.94 |
249.7 |
BAL |
2007 |
43 |
1.95 |
182 |
|
Greg Maddux |
ATL |
1994 |
43 |
2.03 |
202 |
Pedro Martinez |
MON |
1997 |
44 |
1.95 |
241.3 |
MIL |
2004 |
45 |
2.08 |
237 |
|
Randy Johnson |
SEA |
1994 |
46 |
2.15 |
172 |
Randy Johnson |
ARI |
1999 |
46 |
2.28 |
271.7 |
MIA |
2016 |
46 |
2.00 |
182.3 |
|
Greg Maddux |
ATL |
1996 |
47 |
2.26 |
245 |
Why should you care?
DRA’s reliability from year to year demonstrates that it is built on a solid foundation. It achieves state-of-the-art results despite including certain baseball events (such as balls in play and home runs) that other estimators either refuse to consider or take only at face value. Balls in play do not simply cancel each other out; rather, a pitcher’s ability to control them is directly related to his success, and a quality assessment of pitcher skill should take them into account.
Some have expressed concerns about DRA’s methodological complexity. In some respects, those criticisms are fair. However, I would offer a few points in response. First, there are many baseball statistics with poorly-understood calculations (e.g., “earned” runs) which fans of all experience levels rely upon anyway. Much of our perception about “complicated” stats is based on our strong bias toward what we already know and therefore prefer. Second, the correlation data we give you provides independent verification of DRA’s accuracy and can be replicated by anyone who downloads the exact same data from our site. This allows you to have confidence in DRA’s methods without having to reverse engineer them for yourself.
Finally, I strongly believe that the last generation of sabermetric analysis, to its credit, managed to wring pretty much everything there was to be found inside plain algebra and basic linear regression. If we want further accuracy, that is going to require more complexity. You may decide that complexity is ultimately not for you, but for those who want more understanding and better analysis, increased complexity is inevitable.
The Path Forward
At this point, we don’t anticipate further changes to DRA this season. DRA does not presently incorporate exit velocity, although it’s not clear that would help anything, as there are still a lot of batted balls escaping detection. Furthermore, DRA now equals or exceeds the performance of other component pitcher metrics in the public domain, which limits our appetite for further tinkering. DRA of course remains the rate foundation for pitcher wins above replacement player (PWARP) here at Baseball Prospectus.
Nonetheless, if you think you have a good suggestion for how we can make it better, we are always all ears. Likewise, if you have any questions about these or any other changes, we hope you’ll let us know in the comments below, on Twitter, or by any of the other means we are reachable. We appreciate your continued interest and especially your financial support of our research.
Special thanks to the BP Stats team for their review and feedback.
[1] Ahmad Emad & Paul Bailey (2016). wCorr: Weighted Correlations. R package version 1.8.0. https://CRAN.R-project.org/package=wCorr.
[2] Again, the reason we use FIP in all of these comparisons is not to pick on FIP, but because if your proposed metric does not beat FIP in any of these three categories, you are probably just wasting people’s time.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
I will cut a new spreadsheet today, with the impact of the DRA changes included.
This is a big change all around - since all the past data is impacted (DRA and related stats, from which WARP is computed). And projections are based on a model which utilizes much of the past data.
cFIP remains the most reliable estimator from year to year (.6); it takes a hit on descriptive power (.51) and is still competitive on next year's RA9 (.34). These numbers are using the same seasons and method described in the article.
Thanks for letting me clarify this.
"cFIP maintains an advantage over DRA in year-to-year reliability, smoking all competitors, including xFIP and SIERA in that regard. This arguably means that cFIP remains useful, at least for that purpose, but it is noteworthy that this increased reliability does not come with added ability to predict future pitcher runs allowed."
So if cFIP does not do a better job of predicting future pitcher runs allowed, why is it still the better tool for predictive purposes?
Maybe I just don't understand what cFIP's "increased reliability" means in this context. If cFIP is a better stat is better than DRA for the purpose of reliability, what is reliability?
Hope that question made sense.
I suspect that the best pitchers, by virtue of throwing fewer innings, are also throwing better and harder overall, but that doesn't need to be true to make the point.
Just eyeballing the DRA and pWARP leaderboards, it would appear that, in general, many of the top pitcher seasons are now slightly less valuable. That is, in the last iteration, RJ and Schilling's 2001 seasons were worth 11 or 12 WARP,and now they're like 10.5 and 9.2 or so.
Is this delta redistributed to the position players? That is, in an earlier version, RJ/Schilling's DRA was lower, and their catchers were higher as some portion of their success was attributed to the good framing of Damian Miller. In the next iteration, their WARP increased significantly, and I assume ( but don't know) that Miller's WARP would drop as a result. This version lands somewhere in between, but for a guy like Arrieta - does his increased WARP come at the expense of the Cubs defenders/catcher? Have any WARPs for Mets position players changed as a result of Thor's WARP increasing? Or are they pretty independent, and, FRAA needn't line up with the defensive numbers implied in DRA?
Sorry I did not notice this earlier. Fortunately, our pitch data all goes through Harry Pavlidis first, so it is fairly seamless, once the corrections get made.
Awesome article. I am a little confused after reading the comments though, and came up with a simple question that could help everyone out.
If I want to look at one statistic at the all-star break, that will be the best predictor of how that pitcher will perform over the second half of the season, what statistic should I use? DRA, cFIP or SIERA? Thank you.
An answer here would be huge for the kid. I'd really appreciate.