[T]he expected runs produced from each plate appearance starting with a strike decreases by .029 runs and increases by .040 for every ball thrown on a first pitch. In other words, having as many of those 0-0 'striballs' called strikes can greatly impact the outcome of the game.
—Jerry Weinstein, Catching Coach, Colorado Rockies
The Complete Handbook of Coaching Catchers
Coaches Choice, 2014
Introduction
The mechanics of framing pitches are simple enough to explain: Quiet, mechanically sound catchers with a knack for good receiving help their pitchers by getting favorable calls from the home plate umpire. This effect has been known ever since umpires started calling balls and strikes. Although it wasn't always called framing, it has long been a source of speculation and commentary about prominent catchers.
Since the beginning of the PITCHf/x era, researchers have calculated framing in several different ways. We are presenting a new method that we will call the "Regressed Probabilistic Model" of framing (RPM for short). In brief, RPM works by calculating the combined probability (and associated run value) that each pitch will be called a strike; summing those probabilities (and run values) across opportunities; attributing those values to a player (catcher or pitcher); and regressing "career" values to the mean.
We will freely admit: If you haven't seen the results of previous framing studies, it can be tough to wrap your mind around the size of the impact of a good or bad framing catcher. These effect sizes are not out of line with what has been reported in the past, but they're still obscenely large. Everyone agrees that Mike Trout was either a deserving MVP or a deserving runner-up in each of the past two seasons, which the stats say were worth close to 10 wins apiece. Our data suggest that over the past five years, the teams that have employed good framers like Jonathan Lucroy, Brian McCann, and Jose Molina have received essentially "free" MVP-caliber seasons from framing alone. (Each of those catchers has been worth about two extra wins per season over that span). This is a staggering amount of value. Add in the fact that these wins are almost assuredly not properly priced into the free agent market, and the difference between having a good framing catcher or a bad framing catcher can make or break a cost-conscious team.
Method
Calculating Probability for Each Pitch
Rather than identifying a single strike zone and giving binary credit for each pitch relative to that strike zone's borders (i.e., strike or no strike), our model gives partial credit for each pitch based on that pitch's likelihood of being called a ball or a strike. To determine that, we created a probability map of likely calls.
To create this map from the raw data, we used a generalized additive model (using the mgcv package in R), which creates a smoothed "surface." Although there are other alternatives for creating smoothed surfaces (Dave Allen popularized the LOESS method), Brian Mills, Carson Sievert, and others have recently adopted the GAM alternative, which has the benefit of empirically identifying the correct smoothing parameter rather than setting it by hand (as in LOESS). The package also has a special function (BAM) specifically for running large models. And crucially, it supports multi-core processing, without which the processing would have taken so long that we'd be writing this article next year.
To reflect what is best known about the way the size and position of the strike zone shifts from count to count and batter to batter, we ran individual models for each set of batter and pitcher handedness as well as "pitch group" (see Table 1). The smoothing parameters of each model were allowed to vary by count, so that while the general shape of the strike zone derived for each variable combination did not change, the width and height of it did (reflecting, for example, a larger strike zone on 3-0 counts than on 1-2 or 0-2 counts). We also accounted for the changing size of the strike zone from season to season (although these yearly changes are much smaller than the other changes we measured).
Table 1: Pitch Groups
Group |
Members |
Fastball |
Four- and two-seam fastballs/sinkers |
Curveball |
Standard, spike ("knuckle"), and "slow/eephus" |
Slider |
Sliders and cutters |
Offspeed |
Changeup, splitter, screwball |
Knuckleball |
Knuckleball |
We also corrected the data in several ways before running these models. First, all pitch classifications were hand-labeled by Pitch Info to eliminate variability in pitch labels. (This is the same improved dataset that powers the BrooksBaseball.net player card pages). To account for batter height differences, we normalized the height of each pitch by the batter's height using what is now the standard formula (first published by Mike Fast). We also used the correction scheme that Mike published at BP for correcting the X and Y location of each pitch based on the likely distribution of pitch locations that each pitcher would use against left-handed hitters and right-handed hitters (the one difference here was using the LOESS smoothing algorithm rather than a moving average, which we tuned to more aggressively correct for outliers).
After we created these probability maps, we assigned strike probabilities for each pitch at each half-inch location.
Run values
Rather than simply give a single credit for each pitch (~.14 runs) as has been done in many previous models, we looked at the count in which each pitch was framed and gave credit equal to the difference in runs between framing or not framing that pitch. For example, a frame in an 0-2 count was counted as more valuable than a frame in an 0-0 count, because a frame in an 0-2 count can result in a large change in run expectancy while a frame in an 0-0 count does not have quite the same impact.
To be clear, both the positive and negative frames were calculated with that increased difference in an 0-2 count (or in any other count, depending on the run value for that count). We should note that this decision may be somewhat controversial because it's possible that counts will be unequally distributed based on the catcher's team pitching talent. However, see the following section on pitcher adjustments and note that we provide uncalculated calls above average, which should allow the interested reader to create unbiased estimates if they wish.
The run value for a framed pitch is the run value differential for that count (see Table 2) multiplied by the residual of the probability—in other words, if an 0-0 pitch is called a strike in a spot where it's normally called a strike just 80 percent of the time, the catcher will get 20 percent of the available value (.08) for a total of .0004 runs credited (which will later be adjusted based on the pitcher and umpire impact). Failing to get a strike on the same pitch would result in a .0016 run deduction.
Table 2: Framing Run Value Matrix
Ball |
Strike |
Maximum Framing Run Value Available |
0 |
0 |
.080 |
0 |
1 |
.092 |
0 |
2 |
.199 |
1 |
0 |
.112 |
1 |
1 |
.117 |
1 |
2 |
.241 |
2 |
0 |
.156 |
2 |
1 |
.098 |
2 |
2 |
.339 |
3 |
0 |
.173 |
3 |
1 |
.251 |
3 |
2 |
.590 |
As you can see, a framed pitch on a 3-2 take is worth a lot. How is that number derived? Think of it this way—a strikeout costs the batting team -0.28 expected runs, while a walk earns them 0.31 expected runs. The difference is .59 runs. These 3-2 takes don't happen often (0-0 pitches contribute the most, which is not surprising), and when they do, the catcher just gets credit proportionally as described above. RPM isn't doling out half-runs in a single shot with any kind of regularity
Player Attribution
Because catching necessarily involves pitching, and because pitching talent is not equally distributed across the league, it can be difficult to correctly assign credit for each catcher's contribution to a framing total. For example, if Mariano Rivera, Brian Wilson, or Derek Lowe is your batterymate, you are likely to get more favorable calls than if your batterymate is Andrew Miller, Brandon League, or Micah Owings.
We empirically determined each pitcher's value—to isolate it from each catcher's value—by performing a WOWY ("With or Without You") analysis. We note that we also compared these values to a linear regression model that included pitcher and catcher as separate factors; the high correlation between these measures suggested a good degree of ability to correctly assign credit (or blame) to individual players. The WOWY adjustments provide a viable and modular means of assessing the impact of pitchers on framing.
The adjustments derived from the WOWY analysis reflect two aspects of our approach. First, pitchers who throw a pitch that may not fit the norm for a given pitch group may show some difference in the WOWY results (such as hard cutters in the slider/cutter group). Second, pitchers with better command of a pitch than their peers (or the unqualified respect of the umpire) will seem easier to frame.
The WOWY analysis created adjustments ranging from +/- .1 called strikes per opportunity and from +/- .01 runs per opportunity. The largest gross beneficiary of easy-to-frame pitchers was—Yadier Molina. The perennial gold glove winner started the analysis with 127 runs added before giving 60 back to his pitchers. This reflects the command contributions of teammates of the class of Chris Carpenter and Adam Wainwright and is no knock on Molina, who still ranks high overall.
Table 3a: Largest and Smallest Pitcher Impact: Total Framing Runs Added 2008-2013
53 |
|
Derek Lowe |
47 |
40 |
|
39 |
|
38 |
|
-18 |
|
-18 |
|
-20 |
|
-30 |
|
-40 |
Table 3b: Largest and Smallest Pitcher Impact: Framing Runs Added per 2000 opportunities 2008-2013 (min. 500)
40 |
|
21 |
|
Mariano Rivera |
20 |
20 |
|
18 |
|
-13 |
|
-14 |
|
-14 |
|
-15 |
|
Brandon League |
-15 |
The difficulty of framing a particular pitch—the difference between a fastball and a knuckleball—is already accounted for in the probabilistic model. R.A. Dickey's catchers may earn an adjustment above and beyond the credit already given to them for handling a knuckleball if Dickey is harder to catch than his peer group. Dickey actually outperforms the model by a bit, so his catchers get a small deduction.
According to the RPM method, Tom Glavine was a wizard at getting extra strikes, which supports his reputation.
Umpire Adjustments
|
---|
But wait—there's more. The RPM concept can be applied to another catching skill—prevention of passed balls and wild pitches. For lack of a better term, we'll call this blocking. Sometimes a block is simply catching a pitch. This model uses the spatial location of the pitch (where it did strike or would have struck the ground) to determine the probability of a passed ball or wild pitch. Pitch types are accounted for and the model is further adjusted—also via WOWY—for the impact of the pitcher. Runs are a generic .28 per PB or WP prevented, allocated proportionally. This blocking skill is quite real, but not as spectacular as framing. Instead of the top-to-bottom difference in the league being on the order of 50 runs, as it is with framing, the blocking skill range is closer to 10 or 15 runs. It also takes a heavier dose of regression. You'll find blocking on both the player cards and the sortable stat pages (see the "New Site Features" section below). |
We also made systematic but small changes to the data based on the umpire who was calling each game. Because umpires are randomly distributed throughout the data, they tend to have a very small effect on a measure of framing, although they might seem to have a large effect within any individual game. For example, if a particularly generous umpire calls a Jose Molina game on Monday, and then a particularly conservative umpire calls a Jose Molina game on Thursday, although the umpire will have exerted effects within an individual game, Jose Molina's skill will come through in the aggregate.
From our earlier example of a (not) randomly selected Molina, Yadier lost just three runs to the umpire adjustment from his post-pitcher WOWY adjusted tally.
Regression to the Mean
Like other skills, catching involves not only some amount of talent, but also some amount of luck. We've dealt with some of that luck by attempting to correctly attribute runs to catchers (who don't ordinarily get to choose their batterymates), but there are also other sources of luck and inexplicable variability.
To control for this luck, we have regressed career totals to the league average. The amount that we regressed each catcher was based on a measurement of stability for both framing calls and framing runs determined by the intraclass correlation ("ICC") of each measurement. See the "Results" section below for a description of how these correlations were computed and the determination of stability. Because seasonal variability is different from career variability, we also regressed seasonal totals to career totals based on a similar formula.
ICC consistency and agreement both showed that a 50/50 point (where a player's regressed data would consist of 50 percent his own and 50 percent of the mean values) occurred after ~290 framing opportunities (a pitch that isn't swung at and has a called strike probability >0 in our model) for the number of called strikes, and ~430 opportunities for the associated run values. Even the busiest catchers in our sample were regressed to at least .06 percent and .09 percent of the mean for their called strikes added and run values, respectively.
Results
Framing Runs
The big winners in total framing runs are the good receivers who provide enough offense, combined with durability, to pile up innings behind the plate. But you'll also find the likes of Jose Molina, a catcher who wasn't a perennial no. 1 guy until he met a team that properly valued what his glove could do.
Table 4a: Top 10 RPM Framing Runs Earned 2008-2013
Brian McCann | 127 |
Jose Molina | 116 |
Jonathan Lucroy | 94 |
Russell Martin | 91 |
Ryan Hanigan | 74 |
David Ross | 65 |
Yadier Molina | 62 |
Miguel Montero | 62 |
Chris Stewart | 43 |
Buster Posey | 41 |
On the trailing end you have players who had to contribute in other ways behind the dish and at the plate to accrue any value.
Table 4b: Bottom 10 RPM Framing Runs Earned 2008-2013
Ryan Doumit | -124 |
Gerald Laird | -83 |
Chris Iannetta | -75 |
John Buck | -55 |
Nick Hundley | -55 |
Carlos Santana | -54 |
Rob Johnson | -47 |
Kurt Suzuki | -47 |
Lou Marson | -42 |
A.J. Ellis | -40 |
Slicing runs into a rate stat, we can see some less-used catchers who stand out, even with their numbers regressed more the everyday guys. Seven thousand opportunities is roughly a full season's workload (some catchers handle over 8000, but those are unusual seasons), so we've set that as our standard for comparison. Jose Molina distinguishes himself further in this view, and Yasmani Grandal also appears among the elite.
Table 5a: Top 10 RPM Framing Runs Earned per 7000 Opportunities 2008-2013
Jose Molina | 35.9 |
David Ross | 32.8 |
Yasmani Grandal | 32.4 |
Jonathan Lucroy | 31.0 |
Chris Stewart | 28.3 |
Gregg Zaun | 23.8 |
Mike Zunino | 23.5 |
Ryan Hanigan | 23.3 |
Carlos Corporan | 22.3 |
Brian McCann | 22.2 |
And then there's the list of the worst-rated receivers on a rate basis, which includes more than a few Mariners.
Table 5b: Bottom 10 RPM Framing Runs Earned per 7000 Opportunities 2008-2013
John Hester | -35.7 |
Ryan Doumit | -35.5 |
Rob Johnson | -26.0 |
Jesus Montero | -25.4 |
Kenji Johjima | -23.8 |
Gerald Laird | -23.5 |
Adam Moore | -22.5 |
Carlos Santana | -21.8 |
Lou Marson | -20.5 |
John Jaso | -20.8 |
This table of yearly total framing runs shows that Lucroy and McCann have combined for five of the six titles. Again, Mr. Grandal makes an appearance.
Table 6a: Yearly Leaders and Trailers, RPM Total Framing Runs
2008 | ||||
Brian McCann | 33 | Ryan Doumit | -51 | |
Jose Molina | 26 | Chris Iannetta | -25 | |
Russell Martin | 19 | Dioner Navarro | -24 | |
2009 | ||||
Brian McCann | 23 | Gerald Laird | -29 | |
Ryan Hanigan | 22 | Kenji Johjima | -20 | |
Jose Molina | 19 | Ryan Doumit | -20 | |
2010 | ||||
Jonathan Lucroy | 28 | John Buck | -22 | |
Chris Snyder | 19 | Jason Kendall | -20 | |
Jose Molina | 19 | Jorge Posada | -19 | |
2011 | ||||
Jonathan Lucroy | 26 | Carlos Santana | -13 | |
Brian McCann | 24 | Matt Treanor | -12 | |
Russell Martin | 16 | Ryan Doumit | -12 | |
2012 | ||||
Brian McCann | 24 | Carlos Santana | -30 | |
Jose Molina | 24 | Ryan Doumit | -21 | |
Buster Posey | 21 | Wilin Rosario | -15 | |
2013 | ||||
Jose Molina | 24 | Ryan Doumit | -16 | |
Jonathan Lucroy | 22 | John Buck | -15 | |
Chris Stewart | 19 | Chris Iannetta | -15 |
Table 6b: Yearly Leaders and Trailers, RPM Framing Runs per 7000 chances
2008 | ||||
Jose Molina | 33.3 | Ryan Doumit | -53.7 | |
David Ross | 30.0 | Nick Hundley | -33.0 | |
Brian McCann | 27.6 | Chris Iannetta | -29.9 | |
2009 | ||||
Jose Molina | 49.4 | Ryan Doumit | -32.1 | |
David Ross | 39.5 | Kenji Johjima | -30.4 | |
Ryan Hanigan | 31.6 | Rob Johnson | -25.0 | |
2010 | ||||
Jonathan Lucroy | 44.2 | Rob Johnson | -37.9 | |
Jose Molina | 40.9 | A.J. Ellis | -30.9 | |
David Ross | 27.8 | Jorge Posada | -27.0 | |
2011 | ||||
Jonathan Lucroy | 38.2 | Ryan Doumit | -40.4 | |
Brian McCann | 34.8 | Matt Treanor | -30.9 | |
Chris Stewart | 34.1 | Carlos Santana | -27.5 | |
2012 | ||||
Chris Stewart | 36.1 | Ryan Doumit | -45.1 | |
Jose Molina | 35.4 | Carlos Santana | -34.5 | |
Yasmani Grandal | 32.9 | Gerald Laird | -33.4 | |
2013 | ||||
Jose Molina | 35.0 | Ryan Doumit | -42.9 | |
Martin Maldonado | 29.5 | John Jaso | -25.6 | |
Hank Conger | 28.3 | John Buck | -18.1 |
In 2011, Jonathan Lucroy contributed a net difference of 39 framing runs to his team compared to Carlos Santana. That's somewhere in the area of four wins if we simply assume that "average" is a sufficient proxy for "replacement level" when it comes to this skill (a true analysis of that question waits for another day). If we include blocking, the difference grows.
With the combined tools of RPM framing and blocking (see sidebar), we begin to get a more complete picture of a catcher's value. Carlos Santana has an elite bat, one that's worth getting into the lineup. But beware the hidden costs: factor Santana's receiving and blocking into the equation, and Lucroy looks like the more valuable player, despite his somewhat weaker bat.
Table 7: Lucroy vs. Santana With and Without RPM (Framing + Blocking)
Year |
Lucroy WARP |
Santana WARP |
Lucroy RPM "Wins" |
Santana RPM "Wins" |
Lucroy WARP+RPM |
Santana WARP+RPM |
2011 |
1.6 |
3.3 |
3.0 |
-1.3 |
4.6 |
2.0 |
2012 |
2.3 |
3.6 |
3.1 |
-2.9 |
5.4 |
0.7 |
2013 |
2.2 |
4.8 |
3.6 |
-0.6 |
5.8 |
4.2 |
Santana's value is not found behind the plate. He is an extreme case where even his loud offensive skills are nearly washed out by his receiving deficiency. Reducing Santana's time behind the dish (see 2013 and his conversion to third base this offseason) brings him back into the value range you're used to seeing next to his name.
It's always good to get some validation of so-called advanced metrics. In this case, we contacted two professional catching coaches (former major leaguer Rob Bowen of Red Alert Baseball and Kevin Wheeler) to get their ratings for close to 30 current catchers. We didn't share our data with the raters, so they weren't influenced by RPM's ratings.
Max Marchi for inspiration and guidance at the inception of this process.
Mike Fast for his previous research on this subject.
Russell Carleton for a review of an early draft and our methodology.
Rob McQuown and Bill Skelton for their assistance with website data integration.
And various unnamed analyst who helped with our "Did Bayes Weigh in on Time Travel" inquiry on regressing seasons toward careers.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
But, the numbers are not park-factor adjusted (as home runs or doubles or whatever might be in a context-neutral stat). It's an interesting idea. Of course, some of those guys are also probably pretty bad at framing pitches, so...
Thanks for the replies.
On one hand, you aren't adjusting for disproportionate counts received, which implies to me you are looking to measure past value and not true talent.
On the other hand, you regress to the mean, which feels like you are trying to measure true talent and not value.
It seems an odd mix.
I am not discounting the fact that pitch framing is a valuable skill for catchers to have but its based on the subjective error of an umpire.
Catcher framing is a way to assign that skill to each catcher, independent of the umpire. Also, if you read the article, we account for umpires. So...
And then, if you are able to predict it somehow, I just don't see how it can translate into as much value as they claim it does.
BTW, love the trade calculator, I still love using it.
One of the principle points of the offensive component of WARP is making everything context-neutral. You don't get extra credit for hitting a home run when it makes the most difference for your team - that sort of thing is accounted for by WPA instead. It strikes me as strange, then, to use count-dependent run values in this work. To my reading, including these numbers in a WARP calculation adds context-dependence that was strictly avoided previously. If 0.14 runs is the average run value of a changed ball/strike (and I understand that it is), why not just use the probability against that value across all counts?
If you look at our sortables, you can see the .14 based values for comparison. The field is "FR_RUNS_ADDED_BY_CALL" which reminds me I need to update the glossary before Ben notices I haven't.
I can see how it's a tricky call whether to use these for WARP though, as they do seem to not follow the philosophy used otherwise.
In any case, having all these numbers are wonderful, and how they get treated at the last stage will be up to user preference, I suppose.
Love that you guys had the Pitch Info tags and x,y corrections in place - great additions.
Other factors that I've found that had small impacts on the zone size are out state, base state, and league (NL slightly lower). These would be minor compared to basically everything you've controlled for....and only worth thinking about if you're going to let the model run for months in the background to see if they make any difference at this level.
Nice work!
Well, if better catchers are getting more strikes, on average, before that count occurs, then they are more likely to see those high-run-impact counts in the first place.
Since I don't have the data, I don't know how much effect they would have on the final tallies. But definitely something to think about.
As JRoegle says, base state and out state are also important from what I've seen.
Finally, thanks for the shout out, guys. I'm glad you found the mgcv package to be a big benefit. This is awesome stuff.
I'm on the fence about this, too. On the one hand, I generally like to have things context-neutral, but pitchers, catchers, and hitters seem to have much more control over what they do in a particular count than they do in a particular base-out situation or WP situation.
We use context-neutral values for things like "the value of hitting a home run" because, how I understand it at least, hitting a grand slam isn't any harder than hitting a solo home run, it's just less likely because of the fact that the bases need to be loaded. However, framing pitches in certain counts DOES become harder/easier if the size of the zone changes.
Great work, very interesting article--can't way to see how this develops.
I do find it heartening that several catcher studies have looked at this in a few different ways now, and the player lists for each study seems to have the same names keep popping up at the top and bottom.
http://www.mlb.com/news/article/mlbam-introduces-new-way-to-analyze-every-play?content_id=68514514&partnerId=as_mlb_20140301_19352324&ymd=20140301
Fingers crossed that Pitch f/x is not eventually scrapped in favor of a tracking system that is not made public. It would be very unfortunate if MLB makes a series of decisions that result in analyses like this being impossible to underatke in the public domain.
Personally, I'm skeptical that the public will get to see much of these data. I know that MLB has a long and storied history of taking things of value and providing them for free as a public service, but something tells me that they will want to monetize this one.
Which brings me to a larger point. Since all runs are accounted for somewhere on the field, if we were to add pitch framing to WARP calcs, we would have to subtract it from someone else, no?
But, yes, you have to take it way from someone else if we add it to the catcher, and the first victim will be the dudes on the bump.
Also, if a pitcher had bad framers working for him, whether or not he was a good one, he'll get some credit back.
I think a good compromise for the "count" thing would be to do it the way you are doing it, just in case there is some "skill" at being more or less of a good framer at the various counts, and then normalize the results to an average count distribution. That is pretty much what I do with UZR, to some extent.
Also have to be careful with the regression thing which translates the observed into "skill." Most of the components of WAR and WARP do NOT do that, so if you are adding this or comparing it to other WARP components, you are adding or comparing apples and oranges.
So, are the other catching components on the player card and stats pages regressed also? Such as the catcher blocking?
Are you planning on updating these on a regular basis as the season goes on?
Again, great work!
Yes, we will be updating in-season. We'll use the 2013 model and hold off on xy corrections until we have enough to establish a reliable version for each. We'll also have to deal with new pitcher/umpire corrections, so the #s will be squishy day-to-day.
And, yes, the block data is the same RPM approach with fewer factors (we ignore count and season).
Thanks for the insight on the WARP integration, much appreciated.
1. How repeatable a skill is catcher framing? Can we (or, say, PECOTA) use it to predict future results?
2. If the same catcher tends to catch the same set of pitchers (and those pitchers tend to be caught by that catcher), we don't really have much ability to attribute the skill of getting the calls to one or the other, right? Are those catchers regressed to the mean? Is the extent of that problem quantified?
3. It's very difficult to get a sense of how good this information really is. Do you feel like you've got this skill nailed 99% for every catcher in baseball (as strong as, say, the ability to quantify offensive contributions), something less than that but still fairly strong (like some of the quantitative measures of ordinary defense), or somewhere less than that?
the wowy analysis attempts to tease apart the pitcher and catchers, we have a fair amount of confidence in that particular aspect but it's something that we intend to tune and explore in future versions
I think we have a lot more to learn about framing, but we're happy with the direction we're moving with this model (warts and all)
Thanks for the thoughtful feedback, keep it coming!
But, yes, we need to re-run that analysis and see how well it holds up. We'll actually go ahead and publish 'projections' when we post the full set of zone contour images.
I absolutely LOVE the use of context-dependent run values.
Well done, Harry and Dan.