The Stats Go Marching In: Catcher Framing Before PITCHf/x

May 16, 2013

Analysis of framing has intensified over the past couple of years, with Joe Maddon talking about it on the radio and (via Ben Lindbergh) Clubhouse Confidential and MLB Network’s Diamond Demo series featuring discussions of the issue with guests like Jonathan Lucroy. Ben has been running a weekly column on the subject since the start of the season: in the first installment (as well as this piece for Grantland) he provided some background on the research so far, so you’re invited to have a look at that article before you read the rest of this one.

Framing evaluation is one of those research subjects that has been made possible by PITCHf/x data, which means that we’re now into the sixth full season for which catcher framing can be measured. However, for quite some time, I’ve been thinking about this: if one could get a good approximation of the framing numbers just using Retrosheet pitch sequences, 20 years of catcher framing could be added to the discussion. When Ben jogged my memory recently, I decided it was time to stop thinking about it and start doing some number-crunching.

The Method
Going back to 1988, Retrosheet has data with a fair degree of completeness for pitch sequences, indicating the outcome (ball, called strike, swinging strike, foul, and so on) of every pitch thrown.

For each plate appearance, I counted the number of pitches not featuring a swing by the batter (basically balls and called strikes), with the useful Chadwick Tools saving me a lot of time and work.

In the original model I created with PITCHf/x data, in addition to using the location coordinates as measured by the camera system and the pitch type as classified by the MLBAM algorithm, I controlled for the effect of the ball/strike count, the home plate umpire, the pitcher, and the batter—plus, obviously, the catcher.

Since that model requires a lot of computing time, in order to update my numbers once in a while, I switched to a simpler but quicker model in which the pitcher and the batter are not accounted for. In fact, once the location and pitch type are factored in, the batter has very little effect on the call by the umpire (mostly due to his stance and proximity to the plate, I suppose). The effect of the pitcher is also reduced, and I decided that the tradeoff between accuracy and computing time was worth the exclusion. However, with Retrosheet data, we have no information on pitch location and type, so throwing the pitcher and the batter back into the model was necessary.

In short, for every plate appearance I have the percentage of strikes on pitches not swung at as the outcome variable and the four actors involved (pitcher, catcher, umpire, batter) as the predictors. As I have done many other times in my baseball analysis, I have used a Cross-Classified Multilevel Mixed Model, which for saber-oriented people I’ll call WOWY-on-steroids.

Note that when using PITCHf/x data, an extra strike is more or less attributable to something framing-related, being it a good reception by the catcher, the pitcher hitting the target, or the umpire being deceived (or, more likely, a combination of the three). However, when no information is available about location, several other factors come into play: among the called strikes are, for example, pitches thrown right down Broadway that may have not been swung at because of the batter’s tendencies (partly accounted for as the batter is in the model) or because great sequencing has fooled the batter. Thus, this version of framing might include at least some pitch-sequencing effect as well.

Comparing Retrosheet and PITCHf/x numbers
Obviously, the first thing to do before calculating and showing numbers going back to 1988 is to test how the rankings based on Retrosheet-only data compare with the PITCHf/x version for the years that have the more detailed data.

Let’s start by showing a scatterplot featuring framing runs saved (prorated to 5,000 pitches caught*) by catchers in the seasons from 2008 to 2012. The darker dots denote a higher number of pitches caught, signifying more reliable estimates.

* Keep in mind that from here on, when I write “pitches caught” I really mean “pitches caught with no swing attempt by the batter.”

Not a bad start. The chart displays a good agreement between the two different models; the Pearson correlation coefficient, weighted for the number of pitches caught, is a healthy 0.72.

One important difference between the two methods is the distribution of ratings. The PITCHf/x-based numbers are more dispersed: when one considers catcher-seasons with at least 1500 pitches caught, the standard deviation is close to 13 runs for the PITCHf/x numbers and about 7.5 for the Retrosheet ones. That means the Retrosheet-based values (I’ll call them “RetroFraming”) will yield more conservative results.

Given the good agreement of RetroFraming with the PITCHf/x-based numbers, we can move on to showing some numbers going back to 1988, keeping in mind that we’ll less likely see extreme values with this metric.

Single-season achievements
The best catcher-framing season of the last quarter century belongs to Brad Ausmus, with 36 runs saved for the 2000 Detroit Tigers.

Here a note is due. In the previous section, I warned that RetroFraming numbers give more conservative results: in fact, there is no trace of a 50-run season. A recent revision of my algorithm has changed Jose Molina’s PITCHf/x framing value for 2012 to 41 runs, but that would still make it higher than Ausmus’ 2000. RetroFraming has Molina’s 2012 at 25 runs saved, which is quite a difference.

I know such discrepancies can be enough for some people to turn away altogether from this article and others on framing, as they often do when two play-by-play-based fielding metrics disagree on an evaluation of any position player. However, what I make of these numbers is this:

There are two metrics that strongly agree: no catcher over the past five years is rated above average by one and below average by the other.
According to either method, a good framing catcher can be expected to bring his team a handful of extra wins in a single season.
The PITCHf/x-based method is more precise and less likely to be pulling in other aspects of a catcher’s defensive performance, so for seasons where both methods are available, I would tend to trust its output over the Retrosheet estimate. If you’re skeptical that the big numbers associated with the PITCHf/x approach could be accurate, Mitchel Lichtman’s testing from last year might lay some of your concerns to rest.
Teams with analytically minded front offices are already making seven-figure decisions based on numbers like these.

Enough talk—here are the 20 best RetroFraming seasons since 1988.

Catcher	season	pitches	Run Value
Brad Ausmus	2000	10863	36
Russell Martin	2008	10861	30
Jason Varitek	2007	9404	26
Jose Molina	2012	6347	25
Joe Mauer	2010	7778	23
Paul LoDuca	2003	9057	23
Javier Lopez	2000	9615	23
Brad Ausmus	2005	8576	22
Jose Molina	2008	6665	22
Johnny Estrada	2004	9127	21
Brad Ausmus	2006	9282	20
Jonathan Lucroy	2011	9637	20
Jason Varitek	2002	9202	19
Ramon Hernandez	2002	9298	19
Tony Pena	1991	9954	18
Joe Mauer	2005	7461	18
Mike Scioscia	1989	8353	18
Ramon Hernandez	2001	9756	18
Russell Martin	2010	7138	18
Carlton Fisk	1990	7733	18

At age 40 Carlton Fisk was still capable of a top-20 season. In a subsequent section, I’ll take a look at aging curves for the framing skill.

In case you’re wondering, the worst season belongs to framing whipping-boy Ryan Doumit (2008) by a mile, with Jason Kendall (2000) and Jorge Posada (2005) just a bit better.

Career framers
Ausmus also gets the career laurel as the cumulative king of framing for the past quarter century. In an 18-year career behind the plate, he added roughly one win per season through his ability to earn extra strike calls. Once more, the purported divide between scouting and statistical analysis is revealed to be a false one: way before numbers-based discussions on framing were made, teams were willing to give playing time to weak-hitting catchers like Ausmus because of their defensive ability.

Catcher	pitches	Run value
Brad Ausmus	135045	179
Jose Molina	49116	122
Jason Varitek	107444	111
Joe Mauer	58510	102
Russell Martin	67441	99
Javier Lopez	94920	89
Yadier Molina	76486	87
Tony Pena	68627	83
Mike Piazza	113843	78
Charlie O'Brien	47143	73

Jose Molina is a solid second, despite much more limited playing time. In fact, over the same amount of playing time, we’d estimate Molina to be close to twice as valuable as Ausmus. Below is the Top 10 list for prorated (to 5,000 pitches caught) values, minimum 25,000 pitches.

Catcher	pitches	Run value / 5000 pitches
Alberto Castillo	25686	13
Jose Molina	49116	12
Sal Fasano	26642	10
Joe Mauer	58510	9
Johnny Estrada	41128	8
Charlie O'Brien	47143	8
Todd Pratt	33520	7
Ryan Hanigan	26178	7
Russell Martin	67441	7
Mike Scioscia	37835	7

At the bottom of the list, depending on whether you prefer the counting stat or the prorated version, are either Charles Johnson (costing more than a win per year for 12 seasons) or, once more, Ryan Doumit.

Year-to-year correlation
So what do we do with 25 seasons of ratings? The first thing I thought of is running a year-to-year correlation. I did the usual matching of every catcher with his previous-year-self and produced the following plot, which shows the year-to-year correlation for runs saved per 5,000 pitches caught. Again, the shading of dots indicates the underlying number of pitches (minimum between the two seasons considered). The weighted Pearson correlation coefficient is 0.52.

A look at aging
The second analysis it made sense to perform with 25 available seasons is an exploration of aging. I looked at the subject through a few different statistical lenses, but the results were fairly consistent. Basically, the aging effect is very small, with no more than two runs separating the prime from the career nadir. Below is a chart showing an estimated career curve, featuring a slight improvement until age 25, followed by a gentle decline.

Below are charts for a few interesting careers. In each one of them, the dots indicate the seasonal ratings, the thinner line is a smooth curve through the data points based on the displayed catcher’s data only, and the thicker line makes use of data coming from the other catchers as well (sort of regressing the curve).

Here’s Jose Molina, who just keeps getting better:

Ausmus also improved throughout his career:

Posada, on the other hand, displayed a declining trend:

Finally, Piazza’s numbers were consistent throughout his career:

What’s next?
So far I’ve been reluctant to combine game-calling numbers with PITCHf/x-based framing ratings because they’re derived from different sources, with different levels of granularity. But with the framing approach presented here, I now feel more comfortable in subtracting framing from what I termed game-calling, which actually was more of a sum of framing plus calling. Thus, in the future I plan to explore the quantification of game-calling further.

In this article I’ve used pitch-by-pitch data without PITCHf/x information to generate historical leaderboards. However, this kind of data is also available for Minor League Baseball going back a handful of years, so numbers like those shown above can be calculated for lower levels of baseball as well. In that way, good framing catchers might be identified before they reach The Show. And while it might be a long time before we see ubiquitous pitch-tracking technology in the college game, recording pitch outcomes is much more feasible, meaning that teams might even use this information for drafting purposes.

Incidentally, while refining this article, I mentioned its contents to a baseball insider (who obviously will go unnamed here), and he stated, “It's an idea potentially worth millions of dollars.” So, clubs with college pitch-by-pitch data: feel free to knock at my door.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Max Marchi

More about:

Latest Articles

You need to be logged in to comment. Login or Subscribe

Mooser

5/16

Amazing Max, this is fantastic. You talk about 'game calling' which is a combination of calling a pitch, calling for a pitch location, and then framing that pitch. Is all this attention on pitch-framing worth little if a good framing pitcher is a terrible pitch caller / locator. Shouldn't more attention be on the other two pieces of this puzzle before we start lobbying Jose Molina / Brad Ausmus for the Hall of Fame. I recall in a previous article of yours that there are good framers that ultimately lose that value as a result of poor game calling. Or have we dismissed 'calling' to be less of a skill and more of a manager function. Thanks again.

Reply to Mooser

maxtoki

5/17

I plan on looking more into the 'game calling' issue.
From explorations on the data I have done, the composite value is much driven by framing, but there are exceptions.
And the exceptions seem to be consistent year to year: for example, I have A.J. Pierzynski nowhere close to the top in framing, but he seems to be one really improving his pitchers.

So, yes, you'd want everything correctly rated.
I prefer having separate numbers because maybe one skill can be trained more than the other, or someone else might take charge of it (you may be OK with a good framer / bad caller by having all the calls coming from the bench, for example).

Reply to maxtoki

pizzacutter

5/16

Max, what's the stability from year to year in framing "ability"?

Reply to pizzacutter

JDanger

5/16

Absolutely wonderful.

Reply to JDanger

TangoTiger1

5/16

One of the best research pieces of the year. {clap clap clap}

Reply to TangoTiger1

rawagman

5/16

This is extremely fascinating, but I have never been able to come to grips with how values are derived from these extra frames strikes. Can someone kindly lead me to a concise explanation of how we get from extra strikes to extra runs?

Reply to rawagman

marctacoma

5/16

Here's one from ex-BPer Dan Turkenkopf: http://blog.stealingfirst.com/2008/04/02/switching-a-ball-to-a-strike/

Reply to marctacoma

rawagman

5/16

Interesting. So if the value of a ball/strike varies per the count at the time, should we not also be assigning value accrued to receivers based on when they gained or lost a call?
To those who would say that it evens out, I would reply that catchers have the ability to call that type of pitch when they want to, thereby putting their framing skills to the test.
Am I wrong?

Reply to rawagman

maxtoki

5/17

When I calculate PITCHf/x-based framing numbers I use count-specific values for extra strikes gained (lost). See here: http://www.hardballtimes.com/main/article/searching-for-the-games-best-pitch/

And you're absolutely right.
A good framer chooses when it's more important to apply his skill, while a bad one isn't selectively worse in some situations.

Reply to maxtoki

rawagman

5/17

Thank you

Reply to rawagman

pjbenedict

5/16

I am a huge Twins fan, and I'm surprised at how well Mauer fares. I've always thought he's great, but legendarily excellent across the history of this data?

Wow.

Reply to pjbenedict

jroegele

5/16

This is fantastic work, I really enjoyed this. Pitch framing is actually one area where I would have expected increased performance as catchers age as opposed to a decline. I would have guessed calling games and hitting would have been higher on the list than pitch framing, especially further back in time, so that catchers would have gradually improved at this skill as they aged. I guess I'm surprised that the peak is around age 25 and not older.

I'm looking forward to your follow up articles on this subject!

Reply to jroegele

LlarryA

5/16

I think there's a fairly big split between good framers and bad in how they age. A top-notch framer (Ausmus, Molina, Lucroy...) is going to refine his craft and get better at it (though there's probably a practical upper bound), even as physical skills may decline. A bad framer most likely doesn't care much, and may even develop and cement bad habits that make him worse over time. This is even more likely given that there hasn't historically been all that much attention paid to this subject. Piazza probably had a pretty decent natural feel for it, but without much incentive or opportunity to work on getting better.

Reply to LlarryA

maxtoki

5/17

I also thought the peak would occur later, but Llarry's point is really worth for being explored.

Reply to maxtoki

sportspopery

5/17

This is awesome. Is there a particular reason that Scoscia had such a great season in 1989 yet fares so poorly in the overall analysis?

Reply to sportspopery

NathanAderhold

5/18

This is amazing stuff.

There's still one thing I'm unclear on though... can you explain a little more about how are you accounting for whether a strike was framed or not without any indication of the location of the pitch?

Reply to NathanAderhold

drawbb

5/18

I was about to ask the same thing.

With respect to Max's research, calling this framing can't possibly be accurate if you are using just play-by-play data. "Catcher pitch selection" or "catcher pitch sequencing" maybe.

Reply to drawbb

maxtoki

5/19

Nathan and drawbb, you're right. I acknowledged this in the final paragraph of "The Method" section.

Reply to maxtoki

The Stats Go Marching In: Catcher Framing Before PITCHf/x

Thank you for reading

Latest Articles

Please, No! Not Another Closer Article! $

Deep League Landscape ’24: Week Three $

Box Score Banter: A Very, Very, Very Fine Houck B

MLU: Potential Rotation Fitts $

The Call-Up: Jonatan Clase $

Max Marchi

More about:

Latest Articles

Please, No! Not Another Closer Article! $

Deep League Landscape ’24: Week Three $

Box Score Banter: A Very, Very, Very Fine Houck B

Thank you for reading

Related Articles

Latest Articles

More about:

Latest Articles

Related Articles