Of course. He's a pitcher.
I've gotten a few requests for this one. Almost a year ago, upon returning to Baseball Prospectus, I posted an update to the work that I had previously done on the reliability of hitting statistics. I had originally written one on pitching stats, as well, but never updated it similarly.
Warning! Gory Mathematical Details Ahead!
As with my piece from a year ago here at BP, I'm updating the methodology that I had originally used. I'm using Kuder-Richardson (formula 21) reliability for binary outcomes and Cronbach's alpha for non-binary outcomes. Data set is 2003-2012 (Thanks, Retrosheet!), with a minimum of 2000 batters faced during those years for each pitcher (unless otherwise noted), meaning that I can see reliability up to sample frames of 1000 PA. For stats that refused to stabilize by a sample size of 1000 PA, I used the Spearman-Brown prophecy formula to estimate the stabilization point.
These numbers represent the point at which each stat reaches a reliability of .70 or greater according to the relevant formula.
Statistic |
Definition |
Stabilized at |
Notes |
Strikeout rate |
K / PA |
70 BF |
|
Walk rate |
BB / PA |
170 BF |
IBB's not included |
|
|
640 BF |
|
Single rate |
1B / PA |
670 BF |
|
XBH rate |
(2B + 3B) / PA |
1450 BF |
Estimate* |
HR rate |
HR / PA |
1320 BF |
Estimate* |
|
|
|
|
630 BF |
Min 2000 AB's |
||
(H + |
540 BF |
|
|
(1B + 2 * 2B + 3 * 3B + 4 * HR) / AB |
550 AB |
Min 2000 AB's, Cronbach's alpha used |
|
(2B + 2 * 3B + 3 * HR) / AB |
630 AB |
Min 2000 AB's, Cronbach's alpha used |
|
|
|
|
|
GB rate |
GB / balls in play |
70 BIP |
Min 1000 BIP, Retrosheet classifications used |
FB rate |
(FB + PU) / balls in play |
70 BIP |
Min 1000 BIP including HR |
LD rate |
LD / balls in play |
650 BIP |
Min 1000 BIP including HR, Estimate* |
HR per FB |
HR / FB |
400 FB |
Min 500 FB, Estimate* |
Hits / BIP |
2000 BIP |
Min 1000 BIP, HR not included, Estimate* |
What Do I Really Know About You?
Well, we quickly have an idea of strikeout rate, ground ball and fly ball tendencies, and (somewhat less quickly), walk rate. Over a season, you can get a pretty good idea of a pitcher's single and
All Numbers Tell a Story, But it Might Not Be The Story You Wanted to Hear
I'm well aware of the fact that most of the requests for these analyses came from people who were trying to get a feel as to whether their favorite pitcher having a bad year (read: Roy Halladay) was just having a bad couple of games or whether his performance was "real." I want to (again) point out that the way in which I most often see these numbers used is not exactly what they’re meant to show.
When I say that strikeout rate for pitchers stabilizes at 70 batters faced, what I mean is that we can be reasonably sure that his strikeout rate over those 70 batters is a good reflection of his talent level over those 70 (now past) plate appearances. This is different from saying that once a pitcher has gotten to 70 batters, we can assume that he will perform this way for the rest of the season. That's an assumption. It's not a bad one, but it is an assumption. Instead, what it means is that if his underlying skill set has changed in some meaningful way, we'll know in 70 plate appearances.
Also, I'd caution people against treating these numbers too dogmatically. 70 plate appearances is not a magic number. It's the point where a measure of reliability slowly crosses an only-somewhat-arbitrary line in the sand. At 139 PA, the reliability for strikeout rate is just shy of .70, and you need to have just a shade less confidence in any proclamations that you make using those 139 PA.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
I had a football coach who knew just enough about statistics to be dangerous. He discovered we scored 100% of the time when we attempted a td pass from between the 25 and 30 yard line. A running back made a break into wide open field on 2nd and 5 from the 40 and he made him run out of bounds at the 26 with the threat of death. Next play, the 100% productive td pass from between the 25 and 30 was picked off.
I looked at him like he was stupid, but a friend of mine made a good point. Even with the pick the play still saw a success rate of 66.7%.
Stats rule, in context.
Great work, as always.