June 10, 2005
One of the aspects of baseball that lends itself so easily to statistics is that most of the outcomes are very clean and usually binary. A batter either reaches base or he doesn't; a ball is a hit or not; even if it is a hit, there are only four possible degrees of hit. Even more important, the data are--with certain occasional exceptions--perfectly gathered: An official scorer sits at each game, carefully noting the outcome of each event in the commonly accepted manner. This runs counter to the application of statistics in the real world, where data can not only be incorrect but outstandingly complex.
With play-by-play data, however, the events we've taken for granted as simple and binary can suddenly become more complex and, if properly applied, a more accurate reflection of the action on the field. Not entirely unlike J.J. Thomson and others discovering an internal structure to the atom (but without the massive scientific and physical impacts on our understanding of the universe), breaking large, binary blocks of baseball stats into smaller, more descriptive pieces can yield more information. While this method has largely been applied to more advanced defensive metrics such as UZR, it can also be applied to events such as singles, doubles, triples, and, in particular, strikeouts.
On the surface, strikeouts seem to be a very clean statistic, much like walks and home runs, its cousins in the Three True Outcomes tree of knowledge and certainty. On the other hand, each of those three stats can be broken up into smaller pieces: Walks can be intentional, unintentional, or semi-intentional; home runs can be inside-the-park, opposite field, towering, or line drive; and strikeouts can be swinging, looking, bunting, or even dropped. In the box score, they all look the same--a line-drive home run counts the same as blasts of Ruthian proportions. But being able to break strikeouts into separate categories may yield additional insight into both player approaches on the mound and at the plate as well as predictive value about players who may be under- or over-performing reasonable expectations.
With that in mind, let's take a closer look at every pitcher's friend--the whiff. In 2004, 73.1% of strikeouts were swinging (either complete whiffs or foul tips), 26.3% were looking, and the remaining 0.6% was either missed bunt attempts or foul bunts. This data gives us a handy baseline for seeing who's above and below average when it comes to types of strikeouts. Let's check out the leaders in 2004 (minimum 50 strikeouts) to get an idea of what kind of pitchers inhabit both ends of the spectrum:
Pitcher Year Swinging Looking Total Swing_Perc SO/PA ------- ---- -------- ------- ----- ---------- ----- Dave Burba 2004 94 4 50 95.9% 15.3% Mike Wood 2004 50 4 54 92.6% 12.5% Esteban Yan 2004 63 6 69 91.3% 18.2% Guillermo Mota 2004 150 16 85 90.4% 21.6% Brad Lidge 2004 140 16 157 89.7% 42.5% Luis Vizcaino 2004 56 7 63 88.9% 21.1% Salomon Torres 2004 54 7 62 88.5% 16.3% Danny Baez 2004 46 6 52 88.5% 17.6% Jon Lieber 2004 89 12 102 88.1% 13.6% Brad Radke 2004 125 18 143 87.4% 15.9% ----- Ismael Valdez 2004 78 56 67 58.2% 8.9% Carlos Silva 2004 44 32 76 57.9% 8.7% Darrell May 2004 69 51 120 57.5% 14.4% Woody Williams 2004 73 55 131 57.0% 16.0% Scot Shields 2004 62 47 109 56.9% 24.0% Jeff Weaver 2004 84 67 153 55.6% 16.4% Esteban Loaiza 2004 126 108 117 53.8% 14.3% Chad Cordero 2004 44 39 83 53.0% 23.2% Dave Weathers 2004 93 90 61 50.8% 17.1% Jaret Wright 2004 70 87 159 44.6% 20.4%The top group--those who cause the most swings and misses--looks mostly like a pretty hard-throwing, walk-stingy group with a couple exceptions. The bottom group is a slightly different brand of pitcher--not as many closers and not as many players with a reputation of missing bats. Interestingly for Yankee fans, the top group includes discarded rotation member Jon Lieber; newly acquired Jaret Wright leads the bottom group by a wide margin. These two players frame the next natural question stemming from breaking strikeouts into sub-categories: By looking at one type of strikeout or another, could the Yanks have seen Wright's disappointing (and brief) performance coming?
To check it out, let's first see how consistent something like the percentage of strikeouts swinging (S% for short) is from year to year. Unfortunately, reliable play-by-play data doesn't always include accurate pitch-by-pitch information, so we'll only have 2003-2005 data to use. Obviously, 2005 is far too young to use when determining the consistency of a stat from year to year, so we'll have to settle for two consecutive seasons of data as a first pass. As more accurate data going backwards is available, we'll be able to add more confidence to these findings, but with limited data, the r-squared from 2003 to 2004 of S% is .3022. That's not entirely insignificant, falling just below stats like BB/9 and OBP in terms of statistical consistency.
Given that S% is somewhat consistent from year to year, perhaps it could help us predict an imminent change in K/PA. It's certainly possible that some pitchers appear to keep up their K/PA rate--a critical stat for predicting pitcher success--with a few more favorable umpire calls on third strikes rather than missing bats. To see if that's the case, a multivariable regression using each player's K/PA and S% in 2003 against K/PA in 2004 should give us an idea if that's the case.
Unfortunately, the previous year's K/PA dominates S% in the regression analysis, accounting for 59.67% of the variation while S% manages only a meager 2.49%. It's not quite Royals-Yankees or Koror-Ulong, but it's close. Given the dominating determining factor of the previous year's K/PA rate in predicting K/PA, S% doesn't yield any significant predictive value when looking for an edge in predicting pitcher breakout or decline in terms of K/PA. It's certainly possible that with more years of data available, a more discernable trend could be found, but with regards to predicting K/PA changes the following season, a whiff is a whiff no matter how you can get it.