Were Ichiro Suzuki represented by Scott Boras, the super-agent might be able to make a more convincing case than usual for his client’s singular, once-in-a-generation talent. Actually, Ichiro-types don’t come along even as often as that, especially in PECOTA’s post-World War II player comparison pool; the baseball gods appear to have both made and broken the mold especially for him.
The Mariners’ NPB import is an outlier in more ways than one, which makes him both a fan favorite and a likely future Hall-of-Famer. Of course, every player could accurately be described as unique, whether because of some aspect of his play on the field, his background, or his choice of breakfast cereal. But Ichiro’s uniqueness is impossible to ignore.
As it happens, some of the very qualities that endear Ichiro to baseball fans render him a persona non grata with the developers of forecasting systems, at least in their professional capacities. In addition to being a great quote, Suzuki has famously managed to collect at least 200 hits for 10 consecutive seasons, a feat that distinguishes him from every other player in history. The traits that have enabled him to amass those remarkable hit totals also mark him as the rarest of roses in and of themselves.
By virtue of his speed, tendency to hit the ball on the ground, and, perhaps, some innate ability to hit ’em where they ain’t, Suzuki has managed to sustain a .357 BABIP over more than 7,000 plate appearances in environments where the “average hitter” musters only a near-.300 figure. In not-unrelated news, Suzuki has led the American League in infield hit percentage for five straight seasons.
The statistical quirks that have made Ichiro such good news for Seattle have doubled as bad news for the accuracy of PECOTA’s projections. Since automated projection algorithms aren’t tailored to individuals, players to whom the normal rules don’t apply (or apply only loosely) present a challenge. Let’s take a look at how PECOTA’s past and present projections for the speedy right fielder stack up to reality. The following table displays Suzuki’s actual stats since suiting up on this side of the Pacific (omitting his rookie season):
Year |
||||
2002 |
728 |
.321 |
.388 |
.425 |
2003 |
725 |
.312 |
.352 |
.436 |
2004 |
762 |
.372 |
.414 |
.455 |
2005 |
739 |
.303 |
.350 |
.436 |
2006 |
752 |
.322 |
.370 |
.416 |
2007 |
736 |
.351 |
.396 |
.431 |
2008 |
749 |
.310 |
.361 |
.386 |
2009 |
678 |
.352 |
.386 |
.465 |
2010 |
704 |
.314 |
.359 |
.395 |
730 |
.329 |
.375 |
.427 |
For comparative purposes, here’s how PECOTA projected Ichiro in each of our annual publications since the system hit the scene. PECOTA was little more than an apple in Nate Silver’s eye in 2002, so we’ll look at 2003 on:
Year |
|||
2003 |
.306 |
.368 |
.419 |
2004 |
.309 |
.351 |
.423 |
2005 |
.311 |
.355 |
.415 |
2006 |
.308 |
.343 |
.406 |
2007 |
.310 |
.354 |
.398 |
2008 |
.304 |
.346 |
.384 |
2009 |
.292 |
.338 |
.359 |
2010 |
.322 |
.375 |
.426 |
Withholding comment until we’ve presented all the data, let’s take a look at the retroactive forecasts (sans aging adjustments, which we’ll cover later this week) for the same seasons, generated by the latest PECOTA methodology:
Year |
|||
2002 |
.315 |
.345 |
.424 |
2003 |
.315 |
.356 |
.420 |
2004 |
.312 |
.352 |
.421 |
2005 |
.327 |
.368 |
.428 |
2006 |
.320 |
.358 |
.423 |
2007 |
.319 |
.362 |
.422 |
2008 |
.325 |
.366 |
.422 |
2009 |
.321 |
.363 |
.407 |
2010 |
.319 |
.359 |
.412 |
.319 |
.359 |
.420 |
Actual Ichiro outperforms even the new-and-improved projected Ichiro, but not by much: only 10 points of batting average separate the two. A projection system can’t predict luck, and since some random fluctuation is inevitable, no method can pinpoint batting average infallibly. Ichiro’s true batting-average ability may have remained more or less stable even as his results jumped from as low as .303 to as high as .372, but “new” PECOTA wisely split the difference, never calling for a figure lower than .312 or higher than .327.
So what’s responsible for the improvements in Ichiro’s forecast? As Nate Silver acknowledged several years ago, PECOTA wasn’t doing a great job of grasping the legitimacy of Ichiro’s high batting averages. All high batting averages aren’t created equal, but as Nate lamented about the system’s former failings, “PECOTA thinks that Ichiro is due for a major correction because it thinks he’s like Luis Polonia, and when a hero like Luis Polonia hits .330 or something, it is almost certainly a fluke, a lucky year by a banjo hitter.”
Nate dubbed Ichiro “unique,” but he’s not the only batter whose high BABIPs manage to confound PECOTA on a regular basis. Matt Swartz has written about these “BABIP Superstars” on multiple occasions. Along with Ichiro, the group he identified includes luminaries like Derek Jeter and Joe Mauer, which hasn’t helped to obscure PECOTA’s deficiencies in the BABIP department.
The problem is in projecting batting average in the first place. There’s any number of component skills that contribute to a player’s ability to hit for average – his ability to hit home runs, his ability to make a lot of contact, his ability to leg out a few additional singles. But PECOTA was lumping all of those skills into one catch-all metric, one that is typically subject to a high amount of noise.
So we’ve broken hitting down into a much larger set of component skills than PECOTA has in the past – utilizing play-by-play data from Retrosheet, we can break out things like infield singles and reaching on errors. We can then break this more detailed batting line down into an even more detailed set of components, and project them all independently before combining them into an overall batting line.
This lets us do a better job of projecting players with unique skill sets – by taking a closer look at the variety of skills that make up their batting line, we can do a much better job of identifying the underlying skill and not regressing it away as “luck.”
This also has implications for pitchers – we’re not stuck using official pitching stats to project pitchers anymore. We can get an exact count of (for instance) doubles and triples allowed, and to the extent that pitchers have a persistent skill in allowing extra base hits on balls in play, we can use that information to project their runs allowed.
(This also reduces the amount of code needed to run PECOTA, because we can share more code between the hitter and pitcher forecasts. That means less possibility for bugs and more shared improvements between the two sets of projections.)
But what about players whose skills aren’t unique, but their situations are? Tomorrow, we look at how we’re making PECOTA smarter about injuries.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
Has anyone done something like that? Is there data to allow it?
I haven't seen anything in the currently available batted ball location data sets that suggests we have enough precision and accuracy in the data to tell one way or the other.
http://fastballs.files.wordpress.com/2010/09/suzukic01_hitfx_spray_angle.png
There are two lines on the graph. The blue one is the horizontal angle at which Suzuki hit his batted balls in April 2009, grouped in bins five degrees wide. The red one is the BABIP for left-handed batters on batted balls with a vertical launch angle of less than 8 degrees (basically ground balls and borderline line drives). The idea is to find the positions of the infielders. Where the BABIP is lowest, that's the likely position of the infielder, and where the BABIP is the highest, that's the gap between the fielders.
You can see that Ichiro did a pretty impressive job of hitting the gaps. I included all of Suzuki's 42 batted balls, but the pattern doesn't change much if his 10 air balls are excluded.
Whether that is a repeatable skill or not, and if it is, to what extent, I don't know. However it is over three standard deviations from the mean in the binomial distribution, under the assumption that the spray angle is simply random.
I looked at Ryan Howard, too, and excluding his air balls, he hit 13 balls in the gaps and 11 balls near the infielders, not accounting for any abnormal shifting by the infielders beyond what they normally do for LHB.