One of baseball’s enduring charms is its ability to defy prediction. Each time we think we’re absolutely sure of something-say, that the 2008 Tigers will score a bajillion runs, or Juan Pierre will be a disaster filling in for Manny Ramirez-our forecasts are confounded by baseball’s eternally fickle nature. Sophisticated projection tools, such as Nate Silver‘s PECOTA, are designed to help take some of the guesswork out of predicting how teams and players will perform during a given season, and often produce surprisingly accurate forecasts on the whole. But even PECOTA is prone to big misses, especially in individual player projections, which help to preserve the game’s air of mystery.
Projection models generally use each new season of data to better aim next year’s forecasts. A good model should improve its accuracy over time, and while a swing and a miss on a given player in a given season is inevitable, the addition of new data should make it less likely to be repeated. So how quick a study is PECOTA? After missing once, can it use this new information to get back in the box and make solid contact, or do certain players continue to perplex the system, year after year?
To find this out, I looked at PECOTA Equivalent Average (EqA) projections for all hitters with 300+ plate appearances during the 2006-2008 seasons, and compared them to the actual offensive production of those hitters. If the projection was at least 10 points lower than the actual EqA, I rated that projection as an “underestimation”; a projection that was at least 10 points higher is considered an “overestimation.” The results are shown in the following charts, starting with the players from whom PECOTA expected the worst:
Players with 300+ PA During Season: PECOTA Underestimations Sample Sample Proj. EqA Year Description Size 10 Pts. Low % 2006 All Players 260 124 48% 2007 Underestimated in 2006 108 22 20% 2008 Underestimated in 2006-07 17 7 41%
During the 2006 season, there were 260 players with at least 300 plate appearances. Of those, 124 players saw their actual EqA surpass their projection by at least 10 points-a surprisingly high 48 percent. Presumably, PECOTA should be able to absorb this new information and adjust their 2007 projections accordingly-and the numbers seem to bear this out. Of the 108 “overachieving” players from 2006 who again met the 300 PA threshold in 2007, only 22 of them (20 percent) again exceeded PECOTA‘s projection by 10 or more points. Of the 17 players which PECOTA had twice underestimated who met the 300 PA threshold in 2008, seven of them were underestimated yet again-so the percentage goes up to 41 percent, but with such a small sample that’s probably just noise. There were 154 players who met the PA threshold in all three seasons; of those players, only seven of them (4.5 percent) were underestimated by PECOTA in all three seasons.
Now let’s look at PECOTA‘s experience with irrational exuberance:
Players with 300+ PA During Season: PECOTA Overestimations Sample Sample Proj. EqA Year Description Size 10 Pts. High % 2006 All Players 260 65 25% 2007 Overestimated in 2006 39 20 51% 2008 Overestimated in 2006-07 12 4 33%
Here we see that PECOTA, as a stern evaluator, was about half as likely in 2006 to overestimate a player (25 percent) as underestimate a player (48 percent). Not surprisingly, those that PECOTA overestimated (and thus had a disappointing season) were less likely to meet the 300 PA threshold in the following season-so our sample shrinks at a faster rate. But interestingly, in 2007 PECOTA didn’t seem to learn as much about the underachievers as it did about the overachievers. While PECOTA had only a 1-in-5 chance of repeating its underestimation in 2007, more than half the players it overestimated in 2007 (who met the PA threshold) were again overestimated in 2008.
The list of players who were twice overestimated is peppered with names like Jim Edmonds, Richie Sexson, Trot Nixon, Craig Biggio, and the Giles brothers-players who had been highly productive but whose numbers suddenly cratered (often due to age or injury). For PECOTA, as with managers and fans, it took a while to see that these players truly had become shadows of their former selves. By the third season, most of these players were either no longer full-time major leaguers, or PECOTA finally stopped squinting and came up with a more realistic projection: only four players (2.6 percent of the 154 who met the PA threshold in all three seasons) were overestimated a third time.
Who were these masked men, the players who managed to turn PECOTA into Pollyanna, continually predicting performance far beyond that which they produced?
2006 2006 | 2007 2007 | 2008 2008 Actual PECOTA | Actual PECOTA | Actual PECOTA Player EqA EqA | EqA EqA | EqA EqA Bobby Crosby .231 .276 | .225 .265 | .234 .255 Juan Uribe .234 .253 | .231 .263 | .236 .250 Austin Kearns .282 .292 | .271 .290 | .223 .280 Jason Varitek .248 .284 | .272 .282 | .237 .274
If there’s a pattern to discern here, it’s early promise followed very quickly by injury and/or disappointment, or what we might call the Ben Grieve career path. Crosby has long been either injured or lackluster, with his career shape looking ever more like a Pet Rock: instant, inexplicable, short-lived success that quickly becomes a metaphor for fleeting value. Kearns has never exactly been bad (until recently); neither has he become the consistent, multi-talented outfielder most thought he would grow into. PECOTA seems to have focused on what Ooh Ooh Uribe could do (hit 20-plus home runs in his mid-20s) while ignoring what he couldn’t do (get his OBP much above the mid-.200s). Only Varitek stands out in this crowd, and it looks as if PECOTA felt his leadership and moxie would exempt him from the standard catcher aging curve.
Can any member of this rogue’s gallery make PECOTA whiff yet again this year? Varitek’s bounce-back season (.278 EqA) and Uribe’s surprising competence (.263) in San Francisco has them both far exceeding PECOTA‘s sudden and deserved pessimism. On the other hand, Crosby (.235 actual/.243 projected) and Kearns (.237 actual/.275 projected) continue to be poster children for unrealized potential and may well achieve the four-peat.
The list of three-time overachievers is a little more complex:
2006 2006 | 2007 2007 | 2008 2008 Actual PECOTA | Actual PECOTA | Actual PECOTA Player EqA EqA | EqA EqA | EqA EqA Chipper Jones .331 .303 | .339 .308 | .360 .321 Hanley Ramirez .288 .241 | .318 .277 | .320 .298 Matt Holliday .304 .271 | .317 .296 | .316 .295 Dan Uggla .278 .233 | .275 .262 | .296 .273 Ichiro Suzuki .288 .266 | .302 .277 | .283 .271 Aaron Miles .234 .222 | .240 .206 | .265 .221 Mark Grudzielanek .257 .247 | .268 .255 | .265 .245
A PECOTA Similarity Score below 20 indicates a player who is particularly unique and difficult to compare to other players; Chipper Jones (Sim Score: 4) and Ichiro Suzuki (Sim Score: 17) fall into this category. Look at Chipper’s Equivalent Averages-it’s not like PECOTA expected the oft-injured star to become unproductive, it’s just that he’s been virtually superhuman (when healthy) during his late-career drive towards Cooperstown. Ichiro is a unique story, so it’s not surprising PECOTA has never been sure what to make of him. Uggla and Ramirez both achieved such immediate success that it’s taken time for PECOTA to believe what our eyes have already seen, especially for Hanley, whose minor league numbers were no match for his scouting reports. I’m not sure exactly what ancient grudge PECOTA has held against Matt Holliday, but it looks like 2009 might see them meeting halfway (.296 actual/.305 projected EqA). Miles has only managed to be not as awful as you might think, while Grudz has been a useful player far later into his 30s than most would have thought possible, and with his recent history of exceeding expectations, the Twins may have bought themselves a useful insurance policy.
For 2009, PECOTA has finally come around on this group; in fact, all but two players are currently well below their forecast. Hanley’s .324 projection is pretty much spot-on. The only player currently in great danger of yet again being underestimated by 10 or more points: the inscrutable Ichiro (.308 actual/.258 projected EqA), continually mistaken by PECOTA like some latter-day Rodney Dangerfield, someone who just can’t get any respect.
With the percentage chance of missing three times in a row comfortably in the low single digits, it looks as if PECOTA rarely goes into extended slumps when projecting any given player. There may be a few specific types of players (early career busts, players who maintain high productivity late into their 30s, holders of the single-season hit record) that tend to be venerated or demonized longer than they should. But overall, a given PECOTA projection is at least as accurate as your local weather forecast-good enough to know whether you’ll need a coat, but with enough short-term variation to occasionally leave you out in the cold.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
Put another way, what we should really be looking for in analyzing PECOTA is whether we can predict what type of player is likely to fall short of his expectations (due to problems with the model) and learn how to adjust the model from there. "Early promise followed very quickly by injury and/or disappointment" doesn't allow us to do that.
Those of you who dinged the comment and did not explain why, phooey on you.
Year Actual/PECOTA
2006 .236/.275
2007 .252/.291
2008 .254/.283
2009 .235/.277
Also, I think it's important to throw out all players with a similarity score below a given threshold. You basically state that we shouldn't be surprised if PECOTA can't predict what Ichiro is going to do, so including these types of players in the analysis is only going to add noise to the analysis.
But I like your paraphrase of it, and I really like this article. I can't wait to see the pitchers' article, because if I remember correctly, PECOTA usually does a very good job with hitters. Pitchers, on the other hand . . .
If the latter, there's a sampling problem here, namely that your sample would be more likely to include players that PECOTA underestimated.
You make a good point -- if a hitter puts up truly awful numbers he'll have a harder time staying in the lineup long enough to get 300 PAs. Given this, it's reasonable to think that if I lowered the threshold to, say, 50 PAs the percentage of total projections in a given year that are "under" by 10 points might go down. But you'd also then be allowing in other SSS outliers which may skew things either up or down -- that's the noise I was trying to keep out of the sample. I'll try to find the time to re-run this with a lower threshold to see the effect, and post the results here.
Min 100 PAs
Overs: 35%
Unders: 39%
Min 50 PAs:
Overs: 37%
Unders: 38%
So when the sampling bias is removed, the unders and overs become much more similar.
I hope to be able to dig into all this further in a later article -- the comments in this thread have some terrific suggestions.
"There is no denying that many good writers and editors strongly approve of /unique/ in its 'unusual' sense, even though it is indisputably well established in general prose. Perhaps you might try being one who knows enough about its bad reputation to avoid it but who also knows enough about its actual history not to sneer at those who use it."
Thanks for the article, Ken, by the way - may there be many more along this path, leading us to a better understanding of how to predict performance.
Quite an embarassing systematic error which requires some explanation.
Any fitted regression model would be expected to produce errors without any bias to the mean - that is - an equal number of players above and below their predicted performance.
Dinging a post just because it is critical is shameful.
The only post I would consider dinging is one that attacks another personally or uses offensive language.
I negative-rated the post in question because, as I said in my comment, the question was answered three hours earlier. If you're going to say something is embarrassing, you should check the thread to make sure the issue hasn't been addressed. I'll admit that I may have cut another poster more slack on this, but sbnirish frequently posts snarky criticisms of BP writers that call into question their objectivity and abilities. Maybe it's unfair of me to take that it account, but I read his/her post with that history in mind and it affected how I interpreted it.
2006 .236/.275
2007 .252/.291
2008 .254/.283
2009 .235/.277
Now this makes me wonder why, over four seasons, PECOTA never projected an EQA below .275 when Young never produced one above .254
I suspect the next big breakthrough in PECOTA, both on the offense and pitching sides, will come via the use of batted ball type information to get a better handle predicting the results on balls in play.