Without further ado, we present to you the PECOTA cards. Debuting on the cards are the 10-year forecasts and the percentile forecasts.
The 10-year projection process breaks down like so:
- First, a player’s performance is broken down into a series of components. These are more detailed than the official stats breakdown, featuring everything required to build a full batting (or batting against) line, including reach on error, infield single rates, etc.
- Single-season aging curves for all of the components are built upon comparing adjacent seasons. There is a selection bias in adjacent seasons, so we use a back-weighted sample of the player’s past performance (using the same process we use in the normal PECOTA forecasts), regressed to the mean, to build the single-season aging curves.
- We use a curve-fitting process to “smooth” the aging curve, to make sure the progression is orderly and to diminish the effect of random variance in the aging curves (especially for very young or very old players, where the pools are much smaller).
- To come up with multi-year forecasts, we “chain” the smoothed single-year age adjustments.
- Each player has a custom aging curve built using the comparables, and that curve is tested for reliability and then regressed to a generic aging curve based on that reliability assessment.
Peak ages will vary based upon a player’s comparables and his skill set (as each component gets its own aging curve). But typically for hitters, we see peak ages in the 10-year forecasts at around age 28, a year later than the conventional wisdom. This is somewhat offset by a decline in a player’s defensive value, which doesn’t really peak at all but starts to decline almost immediately upon his debut in the majors. (You can see the effects in this aggregation of various 10-year forecast components by age, weighted by DC playing time, here.)
Pitchers are more interesting—there seems to be an earlier peak, at age 26, for pitchers, in terms of ERA. But for pitchers who manage to survive beyond age 26, there seems to be a much later peak. Pitchers who pitch past ages 27–29 seem to peak around age 30, with some pitchers peaking even later. So pitching seems to be essentially bimodal in aging, where some pitchers peak early and others peak late. This is even true if we restrict our analysis only to pitchers who work primarily as starters their entire career. (Again, a breakdown is available here.)
Playing time in the 10-year forecasts is a reflection of what we expect a player’s playing time to be at his peak production, rather than starting with his expected 2012 playing time; this results in more sensible long-term forecasts for young prospects who aren’t quite ready for MLB yet but are expected to have productive careers when they do make it to MLB. Playing forecasts for off-peak years are then adjusted from the peak-year playing time forecast.
Now, projecting the future is more difficult the further out it goes. So how reliable are the 10-year forecasts? Looking at the root mean square error of projected 10-year True Averages for “backcasts” of historic players:
Year |
RMSE |
1 |
0.031 |
2 |
0.032 |
3 |
0.033 |
4 |
0.035 |
5 |
0.037 |
6 |
0.039 |
7 |
0.041 |
8 |
0.043 |
9 |
0.046 |
10 |
0.050 |
This is about what we would expect; a player’s performance 10 years down the road is substantially more difficult to project than his performance one year down the road. But especially through the first several seasons, the reliability of the forecasts is not substantially different.
Similarly, for pitchers:
Year |
RMSE |
1 |
1.18 |
2 |
1.21 |
3 |
1.23 |
4 |
1.26 |
5 |
1.29 |
6 |
1.30 |
7 |
1.35 |
8 |
1.41 |
9 |
1.49 |
10 |
1.56 |
Again, results in the first several seasons are very close, with results becoming harder to project the further out you get. (Bear in mind that these are forecasts for a neutral park and league context and thus will exhibit higher RMSEs than regular PECOTA forecasts.)
The percentiles are based on three primary variables:
- The reliability of a player’s forecast, estimated based on the playing time (weighted) that went into the forecast,
- The projected playing time, which affects the amount of random variance expected from the forecast, and
- The population tendencies (a player is more likely to underperform his forecast than overperform it if he is projected to be above the league average, for instance).
Again, playing time is based upon a pitcher’s expected performance—the better the performance, the more playing time we expect for that player.
Keep in mind that the percentiles key off of the primary value component—TAv for batters, and ERA for pitchers (although the ERA is a component ERA with less variance than actual ERA, as it does not account for random variance in sequencing around the component lines). Component stats are meant to illustrate the key value stats only—a pitcher’s 90th-percentile home run forecast, for instance, is not his maximum home run potential but the most likely home run total to accompany his 90th-percentile TAv.
How well do the percentiles do on historic data? Looking at back-forecasts from 1950 on, we see that 79 percent of observed TAvs fall within the 10th and 90th percentiles, and 60 percent fall between the 20th and 80th percentiles, exactly what we should expect.
As a reminder, PECOTA on the cards is restricted to subscribers only. To those of you who already subscribe, thank you for your patronage, and I hope you find the PECOTA cards useful, informative, and (employers of America, forgive me) a veritable time sink. Enjoy.
Additionally, you can look at progressions for different age groups over time, weighted by projected 2012 playing time according to the depth charts. Pitchers are located here. Hitters can be found here.
UPDATE: The ten-years forecast shows seasons two through ten of a player expected to play all ten seasons. For players whose forecast falls below the attrition rate, no forecast is displayed.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
But I would love to see some type of Excel spreadsheet with all non-MLB players 2015 projections or a mechanized way to grab them off the website.
Zachary Rosscup, James Pugliese, Yao-Lin Wang, Robert Whitenack, Ben Wells, Hayden Simpson, Willengton Cruz, Luis Liria, Austin Reed, Hector Mayora, Sheldon McDonald, Jose Rosario, Starling Peralta, Ryan Hartman, Santo Rodriguez, Yilver Sanchez, Pete Levitt, Dustin Fitzgerald, Daniel Berlind, Colin Richardson, Jin-Young Kim, Joseph Zeller, Su-Min Jung, Cameron Greathouse, Marcus Hatley, Juan Serrano and Larry Suarez.
Does this mean there is no way of seeing those numbers in a spreadsheet? Or is that information simply not accessible any longer?
Thanks
I'm not sure how you got that info so I'm not sure how to figure that out from what is on the pecota card now. Bummer!
With the 30 HR forecast this year, plus the 10 year forecast, PECOTA currently projects A-Rod to hit 770 HR (1st all-time), with 2334 RBI (1st all-time), and 2276 runs scored (2nd all-time).
Jeter's forecast calls for him to finish with 3680 hits (4th all-time) and 2034 runs scored (8th all-time).
They'd be the only pair of teammates to ever retire with over 2000 runs scored each.
It will be interesting to see how close those end up to being right, and what the projections look like at this time next year!
I looked at Heath Bell's 10 year forecast and noticed that the number of saves projected is very high. His 65 for 2013 would set the single season record except that Mariano Rivera is projected for 71. In 2018, when his ERA is projected to be over 5, his WARP below 0, Rivera still projects for 41 saves.