PECOTA percentiles are now available to subscribers.
Those of you new to BP, or to PECOTA, might wonder why we publish percentiles in addition to the weighted-mean projections for players, which we’ve already released. The answer is that forecasting is an inexact science; the future is not exactly what you'd call certain. The percentiles allow us to put a range of outcomes around a single-point forecast, to illustrate how uncertain the forecast is and what range of outcomes are most likely.
The percentiles, then, represent the spread of outcomes if we were to have a player go through the 2013 season thousands upon thousands of times. Imagine a bell curve, with the 50th percentile at the very peak. Twenty percent of the time, a player's results should fall in between the 40th and 60th percentiles—or 60 percent of the time, a player should perform at his 60th percentile or worse, while 40 percent of the time, he should play better.
As an example, let’s take a look at Giancarlo Stanton’s percentile forecasts (click to enlarge):
Our best estimate is that Stanton will be about a five-win player, with a .314 TAv. If he plays to his 90th-percentile projection, though, he could post a .345 TAv and be about as valuable as NL MVP Buster Posey was last season. And if he disappoints to the tune of his 10th-percentile projection—well, he’d still be a pretty useful player. Giancarlo Stanton is really good at baseball.
You can find the percentiles in the “2013 Forecast” section of the player cards (not the box at the top, with the basic projections—scroll down, or select the “PECOTA ONLY” tab, and you’ll see it).
A few more notes might be helpful here. The basic inputs to the percentiles are:
- The reliability of the forecast,
- The level of talent forecasted, and
- Expected playing time.
Percentiles for batters cover offense (not fielding or baserunning, except as a function of playing time and opportunities). The percentiles key off the primary rate stat for each type of player, TAv for hitters and ERA for pitchers. The component stats are meant to illustrate a likely set of stats that could lead to that level of production for that player. What this means is that a hitter’s percentiles in home runs, for instance, reflect the home runs that would lead to that TAv, assuming similar changes in the other stats in a hitter’s batting line, not the chance of hitting that many home runs. There are many different batting lines that can lead to any one TAv.
We’ve tested the percentiles against historical data, and we can report that they behave how you’d expect—80 percent of batters fall between their 10th- and 90th-percentile forecasts for TAv, for instance.
In the past, we’ve forecasted a linear fit of ERA and RA for pitchers based on expected batting against. There’s a lot of variance in ERA that isn’t captured by batting-against stats, though, particularly performance with men on base. We’ve back-tested against historical data and added some extra variation to ERA and RA to account for this.
We’ve also integrated the percentiles more closely with the depth charts, for players who appear in those—we’re pulling things like lineup slots to help calculate RBIs, for instance. The percentiles take quite a while to run, though, so don’t expect them to stay in sync with the depth charts, which can be updated as often as several times in one day.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
With 10 year forecasts back on the drawing board, is there an ETA or is it still TBA?
Psyched.
Can you show those test results?
I'd like to see you redo the work you did here:
http://www.insidethebook.com/ee/index.php/site/comments/pecota_percentiles_finally/
You provide the evidence, the conclusion follows that evidence, and the reader can feel comfortable that your conclusion is valid.
All I'm asking is for you to provide something that you've provided in the past, like this:
For example, can I download a spreadsheet of all players' 70th percentiles instead of the weighted means?
I'm sure it's not that simple, since the percentiles don't appear to be perfectly symmetrical...nevertheless, there is surely some number that can show that player A has a more volatile expected WARP than player B. It would be nice to get that, conceptually, without having to look at each player's percentiles.
I'll look and see if something like this is something we can provide.
What I would really love is to see this (or even just a total standard deviation) included next to the PFM $ amounts... What do you think?
Not that I'm bothered by a symmetric distribution—though I do understand the Bayesian reasons you wouldn't have them. With how they look now, I'd definitely use the s.d.'s if you can generate them.
ERA won't be (can't be). But the square root of ERA should be close to symmetrical. From the few I looked at, however, ERA looks fairly symmetrical.
And I do see that there is a slight bit more skewness to ERA, though the scale is also much larger, enhancing very small differences. Either way, if the distributions are barely skewed, I'd still love mean + s.d. as a more compact way to deliver the info in a spreadsheet.
For ERA, it’s proportionate to the SQUARE of TAv (or similarly, the square of OBP). Just think of Bill James’ Runs Created which is, at its core, OBP x SLG. Which is kinda like saying OBP squared.
So, if you take the square root of ERA, you’ll get something that is proportionate to OBP, and so, you should get that kind of distribution.
Theoretically anyway. It shouldn’t be too hard for someone out there to prove me right (or wrong!).
You are asking for a distribution of this equation, basically:
PA x (TAv - baseline)
That baseline is fixed, the TAv follows a binomial-type of distribution, and PA is going to be heavily skewed.