Reintroducing PECOTA: The Seven Percent Solution

October 1, 2010

Let’s talk percentiles.

It’s probably the most famous thing about PECOTA-the fact that we provide a range of forecasts instead of just a single point estimate. Earlier this week, I talked about the accuracy of the weighted mean forecasts. But what about the percentiles?

First, some notes about the percentiles. They are derived based upon the overall unit of production (TAv for hitters, ERA for pitchers), not the underlying components. This is important, because a hitter who hits more home runs than we expect (I hesitate to call it luck-he may have been underestimated, or he may have found a way to improve his talent) isn’t necessarily going to improve his rate of hitting singles by the same amount, or at all.

What this means is that you can’t look at a single stat (say, hits or strikeouts) and think that’s the range of expectations PECOTA has for that skill. The percentiles are supposed to reflect what we know about the distribution of a player’s skill, but they are in essence the average batting line we should expect from that player if he puts up that level of performance in that season. There are a lot of different shapes that performance could take, however, and that means there’s more variance in any single component than is reflected in the percentiles. So the correct test of the percentiles is the overall level of performance, not the underlying components.

The other thing to note is that the observed performance of any individual player is a function of his playing time-the less playing time a player has, the more variance we expect in his overall performance. Things have a tendency to even out over time (although a tendency is not the same thing as a guarantee), and so the spread of observed performance goes down as playing time goes up. If a player is projected for a full season’s worth of playing time, and only ends up playing 50 games or so, the percentiles are going to be too tight. That’s not a bug-it’s impossible to make one set of percentiles that functions across any amount of playing time.

Let’s start off with the hitters. Looking at only players with at least 300 PA, here’s how the distribution of players looks:

	DIFF20	DIFF40	DIFF60	DIFF80
Overall	23.9%	34.9%	49.2%	63.5%
Up	17.6%	24.7%	30.8%	36.7%
Down	6.4%	10.3%	18.4%	26.8%

Going from left to right-DIFF20 refers to the percentage of players between their 40^th and 60^th percentiles, through to DIFF80, which represents the percentage of players between their 10^th and 90^th percentiles. The second row represents those players above the 50^th percentile; the third row represents players below the 50^th percentile. Adding up plus down gives you the overall percentage.

What we should want to see is DIFF20 equal to 20 percent, etc. We don’t quite see it, though. It may be a bit more helpful to look at a histogram:

The first thing that sticks out should be the fact that most players are in the 50^th to 60^th percentiles, by a large margin. Why? Fundamentally, players who perform above their expectations are more likely to get playing time than players who perform below their expectations. This isn’t something that should surprise us-this is why we have the weighted means forecasts for PECOTA, which explicitly takes this fact into account. (This is also probably the explanation for why DIFF20 exceeds 20 percent.)

But there’s also more variation in observed performance than what the percentiles expect. Let’s consider the reasons we see variation from what our projections expect. The first point I want to make is that forecasting is not mathamancy; there’s no such thing as a perfect forecast, except in hindsight. PECOTA utilizes a two-stage process:

As described earlier this week, we generate a baseline forecast based on a player’s past performance, and
We adjust for our expectation of how a player will age, using baseline “forecasts” for comparable players to create a custom aging curve-what Nate Silver would refer to as the “career path adjustment.”

Both of those estimates are subject to a measure of uncertainty. The third source of variation is simply randomness. We use the observed variation of the performance of the comps to model this variance.

Not all forecasts have the same expected variance, though-it seems as though some players have more variance in their baseline forecasts than their comparables do. This is a relatively simple fix-the uncertainty in a forecast is largely a function of the amount of data you have on a player. (It’s also something of a function of a player’s skill set, among other things.) When we build a player’s baseline forecast, we can compare the uncertainty in the forecast to the uncertainty of the comps’ forecasts and figure out how much additional variance we need to add to the percentiles.

We’ve also been treating the uncertainty of a forecast as symmetrical-apparently there’s more uncertainty on the downside than the upside. This is something we can build into our model as well.

Now let’s take a look at our pitchers, minimum of 70 IP:

	DIFF20	DIFF40	DIFF60	DIFF80
Overall	18.0%	29.0%	37.3%	50.5%
Up	13.6%	19.4%	22.4%	29.4%
Down	4.4%	9.6%	15.0%	21.1%

I should clarify “down” and “up” in this context-up is an ERA below the forecast, down is an ERA above the forecast.

What we see is something similar to the hitters, but much more pronounced. Let’s examine it from a slightly different angle, and look at FIP as a stand-in for ERA:

	DIFF20	DIFF40	DIFF60	DIFF80
Overall	27.9%	42.7%	53.4%	65.4%
Up	23.3%	29.9%	35.5%	38.8%
Down	4.6%	12.7%	18.0%	26.6%

That’s a lot closer to what we saw with the hitters (and of course, everything I said about those applies equally here).

What it comes down to, I suppose, is how you define performance for a pitcher. There are three elements to preventing or allowing runs:

The pitcher’s ability to affect the batter-pitcher matchup directly (walks, strikeouts, home runs),
The ability of a pitcher and his defense to prevent hits on balls in play, and
The sequence these events occur in

I’ve talked in the past about how those figure into a player’s value. Suffice it to say that the range on the PECOTA percentiles are largely focused on the first element (the one which is where most of the variation in pitcher skill occurs and thus the area most relevant to forecasting).

So, lemme ask-what do you find the most useful to you in using the percentiles? Would you rather they reflect the extent to which we know pitchers have skill in preventing runs? Or would you rather the percentiles reflect the rather considerable noise in measuring a pitcher’s performance (really, the performance of a pitcher and his teammates at preventing runs)? Drop me a line in the comments and let me know.

Or you could talk to me about that-or anything else related to PECOTA, or baseball stats in general-in a few hours, when I chat live starting at 1 ET, as the finale of PECOTA week. And again-this is the beginning, not the end, of a long conversation about PECOTA. Thanks for being a part of it.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Colin Wyers

Latest Articles

You need to be logged in to comment. Login or Subscribe

ferret

10/01

Thank you very much for soliciting your subscribers input. To answer one of your questions, I prefer the pitching percentiles reflect the hurlers skill in preventing runs.

Additionally, as this exchange with us continues and develops could you keep us aware of the schedule you are working with as I hope/anticipate your projections (and all Pecota related data) will be available much earlier next year.

Thanks again.

Reply to ferret

Mountainhawk

10/01

That histogram doesn't look too horrible to me. If you subject it to a Chi-squared test for uniformity, does it pass?

Reply to Mountainhawk

BillJohnson

10/01

The thing is, if the prediction system is optimal, the histogram shouldn't be "uniform," it should take the shape of a Gaussian. Actually, apart from the bottom two bins, it fits that shape quite well, although it's hard to be sure because of the small number of bins. (You might repeat this analysis with twice as many bins that are half as wide, i.e., 5-percentiles-wide bins. A Gaussian shape would be more obvious from that.) The system's main problem is inability to deal with the Cardinals^h^h^h players that have unexpected, complete meltdowns. It'll be interesting to see whether the incorporation of health into PECOTA addresses that. I am dubious -- including past health doesn't necessarily have predictive value for future health-related collapse -- but I'm looking forward to seeing it tried.

Reply to BillJohnson

TangoTiger1

10/01

Bill: no, it has to be uniform. If PECOTA is saying that something is going to be between the 70th and 80th percentile, then we'd expect 10% of those something to occur between the 70th and 80th percentiles.

You might be thinking of say between the 1.0 and 1.5 standard deviations or something (scale of SD not percentile), and in that case, you would be correct.