Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop
BP360 is back! Pick up a yearly subscription, 2025 Annual, and t-shirt for one great price!

To access the 2012 PECOTA spreadsheet, click here.

Madame Sosostris, famous clairvoyante,
Had a bad cold, nevertheless
Is known to be the wisest woman in Europe,
With a wicked pack of cards. Here, said she,
Is your card, the drowned Phoenician Sailor,
(Those are pearls that were his eyes. Look!)
Here is Belladonna, The Lady of the Rocks, The lady of situations.
Here is the man with three staves, and here the Wheel,
And here is the one-eyed merchant, and this card,
Which is blank, is something he carries on his back,
Which I am forbidden to see. I do not find
The Hanged Man.  Fear death by water.

—T.S. Eliot, The Waste Land


PECOTA has arrived.

BP’s projection system, at its core, follows the same basic principles as it has before. We begin with our baseline projections, which start with a weighted average of past performance, with decreasing emphasis placed on seasons further removed from the season being projected. Then that performance is regressed to the mean. After that, we use the baseline forecast to find comparable players (while also taking into account things like position and body type) and use those to account for the effects of aging on performance.

Every season we put PECOTA under the knife, looking for things we can improve to make sure we’re coming up with the best forecasts possible. Sometimes what we come up with is a minor tweak. At other times, though, what we unearth is not only more significant, but an interesting baseball insight in its own right, even aside from its inclusion in PECOTA.

This season, we’ve made some rather radical changes to how we handle the weighted averages for the PECOTA baselines—we still deemphasize past seasons, but nowhere near as much as we used to. With such a dramatic and counterintuitive change, we thought it best to give our users an explanation of what was changed and why so that they could correctly use and interpret the PECOTA forecasts.

Last year, I was asked to appear on a Chicago sports talk station to discuss the town’s two teams, in particular how PECOTA saw them faring. I said many things, most of which don’t bear repeating (or for that matter remembering) this far past, but there was one thing I remember saying, and it probably does bear repeating—I expected Adam Dunn to be the best hitter on the White Sox in 2011. Suffice it to say, this statement does not represent my finest hour as a baseball analyst. Consequently, I’ve spent a bit of time thinking about Adam Dunn and whether there was anything in 2010 or earlier that hinted he might be capable of a season like 2011. In other words, is there anything that I know now about forecasting in general that would allow me to predict what happened using only what I could reasonably have known about Adam Dunn before the start of the season? The conclusion I’ve come to is that no, there really wasn’t. What happened to Dunn was, in essence, unforeseeable given what we knew heading into last season. That’s the bane of forecasting—no matter what you do, reality in all its many variations is always going to be able to surprise you.

Now it’s time to predict 2012’s stats, and PECOTA has learned from its mistake. No longer does it declare Dunn the best hitter on the White Sox. It has been humbled, dropping Dunn… all the way to second place, behind Paul Konerko. This is partly due to the fact that the White Sox are not a very good hitting team as currently constituted, having traded away Carlos Quentin during the offseason, but part of it is because PECOTA sees a far greater chance of the Adam Dunn that mashed baseballs for the better part of a decade showing up next year than the putrid Adam Dunn the White Sox saw in his first season on the South Side.

Naturally, some of you are going to look at PECOTA’s forecast for Dunn, think back to his abysmal season, and say, “I’ll take the under, thanks.” But PECOTA knows about his terrible performance just as we do; at its core, PECOTA takes past baseball statistics and applies a set of rules to them to come up with an estimate of what a player’s future statistics will be. If PECOTA is too optimistic about Adam Dunn, the culprit can be found in the rules governing the amount of emphasis to be placed on recent performance.

Of course, in tying myself so explicitly to Dunn, I run the risk that—to be blunt about it—he sucks again. I’m reminded of an article Ron Shandler wrote prior to the 2005 season, where he said:

As an example, let's look at Pujols. After hitting 37, 34, 43, and 46 HRs, his baseline projection called for 42, which represented a normal regression to the mean. However, our flags pointed out consistent upward trends in contact rate, fly ball ratio, batting eye and a second half surge in his power index. Add in his alleged age (25) and a reliability rating of 94, and all signs pointed north for his power trend to continue. Our projection now calls for 50 HRs.

Why 50? I believe it is reasonable to expect Pujols to maintain his second half PX level for a full six months, given the trends in his skills. For some people, it might take a moment to accept 50, but the more you look at it, the more it passes the eyeball test. This is a player with no true comparables in history. All we have is our eyeballs and a general idea of what makes sense. Fifty makes sense to me.

Shandler probably should have left well enough alone; Pujols hit 41 home runs in 2005, and he’s never hit 50 or more home runs in a season. But it all comes down to the same set of questions: How much emphasis should we put on Dunn’s utter collapse, or on a young Pujols’ second-half power index? We don’t just have our eyeballs to rely on—we have decades of past baseball stats we can use to come up with an idea of how to weight baseball stats in relation to one another.

So, let’s build ourselves a forecasting model and see how various changes to the backweighting affect the forecasts, as well as try to determine the correct way to derive the backweights. For the sake of illustration, we’re going to use a much, much simpler model than PECOTA (it will remind many of you of the Marcels done by Tom Tango). To predict future TAv (from here on out, TAv_OBS), we will use three years of past TAv, where TAv_1 means one season prior to TAv_OBS, TAv_2 is two seasons prior, and TAv_3 is three seasons prior. The simplest model we can come up with is:

TAv_OBS = (TAv_1 * PA_1 + TAv_2 * PA_2 + TAv_3 * PA_3)/(PA_1+PA_2+PA_3)

What we have here is a weighted average of a player’s TAv for the past three seasons. But let’s suppose that we want to downweight less recent seasons based on our intuition that more recent seasons are more reflective of a player’s current ability level. We would modify the formula as such:

TAv_OBS = (TAv_1 * PA_1 * a + TAv_2 * PA_2 * b + TAv_3 * PA_3 * c)/(PA_1 * a+PA_2 * b+PA_3 * c)

So how do we come up with our yearly weights? What we can do (and what many other forecasters have done) is use an ordinary least squares regression to come up with weights for each prior season. The simplest way to do this is to use TAv_1 through TAv_3 to predict TAv_OBS in our regression. If we do so, we get:

TAv_OBS = 0.47 * TAv_1 + 0.32 * TAv_2 + 0.18* TAv_3

According to this model, the most recent season is nearly 1.5 times as predictive as the second-most recent season and over 2.5 times as predictive as the third-most recent season. Recasting the coefficients so that the first season is equal to one, I get 1/.6/.4. (This is similar but not an exact match to the weights used in the Marcels, which work out to 1/.8/.6.)

[I’ve set the intercept to zero, because our weighted average formula lacks an intercept and this makes it a slightly more representative model, although the effect on the relative (rather than absolute) value of the weights is rather modest. If you include an intercept, it will essentially behave as the regression to the mean component of the forecast, which we’ll address separately in a moment.]

The trouble is that this kind of regression doesn’t truly model how the weights will be used in practice. From now on, we’ll call it our unweighted model. With a little bit of algebra, we can redistribute the formula like so:

TAv_OBS = TAv_1 * PA_1 /(PA_1+PA_2+PA_3) + TAv_2 * PA_2 /(PA_1+PA_2+PA_3) + TAv_3  * PA_2 /(PA_1+PA_2+PA_3)

If there were no need for downweighting of past data, this would provide the proper weighted average we need for our forecasting model. For the sake of brevity, we will refer to

TAv_1 * PA_1 /(PA_1+PA_2+PA_3)

as TAv_1_W (for weighted), and so on. If we plug those into our regression model, we get some radically different weights:

TAv_OBS = 1.03 * TAv_1_W + .95 * TAv_2_W + 0.93 * TAv_3_W

These values are on a very different scale, since due to the lack of an intercept the values have to sum to one for the first regression and to three for the second regression, but they’re also very different in a more meaningful sense; recasting the first year to 1 (which is practically already done for us), we get weights of 1/.92/.90.

In this second method, we get a result that seems contrary to our intuition—the most recent season is only slightly more predictive than older seasons. How can we assure ourselves that the less intuitive model is still more correct? We can look to the regressions themselves for one piece of evidence. The r-squared of the first regression is .27, compared to .38 for the second regression. It’s also more consistent with the way the weights will actually be used in practice.

What’s interesting is that by themselves, the PA weights have no meaningful predictive value—by definition, they have to sum to one for every player, and including them in the regression as separate variables doesn’t do anything to increase the predictive power of the regression. It’s not the distribution of past playing time that’s affecting the model, but rather what that distribution tells us about the TAv values themselves.

Ideally, we’d compare both methods with known good values for what the seasonal weights ought to be and determine the correct method by whichever provides the more accurate results. But we don’t have known good values—if we had, we could’ve used those instead without messing around with any of this in the first place.

While we can’t get known good values for real data, though, we can get known good values for fake data—in other words, a simulation. In this case, a simulation is startlingly simple to do; we assume that a player’s TAv_OBS is his true talent level and that all past seasons are equally predictive if PA are held constant. Then we simply take a player’s PA in each of the three preceding seasons and use a random number to come up with TAv values for each preceding season that reflect a combination of a player’s true talent and random variance. (For those who care about the technical details: we generate a random number between 0 and 1, convert that from a percentage to a z-score, multiply by the expected random variance, assuming TAv is a binomial, and add that to TAv_OBS.)

Running regressions on our simulated data, we get weights of 1/.8/.3 for our unweighted model compared to 1/1/1 for the weighted model. We constructed our simulation to behave as though player talent was absolutely stable from season to season, so we can confirm that the second set of weightings is correct here, which we couldn’t do with the first set of regressions that featured real-world data. The unweighted method, in this case, still downweights past seasons, which shouldn’t be the case

There are three important practical takeaways from this finding. The first and most obvious one is that projection systems that dramatically emphasize a player’s most recent performance will be biased against players with poor recent results and toward players with good recent results. Players are more likely to bounce back from poor seasons or revert back to type after exceptional seasons than those sorts of models would predict.

It also suggests that three years is not enough data for a forecasting model to use. If you assume the Marcel weights are accurate, then it makes sense that older seasons wouldn’t add much value to your forecasting models. However, if the decline in value of older seasons is much more subtle than that, you can make good use of five or even seven years of data, if not more.

The third, and perhaps most important, takeaway has to do with regression to the mean. We can add a simplistic version of regression to the mean to our forecasting model by adding a TAv_REG of .260 (the league average) with a PA_REG of 1200. (The PA_REG comes from the Marcels; it’s included here mostly for the purposes of illustration. The regression component in PECOTA is a more rigorous model based on random binomial variance—again, the purpose here is only to illustrate the concepts.

Consider a player with 650 PAs in three straight seasons, or 1950 total PA. Using the Marcel weighting of 1/.8/.6, that comes out to 1560 effective PA— in other words, throwing out 20 percent of a player’s PAs during that time period. That means 56 percent of a player’s forecast comes from his own performance, and 44 percent comes from the regression to the mean component. Using weights of 1/.92/.90 yields 1833 effective PA, throwing out only about six percent. Using the same regression component, that’s 60 percent of a player’s forecast coming from his own production and only 40 percent coming from regression to the mean. (And if you follow from the conclusions above and start using more years to forecast a player as well, even less regression to the mean is necessary.)

Regression to the mean is a valuable concept to keep in mind when forecasting, but increasing statistical power (in other words, the amount of data used to make a forecast) is a far better solution whenever possible. Discarding data (or in this case, downweighting it) in favor of regression to the mean is only advisable when there is conclusive evidence that the data being discarded or downweighted is less predictive.

As a result of its revamped weighting, PECOTA is going to be more bullish on players coming off a bad year and more bearish on players coming off a great year than many other forecasting systems. We’re okay with that. We believe that a full accounting of the historical data supports what we’re doing with PECOTA, and we think a forecasting system with a uniquely accurate outlook is more valuable than one that conforms.
 

UPDATED: Coming soon, we'll have a more in-depth look at how the new PECOTA stacks up, including RMSEs against Marcels. Some quick examples beforehand: the recent poster boy for “New PECOTA” would probably be Francisco Liriano, whose 3.60 PECOTA forecast for 2010 was almost identical to his real-life 3.62 ERA, while Marcels weighted his recent past (2009 was horrific, and many observers wondered if he'd ever return from his injury woes) and forecast a 4.88 ERA. The ERAs cited here were derived using a 3rd-party version of Marcels (don't want anyone thinking we cooked the books), against the “New PECOTA” system applied retroactively. 

Some hits are obviously due to differences between the systems, such as Aaron Harang moving to PETCO (4.01 PECOTA, 3.64 real, 4.74 for Marcel, which doesn’t account for park effects). With other pitchers, it's just a matter of missing the least, such as when Mike Scott unveiled his nuclear splitter for 1986 (4.55 PECOTA, 2.22 actual, 3.79 Marcels). Usually, pitchers don't leap like Mike; their dramatic improvements are quirky statistical samplings which need to be included, but should be weighted little more than earlier seasons. A more recent example is Tim Redding, who posted ERAs of 5.72, 10.57 (in just 30 innings), DNP, 3.64, and 4.95, the last in 2008. PECOTA wasn't impressed with his recent exploits, and projected a 5.30 ERA, compared to 4.51 for Marcels. His actual 2009 ERA was 5.10 (2009), and his latest pitching exploits involved a combined 6.24 ERA for two Triple-A teams.—Rob McQuown

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Tipman
2/08
Access is forbidden...and when you go to the fantasy page, it gives you the 2011 pecota. I almost had a heart attack when I saw K Escobar expected to win 10 games for the Mets again!
vonstott
2/08
Argh - acees was denied for me as well.
drpjlang
2/08
Directory access is forbidden.
PhillyPhreak
2/08
At least it's not only me...
mcquown
2/08
I'm fixing that link now... Premium subscribers can access it through their "manage profile" link on the login bar.
vonstott
2/08
Not there, either (for me) maybe because I'm month-to-month?
PhillyPhreak
2/08
I can't see it on the manage profile link. Oh well, I'll just sit tight. Although this means I won't get any work done today.
apbadogs
2/08
I only see 2011 spreadsheet.
uptick
2/08
what's up with the "directory access is forbidden" when trying to download the spreadsheet?
swivelhips
2/08
Same here Access Forbidden. What's up with that?
PBSteve
2/08
Apologies... We're working on this.
mjglenn
2/08
If you go to "Depth Charts" and click "Raw CSV Data" you can get the 2012 PECOTA Spreadsheet.
mcquown
2/08
Those are only for the players in the PFM. The PECOTA spreadsheet contains a lot more projections.
mcquown
2/08
All access problems *should* be cleared up now, though I'll verify 100% that the people who posted here are not blocked. Feel free to use the Contact form to tell me directly if you're still having trouble.
mcquown
2/08
"manage your profile" is a link on top of the home page inside the blue login bar, just below the BP logo. It takes you to a page where you can view and edit information about your account, including your past few comments, password changes, and any downloads you're entitled to receive.
mcquown
2/08
Egg on my face, I was wrong - one more setting which was blocking a few people. *Now* everyone should be fine.
rmelby
2/08
What happened to the "LG" AL/NL column?
apbadogs
2/08
Jackpot!! So much for any productivity today at work!! WOOHOOO!!!! :)
vonstott
2/08
What grade was Miggy Cabrera in when he weighed 180? LOL
buddons42
2/08
2nd or 3rd, probably.
kcshankd
2/08
...and thank you very much. Small complaint: please never list by first name. Pujols, Albert, or separate cells. My first action is always to sort alphabetically so I can search easily. Now, not so much.
cdluzier
2/08
I agree. please post by their last names
brownsugar
2/08
Agreed. Excel does have a workaround though. Insert a blank column to the right of the names, go to Data->Text to Columns, and check the 'space' button on the second menu screen. This will separate the first and last names into two separate columns.

IIRC, Jorge de la Rosa screws it up though. Best to add about 4 blank columns to the right of the names so there is space for the de, the la, and the Rosa.
kcshankd
2/09
Did that after posting this morning and had the same result.
Several 'de's to sort out, and the whole process is silly. I can't see anyone wanting to sort 1000s of baseball players by first name.
frampton
2/09
Copy the name field to the first empty column on the right side of the spreadsheet, and THEN run text to columns. You can then filter for non-blanks on the column to the right of the last name field, and tidy up 10 or so multiple-word last names.

What I do then, which may be a little obsessive, is, in the blank column to the right, create a lastname/firstname column, using the formula =[lastname field]&", "&[firstname field] on the top row; copy that to all rows below it; then, highlight the entire column, copy it, and paste special using "values" over the same column. That will give you a lastname, firstname column that you can move over the first name lastname column.
nschneider
2/08
The comparables always make me laugh. According to the spreadsheet, the Blue Jays big choice at left field this season is between the next Boog Powell, or the next Barry Bonds.
PBSteve
2/08
Update: Just added a few notes on new PECOTA vs. the Marcels from Mr. Rob McQwown.
thatfnmb
2/08
I appreciate the effort, but have to say, this is not the PECOTA of old. The projections are a bit disappointing, especially pitcher projections. One clear bug is RP's moving to SP. Top 3 SP's in SO9 are Chapman, Sale & Moore. #7 is Bard, #12 Feliz. Those are reasonable if they're pitching 60 IP not 110+

Another head scratcher; Kershaw 30th in WHIP among SP behind guys like Lilly & Peavy. We'll see.
mcquown
2/08
Actually, I was skeptical of some of the new PECOTA wrinkles myself, but it's actually quite similar to the "PECOTA of old" - in fact, Nate's early publications on PECOTA still apply almost in their entirety to the current PECOTA process.

And this past year, we've dramatically improved the ability to run the system against historical data, and it does amazingly well by any measure.

That said, I don't recall Kershaw coming out that badly in the preview PECOTA runs, I'll look into that now, thanks for the feedback.
tcfatone
2/08
Plugged in my 5x5 parameters into PFM and Mark Reynolds popped up us the most valuable 3B... don't know what to make of that.
jberkon
2/08
Agreed -- doesn't look like adjustments were made for role switch. The CAIRO projections did two projections for these pitcher types -- reliever and starter. Would be useful to see same in PECOTA, if possible.
backbrush
2/08
What's the timing for the updated player cards?

Also, friendly feedback: team depth charts were much easier to read when they were organized by position rather than by batting order. These are pretty impossible to consume, and I'm not sure what value it adds to know if Adrian Gonzalez will bat 3rd or 4th.
http://www.baseballprospectus.com/fantasy/dc/index.php?tm=BOS
sweptaway3641
2/08
Ryan Drese with the highest BREAKOUT score!

Now I have to draft him in all leagues.
jtanker33
2/08
Not really a problem, but, in the interest of maybe spurring conversation or improvement, I see Corey Dickerson of the Rockies is projected for 14 HRs and a .487 slg (which ranks 21st above Youkilis and Pablo Sandoval) in 250 PA. I believe his big power numbers this season were inflated by his playing in Ashville in the minors. Does Pecota look at minor league park factors? I found this at BA: http://www.baseballamerica.com/blog/prospects/2011/08/daily-dish-the-curious-case-of-ashevilles-corey-dickerson/
moonlightj
2/08
McCormick Field is a dreamland for any lefty hitter with loft in his swing. 373 to dead center, 320 to right center, and 297 down the RF line, albeit with a 36 ft fence from the gap to the line.
wjmyers
2/08
Kershaw's comparables include Matusz, Clay Buchholz, and Rich Harden. Wow. PECOTA likes him far less than I do. I think comparables are the biggest issue with PECOTA since Nate left. And that was the key driver behind PECOTA, as it was originally conceived, was it not?
Oleoay
2/08
I've noticed the change too though I'll admit that I miss seeing Andre Dawson as a comparable to half the Dominican players.
rmelby
2/08
shankweather
2/08
Gamel is there.
rmelby
2/08
My bad, looking for him at first.
rosborne
2/08
Any projection for Darvish?
dpease
2/08
we have been working on Darvish and the other Japanese imports. We will have him in there soon.
shankweather
2/08
At first glance, it looks like CF is the strongest position, beating 1B easily.
johnorpheus
2/08
Freddy Freeman at -1 warp for the season sticks out most through a cursory glance through. I guess -17 defense will do that you. I find it hard to believe he will be that bad in the field.
Oleoay
2/08
Is PECOTA predicting FRAA now?
Oleoay
2/10
Didn't get a response on this. Is PECOTA predicting FRAA and if so, how?
jrmayne
2/08
Quick hits, because I gotta go to work:

1. The obvious problems last year were the Kila/Bowker problem, the minor league comps problem, and the long-term aging curves looking ridiculous.

2. Some efforts have been made. Still, Bryce Harper's number one comp is Wayne Causey. I'd categorize that as a failure. It indicates an inattentiveness to the obvious errors of effectively dropping minor league seasons out of the comps and comparing minor leaguers to major leaguers only (and then, if your answer is Wayne Causey, doing it wrong.)

3. These are updated PECOTA's; I expect that the books are going to have different ones. I was part of the Beta Testing, and I think my comments thereto are part of my public comment log.

4. The claims that this PECOTA is (roughly) better than everything in history have been made the last two years. For any comparisons (and one to ZiPS seems warranted), transparancy will be vital.

5. For pity's sake, implying that these are iterative changes to the already-Deadly Accurate PECOTA's is worrying. Authors cited the projections for Kila and Bowker repeatedly and BP never said, "These were clearly wrong."

6. There are some weird projections. Chipper PECOTA: 281/375/450. Chipper ZiPS: 260/348/438. We'll see if this is right. But I doubt it: Weightings for the very young or very old should trend more strongly to more recent seasons. (See: Silver, Nate, Rearranging PECOTA, in the 2006 annual.)

7. Speaking of 2006 conclusions:

A. Level adjustment for minor league players by age. Since comps are not selected from minor leaguers, I doubt this is done. Also, you should not comp more than one level away by age, according to Silver.

B. Starter/Reliever adjustment. Chris Sale will lose velocity and K rate as a starter.

Crap. Went long. Gotta run.

--JRM
markpadden
2/08
Regarding number 6, wouldn't an ideal projection system try to determine customized past season weights based on age/experience to handle the outliers (very old and young)? It appears PECOTA does not. That is, one would want to look at what the optimal weights are for a 22 year old vs. a 40 year old. There is almost no chance they will be the same as a average-aged player (recent perf. will in fact have more value, as will the rate of change), so why force them to be static in the projection system? Yes, I know that an aging curve is applied after raw rate stats are projected. But this assumes that all players age identically, and that there is zero information contained in the rate of change of past stats as to how this particular player is developing/aging. Fine for a mid-career guy, but not so good for players in the middle of a steep incline/decline. Basically, you want to take more seriously very recent changes in performance when players are of an age when very large, real changes are likely to occur.

Speaking of which, was the order of past seasons considered? That is, if a player had TAvs of .250, .280, .310 in his last three seasons, his next season projection would look roughly the same as someone who had gone .320, .280, .250 -- assuming I am reading the explanation correctly. I would that for any aged player, a clear trend (and you probably need more than 3 years to establish a trend) would merit some level of inclusion in the predictive model.
cwyers
2/08
I did some specific research into incorporating trends into a player's forecast. The results were not statistically significant, and somewhat counterintuitively they actually ran in an opposite direction from what you're suggesting - a player who was on an upward trend over multiple years of performance was actually more likely to *underperform* than over-perform, relative to other players of the same age.

It's both a counterintuitive result and not significant (either statistically or practically) in the testing I've done so far, so it's not currently being incorporated into PECOTA. (With some further corroborative testing, it might be.) But to the larger point, I haven't seen anything to suggest that a player with a .320/.280/.250 three-year run of TAvs should be expected to continue trending upward, above and beyond what his forecast would otherwise suggest.
markpadden
2/08
Did you take a look at very old and young players specifically? I would think that if a 39 year old player's stats are falling off a cliff over last 3 years, it would make sense to use a more aggressive aging curve than the curve used on an average 39 year old.

Also, why not let the season sequence info. contribute to the comparable player-finding? Right now, you generate a single number (baseline forecast) using certain weights, and then try to find historical players with similar stats and other attributes. [Correct me if this is inaccurate]. I would suggest looking for historical players who have experienced season sequences similar to the player in question (say, .250,.290,.230) over the last x years, and see how they performed. x would certainly need to be greater than 3. It may not even be the sequencing that provides extra info.; it may the y2y volatility or lack thereof. And clearly the criteria for "similar" season sequences (a.k.a., "career paths") would need to be pretty loose to create a sufficient sample and not overfit.

Is this type of approach (using more than 3 past seasons and using loosely-bucketed career paths to find comps) under consideration for future versions of PECOTA?
cwyers
2/08
I did look at very old players specifically - I'd have to dig up my notes to see if I did the same for very young players, but I believe I did.

In terms of looking at trajectory on the whole, and matching comps based on that - I've done some examination of that. Where you run into the most problems is comparing players who are currently mid-career to guys who have finished their entire career.
Oleoay
2/08
Regarding #3, I guess that means the PECOTA in this year's annual will be Potentially Fatal in terms of accuracy.
Oleoay
2/08
I didn't even realize that people in the BP community helped beta test PECOTA...
jberkon
2/08
When will depth charts be viewable? I get this message ("Depth Charts will return in 2012") when I go to http://www.baseballprospectus.com/fantasy/dc/
jessehoffins
2/08
you can look at the individual team DC's for now, but they look like they're a bit of a mess. Or is the plan going forward to list by batting order spot? That would not be useful.

They do seem to be roughly adjusted for reasonable expectations about injuries, so it seems like they are somewhat current, and they match the numbers from the pecota spreadsheet.

But isn't the spreadsheet not supposed to be playing time adjusted?
jberkon
2/08
How do you look at individual team DCs?
joechris96
2/08
We're still putting some finishing touches on the depth charts so for now they've been disabled.
jberkon
2/08
Thanks! Appreciate the quick response.
tyjordan
2/08
there is a serious glitch in your height column
naehring
2/08
Where is VORP for hitters? Many are using this for fantasy purposes and while including WARP is good, it includes BP's estimation of the players defense. In all fantasy formats, defense is either not included (standard roto, point formats), or included in a different way (scoresheet, strat, etc). It is difficult to use this to rank hitters if you have to try to back out fielding from WARP. I posted a similar comment last year and was quickly rewarded with the VORP column in an updated spreadsheet. Here's hoping for similar excellent service this year.
Lucasjj
2/09
I would also like to see this.
cdluzier
2/09
ditto
dconner
2/10
I never saw the VORP addition last year, but it would be very welcome.
jj0501
2/08
How do I get to the new Depth Charts ? Is there a link ?
It looks like a couple people have gotten to them. Thanks.
dpease
2/08
hi, we'll have those up soon. Thanks.
hessshaun
2/08
Chris Sale is coming up for me as pitching 168 innings and striking out 198. Strasburg is projected to throw the same amount of innings with 184.

PhillyPhreak
2/08
I can only imagine that when you were putting these projections together that everyday you see one more card.
markpadden
2/08
The ten-year forecasts look poorly smoothed at best, poorly conceived at worst. E.g., Kershaw's 10-year eqERA goes:

3.65
3.75
3.75
3.56
3.81
3.85
3.76
3.76
3.63
3.52

So he is still getting worse right now (age 23), but when he turns 30, he'll finally turn the corner and start to improve... I think the erratic K rate projections are mostly to blame here.
markpadden
2/08
Correction: I was looking at an old card for Kershaw. (The cards are apparently not updated yet).
derekdeg
2/08
Is the PFM up? I thought I saw it active about an hour ago and now it's not.

Or was I just dumb and used last years.
joechris96
2/08
No, you're not dumb. It was active but wasn't using the current year data. We've disabled it so there's no confusion while we put some finishing touches on the program.
brownsugar
2/08
Not so fast there Joe. If there is one thing I've learned around here over the years, it is that Jaffe has a fantastic mustache. But if there are two things, it is that correlation does not equal causation. The current functionality of PFM does not provide sufficient evidence to determine whether or not derekdeg is in fact dumb.

(I kid, of course, Derek! Pecota release day always gets me all fired up.)
derekdeg
2/08
All good. It wouldn't be the first or last time I was dumb.

I just saw that there was a PFM up. I had copied that and the PECOTA and the stats were e same. But the player values seemed wayyyyy off.
tbwhite
2/08
Before I raise some questions, I'd like to say that I enjoyed the article and the extra transparency into the process. I think that there is room for improvement still, but I appreciate and enjoyed the article. I'm also happy to hear about the downplaying of regressing to a ML mean, that always bothered me, essentially by weighting previous seasons more it feels like you replacing regression towards the ML mean with regression towards the specific player's mean.

I'm very curious about the comps. If you sort the batters by age, you'll notice that all 19 years olds have the same 3 comps: Wayne Causey,Ed Kranepool,Robin Yount. Sometimes the order differs, but it's always the same players. This seems to back up what jrmayne says that minor league seasons aren't being used as comps, and suggests that all 19 year olds get compared to those 3 players because those are the only 3 available to use as comps. If that is true, I find it troubling. It wasn't long ago that BP ran Rany's articles about the importance of age in assessing player's potential. So, BP is on record having published that youth is a critical factor, yet if minor league seasons aren't used in comps, then the pool of potential comps shrinks more and more(meaning that the quality of the comps must decline) as the importance of the comps should be increasing to the model. All other things equal, a 19 year old should ultimately improve more than a 20 year old who should improve more than a 21 year old. If my understanding is correct, PECOTA sets it's expectations for improvement or growth based on the comps, so the time period where players should see the most growth is also the time period with the worst quality comps.

From my perspective this troubles me because the projections I care most about are the prospects. I get that whether Pujols hits 40 or 50 homers is probably due to chance. I don't particularly care if you say Votto will hit .300, .310 or .290 because that prediction probably says more about the biases of your projection system than it does about the reality of how good Votto actually is. The fact is that no current projection system is going to offer any real insight into an established player like Votto.

But, where a system I think can offer insight is in the younger players, making sense of what the minor league stats mean, assessing how that player might develop and grow over time based on what similar players have done. This is what I thought PECOTA was doing, but it sounds like that is not the case. If PECOTA's projections for all 19 year olds are based primarily on what Ed Kranepool and Wayne Causey did, I don't see how it can offer any insight into how those players will develop.
markpadden
2/08
I think Boras should use PECOTA aging curves in his next negotiations. Old guys get older, but their performance apparently stays about the same. Torii Hunter will apparently be a league-average hitter well into his mid-40s. But it won't even be a market inefficiency for teams to exploit, as the entire league will be filled with 45-year-old .260 TAv hitters by 2018.
pakdawgie
2/08
Looking forward to seeing the PECOTA player cards - for me by far the most interesting thing about PECOTA is to see the whole range of outcomes to guage things like upside potential. I have trouble knowing exactly what to do with single deterministic projections.
joechris96
2/08
Quick note...I'm putting together a list of all your questions, concerns, and inquiries. We will attempt to address all issues either here in the comments or in a separate post (or possibly a FAQ page).

Please note that we are still putting finishing touches on PFM and depth charts so, for the time being, we've disabled those features. If you ran PFM earlier today, please disregard those results.

Thank you for your patience and your feedback.
jesseatcal
2/08
Any prediction on when the PFM will be up?
doog7642
2/08
I'm glad to see the Breakout scores for hitters are far more substantial than last year's iteration, but I'm still surprised that breakout scores for the top pitchers are more than double those for the top hitters.
doog7642
2/08
Kind of a bummer that neither singles nor total hits are included for hitters. I can discern the numbers from avg. and PAs, but it's kind of a pain.
cwyers
2/08
I'll add those to the next spreadsheet run.
Rangers
2/08
This is my first year messing with PECOTA, so maybe I'm missing something obvious...is there no way to sort by position?
PhillyPhreak
2/08
When you download PECOTA to Excel, just do a column sort on the position tab.
Rangers
2/08
Oh, I see. When I did Randy Brown's Text to Column fix I accidentally deleted that column. Whoops!
Gordon
2/08
Was there any attempt to incorporate injury data into the Pecotas? I think that it would be a difficult thing to do, but in the case of Tommy John procedures it might work: there are good data out there as to who got the operation, when they got it, and there seems to be some evidence of a season-long increase in walk rates as they pitch and recover more.
dpease
2/08
Yes! We are working on this right now. We're not at the point we can say exactly what we'll be able to do with it, but we should be able to update you soon.
Gordon
2/08
Breakout rates for many batters seem really low. 2% for Dominic Brown?

There also seems to be quite a few batters with 250 PA. Was this an attempt to truncate high PA for players not expected to be full time players?
cwyers
2/08
The primary input into the PAs for players not in the depth charts was past MLB playing time. For players with little or no MLB experience whatsoever, we put in a "floor" of 250 PAs for the sake of readability. Again, those players are not in the depth charts, so in reality we're not expecting them to play in MLB much if at all. There's a column in there called DC_FL which will tell you if a player's PA forecast comes from the depth charts or from historic playing time only.
RFKRFK
2/08
I would like to see the Upside By Year data in the spreadsheet.
dpease
2/08
We will have updates on Upside, 10-year projections, etc soon.
Oleoay
2/08
Ya know spring training is about to start when the questions about PECOTA come out...
dpease
2/08
Hi all, we've got a post up on current Fantasy status... we'll keep it updated as events warrant. Please drop by http://www.baseballprospectus.com/article.php?articleid=15999 if you've got a fantasy question or comment.
Oleoay
2/08
"Some quick examples beforehand: the recent poster boy for Ć¢ā‚¬Å“New PECOTAĆ¢ā‚¬Ā would probably be Francisco Liriano, whose 3.60 PECOTA forecast for 2010 was almost identical to his real-life 3.62 ERA, while Marcels weighted his recent past (2009 was horrific, and many observers wondered if he'd ever return from his injury woes) and forecast a 4.88 ERA. The ERAs cited here were derived using a 3rd-party version of Marcels (don't want anyone thinking we cooked the books), against the Ć¢ā‚¬Å“New PECOTAĆ¢ā‚¬Ā system applied retroactively."

Liriano's highest comp on his player card is Oliver Perez's 2010 who had a RA of 7.19. Maybe PECOTA is more tuned to be correct for the short term and Marcel is more correct for the long term (when Liriano's 2011 ERA went up to 5.09 and FIP went up to 4.58)?
RonEckstein
2/08
I'm sure it's in front of my nose and I'm missing it, but where can I find a glossary and how the numbers are reached?
In particular: FAIR RA, BREAKOUT, IMPROVE, COLLAPSE, ATTRITION, ML_PCT, DC_FL

Thanks!
KJOKBASEBALL
2/08
If I understand correctly, for hitters a 1/.92/.90 weighting out to about 7 years in the past is used.

Then the article shifts to talking about Liriano and pitchers, but it's not clear that 1/.92/.90 and 7 or so past seasons are also used for pitchers. Are pitcher's baselined using the same past weighting scheme as hitters? (And if so I'm questioning if that is correct...)

dgreene007
2/08
Someone ought to say it: The title of this article, "The Weighting is the Hardest Part," cracked me up. Verrry clever!
Hagenwagon
2/08
New to BP...what is TC_FL?
Hagenwagon
2/08
Sorry DC_FL?
dpease
2/08
hi, please see this comment. Thanks!
mdthomp
2/08
New to Pecota as well, is it me are do they seem a little on the conservative side?
mdthomp
2/08
on second thought, after closer examination I think they look pretty good. Awesome work.
dwachtell
2/09
Your "closer examination" of PECOTA took all of three minutes?
Oleoay
2/10
2 minutes and thirty seconds.. the other thirty seconds were spent typing up the comment and hitting the "Submit comment" button.
ScottBehson
2/10
They tend to be conservative
rreading
2/08
Thanks for this!

Newer to BP, any chance people are looking for a keeper league owner? Looking to join a dynasty league. Thanks.
mhmosher
2/08
Check out www.tpfs.com. Our email tripplay16@aol.com

tell him Matt sent you. This company runs 40+ dynasty leagues and they're always looking for new owners.
rreading
2/08
Thanks, heading over there now.
mhmosher
2/08
You're welcome - they're a good company with good leagues and smart owners. I've played in several leagues there since '04.
mariotti
2/08
I really like BP, but I have been disappointed with the PECOTA ratings for the past few years. Some of the projections have been real headscratchers, and this year's ratings are no exception. Does anyone really think Erik Bedard will have the same WHIP as Clayton Kershaw, or that David Wright will have a higher TAv than Jose Bautista? Results like that are so off that it makes me question PECOTA as a whole.

Yet despite equally bad projections each of the last two years, the problems are not publicly acknowledged, and PECOTA is once again advertised as "deadly accurate" despite achieving worse results than Marcel in recent years.

I want BP to be successful, but I have to say that I'm disappointed. I'm not sure why you released the spreadsheet in its present state, but I wish you hadn't.
dpease
2/08
We'll be publishing some test results on PECOTA tomorrow. We hope they help allay your concerns.
tbwhite
2/10
Will the test results be published on Friday ?
tbwhite
2/11
Tomorrow, tomorrow, I love you tomorrow, you're always a day away.
dpease
2/11
Sorry, I shouldn't have promised a date so close to PECOTA release. We're still reviewing the data and deciding on the best way to present it. Thanks, and have a great weekend.
joechris96
2/08
For everyone with PECOTA questions, Colin will be chatting tomorrow at 1:00 PM ET http://www.baseballprospectus.com/chat/chat.php?chatId=897
Theaetetus
2/08
I am amused that PECOTA is trusting enough to believe that Miguel Cabrera is a 3B!
cjthomas
2/09
And he weighs 180 lbs
cwyers
2/09
After review, we're going to be switching our data source on player heights/weights to the ones put out by MLB. That's going to shift the comps slightly, so we'll be doing a new PECOTA run this morning as well.
cjthomas
2/09
Thanks!
KevenC
2/09
Please include the AL/NL column.
lichtman
2/09
Downloaded the spreadsheets. For batters, there is no hits or singles column that I can find (I assume B2 and B3 are doubles and triples - is it too much trouble to use the letters D and T or 2B and 3B?). Am I missing it? I can almost infer them from the BA and PA but there is also no AB column and it is not clear if AB is PA-BB (there are no SH and SF and ROE, etc.).

Also does BB include IBB? What about HBP?

And finally, are all the numbers assuming player plays half his games in his home park? For minor league players, do the numbers assume he plays half his games in his minor league park as well?

Thanks!

MGL
cwyers
2/09
Projections are for the listed team's expected park factor - so a minor league player that's listed with the Rockies has a forecast for half his games in Coors Field, for instance. BB includes IBB but not HBP.
Oleoay
2/09
If there are park factors for walk rates, wouldn't there be park factors for HBP on the theory that if it's harder to throw strikes in certain ballparks, then batters are also more likely to get hit by a pitch?
mlive78
2/10
FYI...until they release a new version with hits and/or singles included, I used SLG and AVG to calculate it on my own. Use the following formulas to calculate singles (B1) and at-bats (AB). (Just insert a new column, use the formula, and paste it all the way down the column.):

B1 = ((SLG*(B2+B3+HR))-(((B2*2)+(B3*3)+(HR*4))*AVG))/(AVG-SLG)

After that, you should be able to get AB pretty easy:

AB = (B1+B2+B3+B4)/AVG

You can double-check your calculations for B1 and AB by using them to calculate SLG, (B1+(B2*2)+(B3*3)+(B4*4))/AB. Your result should match the given value on the spreadsheet.

Hope that helps!
PhillyPhreak
2/09
Maybe someone has said this above but I'd really like SP and RP designations too.
chrisgoddu
2/09
Can you include SS/SIM column in the spreadsheet. Many of us are starting scoresheet drafts and that would be helpful.
tomterp
2/09
I'm thinking that giving Matt Stairs 250 PA's for the Nats this year was just a way of finding out if any Nats followers are paying attention. I would be stunned if an Indy team gave him that kind of playing time, and I am 100% sure he gets zero PA's with the Nats this year.
joepeta
2/09
Ignore any player with an "F" (like Stairs) DC_FL. It's a true/false field for Major League roster presence. Anyone with an "F" is defaulted to 250 ABs (with minor exceptions I can't explain like Posada and Ibanez.) If you do a complete projection on Washington, ignore the guys with "F"s and you will see a reasonable team projection.
tomterp
2/10
Ah, thanks. I thought I must have been missing something.
TangoTiger1
2/09
For the record:

Marcel adds 1200 PA of regression only if you use the 5/4/3 scheme. If you use the 1/.8/.6 scheme, then you would add 1200/5 = 240 PA of regression.

In the example in the article, a player with 650 PA each year for 3 years will regress 11% toward the mean, not 40%.

Marcel shows exactly how much each player was regressed, if you download the Marcel files. There's a column called "r", that shows the percentage of his performance stats that were used for the forecast. When you see r=.89, that means that regression was 11% toward the mean.
fairacres
2/09
It strikes me that PECOTA has a very conservative bias. Every year, as I peruse the projections for players, particularly established players, I mentally think "would I take the over or under on that stat line" and while I have not recorded those thoughts systematically, my sense is that each season, I feel like I would take a lot more "overs" than "unders."

To the "Breakout, Improve, Collapse, Attrition" data -- I took the spreadsheet, and narrowed to the top 350 players or so, i.e., the ones that are likely to get the majority of the at bats in 2012. I then sorted by "Improve", and found that 70 had scores of 51 or higher, 7 had scores of 50, and the other 270 or so players had scores of 49 or less.

I am sure there is a reasonable explanation for that outcome, but it does not feel intuitively correct to me, unless the opposite was true for pitchers (i.e., the ascendance of pitching over hitting over the past several seasons is expected to continue, so pitchers' performances are expected to "improve" as hitters' production continues to deteriorate).

Can you guys shed some light?

joepeta
2/09
Fairacres is onto a topic here that I hope Colin addresses either here or on the chat. If you run a simulation for each team based on the component players, PECOTA is calling for total scoring in 2012 to be 21,353 runs. That's a not unreasonable increase from the 2011 total of 20,808.

However, the total 2012 WAR for all batters drops 36.1 YoY, implying a drop off of nearly 350 runs. (I;m comparing PECOTA WAR with the WARs on the BP Statistics Sorts for Team Value.) On the pitching side, the 2012 WARs rise a whopping 67.8. (Obviously implying a much, much lower run environment in 2012 than 2011.) Something seems off here. Has there been a change to replacement level, especially at the pitching level? If so, why don't the prior year's change? I've always suspected prior year pitching WARs are too low based on some change made between the 2011 Annual and after the season starter. (For instance look at Halladay's huge drop from book to PECOTA card.)

Finally, the shift in WARs is most pronounced by league. AL starting pitchers, per PECOTA, are expected to increase WAR by 57.1 (of the total 67.8 I refer to above) YoY. That just doesn't seem plausible and Yu Darvish isn't even in the database yet.

Any thoughts from BP?
molokai
2/09
Would just like to say any spreadsheet would be more useful if you had Berkman, Lance or a column for last name then first name. Sure it only takes a few more minutes for us to text to column but then we have to weed out the Wily Mo Pena's.
lichtman
2/09
"...BB includes IBB but not HBP."

It does not appear as if it does. For most players, you can't tell from looking at the numbers, since most players have few IBB. But look at Fielder and Pujols. Fielder is projected with 92 BB+IBB in 687 PA. He typically gets IBB'd 25 times per season. That leaves 67 NIBB. That can't be right. Same thing with Pujols.

Also, you did not answer my question about singles and AB.
joepeta
2/09
Colin,

More on PECOTA's WAR calculation: Is it the same as BPs calculation?

Here's why I ask: in 2011, per BP, only 23 pitchers had a WAR of 3 or better. On the surface that seems way too low, and I've thought that since changes were made to your calculations in 2010/2011. PECOTA has 42 pitchers with a projected WAR of 3 or better for 2012. Projection systems are always more conservative vs. actual outliers so again, it wouldn't seem possible the WAR calculation is consistent.

Can you shed some light on this?
saint09
2/10
Not sure if this is the forum for a PFM issue, but I continue to see Kevin Youkilis listed as a 1B and not a 3B... either Gonzo was traded, my system is acting up... or PFM is off?

Thanks,

Chris
dpease
2/10
hi, we're looking at this. Thanks.
Tom9418
2/10
All I see in the hitter spreadsheet is 2011 stats. What am I doing wrong???
kathywoolner
2/10
Hi, Tommy. Please download the PECOTA Weighted Means spreadsheet again from the Fantasy page. We have corrected the year on the hitters sheet. Sorry for the confusion. Thanks for your support.
Oleoay
2/10
Thanks to our new feature...

BP Customer Service
3 comments | 0 total rating | 0.00 average rating

I didn't even realize there was a generic BP Customer Service login though I guess it makes sense.

fieldofdreams
2/10
So Jonny Venters has pitched 171 innings in MLB with 189 Ks and a 1.89 ERA, yet PECOTA thinks he'll strikeout only 62 in 74 IP this year with a 4.07 ERA. What am I missing here?
mlive78
2/12
Any chance we can get SH, SF, and GIDP included for hitters in the Pecota Spreadsheet? Those categories are already incorporated into your PFM tool, so I would think they should be relatively easy things to add. I would also like to see IBB, if possible. On the pitching side, GIDP projections would be helpful there, also, as well as breaking out the Hits against category into 1B, 2B, 3B, and HR (obviously, HR is already done).
jbergey11
2/13
In regards to Dunn. He was awful the entire year and showed no improvement from month to month at all. I wonder if he didnt forget how to hit. Apparently he has slimmed down so we shall see.
jbergey11
2/13
It would be nice if this was sortable. And came with the positions separated.
dpease
2/13
the csv is sortable in any standard spreadsheet. You can use the 'POS' column to get position. Is that what you are looking for?
jbergey11
2/14
Got it. Thank you.
WilliamWright
2/13
I noticed you haven't corrected Fausto Carmona's numbers yet. First off, he's now Roberto Hernandez Heredia, Age 31, and based on his legal issues probably won't approach 169.33333 IP
mlive78
2/13
Should I be concerned that PECOTA is projecting the Houston Astros to lead the NL Central in scoring by a LARGE margin?
spaulo
3/14
Why no Retrosheet or Lahman database IDs in PECOTA this year?