The PECOTA cards are finally here!
-
We've got 5948 cards this year, from all the usual major-league players on down to Yorman Rodriguez, who can't yet buy a drink.
-
The depth charts, player search, and newly published articles and blog posts will link to the new cards. You won't be able to see everything on most of them unless you are a subscriber, but anyone can see the New York Yankees cards. Here's Alex Rodriguez, and here's CC Sabathia.
- We're still chasing a few little things on these: pitchers late in the alphabet don't have historical data, for example. We don't expect much change to the core of the cards anytime soon.
Thanks again to our beta testers for looking them over, and to everyone for your patience. I'm sorry they took as long as they did, but the upside of the big changes we had to make is that we're all cued up to do all sorts of neat things with these throughout the 2010 season, and I'm going to go out on a limb (which I probably shoudn't be doing at 1:30am) and guarantee that 2011 will be our earliest year ever.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
Btw, awesome work Dave. Thanks for the extra time you put in to get these out before my draft :)
Looks like one item not caught in testing: There's a totalling problem with at least one pitcher. Check Gio Gonzalez' projection; I think 102 K is possible in 101 innings, but not so possible in 66 IP. Projecting Gio to K 14 per 9 seems like there's some adding issue.
--JRM
BTW, you can also shift-click on the headers to sort on multiple columns. Again, I'm not sure that that's useful for PECOTA, but it still might be fun to mess around with.
We definitely need to figure out how to cram some arrows in the table header to make that sorting functionality more evident in the future though. Thanks for the comment!
2010: .353
2011: .349
2012: .343
On the 10 year performance plot:
2010: < .350
2011: around .340
2012: .330
I noticed this error on every hitter I looked at.
One thing I haven't figured out yet, though I haven't put a lot of time into it, is how to get to the DT cards from the PECOTA cards. Are those still around?
I miss that stat, too.
Can you comment a bit on Strasburg's PECOTA card? How confident are you in PECOTA's ability to project such a once-in-a-generation pitching prospect?
"Lincecum’s 90% forecast divided by his 10% forecast: 71%
Carpenter: 81%
Arroyo: 81%
Strasburg: 75%
And there ladies and gentlemen is why I say the PECOTA percentile forecasts are irrelevant.
1. How in the world is the range of Strasburg’s forecast the same as established stars? It’s impossible. We know less about him, and so the uncertainty of his forecast must be wider.
2. He has only a 10% chance to post a better than 3.94 ERA?
This all goes back to the way Nate does his comparables. What Nate does is that he looks ONLY at rate stats. And so, he would take Strasburg’s MLE rate stats, looks at comparable pitchers based on rate stats, and then looks at those pitcher’s observed ERA, and use that as the range.
This is 100% wrong."
Care to comment?
http://www.insidethebook.com/ee/index.php/site/comments/the_tom_seaver_rule/#comments
I'm sure they would appreciate the enlightenment.
Sorry to point out something that doesn't look right. I'm really excited about the other 99% of it.
Sorry for the snark, but the 10 year projections still look like rubbish to me. Can someone(from BP) explain the relationship between Avg and EqAvg in the 10 Year Projections. I'm looking at Adam Jones and I see for 2010 a projected Avg of .293 and an EqAvg of .282. Ok, fine. But in 2011, his Avg is .279 and his EqAvg is .280 ? What is going to change in MLB between now and next year that would cause a 14 point drop in batting average to only move an EqAvg 2 points ? Felix Pie is even worse. His batting average falls 14 points from 2010 to 2011, yet his EqAvg rises 3 points ?!?!
What's the deal with aging ? Adam Jones shows a steep decline in his TAv at age 33. Nolan Reimold makes it to 35 until he suffers a similar drop. Really ? Two players, one establishes himself in MLB at age 22, the other at age 26, the younger one is by all accounts a better athlete, plays a tougher defensive position, yet he'll be the one who declines first ?
Also, no one from BP has addressed the Phillpe Aumont projection. How exactly does a guys serve up a .300 BABIP for 5 years with a > 50% GB rate, then drop his GB rate ~30% and drop his BABIP to .200 for the next 5 years ? That is the most remarkable projection for a player I've ever seen. When the 10 year projections are run, is a new list of comps generated for each projected year, or do you use the 2010 comps only and see what they did over the next 10 years ? It seems like the latter is what is being done, and I would argue that is wrong, or at least not optimal.
My understanding is that the comps never change, just that some of them are no longer active by a certain time down the road. So PECOTA says that Aumont has X% chance to be one of the drop-outs and Y% chance to make it with the weighted means only looking at those who are still around.
Make more sense?
So, if Aumont is in the majors in 2014, he'll be a mediocre ground ball pitcher. If he's in the majors in 2015, he'll suddenly be a fly ball pitcher with an uncanny ability to prevent hits on balls in play ? That's simply not realistic, I will bet my house that won't happen. Is there any historical precedent for a pitcher who was a ground ball pitcher for 5 years suddenly becoming a fly ball pitcher ? Is there any pitcher in ML history who has posted a .200 BABIP for 5 straight seasons ?
I think the problem is ignoring the players who drop out. The fact that PECOTA only thinks Aumont has a 20% chance of being in the majors should be part of his 2015 forecast. Ignoring the 80% of the time he won't be in the majors turns the 10 Year Forecast into something very different. I'm not sure what to call it, but it is not a 10 Year Forecast. A 10 Year Forecast for Aumont would show that by 2015 he'd be posting below replacement level stats, and the discontinuity in his GB and BABIP rates wouldn't be there.
There is not actually a dichtomy in 'he will either improve remarkably or wash out of baseball within 5 years'. The average major league career isn't all that long, and it's pretty typical for marginal players to need to make a jump in order to attain long-term stability.
What you're really complaining about isn't that prediction, its that you don't think the way that information is presented is intuitive. There's not really a problem with the projection.
Players do not have careers that follow their weighted mean projections as an even line every year. That isn't how PECOTA is designed to work at all and completely defeats the purpose of having 8+ percentile projections for each season. The weighted mean is a useful tool for evaluating potential production for the upcoming season and for evaluating the potential value of a player under contract.
Expecting to come back in 10 years and look at Aumont and see numbers each season that reflect the weighted mean is not even remotely realistic and not at all how the system is designed to work.
Its a question of probability: it is not likely that Aumont improves that much, but its possible and if he does so he will be a pretty decent pitcher for a while. If he doesn't, he'll enjoy life as a crappy 5th starter for a few seasons and get released when he hits arbitration and costs way more than he's actually worth as a replacement-level player.
How doesn't that make sense again?
Maybe the answer is to show percentiles by year, maybe not all 9 that are shown for the current season, but rather P90, P75, P50, P25 and P10. Show those for 5 years, rather than just the weighted mean for 10 years. I'd rather have 5 years of really useful info, than 10 years of largely useless info.
I don't expect to come back in 10 years and see that the weighted means were dead on for Aumont. But I would prefer that enough info be made available such that the "story" makes sense. What you're saying is that the discontinuities I'm complaining about are an artifact of weighted means, fine. But then display enough info so that the discontinuities are easily understood.
I still have an issue with the projections. If what you are saying is true that Aumont is a GB pitcher in most scenarios, but some small % of the time he develops another pitch and becomes a different, better pitcher, it seems like there should be evidence of this in his 2010 projection. In other words, his 90th percentile would show some evidence of his ability to change from a ground ball pitcher and to limit his BABIP. But look at Aumont's 2010 projection and there is no evidence of that, he GB% only drops about 3% in his 90th percentile projection and his BABIP maybe 10 points. Again this would be an argument for more detail in the 10 year forecast as perhaps if you could start to see the new Aumont who misses bats begin to creep into the picture then 2015+ wouldn't look so strange.
Aumont is by his nature a wierd player for PECOTA to predict because he's going to be a late developer: cold weather players get less experience at younger ages than warm-weather players do, and tend to develop a little later. I'm not sure if that's a variable PECOTA accounts for (in fact, I dont think it is), but it's probably significant in the case of Phillipe Aumont.
I would suggest that the longer Aumont career should probably be looked at along the lines of the way that a player like Jaime Moyer developed from a fringe, couple year player into someone who managed to stick around for a very long time with their best potential performance not being much better than league average. Equally, I doubt such players actually develop via spike improvement to things like GB rate and K/9, but in fact likely develop in smaller improvements over time.
In fact, most pitchers develop in this fashion, where the only really big improvements that tend to happen in a pitcher's development involve pitchers with fantastic stuff and control issues who develop better control. Unlike hitters who may spike by 30-50 points in OBP/slg in one seasion and retain half of that going forward, most pitchers improve in smaller increments over time and health.
Most of the unpredictability in pitcher performances tends to be based around playing time and injury and their relations to effectiveness, not the pitcher's raw abilities improving. A hitter like Andre Either can suddenly develop significantly more power or better plate discipline and be expected to retain a portion of that improvement. On the other hand, no one expects either Clayton Kershaw or Chad Billingsley to make sudden leaps forward in the single area that stands between both of them and greatness (BB/9). Things like K/9 and H/9 rates rarely feature significant spike improvement unless defense or home park environments improve significantly for the pitcher.
While the conclusion in question isn't intuitive, its also a question of you not utilizing all the data available in the player card. You should be looking at the attrition rate and stars and scrubs chart in determining things regarding a player's likelihood of being in the league at all. Weighted Means, SuperVORP and the like exist to give you information regarding that players' potential performance if they're in the league at all.
A knowledgable user of PECOTA should know both those things, through use of both the glossary and the archives of PECOTA articles, in order to use it properly.
I have looked at Aumont again and I do wish to revise my reading of it: Honestly, what PECOTA is predicting is that unless he makes a significant improvement, Aumont will perform at a sub-mlb level (probably not even in MLB, or bumping back and forth between as a 5th starter/back of the bullpen guy) until he hits 25-26 and is no longer considered a prospect, when he'll probably be given up on if he hasn't made that significant improvement.
I would not be surprised to see a lot of other prospects with similar projections: it takes Aumont's 75th percentile for him to be good enough to actually keep playing professional baseball for some reason besides potential, and potential doesn't keep people employed all that long. That 5.00 ERA prediction at his 50th percentile looks a lot like that time spent bouncing back and forth with someone hoping he develops something new.
I don't have an answer why PECOTA thinks that that particular kind of improvement is the likely one, though it does involve missing bats so I suppose it makes sense (the drastic drop in BABIP).
I do agree that there would probably be a significant benefit for BP to write out a composite explaination of how each feature of PECOTA is intended to work and what it's meant to tell you, possibly updated once a year and posted at the top of the PECOTA player cards.
This is probably also why certain statistics that were used by a smaller group of us (Like the BETA statistic) have been removed, as they probably confused more people than they helped by a factor of 5 or 10.
Let's just do a simple thought experiment. Let's say PECOTA is taking the 100 players most similar to Aumont to project his 2010 season. For 2011, let's say 10% of his comps are gone from baseball. Either the sample size his projection is based on shrinks, or we replace the 10 dropped comparable players with the next 10 most comparable. Repeat several times. You're now projecting 2015 with either just half of the original sample size, or with a full sample that is less comparable to Aumont(in a worst case scenario you could have replaced comps 1 thru 50 with 101 thru 150).
Two things are happening every year, the quality of the pool of comparables goes up(in general only good or better players will play in both 2010 and 2015), and their similarity to the subject player(Aumont) goes down. In a perfect storm I can envision where the pool of players that survives is very different looking from the actual subject player, and so a guy who is a GB pitcher is paired with comps who are fly ball guys, because the GB guys(who were most similar) dropped out.
As for the BABIP, a large part of whether a mediocre player plays a lot in a given season is probably luck. So, of course, the comps who play a lot several seasons from now are probably also biased towards being lucky(low BABIP). So, suddenly PECOTA starts projecting someone to be lucky(like Aumont).
This is why in my opinion the correct methodology would be to take the 2010 projections and develop new comps to be used in projecting 2011. Then use those 2011 projections in creating comps for projecting 2012, and on and on.
What PECOTA is missing is that sure maybe 10% of Aumont's top comps for 2010 don't actually play in 2010. But that doesn't mean the actual quality of players in MLB has improved. It just means that a crappy 10% portion of players has been replaced with a different 10% portion of crappy players, some of whom will be very similar to Aumont. You can't ignore that fact. Replacement players come and go, they don't just go. If you don't factor in the arrival of new crappy players who are comparable to the guys you are trying to project out several years then you're introducing a bias in the results.
We're now doing it the way PECOTA has always done it--once, up front--and we'll definitely be doing some further research on this topic.
Maybe there is a middle ground, just run the comps once, but don't let players drop out. Rather than have the later years of a projection only informed by the players who were still standing, maybe there should be some sort of default value, just below replacement value that gets plugged in when the career of a player's comp ends. This would prevent the quality "creep" that seems to be occurring over time, and also maintain the quality of the comps used for the later years of the projection.
If comps are replaced as their career ends, it would be possible(perhaps even likely depending on the turnover rate), that the comps used for the year 10 projection would be almost completely different from the year 1 projection. Not only would they be different, they would be better caliber players and less similar to the subject player.
Player career curves are -not- an even line. Younger players are given more leeway when they perform poorly because its felt they may improve from the experience. Older players without established track records are not given that sort of leeway.
The Aumont projection shouldn't be seen as an example of 'power creep' or 'performance creep'. Its more of a statement on how likely PECOTA thinks it is he develops into a MLB caliber player: around 30-35% or so. For him to become a MLB regular, PECOTA thinks he will develop along a certain line. It equally thinks its more likely that he develops along a different line that leads to his being a failed prospect.
While you may disagree with these assessments, I would suggest they are entirely consistent and legitimate assessments of Phillipe Aumont, and not a systemic flaw.
I think your case regarding power creep is actually stronger with established players who are far less likely to see a revolution in their usage patterns or performance, like David Aardsma or a few others that were pointed out during the beta process. For prospects, a boom and bust on entirely separate career tracks is not unrealistic: in fact, its pretty typical.
The upsides are quite a bit higher than in the past and I had thought this was from extending from 7 to 10 years on the PECOTAs but now I'm confused what it really means.