Thanks to the hard work of our statistical and technical teams, our signature BP stats have gotten an overhaul this season, both on the surface and under the hood. As often tends to be the case with ambitious projects, we've had a few hiccups along the way, some of them more noticeable than others, and none of them the kind you can cure by holding your breath and chugging a glass of water (not that we didn't try that, just in case). We've been reexamining old ideas and assumptions, and that's why you've seen some values change or fluctuate. In the process, we've also made a few regrettable missteps. Fortunately, we've managed to resolve the most serious issues, so it's time for an update on where some of our statistical offerings stand:
-
Pitcher WARP: As a number of you noticed, Derek Lowe's WARP looked oddly high for a pitcher with his unspectacular peripherals. We've examined our code and uncovered a bug for which Lowe was the poster boy, if not the lone pitcher affected. The bug resulted in an incorrect boost in FAIR_IP for extreme groundball pitchers like Lowe, which in turn produced bumps in VORP and WARP, counting stats that are based on FAIR_IP. As soon as this fix was implemented, Lowe's PWARP dropped from 4.0 to 2.4, which is likely to be regarded as an uncontroversial change by everyone other than Lowe.
-
Rest-of-season PECOTA: We've made changes to the weighting of recent results that reduced the impact of 2009-2010 performance on players' rest-of-season projections, making them less susceptible to being swayed by small-sample stats. Those changes are now reflected in the "2011 Projections" table at the top of each player card, as well as in our Playoff Odds.
- PADE: Park-Adjusted Defensive Efficiency is now fully operational. The Rays are doing so well in the PADE department, you'd think a member of their front office might've invented it.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
If you reduce 2009/2010 weighting doesn't that increase, not decrease, the smaller samples from this season ?
That said, a cursory examination reveals that new SIERA resembles old SIERA much more than new Coke resembled old Coke. Looking at 2010, the root mean square error between the SIERA values published on Fangraphs and the SIERA values at BP (weighted by innings pitched) was only .19. Mean absolute error was only .12. These are intensely minor differences, all told. Nor has the standard deviation changed appreciably - I think there might be a .01 difference.
In order to get these modest changes out of SIERA, Matt seems to have solved a problem that nobody had - that SIERA just wasn't complicated enough. He's added four additional coefficients, which is quite a lot considering that formulas like FIP only have four coefficients total. The sign on one of his interactive effect has changed from negative to positive - I don't think anyone thinks the relationship between walks and ground balls has fundamentally altered in baseball since SIERA was initially published. To the extent that SIERA has improved in predictive power (which, as I said, he has not yet presented his evidence for) it seems to have diminished in explanatory power. It is now harder to reason out *why* SIERA says what it says.
Also, shortly I hope to finally publish some of my own research findings on the matter that I've been working on for a few months now. I think that will answer a lot of people's questions about this issue.
http://www.fangraphs.com/blogs/index.php/new-siera-part-two-of-five-unlocking-underrated-pitching-skills
I appreciate the general admission of errors, but there are some specific unacknowledged errors that ought to be addressed.
1. What makewayhomer said.
1a. You started the RoS PECOTA by announcing that "Fangraphs is wrong." Now you've apparently gone toward Fangraphs. My quick calculations indicated Fangraphs' weighting was pretty good; rather more tellingly, Tom Tango's did too. I'm pretty sure an apology is in order; if you're going to wrongly disrespect competitors, you owe them an apology when you're wrong.
This is really bad form not to do that.
2. If you're going to talk about PECOTA problems and not address the Kila/Bowker problem - which was obvious and mentioned at the time by more than one person - that seems unfortunate. Some indication that you're trying to repair the error would be nice.
3. If you're going to talk about PECOTA problems and not talk about the Trout problem, that seems unfortunate. Some indication that you're trying to repair the error would be nice. The continued insistence at the time that the comp list problems for those with recent minor league histories weren't problems was very annoying, and remains so. One of PECOTA's great assets has left the building and there's no indication that it's coming back.
4. There's an ongoing bug in the A's in the Playoff Odds for at least a week. The Rockies are similarly problematic due to a Mark Ellis bug. Someone should look at these intermittently.
5. The way Nate did the Playoff Odds, if I understood it correctly, was to assume team quality around a baseline; the Monte Carlo sim used a distribution of quality assumptions vs. the remaining schedule. (That is, if we assume the A's are a .642 team the rest of the way, the sim might play them out as .655 or .617, rather than just as .642.) The playoff odds report appears to neither assume a distribution of potential goodness nor does it appear to take schedule into account. I could be wrong about this.
Short version: Grownups acknowledge error. I get that you want to sell the product, but there are some of us in the unwashed masses who aren't buying diet pills from Steve Garvey, penis extensions from Jimmy Johnson, and Fangraphs is Wrong from BP. Admit specific screwups.
I want to love you guys. And there are lots of good articles. And it makes me happy that Team Tracker's improved in some details (if not in daily reliability.) But you're making it hard.
--JRM
* Rob has located the Ellis bug and is working on a fix - the problem was a mistaken entry in the depth charts that was causing Ellis to appear on multiple teams at once. Longer term, there was a set of safeguards in the depth charts code to catch some of those issues, and once the issue is fixed I need to go back into the code and figure out why the safeguards aren't working any more.
* I don't think RoS PECOTA has moved towards RoS ZiPS in the way you describe - I look at the forecast for someone like Bautista and RoS PECOTA is still more conservative.
I don't take making errors lightly. For all of the errors listed above, I am wholly sorry. End of statement - no qualifiers or excuses.
But I stand by my methodological critiques of rest of season ZiPS, particularly in terms of how prior season playing time is not incorporated in the weighting of current season stats. Tango, who you reference, agreed with that critique. After the season, I will take a look at how each measure performed, and that may go further toward resolving the disagreement. Also, above people have requested a more detailed explanation of the methodology behind the updated forecasts, and while doing so will take more than a comment, I will work on a blog post addressing those concerns as well.
And as a philosophical point - I think the field of sabermetrics is better served if people have their disagreements out in the open. I'm not trying to slander Fangraphs or anyone else. When I offer a critique of anyone else's work, it's because it's something I truly believe in; sometimes I'm wrong, but it's because I'm a human and humans err.
* The playoff odds is a Monte Carlo sim and it does take strength of schedule into account.
I agree that open disputes are often healthy (like this one!). And I concur with the methodological point you make; however, I expect that the more 2011-centric ZiPS projections will be more often closer from the original date/RoS PECOTA projections. Maybe I'll turn out to be wrong.
I am certainly pleased that you'll revisit this - I'd love to see an article at the end of the year to see how RoS PECOTA fared against RoS ZiPS from the inception of RoS PECOTA.
I'd very much like to see an article early in the offseason (or earlier) about the other flagged PECOTA issues. I view them as serious.
Thanks again for your response.
--JRM
1) What mix of projection vs. current season does the in-season Postseason Odds report use, and how was this weighting algorithm tested? Why is this system not transparent, and why not show both the projection team strength and current-season-only team strength, so people can see the actual team vs. expectations.
2) This page, http://www.baseballprospectus.com/odds/ is, to use one of your words, wrong. Look at the expected win pcts. Oakland = .641; Colorado = .627.
3) Why were pitcher win projections massively inflated in pre-season PECOTAs? E.g., all of the Mariners pitchers were projected with a W-L record >= .500, yet the team was projected with 71 wins. Many other teams looked as bad -- e.g., Twins and Mets. Overall, there was a huge problem with PECOTA pitcher W-L projections, an issue which was first raised in early March. To my knowledge, it has never been addressed.
Wouldn't you want to backtest this over, say, 10 years of data instead of just 2011?
If this is going to be a general gripe with the stats, I'd like to add something. I've been disappointed by the disappearance of Clay's minor league translations.
I was shocked to recently learn that he's posting them at his own website and is apparently no longer involved at BP? Who would know as we've never been told.
To the extent that you want to get into that issue is up to you, but I must say I very much miss having something like those reports available. Are there any plans to provide something similar in the future? If not, why not?
Thanks.
I think you have a bug with the "rest of season" (RoS) forecasts. I remember a while ago, Felix's RoS was 2.30 ERA, which was quite bold, considering that:
a) his mean forecast entering 2011 was around 2.60
b) his season performance to then was worse than 2.60
So, given more information, his RoS should have been somewhat worse than 2.60.
Now, his RoS is even lower at 2.19:
http://www.baseballprospectus.com/card/card.php?id=HERNANDEZ19860408A
That is even bolder since his current performance is basically a match to his career totals, and so, you'd need to have his mean forecast be higher than 2.60.
His RoS is a 3.2 WARP, on RoS 98IP. His WARP in his Cy season was 4.4 (250 IP) and 5.6 in 2009 (239 IP).
***
Also, the percentile forecasts are showing this timestamp:
Last Update: 3/26/2010 14:48 ET
Note the year (2010, not 2011).
In Felix's case, there's also the change in park factor at his home park. Seattle has gone from being a modest pitcher's park from 2006 through 2009 to being essentially the AL's new Petco in 2010. I find myself curious as to what's driving the change in Seattle - whether some of it is changes to the park itself, versus changes to other parks around the league - but I'm fairly convinced that this is a substantial change, not just a data artifact.
As for the timestamps, that's just a typo. We'll get that fixed.
Can you show the top 10 in RoS WARP (and their IP), both entering 2011, and right now?
Currently, everything on the site is using 2010 park factors, but the 2010 park factors incorporate 2011 data so the current season results are having a (small) effect on those park factors.
If you'll indulge me in a little speculation - it's widely surmised in the press that the Mariners are building a team around run prevention, which is to say fielding and pitching. I'm at least slightly curious if they've made any sort of changes to the park that would help such a team more than the average team, and thus intentionally brought about what we're seeing in their park factors.
My explanation as to how to do RoS is in post 9. And, I agree in post 12 that you can't keep the weights constant.
Dan chimes in in post 14, and if you look at that post and the link in post 10, it sounds like Dan implemented his weighting scheme (irrespective of actual PA) as a quick effort to get something rolling for this year. I'd expect Dan to improve upon it for next season.
***
In any case, of all the things where we have disagreement in the saber community, weighting of performance by timeline is not one of them.
The weighting of daily performance ALL will follow something along the lines of:
weight=.9994^daysAgo for hitters
weight=.9990^daysAgo for pitchers
That is, the further back in time, the less weight. You can quibble about whether to use .9992 or something for hitters, etc, or, that you want it to accelerate faster, like:
weight = .9998^(daysAgo^1.2)
but, basically, we're all dancing around that scheme.
***
I agree with Colin that all discussions should take place out in the open. It makes life easier, and 2000 heads are better than 2.
Both should have been using the same data (projections and schedule), yet there were differences in the numbers.