March 18, 2013
Baseball Prospectus News
Replacement Level and 10-Year Projections
From time to time—if not at all times—organizations must examine their own operations and ask some difficult questions.
The answers often reveal a range of things done right and things done wrong. Healthy organizations can handle those answers in more than one way—there are many routes to success, but even more to failure—but one hallmark of organizational integrity, to borrow from James Collins, is looking in the mirror when assigning blame and out the window when giving praise.
Here at BP we’ve been faced with an opportunity to ask ourselves some questions, and we’ve decided to grapple with the answers, even though in some cases we don't like them. In short, we have work to do in order to live up to our own high expectations. Despite our pride in much of the progress Baseball Prospectus has made, now is not the time to rest on our laurels. And some recent events make that abundantly clear.
The goal of making WARP play well with the component statistics left behind at BP by previous staffers was worthwhile, but the implementation caused problems: We inadvertently raised replacement level for 2011 and 2012. Taking a summation of the WARP or VORP values for those two seasons resulted in league totals which weren't in line with pre-2011 data. They were much lower. By implication, this meant that replacement level was much higher, or that a “replacement level team” would win more games than the data had indicated for previous seasons.
At any point starting in about May of 2011, it should have been clear to anyone looking closely at the stats that something was different, and not just because Colin had re-engineered (read: greatly improved) some of the WARP formulae or because offense was down in 2011.
For the record, we know that these re-engineered formulae work. The chart below shows league-wide WARP totals by year since 2000, along with the winning percentage of a notional “replacement level team” (really, it's just a subtraction of WARP from wins, so there's some noise there for a variety of good reasons, but it's close enough to give a good idea).
Voila! Exactly the results we'd hoped to get.
One of the steps we take to improve the speed of queries—and thus to expand the scope of subjects we are able to research—is to put the seasonal replacement level for each event into our events database. In that process, we allowed some bad data to be introduced in 2011. We didn't catch it. It really was that simple, the data equivalent of a typo. We’ve corrected the data, and Baseball Prospectus WARP values for 2011 and 2012 are now representative of the theory we meant for them to represent.
Two additional things need to be pointed out about the scope of this problem: first, VORP was also affected, though FRAA and BRR were not—this was entirely an “at the plate” and “on the mound” problem. Also, slight adjustments to some previous-season WARP values were made, as some of our calculations rely on a multi-year smoothing of baseline data, even including forward-looking data when available.
While we're on the subject of evaluating our data, we've decided, after extensive testing, that the 10-year projections just weren't producing the results we desired. It’s difficult to evaluate long-term projections, and we intend to make that a more standardized, easily repeatable process in the future, but we hold our work up to a certain standard, and in this case, we didn't feel that that standard was being met. Instead of putting out an inferior product, we’ve essentially ordered the design team back to the drawing board to get 10-year projections and UPSIDE correctly formulated and out to the public in a timely manner going forward. We will be releasing the PECOTA percentiles soon, and that will conclude our pre-season projections releases.
It's not enough to fix these problems. We will be addressing these issues at their root—with a hard look at and overhaul of our internal processes and quality control.
But we also want to regain your trust. So we’re going to open the kimono and make our work transparent. Not only will this create a wealth of knowledge for everyone involved—readers and writers alike—but it will give BP the opportunity to leverage the wisdom of crowds.
We've recently named Harry Pavlidis our Director of Data Analysis. His first responsibility is to lead this effort. It will be a team undertaking, with all hands on deck. We will be sharing our progress and plans as they develop. But right now we're looking in the mirror. Looking hard.
Harry's first task is to conduct a full audit of our systems and stats. In essence, we're making him do his "day job"—assessing our systems and developing a plan to move forward. Harry will be bringing a process-driven approach to the effort, with the ultimate goal of improving our stat offerings. The experience he has in this area ranges from tiny start-ups to large, publicly traded companies. We'll all be working together to find the best-fitting tools and processes to bring BP up to the level of operational excellence we all expect.
Finally, I want to personally apologize for any inconvenience we may have caused our readers. The people we employ at BP are perfectionists. They spend more hours than anyone knows to get things done right and in a timely manner. They love this game and this company with a passion and will gladly fall on their sword if it means building a bigger and better Baseball Prospectus in the future. But if something goes awry at BP, it’s my fault and mine alone. I’m ultimately in charge, and I take full responsibility for any and all of our shortcomings. I’ve made mistakes and deserve any criticisms I receive. I may hold Baseball Prospectus to a high standard, but I hold myself to an even higher one. I’m sincerely sorry, and I promise you that I will continue to devote my blood, sweat, and tears to make BP the best it can possibly be.