We have all seen the merits of Major League Baseball teams using sabermetrics in its infancy. The famous bestseller, Moneyball by Michael Lewis, brought readers in to watch how the Athletics used statistical analysis to help repeatedly win the American League West despite a miniscule payroll at a time when few teams were using sabermetrics. However, in recent years many organizations have seen the competitive advantage of using sabermetrics grow smaller as more teams become aware and begin to target the players that the A’s were able to acquire so easily.
Although sabermetrics clearly has helped teams like the Red Sox and Rays become dominant in recent years, there have also been plenty of sabermetric teams being disappointments. Perhaps most famously, the Mariners were highly touted for their use of sabermetrics to put together an excellent defensive team going into this season, but instead have fallen deep into last place in the AL West. Dave Cameron at FanGraphs.com has endured much criticism for ranking the Mariners organization sixth in their probability of winning a championship in the coming years going into 2010. While Cameron has been criticized, I was reminded by something Joe Sheehan wrote in his final Prospectus Today column in December:
As far as the Diamondbacks and Indians go, I’m open to the idea that I’m systematically overrating "good" organizations, as I seem to miss on those teams to the high side with some frequency. I’ve certainly been accused of bias regularly, and I think there’s a case to be made that I have to be more careful about falling in love with a GM, a front office or a particular team’s offseason, and take a skeptical eye with teams that, in my mind, have a certain progressive seal of approval.
That makes a good deal of sense at the qualitative-analysis level, but what if overrating “progressive” teams occurred at the quantitative level? What if even PECOTA was overrating sabermetric teams?
I set out to answer this by surveying current and former Baseball Prospectus staff members and interns in an attempt to dig up exactly which teams are sabermetric leaning. I had my own guesses, but I left those out because I did not want to bias the results of tests. I wrote the following survey:
“I want to know about how much you perceive different major-league teams’ usage of sabermetrics. I’m going to list all 30 teams and I want you to label them 1-4 for their Sabermetric use over last five years (2006-2010) to your best knowledge where:
- 1 = Does not use sabermetrics in decision making
- 2 = Uses sabermetrics occasionally, but not as a regular part of their decision making process
- 3 = Often uses sabermetrics to run their team
- 4 = Employing sabermetrics is a regular part of decision-making for the team
If a team switched from being very non-saber to very saber-utilizing at some point between 2006 and 2010, just average out the results and split the difference. Please just mark a number 1, 2, 3, or 4 next to each team in this order.”
I got 13 answers, and although everyone seemed to have a slightly different definition of what it meant to be sabermetric, I was able to use the responses to get better results. To avoid differences in standard deviations from person to person, I set everyone’s standard deviation equal and made the average ranking 2.5 for everyone.
Then I averaged out everyone’s answers to get the following ranking of teams by their perceived sabermetric usage:
Rank |
Team |
Saber Usage |
1 |
Red Sox |
3.93 |
2 |
Rays |
3.79 |
3 |
Athletics |
3.70 |
4 |
Indians |
3.57 |
5 |
Mariners |
3.29 |
6 |
Rangers |
3.20 |
7 |
Padres |
3.14 |
8 |
Yankees |
2.98 |
9 |
Diamondbacks |
2.91 |
10 |
Pirates |
2.81 |
11 |
Cardinals |
2.52 |
12 |
Blue Jays |
2.47 |
13 |
Brewers |
2.37 |
14 |
Angels |
2.32 |
15 |
Nationals |
2.31 |
16 |
White Sox |
2.25 |
17 |
Rockies |
2.20 |
18 |
Dodgers |
2.15 |
19 |
Tigers |
2.13 |
20 |
Cubs |
2.06 |
21 |
Orioles |
2.06 |
22 |
Mets |
1.98 |
23 |
Twins |
1.98 |
24 |
Braves |
1.95 |
25 |
Phillies |
1.92 |
26 |
Marlins |
1.91 |
27 |
Reds |
1.88 |
28 |
Giants |
1.77 |
29 |
Astros |
1.72 |
30 |
Royals |
1.70 |
Then I took the average winning percentage from 2006-2010, and ran a quick correlation between sabermetric usage rating and winning percentage. The correlation was a notable, if not exceptional .10. Of course, removing the Pirates, who picked up sabermetrics when Neal Huntington took over as general manager in 2007 in an effort to pull the organization out of deep in the cellar, bumps that correlation up to .15. This is all despite a negative correlation between payroll and sabermetric usage (.18, between payroll rank from 1-30, and adjusted sabermetric usage). In fact, the amount that teams exceed their expected wins as predicted by payroll rank alone has a .27 correlation with sabermetric-usage ranking.
However, I also took the difference between the PECOTA projected winning percentage for each team and their actual wins from 2006-10. This was a solid measure of how much sabermetrics overrates teams. This measure correlated with sabermetric usage above at a very high level: .27. In other words, a large fraction of the error in PECOTA in the past five years can be explained by PECOTA systematically overrating teams that use sabermetrics.
Consider the following table of the adjusted sabermetric usage rating combined with the average payroll from 2006-10, the average PECOTA projected wins and the average wins from 2006-10 (assuming that teams’ final 2010 winning percentage is equal to their current winning percentage).
Team |
Saber Usage |
Average 2006-2010 Payroll |
PECOTA projected wins |
Actual Wins* |
Over-projection |
Indians |
3.57 |
68.2 |
87.2 |
77.2 |
10.0 |
Pirates |
2.81 |
45.8 |
72.4 |
63.5 |
8.9 |
Diamondbacks |
2.91 |
69.5 |
84.4 |
76.5 |
7.9 |
Orioles |
2.06 |
82.3 |
73.6 |
66.1 |
7.5 |
Nationals |
2.31 |
60.3 |
71.8 |
66.2 |
5.6 |
Athletics |
3.70 |
63.6 |
84.0 |
80.3 |
3.7 |
Cubs |
2.06 |
127.3 |
87.4 |
84.0 |
3.4 |
Mets |
1.98 |
130.3 |
88.0 |
85.0 |
3.0 |
Brewers |
2.37 |
78.6 |
84.0 |
81.1 |
2.9 |
Rays |
3.79 |
52.5 |
84.0 |
81.5 |
2.5 |
Braves |
1.95 |
92.4 |
85.0 |
82.9 |
2.1 |
Royals |
1.70 |
67.3 |
69.8 |
67.9 |
1.9 |
Mariners |
3.29 |
103.8 |
76.8 |
75.1 |
1.7 |
Red Sox |
3.93 |
150.3 |
93.0 |
92.7 |
0.3 |
Dodgers |
2.15 |
117.2 |
86.0 |
86.4 |
-0.4 |
Tigers |
2.13 |
117.1 |
83.6 |
84.7 |
-1.1 |
Astros |
1.72 |
101.4 |
76.6 |
77.9 |
-1.3 |
Yankees |
2.98 |
214.9 |
94.6 |
96.5 |
-1.9 |
Giants |
1.77 |
95.2 |
77.2 |
79.2 |
-2.0 |
Reds |
1.88 |
72.4 |
77.0 |
79.5 |
-2.5 |
Phillies |
1.92 |
117.6 |
87.4 |
90.0 |
-2.6 |
Cardinals |
2.52 |
100.5 |
82.6 |
85.3 |
-2.7 |
Rockies |
2.20 |
72.6 |
78.4 |
83.3 |
-4.9 |
Padres |
3.14 |
58.7 |
77.2 |
82.1 |
-4.9 |
Rangers |
3.20 |
74.6 |
77.2 |
82.4 |
-5.2 |
Marlins |
1.91 |
35.2 |
74.4 |
80.3 |
-5.9 |
White Sox |
2.25 |
104.3 |
77.0 |
83.2 |
-6.2 |
Blue Jays |
2.47 |
82.5 |
76.4 |
83.1 |
-6.7 |
Twins |
1.98 |
74.5 |
81.0 |
88.4 |
-7.4 |
Angels |
2.32 |
114.2 |
82.6 |
91.6 |
-9.0 |
*Actual Wins assume final 2010 winning percentage is equal to 2010 winning percentage as of August 29
The correlation can be seen pretty clearly from the table. PECOTA is routinely overstating the Indians and A’s, perhaps most known for sabermetrics, as well as teams like the Diamondbacks and Pirates who certainly have used sabermetrics in their decision making. On the other hand, PECOTA keeps selling the Angels and Twins short, franchises routinely criticized for not utilizing sabermetrics.
Certainly PECOTA is not expressing a preference for teams like the Indians. PECOTA has no idea which team Moneyball or Mind Game is about. PECOTA simply knows the data it receives. However, teams that use sabermetrics heavily are likely to pick the same players favored by PECOTA, given the data they receive, and therefore those teams are likely to overrate the same players that PECOTA does.
However, more concerning for those sabermetric teams is that they miss on some of the players that PECOTA misses on. In Moneyball, Billy Beane repeatedly threatened to fire all of his scouts, but never actually did. Now, one of the more sabermetrically knowledgeable general managers is the Blue Jays’ Alex Anthopoulos, who has recently expanded his pro scouting staff and doubled his amateur scouting staff, significantly increasing the money put into scouting in an attempt to gain ground in that area, much like the early 2000s A’s were able to gain ground by using sabermetrics to acquire neglected players with high OBPs. The Jays have decided to marry sabermetrics with scouting, using both bases as a foundation to build a winning team in the hardest division in baseball to do so, the AL East.
This is exactly what teams need to do. It’s become quite clear that being sabermetrically savvy does not guarantee a competitive team. Certainly the Rays would not be vying for the top record in baseball on one of the lowest payrolls if they were not also employing a staff of brilliant sabermetricians, but there are plenty of saber-utilizing teams who are not anywhere near playoff contention. Teams need to use both sabermetrics and scouting to cover their bases on multiple fronts. Otherwise, a weak scouting staff will not be able to identify players that the sabermetricians are overrating.
PECOTA only projected the Jays to win 72 games this year, a number they are close to eclipsing with a month remaining in the season. No one saw Jose Bautista hitting over 40 home runs; PECOTA only projected 18. Whether that was good luck, a strong scouting staff, a good sabermetric staff, or all of the above, the fact is that finding the players that the rest of the league are missing is essential and marrying scouting and sabermetrics is the best formula to get the job done.
That is not to say that sabermetrics is not necessary. Sabermetric teams are doing better overall compared to teams that do not rely as much on sabermetrics, but it is clear that the market inefficiencies that sabermetrics revealed a decade ago are no longer as large and teams now need to approach building a roster from multiple angles. Anthopolous’ Blue Jays may prove to be the next generation—teams that use strong scouting and strong sabermetrics to build a winner.Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
It's always been clear that one thing - but perhaps especially - being sabermetrically savvy does not guarantee a competitive team.
There are no guarantees in baseball or life. To include that statement in an article attempting to shine a light on the way that statheads "used to" overrated the importance of teams being sabermetrically savvy leaves a bit of a bad taste imo.
Keep it up, Mr. Swartz.
Bravo to Matt and BP, and the entire sabermetric sector, for always questioning its conclusions and often acknowledging its errors.
And let's all keep in mind, particularly when someone posts a respectful dissent, that BP orthodoxy is just as likely as any other to be misguided. A little humility goes a long way.
Industrial engineers use advanced statistical tools in the endless pursuit of 100% yield. And even when all the variables are accounted for, we have the occasional hiccup.
Then there's the notion that making the "ideal" statistical move is an absolute. Certainly there are competing statistical methodologies that produce varying outcomes. Even the most thorough organizations like the Red Sox clearly make moves at odds with the data (see Lugo, Julio).
The data we get from PECOTA is a forecast of player performance on a macro scale (i.e., over the course of a season). Then we further scale that to predict a given performance, matched with other given performances, will produce a larger outcome (i.e., team wins and losses). If memory serves, the Red Sox under Francona have as often as not underperformed their Pythagoream record. That sometimes gets lost when we consider their overall record in that span. (When your guys win a World Series, you don't worry so much that the data say they should have won 102 games, not just 96.) Maybe there's a wrinkle in there on the scaled level that still needs to be worked out, and/or perhaps there's work to be done for in-game statistical use.
That move (and others) may have been at odds with the data that was publicly available, but that doesn't mean it was at odds with all the data.
Isn't this whole article about exactly this type of hubris?
Of course it's harder to pin down a win estimate for each team, but perhaps one could look at predicted place in the standings instead of wins.
One thing I'd be interested in though: How do the results change if Actual Wins are replaced by Third Order Wins (or Pythagorean Wins or something similar) in the above table? Just curious to see if Pecota is overrating a team's overall ability, or if the problem lies in Pecota not being able to distinguish teams that, for one reason or another, are more likely to under-perform their expected win %
If you regress Actual Wins on Saber Usage and Average Payroll (plus a constant), the coefficient on Saber Usage is insignificantly different from zero. I.e., there's no evidence in this data (about which more later) that sabermetric capability has an independent impact on winning, apart from payroll. Though it is worth emphasizing that "absence of evidence is NOT evidence of absence" - this is a small data set and significance is difficult to achieve in that context.
If you regress PECOTA Projected Wins on Saber Usage and Average Payroll, Saber Usage has a positive and statistically significant coefficient. This is, basically, your conclusion that saber-savvy teams pick the same players that PECOTA likes.
One limitation here is the survey of Saber Usage: apart from the limited number of respondents, team's front offices are black boxes. There are many teams that have hired sabermetricians, but we have no idea which ones actually listen to them - or, if they do, what weight they attach to the sabermetricians' views.
On the whole, though, this is a very thought-provoking piece.
Wouldn't that be overcome in this case because you are measuring the entire universe (i.e., 30 teams = the universe)?
Perhaps it's because that's the highest variance or perhaps it's in the playing time dynamic or possibly just how PECOTA handles relievers.
Of course PECOTA is just a fomula and does not know or care how decisions are made- it can only process the data it is given.
However, nothing is really ever 100% objective. By quantifying a process, you eliminate a lot of subjectivity, but subjectivity is still introduced in the system by the initial decisons of the people who created the PECTOA program. Great job by you in at least identitfying and trying to measure this effect.
I think if you ask a bunch of scouts which team most utilizes scouting information you won't get a unanimous response of Kansas City, the way BPers proudly claimed Boston.
I think they tried to give it a fair shake when I see where Pittsburg is, for example, but I think comments like "Sabermetric teams are doing better overall compared to teams that do not rely as much on sabermetrics" have to be taken with a grain of skepticism given the bias of the response group.
1) Potential bias of authors thinking teams PECOTA likes are more sabery.
2) Authors ignoring the timeframe of the question. Do people really think the Pirates and Mariners' major league teams over the 2005-2010 are the product of sabery front offices? I doubt it.
3) The numbers aren't independent. If the Angels are 9 wins/year better than PECOTA thinks, and they play the Mariners and A's in 1/9 of their games, those teams are going to lose a win a year even if their own talent was correctly assessed.
4) PECOTA projected standings are largely dependent on subjective guesses at playing time and injuries. I'd be interested in seeing whether individual players on supposedly sabery teams underperformed their PECOTAs.
5) There's a negative correlation between payroll and saberness, and high payroll teams are more likely to add talent during the season (while low payroll teams are more likely to shed talent).
1) Potential bias ONLY affects the rating of how much saber teams help you, and does NOT affect the primary conclusion of the article at all-- in fact, it would have the opposite bias. It seems like my colleagues were very careful to rate teams highly in cases where they were not the cream of the crop. Although there could be some effect, I'd like to hear some team names first to even address that weaker side-conclusion that the saber teams were better.
2) Huntington was hired by the Pirates before the 2007-08 offseason, meaning 3 of the 5 seasons in question were with a sabermetrically leaning GM. My colleagues ranked them 10th. The Mariners were 5th even though there were only 2 years of Zduriencik at the helm, but they have been so extremely highlighted in the media for being saber leaning that it's not a totally ridiculous ranking. Overall, the Mariners did not significantly differ from their PECOTA projections at all (1.7 games overall) which is pretty much coming entirely from 2010, when PECOTA did, in fact, overrate them.
3) Opponent difficulty is adjusted for in the PECOTA standings and obviously real life standings are affected by it as well. Despite your focus on the AL West, there is actually a POSITIVE correlation with being sabermetric based on your sabermetric leanings. Look at the saber-tastic AL East and contrast it with the saber-lacking NL East. The top 6 teams in sabermetric ranking were all in the AL.
4) I'd be interested in this too, and that's a good idea for future research if I want to really get my hands dirty. A starting point could be some of my articles on "The Cost of OPP" and free agent PECOTA bias.
5) This is a pretty minimal effect, I'm guessing. The Indians are the premier example of this, overprojected by 10 games a year, so about 50 games overall. Cliff Lee and CC Sabathia were only worth 6.5 games to the Phillies and Brewers combined. The Yankees add a lot of talent and are actually listed as more sabermetric. Also, there is a slightly positive correlation between being competitive and being saber-leaning, so there is going to be a counter-effect whereby saber teams add talent midseason anyway.
I do appreciate the criticism here, but there are major holes in a lot of these arguments. If you think the authors are wrong about the saber usage of a team, tell me which team it is. A couple teams won't change the conclusion but it is certainly worth hearing.
The Royals have three sabermetricians on staff, which is a lot more than the majority of clubs, certainly enough that they shouldn't be at the bottom of the list. I suppose it's an open question how much the sabermetricians have meaningful input into the organization's decision-making process, but I don't see how they can be ranked below teams like the Orioles, Phillies, Astros, etc.
I should have been clearer that I didn't think these were all significant problems with your study. Rather, they were just the potential problems that popped into my head, some of which would be minor issues even if true.
As for Seattle and Pittsburgh, teams don't instantly turn over once a new GM comes in. Huntington came in before the 2008 season, but most of the 2008 team were simply holdovers from the 2007 team. It takes a long time for a GM to overhaul an entire organization, especially one that's rebuilding and not dealing with a lot of free agents. However saber-friendly Huntington is, his 2008-2010 big league teams were largely built by the previous regime.
So, the question from the excellent Matt Schwartz article is are there characteristics that that the outliers on both ends of the distribution share that PECOTA either over or undervalues? On the underestimating side (ie, the teams with a negative difference between PECOTA projected wins and actual wins) there are eight teams PECOTA underestimated wins by 5 or more (using rounding). What characteristics might the Angels, Twins, Blue Jays, White Sox, Rangers, Padres and Rockies share? It might they are less sabermetrically oriented, but the fact the Rangers, Padres and Blue Jays are all in the cohort make me think it's not that. What might it be? A commenter above suggeseted good bullpens and that might be worth looking at. The Twins and Angels anyway, perhaps share a philosophy that is more oriented away from OBP relative to BA. Could OBP could be overvalued by PECOTA in predicting wins? Another characteristics these teams perhaps share is a pitching staff that issues few walks? Maybe PECOTA doesn't give enough credit to teams that limit walks? Obversely, perhaps PECOTA overvalues strikeouts?
Something that has always struck me about where PECOTA and other predictive equations is the difference between projected wins and actuals win, when it is a consistent phenomenon (ie, not just a single year) is the difference between predicted and actual wins might be a proxy for manager skill. Is it possible the eight teams that best overperform on PECOTA are being managed by good managers relative to other teams? Obversely, maybe the Indians and Pirates in particular have suffered from poor managers? It's a crude tool, but no one has come up with good metrics for measuring the impact of a manager (probably for good reason) but everyone who follows the game recognizes the manager matters, at least to some degree and we know manager skill is variable. Perhaps PECOTA error is a measurement of manager effect??
- Where does the ability of an organization to develop come in to the calculus? Presumably, PECOTAs projections for minor leaguers is less accurate than with known commodities, thereby biasing its projections against teams with lots of young talent.
- What about defense. If strong team defense tends to be a characteristic of traditionalist teams and is absent or undervalued by PECOTA, this is an obvious place for error.
- What level of error do we see here and to what degree can that be explained by things which are essentially random (e.g. significant injuries) which just randomly happen to be correlated with sabermetric leaning over this time period. Are we perhaps making a 'post hoc, ergo propter hoc' mistake?
- Or if we believe that keeping players healthy is a skill which an organization can possess, its absence from the model could be reflected here.
- Lastly, what about the value gained (or lost) through synergies (on incongruities) in roster construction. Take a great defensive SS and a ground ball pitcher. PECOTA will not account from the ways in which they will benefit each other in its projections. Are non-saber teams building more efficient synergistic rosters?
While PECOTA may be conservative on prospects, teams that were sabermetrically inclined would have to be less likely to develop have ceiling prospects in the first place. Otherwise, it would not affect the correlation.
Similarly, good medical staff would need to not only be negatively correlated with good sabermetric staff, but the difference would have to not be reflected in historical injury trends, which PECOTA does adjust for.
Synergy would also have to be better understood and better utilized by less sabermetrically inclined teams, which I'm not so sure is true. It's possible though.
Defense is included in PECOTA projections in that pitchers on teams that are more likely to have good defense will have lower ERAs (and RAs). The issue would have to be that teams that are sabermetrically inclined would have to be less likely to pick up defensive players that only look good by the numbers-- entirely possible, and a solid plausible example of where the effect above might be seen.
As far as the chance of this being random error, it's possible. The correlation is .27, which isn't massive. But considering how many other random factors there are, I would never have expected to get a correlation even that high.
The woeful Indians are the poster boy for ridiculously overrating "good" organizations as Sheehan put it.
That said, I don't think Shapiro's problem is sabremetrics. It's just his inability to build a complete baseball team. For a team with a moderate-low payroll to spend $10Mill on Kerry Wood is just nutty.
This is somewhat tangential, but I would love to see a thorough, objective analysis of Billy Beane's performance over the past five years. Obviously, the end results have not been that good; but would like to see evidence for and against the thesis that he is no longer a top GM.
Do bad teams generally under-perform while good teams over-perform?
Any predictions are typically conservative whether for players or teams. That said, team predictions are compiled projections of the players that make up those teams at the beginning of the season.
Trades and other roster moves are considered to not make a large difference, but they do make a difference. Teams under-performing dump talent and teams over-performing acquire talent?
Just curious.