So we won this weekend. At least I think we won. At least I think they told me we won.
It was announced that the Gold Glove Awards will add a metric component to the traditional voting of major-league managers and coaches, a presumed victory for everyone who prefers the analytical and objective over the judgment of the human eye.
So why no celebration in this virtual household, which stands for just that?
First of all, the release didn’t give much information about the metric itself. Here’s a portion of the release from the Society for American Baseball Research:
As part of the multi-year collaboration beginning with the 2013 season, SABR will develop an expanded statistical resource guide that will accompany the Rawlings Gold Glove Award ballots sent to major league-level managers and coaches each year. In addition, SABR will immediately establish a new Fielding Research Committee tasked to develop a proprietary new defensive analytic called the SABR Defensive Index™, or SDI™. The SDI will serve as an “apples-to-apples” metric to help determine the best defensive players in baseball exclusively for the Rawlings Gold Glove Award and Rawlings Platinum Glove Award selection processes
.…
Beginning in 2013, the managers/coaches vote will constitute a majority of the Rawlings Gold Glove Award winners’ selection tally, with the new SDI comprising the remainder of the overall total. The exact breakdown of the selection criteria will be announced once the SDI is created later this summer.
In other words, they’re working on it.
But this isn’t a critique of SABR’s motives, which are absolutely in the right place—taking a step toward greater likelihood of getting it “right” and spreading knowledge of defensive statistics. Nor is it really even about the uncertainty of what will come out of the SABR conclave.
It’s about the certainty of the ugly process that will ensue.
1. There will be a fight when the metric comes out.
The problem with giving this committee the task of inventing a metric by modifying/splicing existing metrics is that it’s virtually impossible. It’s not like the analytics community hasn’t been trying.
Using data provided by Baseball Info Solutions, Mitchel Lichtman pioneered the Ultimate Zone Rating, which uses a fielder’s capacity inside and outside of a given zone to make plays. We at Baseball Prospectus use Fielding Runs Above Average as the defensive component of our value statistics, focused more on total plays made given conditions such as pitcher’s ground-ball rate, batter’s handedness, ballpark and base-out scenario.
Both are exhaustively researched and justified, and yet the choice of using one or the other in Gold Glove voting would lead to totally different results. At the positions where UZR is calculated and available at Fangraphs.com, battery not included, here is the breakdown of the top players from 2012 who played the whole season in the same league.
Position |
UZR |
Gold Glove |
|
AL 1B |
Mark Teixeira |
||
NL 1B |
Adam LaRoche |
||
AL 2B |
Howie Kendrick |
||
NL 2B |
Darwin Barney |
||
AL 3B |
|||
NL 3B |
|||
AL SS |
J.J. Hardy |
||
NL SS |
|||
AL LF |
Alex Gordon |
Alex Gordon |
|
NL LF |
Alfonso Soriano |
||
AL CF |
|||
NL CF |
|||
AL RF |
Josh Reddick |
||
NL RF |
Jason Heyward |
Jason Heyward |
Of 14 positions, UZR and FRAA agree on the Gold Glover in exactly three of them, which is a huge issue for the mainstream public appeal of this vote. (Not even mentioning what would happen in the discourse if a part-time player like Dyson or Bourjos were ranked first at a position.)
It will be a very hard sell for the analysts out there to peddle this idea when it creates division within the sabermetric media and surely within SABR’s membership itself.
But that will only be the first step.
2. There will be a fight when the first vote comes out.
There already are gripes, and legitimate ones, when the award comes out. Remember Rafael Palmeiro as a DH in 1999? Derek Jeter winning all five of those awards? Adding the statistical component saves us from having designated hitters win, and that is certainly a good thing.
But instead of uniting them in some awkward arranged marriage, this has the potential to pit the traditionalists against the statistical analysts. When an award comes out where the vote doesn’t match the SDI, that will become a binary-outcome referendum on both parties.
The coaches got it wrong or the numbers got it wrong. One of them had to get it wrong, and dammit, we need to know who it was.
Instead of a celebration of the winner, it’s an examination of the process, which will become very tiresome very quickly. We don’t need any more “WAR, What Is It Good For” columns. Even if SDI has never been in a song lyric, let’s not take that chance over this. There are plenty of more worthy fights for the importance of analytical thinking.
3. We’ll argue over whether stats should be applied directly to more awards.
This is a tough one, because if statistics are going to be applied to any award voting, Gold Gloves might be the worst ones to start with. One-year defensive metrics are notoriously unstable—Alfonso Soriano’s 7.9 FRAA in 2012 came after a -6.4 in 2011 and a -8.0 in 2012. And the difference between metrics as mentioned above make the Gold Glove possibly the worst award to add a statistical component to now. (Okay, Manager of the Year is worse.)
When the statistical community puts its stamp on this award, it has to be prepared to stand behind it. The stats say Soriano, always thought to be a poor defender, was the most accomplished left fielder in the league last year. The stats say a part-time player was the most accomplished center fielder in the league last year.
Is a statistic that research says can take three years to stabilize really the one we want imprinted on a single-season award? There’s an argument to be made that adding a WARP/WAR component to the Baseball Writers Association of America’s MVP awards—or one of the lesser-known MVP-type honors, or maybe even the Hall of Fame—would be a better step.
But SABR doesn’t control any of the awards, and the BBWAA has not been looking to cede any control over its awards (disclosure: the author is a member of both organizations). So this isn’t a knock on SABR, which is doing what it can to advance the discussion, just an unfortunate case of who came calling and had some ground to make a deal.
We hate the process of most awards, yet we haven’t come up with a much better one either here or in the case of the BBWAA awards. If coaches have proven to be the worst voters of any electorate—and it’s really neck-and-neck between them and fans—then change the electorate. Have the people in front offices paid to assess value hand out awards for defensive value.
Perfecting the Gold Glove and other awards is a noble pursuit. This is a small step toward that goal that might be missed in the very predictable reactions to every part of the process.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
What a complete joke.
While there is some merit to what you all do with the defensive metrics. When a result like the above comes out--you need to take a serious look at what you are doing.
Nobody believes Darin Erstad was a .355 hitter, so while there's some merit in what people are trying to do with offensive metrics, when a result like that comes out someone needs to take a serious look at what they're doing.
Batting average is a very simple stat that measures exactly one thing. What it says about Erstad's merits as a player is up for debate, but it is a factual statement that for one season, 240 of his 676 of his at-bats ended with a hit.
Advanced fielding metrics incorporate a lot of things and reflect various weightings and interpretations. I can't compute UZR without a spreadsheet -- come to think of it, I can't compute it AT ALL because it includes proprietary data.
Soraiano was the leader in those two statistical measures,that is just a fact. The question is how worthwhile a fact that is, which depends on how much faith you have in those respective statistics as a measure of defensive value.
By the way, the system in these comments needs fixing, Just because bluesman98's comment received a certain number of negative votes it was grayed out. Given that it was only a -4, that seems extreme to me, but worse, since it started the thread, graying it out took this whole thread with it. That is asinine, as this is a fairly interesting discussion. I voted bluesman back up, in an attempt to restore the thread. Please fix this.
"One-year defensive metrics are notoriously unstable—Alfonso Soriano’s 7.9 FRAA in 2012 came after a -6.4 in 2011 and a -8.0 in 2012."
...
"The stats say Soriano, always thought to be a poor defender, was the most accomplished left fielder in the league last year. The stats say a part-time player was the most accomplished center fielder in the league last year.
Is a statistic that research says can take three years to stabilize really the one we want imprinted on a single-season award?"
In my opinion, the Gold Gloves should be a collected vote by advance scouts. This is what they do for a living. They would know best.
http://en.wikipedia.org/wiki/Strategic_Defense_Initiative
Something like this -- http://robslink.com/SAS/democd6/col8.png -- but inverted so the ranges are horizontal.
That said, the idea that this year-to-year instability in the metrics is a definitively bad result strikes me as arbitrary and sorta silly (not directed at you, rweiler - hope that's obvious). Why is it that we can't handle volatility among our star defensive players? We see it in Cy Young and MVP ballots, so why do we expect fielding to create different results? I am more than willing to believe that defensive ability is not simply a flat, innate talent. Given that variables like health, park effects, weather, opponents lineups, etc. all factor into every single play, it would be funny to expect consistent annual results. And this is even before the bias of the observer (e.g. the bias towards the flashy play instead of the smooth play) is recognized as playing a larger role in how we judge defense vs. hitting and pitching. I'd think we should be MORE skeptical of an award which features the same names year in and year out in this instance. But that's not how we're trained to see our star athletes, so it'll be a hard sell all the same.
Why can't we just accept that fielding events took place, and whatever else happened, these events may not accurately gauge a player's innate fielding ability. Just something that he did - like driving in a baserunner.
In the ideal world, we'd have a metric that correlated to defensive ability, but in the real world maybe defensive *performance* is simply volatile season to season.