August 30, 2005
Prospectus Hit List
The System, ExplainedLast week's edition of the Prospectus Hit List generated a good deal of reader mail and discussion in cyberspace, particularly with regards to the shuffling at the top of the List. After a week at #3 on the heels of seven weeks at #2, the White Sox, thanks to a lousy 1-5 week, fell to #6. The Red Sox, who had claimed #2, stayed there, while the #4 A's climbed a rung, the #7 Indians vaulted three rungs, and the Angels rose to #5.
Because the A's had gone 1-5 on the week, being swept by the Orioles at home and then losing two out of three to the Royals--who entered the series on an 18-game losing streak--the results set off a minor fury that included several accusations of bias. Reader R.M., echoing the sentiment of a handful of others, asked how the A's performance "merits the A's getting a green 'up' arrow in today's Hit List? Is this a bit of saber-homerism because the A's own a +5 Bat Of Selective Amnesia?" Having spent several e-mails trying to explain all of this multiple times, I decided it was time to explain the Hit List to everyone.
For my money, the single most important tenet of sabermetrics is that there's a predictable relationship between a team's winning percentage and the number of runs it scores and allows. Bill James first codified this in his original Pythagorean formula:
win% = (RS^2)/(RS^2 + RA^2)RS and RA are runs scored and runs allowed, G is games. Studies by BP's Clay Davenport have shown that not only is the Pythagorean formula a good predictor of a team's winning percentage after the fact (how many games should team X have won), it's a better predictor of future winning percentage than the team's actual winning percentage.
The Hit List builds on this in creating our version of the power rankings. It's not subjective; the rankings are computed by equally weighting actual, first-, second- and third-order winning percentages for the season to date as calculated in BP's Adjusted Standings (a Davenport invention). Actual winning percentage is obvious enough, the percentage of games a team wins. The other three are calculated using the Pythagenpat method, a close relative of Bill James' original Pythagorean formula where
win% = (RS^X)/(RS^X+ RA^X)First-order winning percentage is computed using actual runs scored and allowed. Second-order winning percentage uses equivalent runs scored and allowed, based on run elements (hits, walks, total bases, etc.) and the scoring environment (park and league adjustments). Third-order winning percentage adjusts for the quality of the opponent's hitting and pitching via EqA allowed and opponents' EqA.
By using the four percentages, we're correcting for teams that over- or underperform relative to how many runs they've scored and allowed, how many runs they should have scored/allowed given the number and type of hits, walks and other events, their ballpark environment and their quality of competition. There's nothing written in stone about this formula, but neither--contrary to some of the accusations I received from card-carrying Angels fans--is there any hidden agenda. It's simply a way of looking at the question, "How good is each team?" and using a few related but slightly different objective measures to answer that question. In coming up with a means to rank the teams, I wanted to find a way that gave weight to various categories of performance without overcompensating for any of them. That includes actual winning percentage, but goes beyond to try to get a truer picture of what's going on than simple wins, losses and run totals will tell you.
It's important to remember that the Hit List rankings are based on cumulative year-to-date stats, not just on what happened last week. One lousy week wasn't enough to narrow the gap between the A's and the Angels. The Angels had a better raw run differential (Runs Scored minus Runs Allowed), but in fact it's the ratio between the two that's more important, and once the adjustments are taken into account, the A's came out ahead last week by a narrow margin. Let's look at the three Pythagenpat-based calculations and the run ratios that underlie each of them (these stats are through the close of play on Sunday, August 21, when the list was finalized:
RS RA r1 PWP1% A's 585 507 1.154 .565 Angels 589 493 1.195 .580 EQR EQRA r2 PWP2% A's 570 486 1.173 .572 Angels 575 520 1.106 .546 AEQR AEQRA r3 PWP3% A's 574 487 1.179 .574 Angels 575 520 1.125 .554Using actual runs, the Angels have the superior ratio and predicted winning percentage. But once the run-element and park adjustments are made, the A's are on top. The Angels regain a bit of ground in the quality of competition adjustments, but the A's still come out ahead there. Averaging these three projections and the actual winning percentages (.553 for the A's, .573 for the Angels), and the Green and Gold had less than a three-point edge on the Halos, .5660 to .5633.
Here's how the adjustments work. According to our glossary, Equivalent Average is
"...a measure of total offensive value per out, with corrections for league offensive level, home park, and team pitching. EqA considers batting as well as baserunning, but not the value of a position player's defense... The scale is deliberately set to approximate that of batting average. League average EqA is always equal to .260. EqA is derived from Raw EqA, which is (H + TB + 1.5*(BB + HBP + SB) + SH + SF) divided by (AB + BB + HBP + SH + SF + CS + SB). REQA is then normalized to account for league difficulty and scale to create EqA."EqA is used to calculate Equivalent Runs by the formula EqR = 5 * OUT * EqA^2.5.
The calculations of EqRA utilize the EqA-allowed stat, which is exactly what it sounds like, the cumulative performance of hitters against a given team's pitchers. As for the quality of competition adjustment, suppose your team has an EqA allowed of .265. That's five points worse than the league average (always defined as .260). Meanwhile, your opponents have a combined average EqA of .255, another five points worse than league average (though in the other direction, below .260 instead of above). Your opponent-quality-adjusted EqA is now .270, meaning you're 10 points worse than average. That number gets plugged into the EqR formula to adjust the runs to the third-order totals.
I'm equal parts bemused and puzzled by the fact that readers think there's some bias built into BP's brand of analysis to favor the A's over the Angels. While it's true that the two teams' offenses are built along different models, and that the A's follow one that's more in line with the usual preferences of performance analysts, if you read what we've written over the years, you'll see that we have no shortage of respect for the latter as a team, particularly in the way the Angels run their bullpen and their farm system. We don't sit around trying to find measures that say, "Hey, we need to find a way to make the A's come out on top, or at least better than the Angels." As Bill James defined it, sabermetrics is the search for objective truth about baseball, and we hold to that standard. If we kept putting our thumbs on the scales every time the A's--or the Red Sox, or another saber-friendly team--came up, our analysis wouldn't have much value.
Many people got hung up by the green trend arrow beside the A's in the List. Frankly, the arrows are the least informative element in the whole chart, and in the future, I may handle the trend category differently by making it based on the week-to-week change in the Hit List Factor (the unpublished average of the four categories which underlies the rankings) rather than a change in rank. In any event, it's water under the bridge, at least so far as these two teams are concerned. The A's went 5-1 on the week, outscoring opponents 47-14, while the Angels went 2-4 while being outscored 29-22. That not only kept the A's ahead of the Angels on this week's List, but it also carried them into first place in the AL West. Which is where it really matters, in the end.