It happens every week: a reader sees his favorite team trailing one of its division rivals in the Hit List rankings despite leading in the actual division race, and fires off a snarky e-mail or comment questioning the validity of the list, occasionally while making anatomical references, and usually citing last year’s division race or post-season results. Yes, Phillies fans, I can assure you that we’ve counted the rings. Well into my fifth season of writing the Hit List, I’m far more amused by such occurrences than I am offended, but the weekly give-and-take serves as a reminder for the occasional need to explain the list’s workings in greater detail. As such, I annually set aside a column called the Hit List Remix to walk readers through the process.
First, a quick refresher course on the Hit List’s basics. It’s BP’s version of the power rankings, created by me back in 2005, and based upon an objective formula which averages a team’s actual, first-, second-, and third-order winning percentages via the Adjusted Standings. To go into a bit more detail:
-
First-order winning percentage is computed (via Pythagenpat, Pythagoras’ slightly more sophisticated sibling) using actual runs scored and allowed.
-
Second-order winning percentage uses equivalent runs scored and allowed, based on run elements (hits, walks, total bases, stolen bases, etc.) and the scoring environment (park and league adjustments).
-
Third-order winning percentage adjusts for the quality of the opponent’s hitting and pitching via opposing hitter EqA (OppHEqA) and opposing pitcher EqA (OppPEqA), both of which Clay Davenport recently added to the Adjusted Standings report for those of you curious enough to take an interest in such things.
With the exception of an injection of pre-season PECOTA projections during the season’s first month, those numbers are all that go into the rankings, which are averaged into what I’ve called the Hit List Factor (HLF). There are no subjective choices to be made, no additional tweaking to favor the A’s or to hurt the Phillies or fit into any of the other 28 conspiracy theories our readers might think of offering. No recent hot or cold streaks or head-to-head records are accounted for, either, despite the frustration of readers wondering why their team hasn’t vaulted to the top thanks to a 5-2 week against their division rivals. It’s all about runs, actual and projected, because run scoring and run prevention give us the best indication of a team’s strength going forward. Using all four percentages is a way for correcting for teams that over- or underperform relative to the various areas examined.
With that in mind, let’s take a look at the American League Central race, which has drawn comment because, despite maintaining at least a share of first place since May 10, the Tigers have consistently trailed either the White Sox or the Twins on the Hit List, and sometimes both of them. In last Friday’s edition of the Hit List-which I’ll use for all of the examples in this article-the White Sox ranked 12th, the Tigers 16th, and the Twins 17th despite the Tigers holding a 2½-game lead on the Sox and a four-game lead on the Twins at the time. Here’s the breakdown of the various winning percentages that went into that week’s Hit List Factor:
Rk Team W0 W1 W2 W3 HLF 12 White Sox .512 .514 .524 .515 .516 17 Tigers .533 .513 .491 .486 .506 17 Twins .479 .501 .506 .509 .499
Of the three teams, the Tigers had the best winning percentage (W0), but the White Sox had the best run differential (+16 to the Tigers’ +15 and the Twins’ +1) and thus a very slight edge in first-order percentage (W1). Those two figures were almost perfectly in sync for the Sox, but the Tigers were 2.4 wins ahead of their expectation, the Twins 2.6 wins behind theirs. In terms of run elements, the gap grew even wider, with the Sox compiling enough hits, walks and other goodies to project as outscoring their opponents by 28 runs and the Twins doing so by seven runs, but the Tigers projecting to be outscored by 11 runs.
In terms of third-order adjustments, all three teams had faced below-average slates of opposing hitters and pitchers. Recall that .260 is defined as the league average:
Team OppHEqA OppPEqA White Sox .2576 .2596 Tigers .2584 .2594 Twins .2591 .2584
The Sox had faced the easiest hitters of the three, the Twins the easiest pitchers, and when all that was factored in, the Sox maintained a 29-point third-order lead on the Tigers and wound up with a Hit List Factor right in line with their winning percentage. The Tigers, on the other hand, were 47 points ahead of their third-order winning percentage, a difference of 5.7 wins. That overperformance is why they’re atop the AL Central, and it’s been partially credited here via the inclusion of W0. But it’s also not necessarily something to bank on going forward relative to the other metrics which suggest they’re so far ahead of expectation.
Turning to another race which you may have heard about:
Rk Team W0 W1 W2 W3 HLF 2 Yankees .628 .584 .607 .608 .607 3 Rays .542 .563 .587 .589 .570 4 Red Sox .575 .577 .552 .561 .566
Prior to this past weekend’s series in Fenway, the Yankees led the Red Sox by 6½ games and the Rays by 10½ despite the fact that the first-order spread encompassing the three teams was only about 2½ games. The Yankees had been unusually efficient in converting their runs scored and allowed into wins, and the Rays had not-a result that likely owed something to the fact that the Yanks led the league in WXRL at the time while the Rays were seventh. The Red Sox, whose bullpen ranked second, had been on target in converting their runs into wins but trailed both teams on the Hit List because their second- and third-order winning percentages were lower than either of the other teams, but in all their performance has been closest to their various projected winning percentages.
As of last week, those were the only divisions where the Hit List rankings deviated from the standings as far as the contenders were concerned, although that hasn’t always been the case; such anomalies are more common early in the season, but they tend to sort themselves out along the way, even if the pace at which they do can seem glacial.
Speaking of divisions but turning from the micro to the macro, here’s a look at how the six of them stack up:
--------2009------- --------2008------- HLF Division Avg RK WPct HLF Avg RK WPct HLF +/- AL East 9.8 .522 .534 7.6 .538 .549 -.015 NL West 13.4 .511 .516 20.0 .463 .474 .042 AL West 14.0 .533 .513 18.8 .487 .475 .038 NL East 15.2 .492 .500 15.4 .490 .495 .005 AL Central 18.8 .470 .480 16.0 .501 .505 -.025 NL Central 20.5 .482 .467 15.8 .515 .498 -.031
Last year saw a historically strong AL East, one which ranked fourth in winning percentage within the Wild Card Era, as well as third in Hit List Factor. This year’s AL East is strong enough to rank fifth in the latter category, thanks to that trio of top five teams, though it misses the top 10 in the former-a product, mainly, of the Blue Jays‘ falloff from being the best fourth place team ever; they’ve declined from being a .556 HLF team in 2009 to a .506 one this year, though they spent much of the first half living up to last year’s performance before the cracks in their foundation started to show.
Meanwhile, the AL West has the highest winning percentage this year, good enough for fifth in the Wild Card Era, this after posting the 12th-lowest winning percentage in the era last year, one point ahead of the NL West. Both Wests have made drastic improvements relative to the rest of the pack since 2008. While the Angels aren’t runaway favorites in the AL West, they’re still a very strong club, and the Rangers have improved enough to become Wild Card threats. In the NL West, the Dodgers have ranked atop the Hit List for most of the year, and while they’ve come back to the pack a bit in the division race, that’s in part due to the Rockies and Giants playing some strong baseball as they jockey for the wild-card lead. The Rockies ranked 22nd during the week they canned Clint Hurdle as manager, but they’ve methodically climbed the rankings to the point where they were sixth last week.
The NL Central, on the other hand, is bad enough to rank as the seventh-worst of the era in HLF; no less than four of the division’s teams (the Brewers, Astros, Pirates, and Reds) are strewn among the bottom 10 spots on the most recent list. That’s in a division that spent most of the first half with the top four teams separated by just three games and appeared to have an interesting race for the postseason on tap. At least those Central teams can take comfort in the fact that they’re about four wins ahead of their third-order projections apiece, the widest average discrepancy of any division and enough to push their raw winning percentage past that of the AL Central.
On a league level, the split between the AL and NL isn’t as wide as it was last year:
Year AL HLF NL HLF Diff 2009 .509 .493 .016 2008 .512 .490 .022 2007 .506 .495 .012 2006 .513 .488 .025 2005 .509 .492 .017
The shrinking gap owes something to the fact that the AL’s interleague advantage this year was only 137-114, down from 149-103 last year, though in both years the AL has beaten its first-order Pythagenpat projection by a couple of wins. Given the persistence of that split, it might make sense to include a league-based adjustment, particularly since that isn’t built in elsewhere. That’s something I’m toying with back in the Hit List lab, and while I haven’t decided whether to implement it on a weekly basis, it’s something I’m considering, and certainly something that readers have suggested. Here’s what last week’s rankings would look like if I applied a nine-point bonus to the AL teams and a seven-point penalty to the NL ones (the numbers aren’t exactly equal because the NL has 16 teams to the AL’s 14, so they are actually .0086 and .0075):
Rk Team AHLF 1 Yankees .615 2 Dodgers .603 3 Rays .579 4 Red Sox .575 5 Angels .562 6 Rangers .560 7 Phillies .556 8 Rockies .552 9 Cardinals .537 10 Braves .536 11 White Sox .525 12 Giants .523 13 Blue Jays .515 14 Tigers .514 15 Twins .507 16 Marlins .507 17 Cubs .505 18 Mariners .498 19 Indians .484 20 Athletics .466 21 Diamondbacks .465 22 Brewers .459 23 Mets .456 24 Astros .441 25 Orioles .428 26 Royals .413 27 Nationals .407 28 Pirates .407 29 Reds .403 30 Padres .402
Via what I’ll call provisionally call AHLF for Adjusted Hit List Factor (greaaaat, another acronym), the effect isn’t overwhelming. The top spot changes hands between the Dodgers and the Yankees, but five of the top 10 teams are still from the weaker senior circuit, though they move down an average of one rung apiece. Many of the rankings in the middle of the list are unchanged; at the bottom of the list, the Royals benefit by vaulting from 30th to 26th. In all, the magnitude of the adjustment may be a bit conservative, but conceptually, such an adjustment is probably an appropriate step to take. Consider it a topic for further exploration, and an appropriate spot to end this tour of the Hit List sausage factory. I’ll be back on Friday with the full serving of links.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
Why ignored 0th order wins? BP likes to call teams that outperform their 1st, 2nd, or 3rd order wins 'lucky', but I haven't seen a recent article showing that the correlation from year to year on Wins Above Expectation is near 0. Unless that's the case, I think wins are a necessary part of a power ranking measurement.
Additionally, Power Rankings ought to be be informing us of how teams are playing right now, so additional weight ought to be given to the most recent 7, 14, or 30 days. It would be interesting to see what the correlation is between the current power ranking formula and the performance in the next 2 weeks or so. I'm guessing that correlation could be improved by weighting the more recent games more heavily.
On a different subject, if you are going to 'adjust' for league difficulty, please make sure you do it right. I don't have the extensive database to prove it, but I'm 100% certain that if the AL and NL were perfectly evenly match, the AL's expected winning percentage in interleague is significantly higher than .500, because the advantage of having a DH on your roster and using them while the NL has to use the first guy off their bench in AL parks is way more than the advantage the NL has in their park, where their pitcher will hit a little better than the AL pitcher due to more practice. If the AL should win 56% of their home games and the NL should win 52% of theirs, the AL would have an expected record of 131-121.
As for the idea of weighting recent performance more heavily, there isn't really any evidence to suggest that two weeks will tell us much about a team's level of success going forward; if anything, it's likely to indicate future regression to the mean.
Using last year's Hit List, I broke the season up into four-week chunks and tested the correlation between each team's "monthly" w0, w1, w2, w3, and HLF and their following month's actual record. I used these four-week splits because that's what could easily create from the master Hit List spreadsheet (I only save the adjusted standings for the days I use to compile the list). I don't have end-of-month splits available, and waiting for Clay Davenport to dig them up would take some time.
The correlations for "monthly" ____ winning percentage to next "month's" actual winning percentage:
actual: .21
first-order: .24
second-order: .18
third-order: .17
HLF: 0.22
I then tested the correlation between the various year-to-date winning percentages from those increments and the next month's actual winning percentage.
actual: .304
first-order: .289
second-order: .298
third-order: .296
HLF: .312
This is a pretty slapdash study, but it does support the none-too-controversial idea that a larger sample size such as a year-to-date performance is more useful in predicting W-L performance going forward than a recent increment is. And at the very least, using HLF for that purpose is no worse than using any of the individual winning percentages, and possibly better.
Thanks.
C Saltalamacchia -> Ivan Rodriguez
1B Davis -> Blalock
2B Kinsler -> Kinsler
3B Young -> Young
SS Andrus -> Andrus
LF Byrd -> Byrd
CF Hamilton -> Hamilton
RF Cruz -> Cruz
DH Blalock -> Jones
SP Millwood -> Millwood
SP Padilla -> Feldman
SP McCarthy -> Holland
SP Benson -> Hunter
SP Harrison -> Nippert
Surprisingly, the only changes are at Catcher, Firstbase, and DH, and tonights lineup might have Davis at first and Blalock DH'ing, leaving the only change at Catcher. But Kinsler has missed some time, and Cruz has been hurt and Hamilton has been out too. And of course the rotation is entirely different. Benson lasted like 2 starts, and Harrison and McCarthy both lasted 11 (Hunter has had 11 starts as well....) and Padilla 18.
Trying to rank teams is always kind of a moving target, but the Hit list isn't a bad way to do it, as long as you understand what it is.
That's W0, W1, W2, and W3. I suspect you were interpreting *actual* as a stray adjective, rather than a category.
Anyway, you're assuming your conclusion by calling wins the "biggest indicator of performance". Wins are the biggest indicator of wins, and the perfect stat for figuring out where you are in the standings. But if you want to predict future wins, you're better off looking at performance -- i.e. what players have done in individual plate appearances or batters faced. Better still, adjust those for park and league and quality of opposition.
And honestly, until we see that the Hit List factor is actually correlated with future performance, they really aren't much more than a different form of standings, are they? They tell us about the past, not about the future.
Maybe building in a "remaining strength of schedule" factor would matter more than recent performance.
it is a reasonable direction of inquiry given the factor of luck, and the expectation that luck does not continue.
RawEqA =(H+TB+1.5*(BB+HBP+SB)+SH+SF-IBB/2)/(AB+BB+HBP+SH+SF+CS+SB)
I would think that the methodology to develop a model which best fits historial data would be completely different than that used to develop a predictive model so I wouldn't feel compelled to alter the Power Rankings to simulate the Power Rankings used by bettors of college football.