In mid-August I was reading an article on a friend’s blog, in which he was reflecting on his preseason MLB award predictions and lamenting his pick of Chris B. Young for NL Rookie of the Year. Since I had hyped Young to him prior to the season, I felt it necessary to reply and defend my prognostication. Young was hitting just .235 at the time, but had 25 or so home runs, and had already built a reputation as a premier baserunner. In defense of my pick, I hastily replied, “He’s had bad luck. If you gave him even a league-average batting average on balls in play, he’d be hitting .270 and have All-Star overall numbers.”
Something about my statement, however, did not seem intuitively correct. I knew that multiplying a player’s balls in play total by a league-average BABIP and adding in the home runs was, at best, a brute force measure of what the player “should” be hitting, only slightly less variable than a player’s batting average itself. Still, I thought that with Young’s combination of power and speed, he should ideally be hitting with at least average luck on balls in play.
Thanks to Marc Normandin‘s player profiles, most BP readers will be familiar with the concept that a player’s batting average is highly variable from year to year, compared with other important rate stats like isolated patience and isolated power.

The correlation reflected above supports the idea that that hitting for average is subject to great variance due to luck, whereas walk rate and hitting for power are less prone to variance unexplained by the player’s skillset.
Readers will also be familiar with the fact that certain types of batted balls are more likely than others to fall for hits. To determine a players expected batting average, then, we could simply multiply each batted ball type by its individual probability to fall for a hit, and then factor in the player’s strikeout rate.
Unfortunately, this procedure is not as practical as it might seem. There are other factors besides batted ball type that determine the rate at which balls in play are converted into hits. In efforts to investigate whether my defense of Young’s “bad luck” was justified, I created a model that would account for batted ball types as well as three other demonstrable skills that affect a player’s average: frequency of contact, strength of contact, and how he can use his legs to create (or quash) potential hits on ground balls.
To measure the footspeed component, I used Bill James’s original “speed score” formula, the average of five different calculations that measure speed-stolen base percentage, stolen base attempts, triples, runs per time on base, and GIDPs. For a measure of strength of contact, I elected to use the player’s year-to-date rate of extra-base hits per at-bat. I did not discriminate between doubles, triples, and home runs, since I decided to treat these as similar events and roughly equal in description of power ability. Finally, I included the player’s strikeouts per at-bat to measure the frequency of the ball being put into play.
Taking every player-season since 2004 with at least 300 plate appearances (1097 total), I used multiple regression to measure the relative effect of each factor on batting average. The resulting model predicts a player’s “component” batting average (cAVG) based on the aforementioned rates and speed score. The model has a correlation coefficient of .75, with an R-squared value of .55.
Below are the players with the top 10 biggest discrepancies between their cAVG and their 2007 batting averages, according to my model:
Underperformers Name AVG cAVG Diff. Frank Catalanotto .260 .314 .054 Lyle Overbay .240 .293 .053 Richie Sexson .205 .251 .046 Bobby Crosby .226 .269 .043 Marcus Giles .229 .270 .041 Julio Lugo .237 .274 .037 Adam Kennedy .219 .256 .037 Paul Lo Duca .273 .310 .037 Kevin Mench .267 .302 .035 Jason Kendall .242 .276 .034
Overperformers Name AVG cAVG Diff. Matt Kemp .342 .271 -.071 Willy Taveras .320 .253 -.067 Matt Diaz .338 .275 -.063 Ichiro Suzuki .351 .292 -.059 Edgar Renteria .332 .283 -.049 Norris Hopper .329 .286 -.043 Magglio Ordonez .363 .320 -.043 Mike Lowell .324 .285 -.039 Moises Alou .341 .302 -.039 Cliff Floyd .284 .245 -.039
I should note that this list contains both Ichiro Suzuki and Willy Taveras, two of the fastest players in the game. It makes sense that their extreme speed would inflate their batting average more than this model predicts, since the cAVG calculation aggregates years’ worth of data from hundreds of players. However, the equation provides an intriguing look at how these players would perform if their speed were slightly closer to the mean level.
A similar list of over- and underperformers calculated using the normal BABIP method ( ([AB-HR-K][League BABIP] +HR) / AB) shares only nine of the 20 names, demonstrating a sizeable difference between my model and that method. What, then, is the biggest source of difference between the predictions of the two equations?
First, as a basis of comparison, here are the largest discrepancies between the cAVG model and the straight BABIP calculation:
Name AVG cAVG BIP AVG abs(BIP-cAVG) Chone Figgins .330 .305 .254 .051 Curtis Granderson .302 .310 .261 .049 Carl Crawford .315 .307 .260 .047 Juan Uribe .234 .220 .266 .046 Michael Young .315 .309 .264 .045 David Ross .203 .208 .253 .045 Ryan Church .272 .297 .258 .039 Jason Giambi .236 .227 .264 .037 Barry Bonds .276 .277 .314 .037 Bobby Abreu .283 .302 .265 .037 Yunel Escobar .326 .310 .274 .036 Derek Jeter .322 .305 .270 .035 Hunter Pence .322 .302 .267 .035 Chase Utley .332 .317 .283 .034 Nook Logan .265 .258 .224 .034 Matt Holliday .340 .317 .284 .033 Jack Cust .256 .257 .224 .033 Travis Buck .288 .284 .251 .033 Jeremy Hermida .296 .292 .260 .032 Dmitri Young .320 .308 .276 .032
Reviewing the data, is apparent that the cAVG model rewards players who demonstrate above-average skills (line-drive rate and extra-base power) that correlate most strongly with batting average, the combination of which often serve to counteract a high strikeout rate. Though the notional average player who strikes out 137 times in 584 at bats could be expected to hit only around .260, Curtis Granderson’s notable speed, power, and line-drive rate indicate that his .300 average this year was completely in line with his actual hitting profile, and not simply a fluke of luck. Considering that he is entering the prime of his career, at least part of this gain likely reflects an improvement in skill.
Even more interesting is the case of Jack Cust. The Oakland DH struck out in 40 percent of his at-bats in 2007, but his cAVG suggests that his true average has not been inflated by luck, mostly due to his prodigious line-drive and extra-base hit rates. The reverse is true for a player like Cliff Floyd, who has an acceptable strikeout rate and power, but lacks a batted ball profile that befits his actual batting average.
For one final comparison of the methods, I looked at how well the BABIP method predicted actual batting average, and came up with a .22 R-squared value, compared with the aforementioned .55 for cAVG.
Where, you ask, is Chris Young? He sits at 66th out of 281 players for absolute difference between the models, with a .237 AVG, a .248 cAVG, and .269 BIP average. Though his high speed score and extra-base-hit power are big pluses, his poor line-drive percentage and high popup rate suggest that his average was depressed only slightly by chance, not nearly the injustice of luck I originally made it out to be.
The idea that power and speed help a player to hit for average is not a revolutionary concept by any means. However, by parsing the components of hitting for average and quantifying their respective levels of influence, we can more easily identify the ideal outcome of the underlying events. It is inevitable that luck will factor into everything that takes place on a baseball field, and batting average is no exception. This being said, the more luck we are able to eliminate from an evaluation of a player’s performance, the closer we are to accurately understanding and valuating that player’s true skillset. This model brings us a small step closer to that ideal evaluation.
Jason Paré is a contributor to Baseball Prospectus. You can contact Jason by clicking here..
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now