
Recently, Major League Baseball notified vendors that, beginning next season, it would shift its in-game ball tracking from the current, radar-based Trackman system to an optical-based system by Hawkeye. The impending transition makes this a good time to assess how accurately Statcast seems to be measuring balls in play, and how it has changed over time.
This series will focus on batted ball data, and particularly on two measurements made publicly available through Baseball Savant: launch speed (colloquially described as “exit velocity”) and launch angle. These are not the only batted ball measurements that Statcast captures, but these two have received the most press and arguably are the most important.
There are several drivers of batted ball launch speed and launch angle: the two most obvious are the batter (who is primarily responsible for both measurements) and the pitcher (who has some, but less effect). However, an important third input can be the stadium where the ball is struck. Ideally, there would be no such effect: one of the goals of Statcast is to overcome ballpark bias, by measuring batted ball contact at its origin before the environs can distort the outcome. And yet, this ideal is difficult to achieve: different stadiums may have different Statcast installations in different orientations in varying states of operation. Furthermore, stadiums may have inherent eccentricities that confound player efforts to give their typical performance, regardless of how they are detected by Statcast.
The good news is that, at least with batted ball data, the system appears to have made steady improvement in cutting through stadium effects. But this also suggests that the data from previous seasons was progressively less accurate, confounding the measurements that were taken. Even now, similar batters seem to be generating different value ranges at different parks. Some, but not necessarily all of these differences, may rest with the detection system itself.
Getting a radar system to work in 30 different ballparks certainly isn’t easy, and every detection system is going to have quirks. Thus, the point of this study is not to criticize the efforts that have been made, but rather to quantify the differences between stadiums that have persisted and to discuss their probable impact.
This first article will focus on launch speed a/k/a exit velocity. Launch angle will be dealt with separately.
***
Method
Using the baseballr package, available play by play data was downloaded for the 2015-2018 seasons from the Baseball Savant website. 2019 data was downloaded through the first week of June.
Two filters were applied. The first limited the data to balls in play, meaning events with descriptions “hit_into_play,” “hit_into_play_no_out,” and “hit_into_play_score.”
The second filter removed plays which might not contain actual radar measurements. As part of a declared “no nulls” policy, MLB imputes (i.e., estimates) values for launch speed and launch angle when insufficient data is recorded. However, despite numerous requests to designate which plays have been so imputed, MLB has not done so. While these imputations project the mirage of an error-free operation, they give public analysts the worst of both worlds. If we knew which values were imputed, we could employ different imputation methods on the missed values, or at least fall back on the actual values that were recorded. Instead, we are stuck with MLB’s imputations and have no way of knowing which values are real measurements. Thus, certain rows inevitably get tossed even if they are actual measurements because we cannot tell the difference. Until MLB identifies the imputed observations, we need to figure out how to deal with the values being provided.
Ben Dilday implemented one approach for detecting MLB imputations; in more detail, Andrew Perpetua suggested treating all launch speed / angle pairs that appeared more than 75 times between 2015 and 2017 as presumptively imputed. In general, certain launch speed / angle combinations show up so much more frequently than their neighbors that it is obvious many do not belong. But other combinations are less obvious; the challenge is knowing where to draw the line, knowing that it is probably impossible to guess them all.
For this series, we’ve created our own detection method. Our guiding principle is that, rather than try to perfectly detect imputation, we instead determine when our detections are accurate enough to analyze the issue we care about: stadium bias. In other words, if we exclude values that are overwhelmingly likely to be imputed and not too many real values, our model results should converge (and they do).
Thus, we designed a systematic outlier detection procedure that, through successive applications of Rosner’s test, applies a progressively more skeptical filter until further paring of outlier candidates fails to budge our model estimates. The identified probable outlier pairs are removed and are not re-imputed. If there is interest, we may discuss the theory and code for this screening method in a future article.
The bias modeling itself was straightforward. The same three predictors were modeled as random effects for both launch speed and launch angle: batter, pitcher, and stadium. Errors from the launch speed model were assumed to be normally distributed, as this assumption fits the actual radar measurements well. Separate models were fit to each season.
Specified priors are largely swamped by the high volume of batted ball data, but we achieved good results using a half-Cauchy prior with 25 degrees of freedom for the random effect precisions on the launch speed model.
Stadium Effects, Generally Speaking
We begin by discussing Statcast stadium differences generally. We have well over 100,000 batted balls for each completed season, and as noted above, have controlled for the quality of the batters and pitchers in each matchup, putting the stadiums on theoretically equal footing. Between the rigor of that process and the incredibly low standard deviations, we are confident in these estimates, and conclude that variance allotted to stadiums by our models is strong evidence of stadium “house” effects.
Table 1: Launch Speed Differences by Stadium
Season | Average Spread | Spread Uncertainty (+ / -) |
2015 | 0.53 | 0.01 |
2016 | 0.50 | 0.02 |
2017 | 0.41 | ~0 |
2018 | 0.29 | ~0 |
2019 | 0.38 | 0.01 |
Table 1 shows that, from 2015-2016, what I call the “Average Spread” (the standard deviation, technically) between stadiums, controlling for pitchers and batters who hit there, was at least 0.5 mph in launch speed. By 2017, that had declined to just over 0.4 mph, and in 2018 it went down even further. The uncertainties around the average spreads are tight, so, at least in the world of our models, there is virtually no room for disagreement, at least with the observations we selected.
Somewhat interesting are the results to date from 2019, which show greater stadium variation so far than was the case in 2018. This could be an artifact of early season hitting or of inferior sample size. On the other hand, the installation of the new Hawkeye system may make it challenging to maintain the existing, Trackman-based system. We express no judgment other than to observe that the spread between stadiums is worth monitoring during this time of transition.
Hot and Cold Stadiums
Although the average spread between parks may not seem that large, remember that an average summarizes a collection of stadiums with different values. Which stadiums are driving most of this variance? That depends on the season.
Table 2: Largest Stadium Effects, 2015 Season
Stadium | Launch Speed Effect | Launch Speed SD |
ARI | 1.19 | 0.27 |
BAL | 0.86 | 0.27 |
DET | 0.69 | 0.27 |
KC | 0.64 | 0.27 |
STL | -0.48 | 0.27 |
SD | -0.51 | 0.27 |
CIN | -0.98 | 0.27 |
HOU | -1.01 | 0.27 |
NYM | -1.10 | 0.27 |
The ones that interest us most are those that dramatically exceed the SD, or uncertainty estimate. We are reasonably confident that something is “off” with the reported launch speeds at these stadiums, relative to the average such measurements for these players. For 2015, Arizona and Baltimore ran extremely hot, while Cincinnati, Houston, and the New York Mets appear to be reporting surprisingly low values.
Table 3: Largest Stadium Effects, 2016 Season
Stadium | Launch Speed Effect | Launch Speed SD |
ARI | 1.35 | 0.26 |
KC | 0.69 | 0.26 |
ATL | 0.65 | 0.26 |
DET | 0.64 | 0.26 |
MIA | -0.74 | 0.26 |
SD | -0.80 | 0.26 |
CIN | -0.95 | 0.26 |
NYM | -0.97 | 0.26 |
HOU | -1.09 | 0.27 |
For 2016, we have a lot of familiar faces. Despite running a completely new model for each year on separate data, the classifications prove sticky. The presence of Kansas City up top and Cincinnati at the bottom, both well outside the margin of error, confirms that our model is not simply parroting expectations for high and low-offense parks.
Table 4: Largest Stadium Effects, 2017 Season
Stadium | Launch Speed Effect | Launch Speed SD |
ARI | 1.00 | 0.26 |
DET | 0.67 | 0.25 |
MIN | 0.51 | 0.25 |
BAL | 0.49 | 0.25 |
SD | -0.42 | 0.26 |
PIT | -0.54 | 0.25 |
CIN | -0.58 | 0.25 |
MIA | -0.61 | 0.25 |
NYM | -0.74 | 0.25 |
For 2017, we have more familiar faces at the top and bottom, but you can see that the gap between stadiums is narrower overall than it was before, and consistent with that, so are the SD values around these estimates.
Table 5: Largest Stadium Effects, 2018 Season
Stadium | Launch Speed Effect | Launch Speed SD |
CHC | 0.44 | 0.21 |
LAA | 0.41 | 0.21 |
KC | 0.40 | 0.20 |
WSH | 0.34 | 0.21 |
STL | -0.30 | 0.20 |
BAL | -0.32 | 0.20 |
MIA | -0.34 | 0.20 |
NYM | -0.47 | 0.21 |
By 2018, we start to see some reshuffling. Arizona has vanished from the “running hot” list up top, with Wrigley Field and Angels stadium now leading the way. On the opposite end, Miami and Citi Field keep doing their speed-suppressing thing. The SD values have also dropped somewhat dramatically.
Table 6: Largest Stadium Effects, 2019 Season
Stadium | Launch Speed Effect | Launch Speed SD |
BOS | 0.49 | 0.30 |
MIN | 0.40 | 0.30 |
DET | 0.37 | 0.29 |
SEA | 0.29 | 0.28 |
HOU | -0.38 | 0.30 |
TB | -0.41 | 0.29 |
OAK | -0.47 | 0.29 |
MIA | -0.60 | 0.30 |
I wouldn’t take 2019 results too seriously, as we are only two months in, and there is plenty of season left to go. Nonetheless, there are familiar entries at both ends of the spectrum.
Discussion
What to make of all of this? We see a few takeaways.
First, the Statcast system shows signs of having become more accurate over time at detecting and measuring the launch speed of balls in play. Each year, our detection algorithm found that the number of BIP being imputed had probably declined. Furthermore, each year the spread across stadiums of their “house” effects has shrunk. This lends increasing confidence to the measurements being taken, although it also means that recent measurements are probably more reliable than older ones, an important qualification if older measurements are still being used (say, for player projections or to determine other parameters).
Second, the “house” effects of stadiums can be substantial. Effects of more than 0.5 mph in either direction have been common, and we estimate that the differences between two stadiums for the same players can be as much as 2 mph all in. By one estimate, each mph of launch speed is worth an additional five feet of distance. To be sure, full-time players play in a variety of stadiums over the course of a season, which will cancel out some of this variety, but before getting too excited about smaller-sample players, you may wish to check where they have been running up their “good” or “bad” numbers.
Third, good modeling practice would suggest accounting for these stadium differences when modeling launch speed as a function of player performance. This is not as easy as it sounds given that the relationships between launch speed and launch angle feature complicated interactions and non-linearities. Such relationships are thus often modeled with boosted or bagged tree models of some kind, but these techniques are known to struggle with large parameter groups — like, say, a group of 30 stadiums. Additional modeling steps or pre-treatment thus may be required to effectively account for stadium effects.
Fourth, a word on causation. While detection error could explain much of these stadium differences, other factors could also be equally at play. For example, Arizona was a “hot” outlier from 2015–2017, but abruptly became more normal beginning in 2018. Perhaps not coincidentally, for the 2018 season the Diamondbacks implemented their humidor to pre-condition game balls. Because balls with more moisture have lower launch speeds, the humidor could be a factor in Arizona’s newly-moderate launch speed profile, although equipment effects cannot be ruled out either. Analogously, ball storage conditions at other parks may similarly affect their launch speed measurements, although probably to a lesser extent.
Others have suggested to us that certain stadiums may simply suppress batter effectiveness overall, through features such as a difficult batter’s eye. Although stadiums are often thought of as affecting results only after the ball is struck (e.g., because the park is spacious or the air is thin), it is possible that some of these house effects directly suppress batter success from the get-go. If so, the effects we are finding may have even deeper causes than we suspected, and could remain at least in part regardless of how accurately they were being measured.
Next time, we’ll look at launch angles and see whether any analogous “house effects” appear for these measurements also.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now