February 26, 2013
The Stats Go Marching In
Who's Ahead of Whom?
The late Earl Weaver put a box in the first chapter of his classic book dedicated to answers he used to give every spring to “all those sportswriters with nothing much to write about” who every year “asked the same questions”. He alternated between the above depending on whether a loss occurred because his Orioles could not score any runs or allowed too many of them.
“Who’s to say which group develops faster?”
Well, let me try.
Month or temperature?
In that article I did not try to model time of the year and temperature simultaneously, but I’ll try here. As a first step, I ran a very simple regression model, featuring the month and the temperature as the only predictors and the runs scored as the variable to be predicted. Temperature was taken from Retrosheet files, and I did not go back further than 1991, as before that year a very high number of missing values would have made the analysis troublesome.
The results I obtained indicate a rise of 0.04 runs scored per game (both teams combined) for every one-degree increase in temperature, and a seasonal trend as depicted by the chart below.
It seems that, after you remove the effect of the temperature, July and August are actually run-suppressing months. With the goal of this article in mind, it would seem that pitchers start slow and that it takes a couple of months before they catch up to the hitters.
A (nearly) experimental setting
However, the whole purpose of multivariate models is to measure the net effect of one variable after considering the other variables in the model. In an experimental setting one could keep the temperature constant for the whole year and look at the run-scoring trend. But that’s only feasible in physics labs, not in ballparks.
We actually have a few nearly experimental settings, where the temperature has been held more or less constant through the season (and the years). That’s in the domed stadiums without adjustable roofs, which are now represented exclusively by Tropicana Field.
Here is the seasonal run-scoring trend for teams playing either at home or on the road at domed stadiums, excluding the few games played at Tokyo Dome. For this analysis, I examined scoring as a function of games played rather than month. The set of charts below didn’t undergo any statistical trick, except counting.
Okay, I lied. There is actually some smoothing, and there are error bars.
The throughout-the-year drop in run scoring is not ubiquitous (see the Astros,) but run prevention has started out slow for the Twins, the Expos, the Rays and all their opponents—and you can make a case for the Astros’ and the Mariners’ opponents as well.
Back to complex
So I went back to the statistical tricks, namely multilevel regression. I looked at runs scored by the home team, accounting for the park it played in (so that the run-boosting effect of the rarified air in Denver is removed, for example) and again examined temperature and games into the season.
This time I used 10-degree groups for the temperature, in order to detect possible departures from a linear relation with run scoring. However, looking at the chart below, it seems that the linear approximation would not be far off the target.
And here’s the other chart, showing the effect of the number of games into the season once the effects of temperature and the home park are removed.
Starting from game number 60 or so, the line is not well behaved like the one in the previous chart, and I admittedly did not spend much time thinking about what those up and downs might mean (if anything other than noise).
However, a downward trend is apparent for the first couple of months.
Interpretation and caveats
Is this enough to say that pitchers are behind hitters early in the year?
Well, pitchers account for just part (albeit the biggest part) of run prevention. Could it be that sloppy fielding is actually the culprit? I did a similar analysis on Defensive Efficiency (DER), and it yielded something like an inverse path (i.e. higher efficiency at the start of the season). However the difference between the best and worst defensive moments is lower than one point of DER, so I would feel more at ease stating that there is no seasonality in fielding.
Let’s assume for a moment that pitchers being behind hitters is fully responsible for the trends outlined in this article. How should teams react to this information, if at all? Should they change their spring training routines in order to ensure that their pitchers are really ready by Opening Day? Or could it be the case that pitchers can last only so long in a season, and if they were at 100 percent in April they would falter down the stretch?
This matter can be further explored in several directions. Looking at individual pitchers’ trends can give us some information, especially if some of them show different patterns year in and year out: we might spot throwers who are at their peak early on in the season and see whether they last until the end of the season. In addition, a look at past decades could be interesting: Retrosheet has sparse temperature data, if any, the further one goes back in time, but other online sources could fill in the blanks.