Caught Looking examines articles from the academic literature relevant to baseball and statistical analysis. This review looks at the inaugural article from the Journal of Sports Analytics, written by Jim Albert on the topic of run expectancy in differing environments.
The Journal of Sports Analytics launched in 2015 with an unusual editorial board. Rather than a host of scholars with prestigious university affiliations, the peer-reviewed journal boasts editors and advisors from a couple dozen collegiate and professional sports teams. Early indications are that the academic standards for the journal will be plenty high. The journal’s very first article, Beyond Run Expectancy, by Jim Albert, presents a way to think about run expectancy tables using advanced contemporary statistical techniques.
The run expectancy matrix, developed first by George Lindsey in 1963, has become a staple of sabermetric analysis. The matrix includes information on how many runs we can expect to see in an inning, given the current base-out state. In Lindsey’s original specification, these expectancies were simple averages of what happened in each situation throughout the league over thousands of innings. In other words, as Albert notes, these run expectancies were created based on what might be expected from a league-average team.
Lindsey’s run expectancy tables, simple as they may be, provide the backbone of many sabermetric models (including openWAR, reviewed previously in this column) and can be an extremely useful tool for making decisions about when to steal a base or sacrifice bunt. Albert’s paper brings these tables up to the current statistical frontier. He combines a Bayesian approach with multinomial, ordinal logistic regression techniques to produce a more accurate run expectancy matrix that can incorporate differences in team quality and other situational influences. His model also provides estimates of scoring a particular number of runs in the inning.
The value of a Bayesian approach comes about because, contra Lindsey, Albert wants to know how the expected runs table changes for different teams in different run-scoring environments. For different teams, we can observe their past results and draw inferences about their true ability to score runs from a given base-out state, but our observed results include some degree of random chance. We might improve our estimates of run expectancy by using information from the rest of the teams in the league. In Bayesian lingo, the league-wide results provide us with an assumed prior distribution of the underlying process of scoring runs.
In plain language, if we see that on average, teams score 0.50 runs in an inning after a certain base-out state, but our team has scored 0.60 runs following the same base-out state, our team might be either lucky or good. Albert’s approach uses information about both the mean and standard deviation of both the team’s results and the league’s results to assign different weights to the population average and the individual team average. Essentially, if the variance for the league is narrow and the variance for the team is wide, the league-wide mean receives relatively higher weight. We think the team was lucky. Conversely, if the team has relatively narrow variance around their mean, but leaguewide variance is high, we think the team is good.
Next, Albert takes on the nonlinearity and discrete nature of run-scoring in innings. Obviously, teams can’t score 1.2 runs in an inning. This doesn’t represent a meaningful problem for seasonal totals, but within an inning, it’s useful to be able to estimate both the mean runs and the probability of scoring, say, two runs In the inning. For this, Albert uses ordinal logistic regression to create team-specific estimates of the probability that a team will move from zero to one runs in the inning, from one to two runs, and so on. Interested readers should consult the paper, which provides a very readable guide to the method and interpretation of results. Based on the data presented in the paper, the logistic approach unlocks information not contained in simply looking at mean expected runs. Albert shows that different teams get to their averages in different ways. Albert doesn’t speculate about the causes of these differences, but we might think that we could create the same run expectancy from, say, a bases-loaded, one-out situation from a high-on-base, low-slugging team that we could from a low-on-base, high-slugging team, but that the former would have a higher probability of scoring one run and the latter would have a higher probability of putting up a crooked number.
But he’s not done yet. In the next section of the paper, Albert illustrates how his multilevel model can be adapted to consider other factors that might affect the run-scoring environment, such as home field advantage, the quality of the pitcher on the mound, or the team’s ability to produce in the clutch. It would have been nice to see Albert expound on his results regarding clutch hitting (he doesn’t find much evidence for it), but the main thrust of this section is really just to demonstrate the intuition for how the multilevel ordinal regression model can work with any list of covariates the investigator can dream up.
The full version of Albert’s model gives the manager or researcher the ability to examine not just the expected runs from a given situation, but also the probability of scoring a certain number of runs, and includes also the flexibility to adjust the expected runs table to consider the game situation. The model does this by combining information about the process of run-scoring contained in both league-level and team-level data.
One potential limitation of using this model in real time, however, comes from the fact that, for example, early in a season, we have limited information about team-level characteristics and much more information about league-wide run expectancy tables. It’s not likely that the improvements developed by this method will be of much use in the middle of April, or even the beginning of June. Drawing on data from previous seasons is also a more legitimate exercise for league-wide estimates than for individual teams who may have very different players and lineups.
However, we often have lots of information about the players on our teams, and very sophisticated projection systems for those players. One modest suggestion would be to replace the team-level results with simulated results based on individual player projections. This could be done early in the season, when there is not much real data available at the team level, and also to account for things that happen during the season, such as injuries, call-ups or trades. I can see complications—it’s not obvious how to weight the information from simulations relative to the league-wide assumed prior, and the sources of error are different with real data (randomness and timing) and simulations (modeling error). However, this approach does have the potential to include player-level information that is not really contained in the team-level information.
Albert suggests using his approach to evaluate individual batter effects on run scoring, in other words to improve our measures of a player’s impact by accounting for his team context. His method is potentially well-suited for this and this application doesn’t have the problems of sample size described above, at least for regular players. It is a retrospective application. Perhaps the next place to go with this model, however, is prospective. What does the run expectancy matrix look like with Anthony Rizzo, Kris Bryant and Kyle Schwarber coming to bat? (Hint: Pretty damn good, if you’re a Cubs fan.) A path forward for researchers interested in extending Albert’s work might focus on combining projections for teams rather than actual data which comes from an ever-changing cast of players.
For now, Albert has indeed taken us beyond simple run expectancy and to the statistical frontier. The clarity of writing on such a complex topic is also much appreciated, as the paper goes through a number of steps toward building the final model, one that represents a very useful tool toward evaluating player performance and analyzing in-game situations. It’s also a tool that has a remarkable amount of flexibility for further investigation and refinement, so hopefully it will be used far and wide.
Michael Wenz is Visiting Professor at Politechnika Czestochowa in Poland. Comments and suggestions for future articles are appreciated.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now