Caught Looking examines articles from the academic literature relevant to baseball and statistical analysis. This review looks at the inaugural article from the Journal of Sports Analytics, written by Jim Albert on the topic of run expectancy in differing environments.

The Journal of Sports Analytics launched in 2015 with an unusual editorial board. Rather than a host of scholars with prestigious university affiliations, the peer-reviewed journal boasts editors and advisors from a couple dozen collegiate and professional sports teams. Early indications are that the academic standards for the journal will be plenty high. The journal’s very first article, Beyond Run Expectancy, by Jim Albert, presents a way to think about run expectancy tables using advanced contemporary statistical techniques.

The run expectancy matrix, developed first by George Lindsey in 1963, has become a staple of sabermetric analysis. The matrix includes information on how many runs we can expect to see in an inning, given the current base-out state. In Lindsey’s original specification, these expectancies were simple averages of what happened in each situation throughout the league over thousands of innings. In other words, as Albert notes, these run expectancies were created based on what might be expected from a league-average team.

Lindsey’s run expectancy tables, simple as they may be, provide the backbone of many sabermetric models (including openWAR, reviewed previously in this column) and can be an extremely useful tool for making decisions about when to steal a base or sacrifice bunt. Albert’s paper brings these tables up to the current statistical frontier. He combines a Bayesian approach with multinomial, ordinal logistic regression techniques to produce a more accurate run expectancy matrix that can incorporate differences in team quality and other situational influences. His model also provides estimates of scoring a particular number of runs in the inning.

The value of a Bayesian approach comes about because, contra Lindsey, Albert wants to know how the expected runs table changes for different teams in different run-scoring environments. For different teams, we can observe their past results and draw inferences about their true ability to score runs from a given base-out state, but our observed results include some degree of random chance. We might improve our estimates of run expectancy by using information from the rest of the teams in the league. In Bayesian lingo, the league-wide results provide us with an assumed prior distribution of the underlying process of scoring runs.

In plain language, if we see that on average, teams score 0.50 runs in an inning after a certain base-out state, but our team has scored 0.60 runs following the same base-out state, our team might be either lucky or good. Albert’s approach uses information about both the mean and standard deviation of both the team’s results and the league’s results to assign different weights to the population average and the individual team average. Essentially, if the variance for the league is narrow and the variance for the team is wide, the league-wide mean receives relatively higher weight. We think the team was lucky. Conversely, if the team has relatively narrow variance around their mean, but leaguewide variance is high, we think the team is good.

Next, Albert takes on the nonlinearity and discrete nature of run-scoring in innings. Obviously, teams can’t score 1.2 runs in an inning. This doesn’t represent a meaningful problem for seasonal totals, but within an inning, it’s useful to be able to estimate both the mean runs and the probability of scoring, say, two runs In the inning. For this, Albert uses ordinal logistic regression to create team-specific estimates of the probability that a team will move from zero to one runs in the inning, from one to two runs, and so on. Interested readers should consult the paper, which provides a very readable guide to the method and interpretation of results. Based on the data presented in the paper, the logistic approach unlocks information not contained in simply looking at mean expected runs. Albert shows that different teams get to their averages in different ways. Albert doesn’t speculate about the causes of these differences, but we might think that we could create the same run expectancy from, say, a bases-loaded, one-out situation from a high-on-base, low-slugging team that we could from a low-on-base, high-slugging team, but that the former would have a higher probability of scoring one run and the latter would have a higher probability of putting up a crooked number.

But he’s not done yet. In the next section of the paper, Albert illustrates how his multilevel model can be adapted to consider other factors that might affect the run-scoring environment, such as home field advantage, the quality of the pitcher on the mound, or the team’s ability to produce in the clutch. It would have been nice to see Albert expound on his results regarding clutch hitting (he doesn’t find much evidence for it), but the main thrust of this section is really just to demonstrate the intuition for how the multilevel ordinal regression model can work with any list of covariates the investigator can dream up.

The full version of Albert’s model gives the manager or researcher the ability to examine not just the expected runs from a given situation, but also the probability of scoring a certain number of runs, and includes also the flexibility to adjust the expected runs table to consider the game situation. The model does this by combining information about the process of run-scoring contained in both league-level and team-level data.

One potential limitation of using this model in real time, however, comes from the fact that, for example, early in a season, we have limited information about team-level characteristics and much more information about league-wide run expectancy tables. It’s not likely that the improvements developed by this method will be of much use in the middle of April, or even the beginning of June. Drawing on data from previous seasons is also a more legitimate exercise for league-wide estimates than for individual teams who may have very different players and lineups.

However, we often have lots of information about the players on our teams, and very sophisticated projection systems for those players. One modest suggestion would be to replace the team-level results with simulated results based on individual player projections. This could be done early in the season, when there is not much real data available at the team level, and also to account for things that happen during the season, such as injuries, call-ups or trades. I can see complications—it’s not obvious how to weight the information from simulations relative to the league-wide assumed prior, and the sources of error are different with real data (randomness and timing) and simulations (modeling error). However, this approach does have the potential to include player-level information that is not really contained in the team-level information.

Albert suggests using his approach to evaluate individual batter effects on run scoring, in other words to improve our measures of a player’s impact by accounting for his team context. His method is potentially well-suited for this and this application doesn’t have the problems of sample size described above, at least for regular players. It is a retrospective application. Perhaps the next place to go with this model, however, is prospective. What does the run expectancy matrix look like with Anthony Rizzo, Kris Bryant and Kyle Schwarber coming to bat? (Hint: Pretty damn good, if you’re a Cubs fan.) A path forward for researchers interested in extending Albert’s work might focus on combining projections for teams rather than actual data which comes from an ever-changing cast of players.

For now, Albert has indeed taken us beyond simple run expectancy and to the statistical frontier. The clarity of writing on such a complex topic is also much appreciated, as the paper goes through a number of steps toward building the final model, one that represents a very useful tool toward evaluating player performance and analyzing in-game situations. It’s also a tool that has a remarkable amount of flexibility for further investigation and refinement, so hopefully it will be used far and wide.

Michael Wenz is Visiting Professor at Politechnika Czestochowa in Poland. Comments and suggestions for future articles are appreciated.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Thank you for highlighting this piece! I was not aware of the journal and now have some reading to do. If only they offered affordable access to individuals.
You can get this article for free. It's unlocked at the link in the article. Not sure how long it will stay that way!
The paper is also on that page itself, if you don't mind reading on the screen.
I just spent an unreasonable part of my life investigating Beyond Run Expectancy. Trying to project much of anything beyond whether 1st and 2nd with no outs is better than 2nd and 3rd and one out, and a few other situations, like a steal attempt, increases Run Expectancy stretches credibility. Indeed, I would like to know when runners are on 1st and 2nd with no outs just how awful the batter has to be to make the bunt a good play. I would rather have deGrom or Matz swing away in these situations, but Collins bunts regularly even with position players. Beyond that I think it is intuitive, ya think, that the Run Expectancy of 1st and 2nd, nobody out, is a lot better with Rizzo, Bryant and Schwarber coming up against Kyle Kendrick rather than Lagares, Tejada and Colon facing Kershaw. The Cubs will swing away and the Mets might try to eke out a run. How should the Mets play it? Assuming Lagares and Tejada fan, a reasonable assumption, should the Mets pinch hit for Colon. What is the score, what inning is it, how many pitches has Colon thrown, and on and on? Over the course of time the Cubs will score more runs but if the Mets score one run have they exceeded their probability in the aforementioned situation or if the Cubs score two have they failed to reach their expectations? In every situation there are so many variables that I wonder what the value of any quantitative analysis is when each situation is truly unique, and only somewhat similar. I think I am smart enough to know that the object is to maximize the possible result in the varying situations. Do the sabermetricians think they can find a way to glean .01 more runs from these situations with some advanced study? Colon could line a double, let us settle for a single, and Kendrick could fan the three Cubs,(I think Colon has a better chance of getting a hit off Kershaw than that happening), but while probability argues for a certain outcome, nothing can predict the actual results and it seems impossible to influence that outcome beyond what probability already does.
Any statistical estimate has some uncertainty surrounding it, arising from variables that influence the outcome but weren't controlled for. That doesn't render an attempt to get closer to the truth meaningless, and it doesn't even render crude estimates useless. They just have to be used with this in mind. "The bunt is a bad play in situation X" has never meant that there is no possible circumstance in which the opposite is true, and I don't think that sensible sabermetricians think that it does mean that.