One of the emerging storylines of the postseason so far has been inconsistency in the strike zone. That’s not unique to this postseason, of course; every year sees its share of poor calls, and the effect of those calls is magnified when so much is on the line. Whereas a missed strike may be objectionable in the regular season, it can (at worst) alter the outcome of one game out of 162. Missed calls in the postseason, on the other hand, can end seasons.
As a result, every bad call an umpire makes is scrutinized to a much greater degree. When an umpire’s zone is off—poorly defined, or merely inconsistent—whole legions of fans can flood the internet with vitriol. Generally, an umpire who’s doing a bad job of calling balls and strikes won’t favor the fortunes of one team or the other. But it is frustrating, as a fan, to see a beleaguered slugger’s bat taken out of the game on a borderline call, as happened to Matt Kemp recently.
Kemp’s strikeout (among other questionable decisions in recent umpiring) got me wondering whether an inconsistently called zone affects certain types of players more than others. If this notion is true, even though an umpire’s errors may be random in that they occur to each team equally, one team might be affected by those errors to a greater degree. The most obvious place to start with this question is whether an inconsistent zone favors the hitter or the pitcher more.
Before I transition to the numbers, I want to form some hypotheses as a guide. My initial inclination is that the pitcher ought to be favored more by an inconsistent zone, if only because the pitcher exercises more control in the matchup. The pitcher determines the speed, break, location, and type of each pitch, whereas the hitter only gets to swing or not swing. The pitcher can avoid parts of the zone in which the umpire appears to be hazy, or target them, if advantageous.
To illustrate the scenario more concretely, imagine that the pitcher is up 0-2 in the count against a fairly good hitter. At this point, the pitcher gets up to four chances to strike the hitter out (possibly more, with fouls). If he knows that the umpire seems to be calling the top of the strike zone somewhat inconsistently, he can aim his pitches to that area (or above it), in the hopes that the umpire will incorrectly determine that a ball is a strike. On the other side of the matchup, the hitter can’t do much to counteract that approach. If he swings, he risks putting a bad pitch into play, more than likely resulting in a popup. If he doesn’t swing, he’s relying on the umpire, which might be a bad idea if the ump is doing poorly.
On top of this, there’s an informational asymmetry between batter and pitcher. The pitcher, especially the starting pitcher, sees every pitch thrown and each call that’s made. The batter, on the other hand, observes firsthand only the 10 or 20 pitches he receives in a game, as well as whatever he’s able to glean from watching his teammates at the plate. If there are inconsistent parts of the zone, the pitcher will be better able to observe them than each individual hitter, because he gets as much experience with the zone in a game as all of the hitters put together.
These twin advantages the pitcher has (both in control and in information) suggest to me that the pitcher should be favored over the batter when an umpire’s zone is off. But we can do better than speculate, we can test. First, I need to figure out when the umpires are making errors. Then I can see whether batter performance suffers (relative to expectation) in those games in which umpires are doing an especially poor job.
To investigate this question, I first had to build a model which determined when a pitch should be a ball or a strike. According to the rulebook, there are only four factors which influence that decision: the path of the ball (in three dimensions, so the vertical, horizontal, and depth coordinates of the pitch) as well as the height of the batter. In practice, we know that the strike zone varies considerably in response to lots of other factors as well. We know that it shrinks and expands as the count becomes unbalanced, and that catchers influence the size of the zone via framing skill.
Because I wanted to determine whether the zone seemed inconsistent according to the judgment of the players, I decided it was best to incorporate some of these outside-the-rulebook factors as well. After thousands of innings of organized ball, it seems likely that hitters are aware of the way in which the zone changes according to the count, and we have direct evidence that players are aware of pitch framing. So I incorporated these other variables into the model.
In order to have a flexible model that makes accurate predictions about what the umpires will call, I chose to use a machine learning approach (specifically, a random forest model). The idea of this genre of approach is that the form of the model is not fixed in advance, the way a linear model would be. Instead, I feed the algorithm a training set (comprising 30000 pitches), from which the algorithm learns what characteristics make a pitch a ball or a strike. In this way, the algorithm reflects the way a hitter would learn the strike zone, from experience.
After the model is built, I have the algorithm predict whether each pitch in the remaining data should be a ball or strike, depending on what variables it has decided are important, to what degree, and the interactions between them. Then I can contrast what the model decides is a ball or a strike with what was actually called. If there is disagreement, it suggests that the umpire was calling a pitch in a way that it is not usually called—perhaps that the umpire made a mistake*.
Reassuringly, the model predicts that the umpires get it right the vast majority of the time. For about 91 percent of the ball-strike calls, the model and the umpire agree. Note here that this is significantly higher than you get from using a fixed zone, reflecting the fact that the model is taking into account catcher framing and the expansion/contraction process the zone goes through with the count. When the model disagrees with the umpire, it is overwhelmingly in regard to edge cases (catcher perspective):
These edge cases are not all umpire errors, because after all, PITCHf/x is not perfect either—the system has a margin of error as well, and so the umpire and the model are deciding their ball/strike calls based on slightly different data. But probably some of these calls are errors, if only because the edge of the zone is going to be the most difficult portion to call. What’s more, there’s a smattering of pitches in the middle of the zone which should be called strikes (but were apparently called balls), and then outside the zone there’s some of the reverse case.
Assuming you buy into the idea that the model is good at telling a ball from a strike, we can use it to determine how inaccurate umpires have been on a per-game basis. We would expect those per-game accuracies to be centered on their average accuracy (~91%) but with some variation on either side, depending on whether the umpire was doing well or poorly on that particular day.
This graph is a histogram showing how frequent (y-axis) the performance of the umpires (x-axis) was for each of 2000 games in the 2014 regular season. Overall, the umpires do reasonably well, but on bad days, their accuracies can drop to somewhat frightening levels. At the lowest extreme, an umpire can start making incorrect calls as frequently as one in every six or so pitches (as opposed to their normal performance of about one in every 11 pitches). The red line in the above graph illustrates the histogram you would expect if the umpires had a constant probability of error on every pitch for every game. That the actual histogram is a little more dispersed than that red line suggests that umpires sometimes do have bad days, when their calls are systematically off.
To the question which motivated this article: How do the hitters fare when the umpires are having one of those bad days (say the 5 percent of games with the lowest accuracies)? The most obvious way a hitter’s performance could suffer is from additional strikeouts, so that’s what I examined. I used the odds ratio method to control for the particular hitters and pitchers involved, that is to derive an expected strikeout rate for those players. Then, I looked at how often the hitters actually struck out in those games in which the umpires did the worst.
The results show that, as expected, hitters fare worse when the zone is inconsistently defined. The magnitude of the difference isn’t massive, but it’s still considerable: The hitter strikeout rate jumps about 4.4 percent relative to our expectation. In plain English, when the umpires are having a bad day, the batter is 4 to 5 percent more likely to strike out than we would otherwise expect.
It’s not just strikeout rates that increase, though. Just about every positive offensive statistic goes down marginally in the most inconsistent games, and every negative statistic goes up. Even the rate at which balls in play are converted into outs goes up (by ~2 percent), suggesting that hitters might be making slightly weaker contact. That could stem from the inconsistency of the zone as well: Hitters feel pressured to swing when they aren’t sure what the result of a pitch will be. They may be reasoning correctly that a ball hit into play, however weakly, is marginally better than a called strikeout.
These results show that even when the umpire is making random errors, it doesn’t affect all players equally. Because of the asymmetries in the matchup, pitchers prosper when the zone is poorly defined. The good news from this study is that the umpires do get their calls right the majority of the time. Furthermore, other work has shown that the umpires are getting more accurate, so the problem itself is slowly disappearing. But I imagine that does nothing to diminish the sting of seeing your favorite hitter unjustly rung up on a strike that was not a strike late in a playoff game.
*Or that the Pitchf/x data is in error. Or, there’s some extenuating factor the model is unaware of, like a usually good framer doing a poor job on some particular pitch.