Image credit: © Stan Szeto-USA TODAY Sports

Balls and strike calls are among the most fraught and noticeable impacts of umpires in the game. In theory, making them seems like a matter of simple, objective truth: did the ball cross through the TV broadcast’s floating rectangle, or not? In reality, it’s a lot more complicated, and a thousand tiny factors play into whether the pitcher gets the favorable call or not, from the count to home field advantage to the stance of the hitter immediately before seeing the pitch. These factors complicate the ongoing case for robotic replacement of this vital umpire function.

But regardless of how you feel about robot umps, one very important factor that we should all be able to agree shouldn’t be part of the ball/strike call is the race of the pitcher or the hitter. And yet, a new study suggests that umpires are granting thousands more strike calls to white pitchers, and thousands fewer to non-white, specifically Black and Hispanic, players. With the largest and most powerful sample of pitches to date, the study by Claremont McKenna student Hank Snowdon shows significant racial biases in how umpires call pitches.

Previous studies of biases in baseball have returned somewhat mixed results, especially in regards to umpires’ called pitch decisions. Some studies have found significant effects, others have found that those effects depend strongly on how exactly you specify the models, meaning that they may not be exactly certain. It’s worth noting here that in other domains of baseball—like whether organizations promote BIPOC players through the minors equally, or when umpires decide to eject players—there is evidence of significant racial biases.

Focusing on ball and strike calls offers an interesting way to measure bias for two reasons: first, there are hundreds of thousands of these calls per year; second, thanks to Statcast and PitchF/X, we know an astonishing amount about whether a given pitch should be called a ball or a strike to begin with. That makes quantifying the errors much easier.

For his paper, Snowdon grabs the entirety of data from the pitch tracking era, which amounts to millions of pitches with data from 2008-2020. Previous studies have had less precise and less numerous data to rely on. It’s a bit like going hunting for something microscopic using a magnifying glass—you might not see it even if it was there. Snowdon is pulling up a high-powered microscope to the problem, and he immediately finds evidence of biased calls.

He breaks those calls down in several ways, including whether they were balls-called-as-strikes or strikes-called-as-balls, and based on whether the pitcher or the batter shares a racial category with the ump. (Roughly 90 percent of umpires were white in the studied time period, a severe lack of diversity relative to the league’s player base.) But no matter how he slices it, he finds that umpires tend to make more advantageous calls when they share the same race as the person who would be advantaged.

These effects are small, but also large enough to be noticeable. Mistaken calls are about 0.3 percentage points more likely due to race effects, according to the study. Snowdon estimates that umpires called about 18,000 pitches differently over the 13-year period of the study because of racial bias, meaning a little more than a thousand changed calls per year. Any individual player might only receive a handful of these in a season, but for Black players in the league already struggling against discrimination in other regards, any additional barrier is a significant problem.

One of the most contentious and difficult aspects of any study of racial discrimination (in baseball and elsewhere) is that in reality, racial identities are much more complex than can be indicated in a single, one-word description (in statistical terms, a categorical variable: “white” vs. “Black” vs. “Latino” vs. “Asian”). The truth of racial identity is that people can be treated very differently depending on the circumstances and the biases of the people around them, and many people have multiple, overlapping and intersecting backgrounds. This issue is especially pronounced in baseball, where many players are Afro-Latino, some hailing (or with ancestors hailing) from the Caribbean.

Snowdon’s study can’t resolve this problem, but his study ultimately comes down to finding a powerful difference in the treatment of white vs. non-white players, regardless of whether they are Black, Latino, or both. Since the main variable of interest is whether the umpire is of the same race as the pitcher or hitter, it doesn’t matter as much for these statistics what demographic background they have, as long as it is not the same as the ump. Although Snowdon collected his information from a combination of pulling Wikipedia pages, country of origin information, manual inspection of photos, and other sources, even slightly inaccurate demographic data is often enough to detect bias. Indeed, for this discriminiation to be a false finding would require a very high level of racial misclassification that seems unlikely.

A downside of the approach that focuses on whether the ump and player have the same race is that it makes an implicit assumption that Hispanic umps will be as biased against white players as white umps are against Latinos. (The study itself uses the term “Hispanic,” which is a linguistic category, while Latino is an ethnic one.) For various reasons—the inherent, structural racism that forms the historical background of our society prime among them—that’s unlikely to be the case. And in fact, in further analysis, Snowdon finds that Hispanic umps display a bias against non-white players, not whites, as the initial approach assumes. This finding echoes a large body of research in policing showing that hiring more BIPOC officers does not always defuse racial disparities. There is pressure to play the part, and in umpiring, that may mean making slightly biased calls. 

A missing piece in the study is that catchers play a big role in how a pitch is called, and their race or ethnicity may be relevant as well. Unfortunately, due to biases in how players of different races get channeled into certain positions, there is a severe dearth of Black and Asian catchers (but not Latino catchers). That makes studying racial biases at the position significantly harder, albeit not impossible.

This thesis isn’t the final word on the existence or impact on discrimination in the league. However, as more data builds up with ever-more finely-grained information, we are building a more and more powerful microscope to isolate and quantify racial bias. Previous studies have been, by comparison, less able to detect discrimination, with worse-quality data over a shorter timespan.

The study prompts a number of follow-up questions, but it also contributes to the growing case for robot umpires. With the advent of more advanced tracking technology, some of the technical issues that plagued earlier iterations of Statcast are diminishing, and meanwhile evidence continues to pile up—from this study, and elsewhere—that umpires react to variables that have nothing to do with the pitch in how they make calls. Even if these errors are rare, it is worth investing in a system that will not react to a player’s race, status, age, or prestige within the game to make for a game that’s more welcoming to talented players wherever they come from or whoever they are.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
dennis paulik
Let the batters make their own calls of balls and strikes. Abolish all umpires human or robot. Then there would be no bias.
Jon Crate
alex chidester
"Abolish all umpires"

Do we have to use the word abolish? Couldn't we use a phrase that's more appealing to centrists, like "sensible reform" or "bias training" or "work together with umpires to create a more just environment in baseball"?
Ron Hodge
There is discrimination against non-whites in baseball, but I find it hard to believe that these split second ball & strike calls are premeditated racism.
Craig Goldstein
I don't think the takeaway is necessarily that it's "premeditated" but just that it exists, likely (though not always) through implicit biases. Regardless of the intention, the biases are present.
Robert Arthur
Racism is not defined by intent. It's defined by impact. These umps may be well-meaning, usually-nice people--but if they're systematically calling balls and strikes in a biased way (as the study shows), that's a big problem.
Jim Maher
Having a computer determine balls and strikes makes sense to me, though it makes the value of the catcher quite a bit less. I do wonder how calls vary if the pitcher and hitter are the same race-- it would be hard to consider bias to be the issue unless the distance to the pitcher means that the home plate ump is more likely to favor him over the batter.
You could take a random sample of hitters and find some sort of called-strike bias if you really go looking hard enough in that group, I'm sure. Maybe in this case it's "racism", maybe it's not. Thankfully, robo umps seem to be on the way, so all the calls should be right.
Evan Carter
OK, so 0.3% means about 1 missed call every 333 pitches.
That's about 1 missed call every 20 games for a batter, about 7 or 8 per full season.
It's about 2 every 7 games for a starting pitcher, or about 9-10 per full season.
It's about 1 every 17.5 relief appearances for a reliever, or about 3-4 per season for a reliever who pitches about 51-72 games in a season.

Estimate the impact of one pitch call going the wrong way to be a gain or loss of about .300 OPS per pitch changed (It's about .100 when the batter is ahead on the count, and about .400 when even or behind). Net effect of gaining/losing .300 OPS per missed call is about .004 per player-season.

Is it wrong? Yes. Should something be done to change it? Why not, if possible? It's awfully hard to think of what can be done other than robot umps. Anything else might disproportionate to the size of the problem. Is it significant? Depends on what you think significant is. Really, compared to other disparate impacts throughout society, this is small potatoes. I thought it might be considerably worse than it actually is. Ignore it? Fortunately, we don't have to.

Robot umps would take care of this problem, I would think. And as an added benefit, it would eliminate a pretty large number of ejections, at least half of which have to be over ball-strike calls, and probably much more than half. And that would significantly cut whatever disparity exists in that area, just by brute force reduction in total ejections.

Of course, that might eliminate some catchers who are around mainly for their strike-zone warping receiving abilities. If those catchers are disproportionately Latino, then that might initiate a different problem.
The sample size is so ridiculously large that any difference would have been statistically "significant". The key is whether that difference is "meaningful" (i.e., effect size). At 0.3% difference, I don't see how that could be very meaningful, especially considering that his sample size was over 12-year period that include 3 million pitches, and he's talking about a difference of about 6,000. And every batter only sees 5 pitches on average for each at-bat.

Unless I'm missing something, his conclusions seems like an enormous stretch to me.

With a 0.3% difference, this looks more like a textbook example of a model having such strong statistical power that even an extremely tiny difference is "statistically significant", which is why measures of effect size are so important. Just because a statistical test gives a p-value < .05, that doesn't mean the difference is statistically meaningful.
Chad Ronnander
Having read the article (but not Snowdon's research), and thinking through the comments others have made ... Yeah, I think this is pretty small potatoes. Given the bigger picture -- seeing what goes on in other arenas of life, and how many issues truly need our attention -- we might just as well celebrate that Snowdon found that there is actually very little discrimination in the calling of balls and strikes compared to some other, more important, parts of life. Indeed, as others have pointed out, if the effect Snowdon measures is spread evenly among the entire population of players, then it is pretty clear that racial bias in the umpiring of balls and strikes is NOT affecting individual players, games, baseball as a whole, or the world in any meaningful way. The end result is what the great majority of players -- including players of color -- would consider to be fair, for the most part. As others above have pointed out -- just because something is statistically significant does not mean that it is morally significant or impactful. A difference can be both statistically significant (i.e., due to sample size, we are fairly certain it exists) and of little significance (i.e., while we are certain something exists, we can see in the light of other evidence that it has little or no impact.) When it comes to creating a fair playing field for people of every background, we have a lot of work to do -- but it's pretty clear that one of the very last things we need to worry about is the calling of balls and strikes. Snowdon's data shows that we're doing a pretty darned good job there compared to other arenas, I would say.