August 1, 2007
The Big Picture
Analyzing the Umpires
The last column in this series wondered about the possibility of an NBA-like referee scandal happening with Major League umpires. The structure of the game makes that difficult, but I'd like to back that up with research. Now, with data in hand, I'd like to explore if there are umpires who are kind to either favorites or underdogs. With help from Retrosheet, home-plate umpires from 2000 through 2006 will be scored on the probability of the winning percentage of game favorites fitting the expectation. This covers the time period since the mass resignation of umpires in 1999.
The following formula calculates the probability of the favorite in a game winning:
prob = FWPct*(1- UWPCT)/((FWPct*(1- UWPCT) + UWPct*(1- FWPCT))
Note that in this formula, teams with 1.000 winning percentages always win, teams with .000 winning percentages always lose, and teams with the same winning percentage are expected to play .500 ball against each other.
For this study, the two winning percentages are simply the team's winning percentage for the season. The favorite in a game is the team with the higher winning percentage, or the home team if the percentages are the same. For example, any time Cleveland played Detroit in 2005, the Indians were the favorite with a .574 winning percentage and Detroit was the underdog with a .438 winning percentage. That meant the probability of the Indians defeating the Tigers in a game was .634 (11.4 expected wins). They were 12-6 vs. Detroit that season, a .667 winning percentage.
The probability of the favorite winning is calculated for each game, and those probabilities are summed over all games to get the expected number of wins. Over the entire dataset of 16997 games, favorites were expected to win 9993.6 games. They actually won 9950, which is well within the 95% confidence interval. The overall probability of a favorite winning an individual game was .588. The cumulative probability of winning no more than 9950 games is .25, so it is within the 50% confidence interval. The formula underestimates the number of wins, but not significantly.
The same calculation is done for each home-plate umpire. The probability of the favorite winning is summed over all games for a particular umpire to become the expected number of wins. That's compared to the actual wins with that person behind the plate, and the cumulative binomial probability is calculated for winning no more than that number of games. The following table contains all umpires who appeared behind the plate for at least 100 games in the seven years covered by the study:
Notice that the probabilities aren't very normally distributed. The lower half looks okay, but the upper half looks like a normal within a normal. Umpires who are kind to favorites really push their winning percentage up.
The interest of this piece, however, lies at the other end, where favorites losing can make more money for gamblers. There are quite a few umpires with a p-value under .05, with Marty Foster as the most underdog-friendly umpire in the group, so let's look at him more closely. If gamblers are going to get an umpire to affect a game, they'll want one with a big payoff, one where the odds are long. Taking into account Foster's games where the probability of the favorite winning is .65 or higher, we find an innocuous result. Foster was the home plate umpire for 37 such games, and in them, the expectation for wins was 25.7. The actual number of wins was 26. In other words, the variation happens at lower probabilities, where there's more of a chance for luck to take hold.
The opposite of Foster is Paul Schrieber, who is right on overall, as we expect the favorite to win 128.6 of his games, and they actually won 128. But in games with a high probability of the favorite winning, those teams win just 16 of 33 with an expectation of 22.6 wins. Looking at the individual games, 12 of the 16 underdog wins were by the home team, as were 10 of 16 favorite wins. It almost seems like Schrieber is biased toward the home team.
And that's the nice thing about this kind of analysis, as a number of biases can be studied. The Yankees biggest underperformance (three wins instead of seven) comes with CB Bucknor behind the plate. At the other end of the spectrum, the Yankees played five games better than expected with Wally Bell calling balls and strikes (17 wins vs. 12). Joe Brinkman hurt home team favorites the most, costing them eight games, while Chris Guccione helped them the most, adding 15 games. Actually, the home-field advantage is pretty clear in the data. The home team rated as the favorite in 8621 games. They won 5370 of those, despite the expectation being for 5058 wins. That's 312 wins more for home team favorites, a .623 winning percentage versus a .587 expected winning percentage. It's tough to beat a good team at home.
The good news is the lack of evidence that umpires are intentionally affecting the outcome of games, as the few outliers are probably due to small sample sizes. But this is a nice simple methodology for studying the question. It can be extended to see if umpires have biases against certain teams, or even certain starting pitchers. On the question of gambling, however, I feel a lot better that the probability of an NBA-like referee scandal remains low in baseball.