keyboard_arrow_uptop

“I never questioned the integrity of an umpire. Their eyesight, yes.”
Leo Durocher

“Whenever you have a tight situation and there’s a close pitch, the umpire gets a squawk no matter how he calls it.”
–Red Barber

Umpires have a difficult job. Not only is their work publicly visible (and viewable over and over again through the miracle of instant replay), but when the job is done well, no accolades come their way. Players at least can enjoy the adulation of the fans when they excel, despite knowing too well that cheers can turn to boos in a heartbeat. With the availability of PITCHf/x data, as reported through MLB.com’s Gameday application, we now have another tool with which to judge the men in blue. So, on the heels of some tabulation here at BP, today we’ll look at what we can learn about the accuracy and psychology of home plate umpires.

Accuracy

Before we delve into the data, there are a couple of caveats. In calculating whether a particular pitch crossed the strike zone or not, we’re here using a model of the strike zone that provides a one-inch buffer for both called balls and called strikes, as shown in the following diagram.

The width of the regulation strike zone is determined by the width of the plate. It’s a constant, while the height of the zone is dependent on the batter, and is input by the PITCHf/x operator for each plate appearance using the rule book definition. We’re using this one-inch buffer because the accuracy of the PITCHf/x system is reported to track the pitch to within an inch of its actual location, so we want to count as strikes any pitch where any portion of the ball (assuming a diameter of 2.9 inches) touches the zone in green, and count as a ball any pitch that does not touch the zone in blue. This gives the full benefit of the doubt to the umpire. In addition, keep in mind that the system determines the location of the pitch in two dimensions at the front of home plate.

Since the strike zone is three-dimensional, it is possible that there are pitches that catch part of the zone that are indeed strikes, but that our calculations may call balls. Finally, there has been some recent discussion that the accuracy of the reported location data is not correct for some subset of the pitches. I’m working on this, as are several others, to determine how large the problem is.

To calculate how well umpires are doing, we’ll use the same basic methodology that we used when examining whether or not certain pitchers and hitters get the benefit of the doubt. By counting the number of strikes called by the umpire and how many of those we think were regulation strikes, we can create a called strike agreement percentage (CSAgree%). We can then do the same for called balls (CBAgree%), and calculate an overall agreement percentage (Agree%).

Finally, we create a derived metric that measures to what extent the umpire has favored pitchers (PAdv%) by subtracting the number of actual strikes that were not called as such from extra strikes awarded the pitcher (PAdv), and dividing it by the total number of pitches. A positive value for these two new metrics indicates an advantage for the pitcher, and tracks the magnitude of that advantage.

Here are all umpires who have called 500 or more balls and strikes behind the plate ordered by Agree%.

```
Eric Cooper              519     162     .877     357     .964   .936       7   .013
Tim McClelland           922     275     .844     647     .975   .936      27   .029
Marvin Hudson            856     270     .885     586     .956   .933       5   .006
Gerry Davis             1068     331     .882     737     .953   .931       4   .004
Jim Reynolds            1024     306     .827     718     .968   .926      30   .029
Ed Montague              870     266     .850     604     .952   .921      11   .013
Tony Randazzo            590     172     .837     418     .955   .920       9   .015
Angel Hernandez         1652     510     .833    1142     .957   .919      36   .022
Tim Tschida              930     287     .826     643     .960   .918      24   .026
Randy Marsh             1017     316     .861     701     .943   .917       4   .004
Joe West                 797     262     .817     535     .964   .916      29   .036
Wally Bell               545     191     .812     354     .972   .916      26   .048
James Hoye               767     223     .821     544     .954   .915      15   .020
Paul Emmel               884     303     .838     581     .955   .915      23   .026
Hunter Wendelstedt       830     264     .852     566     .943   .914       7   .008
Derryl Cousins           932     302     .864     630     .938   .914       2   .002
Chris Guccione           892     288     .823     604     .954   .911      23   .026
Jim Joyce               1172     358     .799     814     .961   .911      40   .034
Jeff Nelson              606     234     .799     372     .981   .911      40   .066
Sam Holbrook            1175     355     .848     820     .938   .911       3   .003
Charlie Reliford         537     187     .834     350     .951   .911      14   .026
Alfonso Marquez          983     297     .818     686     .950   .910      20   .020
Dale Scott              1020     353     .836     667     .949   .910      24   .024
C.B. Bucknor             567     186     .855     381     .934   .908       2   .004
Bob Davidson            1111     360     .786     751     .967   .908      52   .047
Jeff Kellogg             746     234     .855     512     .932   .908      -1  -.001
Mark Carlson            1377     465     .800     912     .959   .906      56   .041
Rick Reed                950     318     .814     632     .951   .905      28   .029
Brian O'Nora             779     251     .785     528     .960   .904      33   .042
Greg Gibson              563     157     .834     406     .929   .902      -3  -.005
Tim Timmons             1156     367     .807     789     .947   .902      29   .025
Bill Miller              982     313     .815     669     .943   .902      20   .020
Brian Knight             967     319     .777     648     .963   .902      47   .049
Dan Iassogna             587     197     .807     390     .946   .899      17   .029
Kerwin Danley           1049     318     .821     731     .932   .898       7   .007
Gary Cederstrom          852     272     .794     580     .947   .898      25   .029
Marty Foster             754     261     .782     493     .959   .898      37   .049
Mike Winters             940     297     .825     643     .932   .898       8   .009
Larry Young             1557     474     .793    1083     .943   .897      36   .023
Paul Schrieber           603     169     .852     434     .915   .897     -12  -.020
Brian Gorman            1090     385     .795     705     .952   .896      45   .041
Mike Everitt             523     163     .761     360     .956   .895      23   .044
Ted Barrett             1191     410     .785     781     .950   .893      49   .041
Bill Welke               522     182     .835     340     .924   .893       4   .008
Rob Drake                673     217     .797     456     .936   .892      15   .022
Mark Wegner              532     182     .813     350     .926   .887       8   .015
Ed Rapuano               999     326     .801     673     .929   .887      17   .017
Tom Hallion              789     266     .767     523     .945   .885      33   .042
Ed Hickox                573     177     .763     396     .934   .881      16   .028
Ron Kulpa               1018     330     .782     688     .929   .881      23   .023
Doug Eddings             603     227     .758     376     .955   .881      38   .063
Larry Vanover            823     258     .771     565     .929   .880      19   .023
Chuck Meriwether        1056     328     .805     728     .911   .878      -1  -.001
Dana DeMuth              768     251     .821     517     .905   .878      -4  -.005
Paul Nauert              547     172     .738     375     .936   .874      21   .038
Brian Runge              811     263     .787     548     .907   .868       5   .006
```

Overall, the percentage of called strikes for this group was 81.5 percent, of called balls 94.6 percent, and rings in at 90.4 percent overall. From this list we see that Eric Cooper and Tim McClelland have enjoyed the most agreement with the regulation strike zone at 93.6 percent, while Brian Runge finds himself at the bottom at 86.8 percent by “missing” 107 of the 811 pitches that he’s called. In terms of called strikes Paul Nauert had the lowest agreement at 73.8 percent, and Marvin Hudson the highest at 88.5 percent. Dana DeMuth had the lowest called ball agreement at 90.5 percent, and Jeff Nelson the highest with 98.1 percent. Finally, Jeff Nelson can be said to have benefited the man on the mound the most by swinging 6.6 percent of the pitches he’s called their way-in other words, the pitcher has gotten the advantage on 40 more pitches than hitters out of the 606 pitches he’s called. On the other hand, Paul Schrieber has given the advantage to the hitter on 12 of 603 pitches, or 2 percent.

When looking at lists like these, it’s important to ask the question as to whether what we’re seeing here reflects a real difference in how these umpires are approaching their jobs, or whether the differences can be fully explained by random variation. After all, if the distribution here is random, then we can’t ascribe real differences between umpires.

One way of doing this would be to see if our list comports with anecdotal evidence and the judgments of broadcasters, players, coaches, and managers. Is Jeff Nelson really a pitcher’s umpire, and is Paul Schrieber actually more favorable to hitters? Absent a survey of those on the field, one thing we can do is to analyze the variation in the data to determine if there is more variation than we would expect, using the idea that the observed variance equals the variance due to randomness plus the variance due to underlying skill.

When we do this for Agree%, for example, we find that indeed there are real differences in how pitches are called which accounts for 60 percent of the variance in the data. Given the underlying skill difference between umpires, we would then expect 68 percent of umpires to actually fall between .916 and .892, and 95 percent of them to fall between between .928 and .881. Still, at less than 5 percent that range is small, even at the extremes, and means that the best umpires may miss (in terms of the regular strike zone) around six fewer pitches than the average umpire, and the worst about six more. When we follow the same procedure for PAdv% we find that 90 percent of the variance is reflective of skill differences. Once again, however, the range is small: 95 percent of the umpires should actually fall between .055 and -.009, accounting for a magnitude difference of 15 or so a game.

Psychology

Several years ago I took my older daughter on a fossil hunting trip in the Cretaceous Badlands of western Kansas, in what was then the Western Interior Seaway. After just a few minutes of struggling to see the bits of fossilized bone, shell, and teeth that our guide could see so well, the concept of “search image” had become crystal clear for both my daughter and myself. The basic idea is that we see what we’re trained to see. Our minds interpret the data coming from our eyes using predefined patterns that have been influenced and built up from experience. So to our guide, what was clearly a shark tooth of the species Cretoxyrhina literally right in front of our noses, was for us simply another piece of jagged rock. I’m happy to report that we eventually caught on and made a contribution or two as the day wore on.

I was reflecting on this experience as I examined the called ball and strike accuracy of umpires when broken down by count. To understand why this happened examine the following table, keeping in mind that the mean CSAgree% is 81.4 percent and the mean CBAgree% is 94.6 percent:

```
Count    Pitches      CS  CSAgree%     CB CBAgree% Agree%
1-0         6653    2690     .803    3963    .945   .887
2-0         2349    1078     .828    1271    .943   .891
3-0         1155     710     .883     445    .948   .908
1-1         5066    1199     .772    3867    .950   .908
1-2         3871     400     .670    3471    .962   .932
2-1         2359     616     .756    1743    .952   .901
2-2         2637     336     .732    2301    .969   .939
3-1         1048     388     .799     660    .952   .895
3-2         1168     184     .788     984    .966   .938
0-0        20415    8960     .837   11455    .928   .888
0-1         6971    1450     .790    5521    .944   .912
0-2         3078     233     .695    2845    .968   .947
```

Now, take a look at the bolded numbers. They differ in a statistically significant way from the overall mean at the 95 percent confidence level. Notice how far they deviate from the means–at 3-0, over 88 percent of called strikes are actually strikes, while at 1-2 and 0-2 the percentages drop to 67 percent and 69.5 percent, respectively. In other words, at 3-0 (and 2-0 to a lesser extent), umpires are more likely to see the pitch as a ball, and with two strikes (likewise at 2-2), they’re more likely to see the pitch as a strike.

Note that in the cases where there are two strikes this is exactly the opposite of the intent of the pitcher, where experience tells us they typically try and get hitters to chase, and therefore should result in more thrown balls. One possible explanation is that umpires, even in the short span of several pitches, have their search image modified, and as a result tend to model their calls on the prevailing trend.

What this indicates is that while umpires may, in the words of George Will, be “natural republicans-dead to human feelings,” they are prone to at least some of the same biases and perceptions as the rest of us.

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

### Latest Articles

5/18
0
5/18
3
• ##### MLU: Grayson Rodriguez Stays Red-Hot, Cade Cavalli Couldn’t Be Colder \$
5/18
0
You need to be logged in to comment. Login or Subscribe