June 1, 2011
The Real Strike Zone, Part 2
Three months ago, I investigated the nature of the major-league strike zone, focusing on its inside and outside boundaries. I concluded that the location of a pitch relative to the catcher’s target had a significant impact on the umpire’s likelihood of calling a strike. This article will examine the top and bottom boundaries of the strike zone.
There are two common approaches to graphically presenting strike zone data. One is to use fixed heights for the top and bottom boundaries of the zone for all batters, regardless of the height or stance of the batter. The most commonly used fixed heights are 1.5 feet for the bottom of the zone and 3.5 feet for the top of the zone. The popular PITCHf/x data site, JoeLefkowitz.com, uses this definition, as does BrooksBaseball.net for many of its graphs.
Another approach is to normalize the heights based upon the strike zone for each batter. The PITCHf/x data from MLB Gameday includes a value for the height of the top and bottom of the zone for each pitch. Another popular PITCHf/x data site, TexasLeaguers.com, along with some of the umpire graphs at BrooksBaseball.net, presents strike zone locations with heights for each pitch normalized to the top and bottom of the zone as reported in the PITCHf/x data.
PITCHf/x Measured Zone Height
The top of the zone (sz_top) and bottom of the zone (sz_bot) in the PITCHf/x data come from measurements made from video by the Sportvision PITCHf/x operator. Just before each pitch, as the batter takes his stance, the operator marks lines on the center field camera video corresponding to the height of the hollow of the batter’s back knee and to the batter’s belt. The line at the batter’s knee is reported in the data as sz_bot, and the system adds four inches to the height of the batter’s belt and reports that value as sz_top.
However, these values collected by the PITCHf/x system can vary quite widely from day to day for a given batter. For example, look at the sz_top and sz_bot values recorded by the PITCHf/x system for Brian McCann from 2007 through May 2011.
The top of McCann’s strike zone varies by as much as a foot and the bottom of the strike zone by half a foot! Batters may make changes in their stances that result in small tweaks to the height of their strike zones, but surely nothing approaching that magnitude.
Another notable observation is that from 2007-2009, some specific sz_top and sz_bot values were repeated with great frequency. For McCann, these values were 3.26 feet (for 40 percent of pitches) and 3.44 feet (seven percent) at the top and 1.59 feet (50 percent) and 1.70 feet (seven percent) at the bottom. No other specific value was repeated more than two percent of the time. Since the beginning of the 2010 season, these repeated values have been greatly reduced, though not eliminated.
Every batter’s data from 2007-2009 contains these repeated values, and they seem to be “chosen” at random. For some batters they may be middle or high values from their overall distributions, or, as in McCann’s case, they may be low values. Many of these values seem at least somewhat reasonable, but what of Placido Polanco?
Polanco’s official height is 5’10”. I stand 5’9” on a good day, so I should make a fairly good stand-in for Polanco. Polanco does not have an abnormal crouch or otherwise particularly unusual batting stance. When I take my stance, the hollow of my knee is 1.6 feet above the ground, and my belt is at 3.0 feet, putting my strike zone top at 3.3 feet, according to the PITCHf/x definition (belt plus four inches). Nearly all of Polanco’s recorded values are below that, and his average values of 1.4 feet and 3.1 feet are 2-2.5 inches lower than my strike zone. My stance may not be a perfect analog for Polanco’s, but Polanco’s PITCHf/x values are so low as to be unreasonable.
How are we to determine the correct height of a batter’s strike zone?
From the official baseball rules:
The STRIKE ZONE is that area over home plate the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants, and the lower level is a line at the hollow beneath the kneecap. The Strike Zone shall be determined from the batter’s stance as the batter is prepared to swing at a pitched ball.
Returning to my own batting stance as an example, my belt is at 3.0 feet, and the top of my shoulders is at 4.5 feet, for a midpoint at 3.75 feet. My rulebook top of zone is five inches higher than the PITCHf/x definition of belt plus four inches.
The average sz_bot value recorded by PITCHf/x for all batters in 2010-2011 was 1.59 feet, and the average sz_top was 3.44 feet. The PITCHf/x approach to the bottom of the strike zone is an attempt to measure the rulebook line. The PITCHf/x approach to the top of the strike zone is an attempt to report a zone closer to that which is actually called by the umpires.
Mixing two different goals that way does not make much sense. It would be more logical either to attempt to follow the rulebook definition and report sz_bot at the hollow of the knee around 1.65 feet for the typical six-foot-tall batter and sz_top at the shoulder-belt midpoint around 3.85 feet, or attempt to measure the zone that the umpires actually call at both the top and bottom (and the sides, for that matter).
What Do the Umpires Call?
What strike zone do the umpires actually call? If we look at pitches across the middle 12 inches of the plate, we see the following percentage of strikes called by height of the pitch at the front end of home plate.
The average height of the bottom of the zone, marked at the point where half of the taken pitches were called strikes, was 1.74 feet for right-handed batters and 1.75 feet for left-handed batters. The average height of the top of the zone was 3.42 feet for right-handed batters and 3.40 feet for left-handed batters.
Different Zones for Different Batters
How does the height of the average umpire zone vary from batter to batter or pitcher to pitcher?
We have already observed that there is a large amount of variation from day to day in the PITCHf/x sz_top and sz_bot values, making their use for judging strike zone heights (or umpires) problematic. One potential solution to this problem is to average the sz_top and sz_bot measurements over a larger sample, an approach proposed by John Walsh in 2007. The fly in the ointment for this approach is the repeated values that comprise a large part of the sample and skew the average away from the mean of the unbiased portion of the measurement sample.
Filtering out the repeated values from the 2007-2009 data is not a trivial task because their nature varies from batter to batter. Perhaps some useful information remains, however, even with the repeated values polluting the sample. Thus, we will consider the 2007-2009 average values of sz_top and sz_bot and the 2010-2011 average values separately, since the 2010-2011 data suffers much less from the repeated values.
Another potential way to determine the likely top and bottom boundaries of each batter’s strike zone is to use each batter’s height and assume a standard batting stance. Though this approach could be adapted to estimate the rulebook boundaries, here we will apply it to estimate the boundaries corresponding to the zone that the umpires actually call.
Using the method outlined above and considering pitches taken across the middle 12 inches of the plate, we can calculate the height of the typical umpire strike zone boundaries for each batter in the league. When considering the two boundaries independently, I found the best match to the average umpire zone with the bottom of the zone at 0.9 feet plus 14 percent of the batter’s height, and the best match with the top of the zone at 0.9 feet plus 42 percent of the batter’s height.
However, a batter’s top and bottom boundaries are related. It turns out that tall batters do not experience a much bigger strike zone than short batters. A batter who is four inches taller than average will see his strike zone grow only by 0.3 inches, from 21.7 inches to 22.0 inches tall. Using the batter’s height to establish the height of the midpoint of the zone but using the same fixed size of zone for all batters, the accuracy of the match to the umpire zone is slightly improved.
If we consider the average umpire zone as the gold standard here, the average difference between the per-batter umpire zone boundaries and the zone boundaries determined by the batter height is 0.5 inches at the bottom and 0.5 inches at the top. Similarly, we can compare the average PITCHf/x sz_top and sz_bot values to the average umpire zone. After adjusting the sz_bot values upward by 0.15 feet to match the umpire zone, the average difference for the 2007-2009 PITCHf/x values is 0.8 inches at the bottom and 1.2 inches at the top. The average difference for the 2010-2011 PITCHf/x values is slightly better, 0.7 inches at the bottom and 1.1 inches at the top, but still not nearly as good as the height-based estimate.
The zone boundaries derived from batter height on its own correspond better to the zone actually called by the umpires than do the zone boundaries from the PITCHf/x sz_top and sz_bot averaged over a multi-year sample. Can we combine the two to get an even better estimate? No. Neither the PITCHf/x values from 2007-2009 nor the PITCHf/x values from 2010-2011 add any noticeable improvement to our height-based zone boundary estimates for batters.
A third solution, however, shows promise. Building off the concepts from the first part of my strike zone investigation, I wanted to see if the catcher target affected the umpire’s call at the top and bottom of the zone. There is evidence that it does, which we will cover in a moment, but in the process, I discovered something that in retrospect seems obvious. The pitcher can see where the opposing batter’s strike zone is, and on the whole, he attempts to throw strikes into this zone. Pitchers turn out to be the best instrument of all for detecting strike zone heights for individual batters.
The average height of all the pitches that a batter sees turns out to be a better indicator of his strike zone top and bottom height than the height of the batter or the average PITCHf/x sz_top and sz_bot values. In fact, a multivariate regression indicates that once the batter’s average pitch height and the height of the batter are known, the PITCHf/x values add almost no useful information. Using batter height and average pitch height, the average difference between that estimate and the actual per-batter umpire zone boundaries is 0.4 inches at the bottom and 0.5 inches at the top.
To see how that accuracy compares to the quantity we are attempting to measure, let’s look at the distribution of the average actual called strike zone boundaries for major-league batters.
Most batters’ strike zone boundaries are within an inch of 41 inches (3.42 feet) high at the top and within an inch of 21 inches (1.75 feet) high at the bottom.
The zone boundaries shown in the graph are for the height of the middle of the baseball crossing the front of home plate. The rulebook indicates that any part of the baseball touching the strike zone should be ruled a strike. If you find it helpful, you can adjust the top boundary down and the bottom boundary up, each by the radius of the baseball (1.45 inches or 0.12 feet), in accordance with the rules.
Do Pitchers Have Their Own Zones?
We already saw that pitchers in the aggregate are very good at telling us how high each batter’s strike zone is, but how do individual pitchers affect the height of the strike zone that umpires call? The average height of the pitches thrown by each pitcher has a fairly good correlation to the height of the bottom boundary of the strike zone, and to a lesser extent the top of the zone.
The average height of batters faced by pitchers stabilizes to the league average very quickly and is not a significant factor in the difference between pitchers’ strike zone heights.
Presumably, the correlation between average pitch height and strike zone height for pitchers is due to umpires calling the strike zone relative to the catcher target, at least to some extent. We discussed this phenomenon more fully in the previous article. If the catcher target is playing a significant role in the height of the strike zone, it makes sense that it would have a greater effect at the bottom of the zone than at the top. Catchers tend to hold their gloves low, even for fastballs that the pitcher aims high in the zone. Thus, most pitches that miss high would also be far from the catcher target and therefore unlikely to benefit from a favorable umpire call, even if the pitcher tended to pitch higher in the zone. On the other hand, pitchers who tend to pitch low could benefit from catcher framing to gain a low strike call from the umpire. This is an area for further investigation.
Tim Wakefield’s high strike zone is an interesting anomaly here. The catchers for both Wakefield and R.A. Dickey, the two knuckleballers in the data set, don’t set a typical target for each pitch. Rather, they rest their glove on their knee and stab at the pitch when it is thrown. I don’t know if this affects the size of the zone or if umpires just can’t tell where the knuckleball went and thus give knuckleballers, Wakefield in particular, a large zone.
There are several other anomalies and extremes in the pitcher data that would make for interesting study but are beyond the scope of this article.
We have observed, unsurprisingly, that batter height affects the height of the strike zone that umpires call. We also found that the PITCHf/x-operator measured zone boundary heights are unreliable on a pitch-by-pitch basis and unhelpful even when aggregated over a larger sample for determining the boundaries of a particular batter’s strike zone. The batter’s height is a more reliable indicator of the top and bottom zone boundaries. An even more reliable indicator is the average height of pitches thrown to that batter, since pitchers can see the batter’s strike zone and aim their pitches toward it.
We also observed that pitchers have some effect on the height of the strike zone, irrespective of the batter. Pitchers who throw lower in the zone get lower strike calls. One might assume that this is because their catchers set a lower target and the umpires are influenced by the location of the pitch relative to the target. In the previous article on the strike zone, we discussed the impact of catcher framing skills on the strike calls at the left and right boundaries. It appears catchers may also have the opportunity to impact calls at least at the bottom boundary, too.
We found that the popular approaches of using either the PITCHf/x-supplied sz_top and sz_bot values or fixed values of 1.5-3.5 feet for plotting strike zone boundaries are flawed. A fixed strike zone top at 3.4 feet and bottom at 1.75 feet would make for strike zone boundaries that are more consistent with the zone that umpires actually call. For additional accuracy, one can use the batter height and the average pitch height seen by a batter in order to calculate strike zone heights unique to each batter.
If one desires to measure strike calls against the rulebook zone, a good approach is unclear. The PITCHf/x sz_top does not even attempt to measure the top of the zone as defined by the rulebook. Batter height and average pitch height seen by a batter could probably be used to estimate the actual rulebook zone boundaries, but there is no reference to check the accuracy of these estimates, short of obtaining access to the field with a measuring tape during the game in order to stand next to each batter as he prepares to swing.
I understand the desire to measure umpire strike calls against a known and official reference, and in the horizontal dimension one can find justification for that approach. In the vertical dimension, it seems unworkable unless one is willing to deal with a great deal of uncertainty and bias in the measurements. To my mind, this calls into question the logic in measuring umpires against the rulebook zone with the data currently available, since it hardly makes sense to do it in one dimension but not the other. The utility and accuracy of a Zone Evaluation system that is used to grade major-league umpires based upon the unreliable PITCHf/x sz_top and sz_bot measurements is also called into question.
Short of having a more reliable method for measuring the actual stance of batters from video, it seems to make the most sense to set the top and bottom boundaries based upon the height of the batter and the average height of the pitches that the batter sees, as scaled to the average umpire zone. That is probably not a workable solution for the robotic home-plate umpires that many fans desire. Such a system would be slow to capture changes in batter stances and could be subject to manipulation. However, given the current data, it seems to be the best approach for analysis of strike zone data, and it is much more accurate than using the boundaries supplied in the PITCHf/x data.