Seventeen inches across, from the “hollow of the knee” to the midpoint between the belt line and the shoulder. Despite the potential ambiguity introduced in setting the lower boundary at something called a “hollow,” the rulebook strike zone is a fairly straightforward thing. Fans of the game know, though, that the defined zone doesn’t always have that much to do with the zone as called. Lefty hitters have a different effective zone than righties; it’s easier to get a called strike on 3-0 than on 0-2; the effective zone has been expanding downward for years now. Every season sees calls for #RobotUmpsNow; there are countless Twitter bots posting information about whether a particular strike/ball call helps or hurt a team of interest. Thinking about the strike zone in preparation for this article has lead me to sympathize a lot with BP E-I-C Sam Miller’s proposed definition of a strike: “A ball, delivered by the pitcher while standing wholly within the lines of his position and facing the batsman, that, so delivered, is determined by the umpire to be a fair pitch.” Still, I don’t think the game is all that close to adopting Sam’s idea, so we’re left to try to evaluate the strike zone as best as we can.

The current model for called strikes here at BP produces a stat called Called Strikes Above Average. It’s built from a mixed model that accounts for, among other things, the pitcher, umpire, batter, and current score. CSAA, in turn, builds upon called strike probability, estimated from a proprietary PitchInfo model that reflects PI’s pitch classifications and adjusted PITCHf/x data. With that in mind, I set out to find an answer to a very specific question: All else being equal (pitch type, amount of movement, etc.), does a pitcher’s release point have an effect on whether a pitch is called a strike? In researching the topic, I found that yes, it does (though minimally); I also found that pitch type continues to play a role in the “above average” part of the equation (so to speak), even after being accounted for in the initial calculation.

Fortunately for me, much of the truly hard work behind this research had already been done long before I came up with this idea. Jonathan Judge, Harry Pavlidis, and Dan Brooks developed the CSAA model; I merely had to ask them for the code to run it myself. The use of their code took care of most of the conditions I’d put on my initial question; I used the change in called-strike likelihood assigned to the pitcher as my outcome variable (rather than simply the binary outcome of “was this pitch called a strike or not”), and in doing so accounted for the “all else being equal” part of the question I’d asked. I refactored their code to be able to isolate pitch types within pitchers, but left it otherwise unmodified. To lower the total time required for analysis every time I decided to look at a new parameter, I restricted my data set to 2012-present, separated by season. Each full season had about 300,000 pitches, and there had been about half that in 2016 up to the point I did this.

I began with a simple linear model with only one predictor—z0, which is the PITCHf/x parameter indicating release point height. For all years, the model shows a statistically significant but small effect. In general, the z0 coefficients are between 1 percent and 10 percent of the size of what they’re trying to predict, the r-squared value for the model is miniscule (on the order of 10^{-4}), and the p-value is “<2E-16.” In plain language, this means that the effect z0 has is small (the coefficient), and it explains very little of the variance between pitches (the r-squared), but what little it does seem to explain is highly unlikely to be a mere coincidence (the p-value).

Year |
z0 coeff. |
P-value |
r-squared |

2012 |
0.00151 |
<2e-16 |
0.0018 |

2013 |
0.00213 |
<2e-16 |
0.0040 |

2014 |
-0.00053 |
<2e-16 |
0.0002 |

2015 |
-0.00130 |
<2e-16 |
0.0014 |

2016 |
-0.00090 |
<2e-16 |
0.0008 |

At this point, it feels important to mention a strong caveat that comes along with using p-values to measure significance. For those who don’t already know, a p-value functionally represents the likelihood of the null hypothesis being true, in this case that z0 does NOT have an effect on what’s being measured (more specifically, it’s the likelihood of getting this data set if the null hypothesis is true). It’s easier to get small p-values with large data sets than with small, which is relevant here because the data set is 300,000 pitches per year. This doesn’t mean a low p-value is guaranteed, by any means; that could be shown easily enough even within this data set. However, it does mean that I should be particularly careful in assigning meaning to the results, and that supporting evidence is highly important. The small effect size and correlation in the data means that we should all be a little bit more skeptical that the effect is real than we would be based on the p-value alone.

I examined a number of other variables, mostly coming from PITCHf/x data; many, though not all, showed results similar to z0—a very small but real (by p-value anyway) effect. Horizontal release point, pitch speed, vertical movement, and horizontal movement all could be described this way. However, as terms are added to the model equation interactions between predictors also start to become important, and I don’t have the statistical chops to fully interpret what these results mean. It’s possible that these results are an artifact of sorts from the way the PITCHf/x system works. PITCHf/x data includes the two-dimensional point at which the ball crosses the plane of the front of the plate. The strike zone rule, however, includes the entirety of the volume over the plate within the top and bottom boundaries. It’s possible to throw a pitch that misses the front of the plate but does in fact enter that space; Dr. Alan Nathan found that roughly 5 percent of pitches meet this criterion. It’s possible that pitches that are thrown from a lower initial point are disproportionately less likely to pass through this 3D area. There are trajectory calculations that might be able to answer this, but due to their complexity are beyond the scope of this article. Certainly, a low z0 leads to a lower pitch height at the front of the plate—looking exclusively at fastballs, I found that for every drop of one inch of the release point, the ball crossed the plate a tenth of an inch lower.

At the recommendation of Dan Brooks, I also looked into whether z0 has an effect on strike zone size and shape. Using the same method of determining strike zone size as I've written about before, I found that strike zone shape, but not size, is influenced by average z0. Using a dataset containing taken fastballs from 2012-present, limiting the sample to pitchers who’ve thrown 2,500 such pitches or more in that time frame, I found no relationship between average z0 and total strike zone size; however, I did find a statistically significant relationship (p=0.009) between a lower average z0 and a greater strike zone size below 24 inches (when/where the ball crosses the front of the plate). Similarly, the proportion of the effective strike zone that falls below the 24-inch line is larger with lower release points (p=0.001).

There was a second category of result from this work that deserves further exploration, which is what I’ll talk about for the remainder of this piece. As I mentioned, in doing the research for this piece I separated out the data by pitch type (and also therefore PitchInfo’s “pitch group”, a higher-level classification system). This allowed me to see season-scale trends in CSAA and whether any particular pitch types were consistently better than others.

In keeping with the theme of this article, I found a small but real effect. I found that pitchers consistently add strike likelihood when they throw fastballs and sliders, and subtract strike likelihood when they throw change-ups and curves. This is true both on a rate basis and by actual totals. As I said, the effect is once again small—on any given pitch, the change in probability is a small fraction of a percent—but is consistent across the years in my data set.

Total Pitcher Effect on Called Strikes |
|||||

Pitch Group |
2012 |
2013 |
2014 |
2015 |
2016 |

Change-up |
-8.9 |
-12.5 |
-13.8 |
-8.6 |
-32.8 |

Curve |
-19.9 |
-41.0 |
-23.0 |
-32.3 |
5.0 |

Fastball |
8.2 |
21.7 |
34.6 |
9.1 |
34.0 |

Slider |
34.5 |
26.3 |
17.1 |
33.6 |
-1.0 |

Pitcher Effect on Called Strikes per 10k Pitches |
|||||

Pitch Group |
2012 |
2013 |
2014 |
2015 |
2016 |

Change-up |
-3.3 |
-4.6 |
-5.1 |
-3.2 |
-22.7 |

Curve |
-5.8 |
-12.8 |
-7.4 |
-11.5 |
3.0 |

Fastball |
0.4 |
1.1 |
1.9 |
0.5 |
3.4 |

Slider |
6.1 |
4.5 |
3.0 |
5.9 |
-0.3 |

The point of all of the above isn’t to critique the work of Jonathan, Harry, and Dan; nor is it to argue for robot umpires (and it’s only a little bit to argue in favor of Sam’s zone definition). The tininess of the effect sizes throughout indicate to me that even if the effects I found are real (which is still debatable), the benefit to accuracy is unlikely to outweigh the increase in computing costs of adding parameters to the existing CSAA model. I guess the point is just to share the observation that both calling and predicting the call of balls and strikes is really, really hard, and that way, way more factors go into it than the average fan may realize.