October 26, 2011
Can We Predict Hot and Cold Zones for Hitters?
A few weeks ago, during the division series, Brandon McCarthy remarked on Twitter that it would be more interesting for TBS to show a diagram of the batter hot and cold zones for every batter than to show the PitchTrax strike zone and pitch location graphic. He argued that knowledge of the hot and cold zones would give viewers additional insight into the battle between the pitcher and the batter.
The pitcher-batter confrontation lies at the heart of baseball; learning about it is one of my favorite pursuits. Thus, I was intrigued by McCarthy’s comments. As far as I know, no one has published a study about the reliability of batter hot and cold zone data. If batter performance in particular areas of the strike zone is very repeatable, that knowledge could be highly valuable, both to teams and to fans. On the other hand, perhaps such data is no more useful than knowing that a batter is 2-for-10 in the postseason or 3-for-7 against a given pitcher in his career, in which case it might be interesting for entertainment purposes but practically useless for decision-making in a game.
Batter hot and cold zone information has been provided for at least a decade by scouting services like Inside Edge, and teams routinely make this information available to their players. Starting in the 2010 season, MLB Advanced Media’s online Gameday application added hot and cold zone information developed from PITCHf/x data.
These hot and cold zone reports typically divide the strike zone into nine zones using a 3x3 grid and report the player’s past batting average in each zone over some time period, along with coloration to assist in recognition of the hot and cold areas for the batter. Other analysts and data sources have used other grid sizes to divide the zone, but the 3x3 grid is by far the most popular and recognizable presentation of this data.
Other analysts have recently used heat maps, often based upon PITCHf/x data, for similar purposes without being constrained to a 3x3 grid depiction of the data. TruMedia’s heat maps are a common example.
In addition, hot/cold zones may be based upon slugging average, runs above average, or some other metric besides batting average.
In order to evaluate the usefulness of the hot and cold zone data, I took the results of every plate appearance for which we have detailed pitch location data from PITCHf/x during the period 2007-2011 and assigned those results to the location of the final pitch of each plate appearance. I grouped the pitches by zones for each batter and calculated the average run values for each zone using linear weights. I split the data for each batter into two halves, randomly assigning games from 2007-2011 into each half for comparison.
First, I examined the traditional division of the strike zone into nine zones. I divided all the pitches within the strike zone that ended a plate appearance into nine fixed bins. I separated the pitches vertically at 1.74, 2.30, 2.86, and 3.42 feet. (I ignored the height of the batter, but when I controlled for it later, it had little effect on the split-half correlations.) I separated the pitches horizontally by dividing the plate in thirds at +/-0.83 and +/-0.28 feet.
I ran a regression for all the right-handed batters with at least 630 plate appearances in 2007-2011 that ended on a pitch in the strike zone. I used performance in a given zone in one half of the sample along with performance in the other eight zones in the same half of the sample to predict performance in that zone in the other half of the sample. The resulting regression equation is as follows, where performance is measured in runs above average:
Zone Performance in Split Half 2 = (0.17 * Zone Performance in Split Half 1) + (0.43 * Performance in Other Eight Zones in Split Half 1) + (0.40 * League Average Performance).
The correlation coefficient was r=0.30, and the p-values for both input variables were highly significant (<.0001).
The biggest problem with this data is that, even when considering five seasons, most batters had less than 100 plate appearances of data in both halves of the sample in all of the nine zones. Our best predictive results occur when we use only 17% of the observed zone performance and base the remainder of the prediction on the batter’s overall performance and the league average performance.
There is some true signal there amidst the noise, but if we expect pitchers to form their pitching strategies based upon this data, that signal is very weak. Let’s look at a couple of typical examples.
The first group of zone data suggests that you want to pitch Michael Young on the outer third of the plate or over the middle of the plate if you keep the ball down. If you need to come inside, you might be able to sneak one up and in without too much damage.
If you executed that strategy against Young in the other half of the sample, you would do pretty well in all the zones on the outside third, but you would get killed whenever you tried to sneak one up and in, and you would not do too well with pitches down over the middle of the plate, either.
Yuniesky Betancourt is a weaker hitter, and the zone data from the first half of our sample suggests that the only places in the zone where he is a real threat are up and in and down over the middle of the plate. Moreover, he is very vulnerable both down and away and over the heart of the plate.
A pitcher who pitched to those cold zones in the other half of the sample would find Betancourt a decent hitter and might miss his weakest spots up and away or down over the middle of the plate.
Young and Betancourt are typical examples. At the extremes, Freddy Sanchez is an example of the best zone correlation between sample halves, and Mike Napoli is an example of the worst zone correlation between sample halves.
The division of the strike zone into nine boxes does not seem to serve us very well. The hitters do have tendencies toward hot and cold areas, but dividing into nine pieces makes the data very noisy and unreliable, and it becomes difficult to pick the true tendencies out of the vagaries of the noise. Moreover, imagine what would happen to the sample sizes if we split the data further by pitch type or if we used only a single season of data.
Perhaps using fewer zones in order to increase the sample size would produce more statistically meaningful and practically useful results. I experimented with a few different possibilities but ultimately settled on using four zones, including the area outside the strike zone. I extended the sample beyond the boundaries of the strike zone to include the susceptibility of a batter to chasing bad pitches, with the added benefit of increasing the total sample of PA-ending pitches by over 50 percent.
I divided all the pitches that ended a plate appearance into four bins, separated at the vertical and horizontal midpoints of the pitch location distributions. I separated the pitches vertically at 2.4 feet. For right-handed batters, I separated the pitches horizontally at 0.07 feet, just slightly outside from the middle of home plate. For left-handed batters, I separated the pitches horizontally at -0.28 feet, a few inches outside from the middle of home plate.
With larger sample sizes, the split-half correlation improved somewhat, as expected. However, even with only four zones, much noise remained in the results. Here is the regression equation for right-handed batters:
Zone Performance in Split Half 2 = (0.32 * Zone Performance in Split Half 1) + (0.32 * Performance in Other Three Zones in Split Half 1) + (0.36 * League Average Performance).
The correlation coefficient was r=0.46, and the p-values for both input variables were highly significant (<.0001).
With sample sizes from larger zones between 200 and 300 plate appearances in each half of the sample, both the split-half correlations and the statistical significance of the results have improved.
Do the results have better baseball meaning? Let’s revisit a couple of our earlier examples.
We see that Young has a consistent hot zone up and in and that his weakest zone is down and away.
Betancourt is weak on the outside part of the plate and a little closer to capable on the inside half, particularly up and in.
The 3x3 grid contained about the same information as the 2x2 grid, but the 3x3 grid gave us a false sense of greater granularity than is present in the data, at least at the sample sizes we typically have available. Given the limits of the ability of most pitchers to locate within a small zone, the 2x2 grid is likely more representative of actual pitching strategy anyhow.
Heat maps of batter hot and cold zones should be regarded with a similar sense of skepticism, depending on the sample size involved. Such heat maps are drawn from the same underlying data and should have similar statistical correlation between sample halves. If a heat map has insufficient spatial smoothing of the data, it could be an even less reliable predictor of future performance than a 3x3 grid.
Levels of coloration, whether for the gridded bins or for heat maps, are another important facet of how accurately hot and cold zone information is communicated to the viewer. I chose to use six levels of coloration in the examples in this article, with the traditional red for hot and blue for cold. I observed, however, that the switch from light blue (for slightly below average) to light red (for slightly above average) seemed to have more visual impact than the change in performance warranted. I did not experiment further to find an optimal color palette, but I would warn both creators and viewers of hot/cold zone graphs that proper interpretation is heavily affected by the choice of palettes.
Let’s close by looking at the True Average batting leaderboards for the four quadrants of the hitting area for batters with at least 1000 plate appearances in 2007-2011.
Nelson Cruz, who gained attention for hitting six home runs against the Detroit Tigers in the AL Championship Series, just missed the top ten here, with a .356 TAv in plate appearances that ended on a pitch up and in. All six of his ALCS home runs were hit off pitches up and in.
Hot and cold zone information for batters does have some predictive value for future performance, but all such data continually flirts with the problem of small sample sizes, more so as the hitting area is divided into smaller grids or more granular heat maps. Aggregating into bigger zones improves the predictability. A 2x2 grid works fairly well with multi-year samples, but for smaller samples of data, even that level of aggregation may not be sufficient to render the hot and cold zone information useful.