The challenges of hitting a baseball are many and difficult. Depending on the speed of the pitch, a batter may have something like half a second to 1) locate the ball as it leaves the pitcher’s hand, 2) predict its movement based on the kind of pitch it is (fastball, slider, curve, etc.), 3) decide whether to swing, and potentially 4) adjust mid-swing to the path of the ball or check his swing. All of which is to say, hitting a baseball in MLB may actually be the hardest thing in the galaxy (I’ve never done it, myself).

Arguably the most demanding part of this battle is purely mental (as Hank Aaron noted). Because of how little time there is for a hitter to perform all of the above-mentioned tasks, it is helpful to have some notion ahead of time of what, where, and how the pitcher is going to throw. Conversely, the more uncertainty and confusion a pitcher can create in the hitter, the more chance he has of catching him off guard.

I’ve written about this topic before in the context of pitch type. In that study, I found that pitchers who threw more pitch types and mixed them more evenly were better able to get strikeouts. A lurking caveat in that initial analysis was that it ignored location, and location is important. I aim to fix that blind spot, at least partially, in this article. As before, I’ll quantify uncertainty using a measurement called entropy, and see how the entropy of location affects each pitcher’s outcomes. The greater the entropy of location, the harder it is for a batter to predict where the next pitch will be.

Location, Location, Location
Location is unlike pitch type in a couple of important respects. Whereas pitchers tend to throw a finite number of pitches, location does not divide itself so cleanly. To apply entropy to location, I need to define a grid in and around the strike zone and then ask how evenly pitches are distributed across that grid. From the hitter’s perspective, the more evenly pitches fall, the harder it is to predict where the next pitch will be.

Secondly, it’s not clear whether entropy in location should necessarily be a good thing. On the one hand, varying location gives the batter more to think about and could potentially cause confusion. On the other hand, there’s an inherent penalty to locating one’s pitches too unpredictably: walks. Simply pitching far outside of the strike zone for the sake of befuddling the batter is a losing strategy. To put it another way, unlike an eephus, a pitch four feet out of the zone isn’t fooling anybody.

With these caveats noted, let’s compute the entropy of location and see how it varies between pitchers. I’ll use the same dataset as before, limiting myself to starting pitchers. Interestingly, whereas before I noted a correlation between entropy and velocity, no such relationship exists between location entropy and velocity.

The league leaders in location entropy are an odd mix of pitchers good, bad, and mediocre: Clayton Kershaw (good) is there, but also Barry Zito (bad), and Felix Doubront (mediocre). At the opposite end of the spectrum, the group of pitchers with the least location entropy includes guys like Hisashi Iwakuma and Bartolo Colon, but also Aaron Harang. Both extremes are a little confusing, and if there’s a pattern, I can’t figure it out.

Location entropy is reasonably well-correlated with higher walk rates, consistent with the idea that it’s a product of wildness on the part of the pitcher (R2=.16). Consider this graph of location entropy vs. BB/9.

There’s a noticeable and evident trend toward more unpredictable pitchers producing a higher rate of free passes. But, perhaps interestingly, location entropy is also associated with a higher strikeout rate.

Predicted Factor


Significance (p-value)










The end result of these countervailing trends on FIP is… nothing. They effectively seem to cancel out, so that overall location entropy has no significant effect on FIP.

Splitting the Zones
This first analysis was rather blunt, however. Another approach might be to examine location entropy divided into segments—specifically the strike zone and the region outside it. By mixing up location within the zone, a pitcher might achieve all of the benefits of confusing the batter with none of the walk-provoking disadvantages. In the case of outside-zone entropy, the pitcher would be best served by consistently nibbling at the edges of the zone, keeping entropy low so as to not give up obvious balls.

To define the strike zone, I considered the rate of strikes/pitch per each cell in the grid and divided the zone based on whether the probability of a strike was greater than .9 (strike) or less than .1 (ball). (As a side note, as much as PITCHf/x has made the umpire’s occasional mistakes stand out, it is staggering in some respects just how consistent they are: About 90 percent of the bins are either almost always strikes or almost always balls). I then computed the entropy for each set of bins independently. For each pitcher, I limited myself to at-bats against righties, so as to avoid the additional complexity of the lefty strike (I will return to handedness-specific splits in the future).

Here’s a plot of within-zone entropy vs. outside-zone entropy.

They are less correlated than I would have guessed (R2=.36). It appears that there is some ability of pitchers to locate pitches unpredictably outside the zone but hit their spots inside the zone as well. Intuitively, I predicted above that the best combination would be high in-zone entropy and low outside-zone entropy, but as is usually the case, the situation is more complex than that.

Predicted Factor

In-zone Entropy

Outside-zone Entropy

Fastball Velocity













In fact, as we run through our three diagnostic characters (strikeouts, walks, and FIP), we see an odd and counterintuitive pattern emerge (stars denote correlations with significant p-values). Outside-zone entropy is associated with more walks, but it’s also associated with more strikeouts. In-zone entropy isn’t associated with more strikeouts, but it is powerfully affiliated with fewer walks. Neither category of entropy does much for FIP, at least not significantly so. And fastball velocity, taunting me with its consistency, causes many more strikeouts, slightly more walks, and much better FIP.

The Complexity of Location
While location is not as simple to analyze as pitch type, a more nuanced approach can nevertheless yield insight. Aggregating overall, I find that location entropy is strongly associated with more walks. However, at the same time, it appears to be correlated with more strikeouts. These trends, of opposite value to a pitcher, appear to cancel out such that location entropy is neither a good nor a bad thing overall.

Peering into location more finely by examining within- and outside-zone entropy, however, showed counter-intuitive patterns. Location entropy outside the zone was associated with both more walks and strikeouts, while inside the zone, location entropy caused a dramatic decrease in walks. Peering at the list of pitchers with high in-zone entropy confirms that it is measuring something like control: Guys like Cliff Lee, Madison Bumgarner, and Cole Hamels show up in the list of the five highest in-zone entropies, and down the list a bit, Clayton Kershaw makes an appearance.

Uncertainty in location outside the zone correlated with more walks and more strikeouts. More walks are understandable, because I can imagine outside-zone entropy as measuring wildness. The manner in which outside-zone entropy causes strikeouts is a little trickier. A clue might be that outside-zone entropy is significantly associated with more swinging strikes, suggesting that these pitchers are somehow better able to cause hitters to chase pitches.

Any entropy-based analysis of location grapples with the question of how a pitcher should optimally distribute their pitches. Is it best to spread them all about the zone, knowing that to do so requires the hitter to think more carefully about whether and where to swing? Or alternatively, is it best to pick a handful of spots, say, each of the four corners, and pitch predictably but competently to those locations? There have been successes with both strategies, and many others besides.

There’s no question that location is a trickier beast than pitch type, perhaps most demonstrably because there is so much more dimension and richness to location data than pitch type data. Location doesn’t come in eight flavors; it comes in thousands of points, barely distinguishable to the normal human’s eye but important to an MLB hitter. That hitters take not only location but also pitch type into account—all within the context of the count, the pitcher’s tendencies, the base-out situation, and more—speaks to the task that has been called the hardest thing in sports (and maybe the galaxy).