keyboard_arrow_uptop

The challenges of hitting a baseball are many and difficult. Depending on the speed of the pitch, a batter may have something like half a second to 1) locate the ball as it leaves the pitcher’s hand, 2) predict its movement based on the kind of pitch it is (fastball, slider, curve, etc.), 3) decide whether to swing, and potentially 4) adjust mid-swing to the path of the ball or check his swing. All of which is to say, hitting a baseball in MLB may actually be the hardest thing in the galaxy (I’ve never done it, myself).

Arguably the most demanding part of this battle is purely mental (as Hank Aaron noted). Because of how little time there is for a hitter to perform all of the above-mentioned tasks, it is helpful to have some notion ahead of time of what, where, and how the pitcher is going to throw. Conversely, the more uncertainty and confusion a pitcher can create in the hitter, the more chance he has of catching him off guard.

I’ve written about this topic before in the context of pitch type. In that study, I found that pitchers who threw more pitch types and mixed them more evenly were better able to get strikeouts. A lurking caveat in that initial analysis was that it ignored location, and location is important. I aim to fix that blind spot, at least partially, in this article. As before, I’ll quantify uncertainty using a measurement called entropy, and see how the entropy of location affects each pitcher’s outcomes. The greater the entropy of location, the harder it is for a batter to predict where the next pitch will be.

Location, Location, Location
Location is unlike pitch type in a couple of important respects. Whereas pitchers tend to throw a finite number of pitches, location does not divide itself so cleanly. To apply entropy to location, I need to define a grid in and around the strike zone and then ask how evenly pitches are distributed across that grid. From the hitter’s perspective, the more evenly pitches fall, the harder it is to predict where the next pitch will be.

Secondly, it’s not clear whether entropy in location should necessarily be a good thing. On the one hand, varying location gives the batter more to think about and could potentially cause confusion. On the other hand, there’s an inherent penalty to locating one’s pitches too unpredictably: walks. Simply pitching far outside of the strike zone for the sake of befuddling the batter is a losing strategy. To put it another way, unlike an eephus, a pitch four feet out of the zone isn’t fooling anybody.

With these caveats noted, let’s compute the entropy of location and see how it varies between pitchers. I’ll use the same dataset as before, limiting myself to starting pitchers. Interestingly, whereas before I noted a correlation between entropy and velocity, no such relationship exists between location entropy and velocity.

The league leaders in location entropy are an odd mix of pitchers good, bad, and mediocre: Clayton Kershaw (good) is there, but also Barry Zito (bad), and Felix Doubront (mediocre). At the opposite end of the spectrum, the group of pitchers with the least location entropy includes guys like Hisashi Iwakuma and Bartolo Colon, but also Aaron Harang. Both extremes are a little confusing, and if there’s a pattern, I can’t figure it out.

Location entropy is reasonably well-correlated with higher walk rates, consistent with the idea that it’s a product of wildness on the part of the pitcher (R2=.16). Consider this graph of location entropy vs. BB/9.

There’s a noticeable and evident trend toward more unpredictable pitchers producing a higher rate of free passes. But, perhaps interestingly, location entropy is also associated with a higher strikeout rate.

Predicted Factor

Coefficient

Significance (p-value)

BB/9

3.37/bit

0.000002

K/9

3.66/bit

0.006

FIP

0.1/bit

0.86

The end result of these countervailing trends on FIP is… nothing. They effectively seem to cancel out, so that overall location entropy has no significant effect on FIP.

Splitting the Zones
This first analysis was rather blunt, however. Another approach might be to examine location entropy divided into segments—specifically the strike zone and the region outside it. By mixing up location within the zone, a pitcher might achieve all of the benefits of confusing the batter with none of the walk-provoking disadvantages. In the case of outside-zone entropy, the pitcher would be best served by consistently nibbling at the edges of the zone, keeping entropy low so as to not give up obvious balls.

To define the strike zone, I considered the rate of strikes/pitch per each cell in the grid and divided the zone based on whether the probability of a strike was greater than .9 (strike) or less than .1 (ball). (As a side note, as much as PITCHf/x has made the umpire’s occasional mistakes stand out, it is staggering in some respects just how consistent they are: About 90 percent of the bins are either almost always strikes or almost always balls). I then computed the entropy for each set of bins independently. For each pitcher, I limited myself to at-bats against righties, so as to avoid the additional complexity of the lefty strike (I will return to handedness-specific splits in the future).

Here’s a plot of within-zone entropy vs. outside-zone entropy.

They are less correlated than I would have guessed (R2=.36). It appears that there is some ability of pitchers to locate pitches unpredictably outside the zone but hit their spots inside the zone as well. Intuitively, I predicted above that the best combination would be high in-zone entropy and low outside-zone entropy, but as is usually the case, the situation is more complex than that.

Predicted Factor

In-zone Entropy

Outside-zone Entropy

Fastball Velocity

BB/9

-6.57/bit*

1.67/bit*

.07/mph*

K/9

-.52/bit

2.77/bit*

.215/mph*

FIP

-1.42/bit

-0.291/bit

-0.1/mph*

In fact, as we run through our three diagnostic characters (strikeouts, walks, and FIP), we see an odd and counterintuitive pattern emerge (stars denote correlations with significant p-values). Outside-zone entropy is associated with more walks, but it’s also associated with more strikeouts. In-zone entropy isn’t associated with more strikeouts, but it is powerfully affiliated with fewer walks. Neither category of entropy does much for FIP, at least not significantly so. And fastball velocity, taunting me with its consistency, causes many more strikeouts, slightly more walks, and much better FIP.

The Complexity of Location
While location is not as simple to analyze as pitch type, a more nuanced approach can nevertheless yield insight. Aggregating overall, I find that location entropy is strongly associated with more walks. However, at the same time, it appears to be correlated with more strikeouts. These trends, of opposite value to a pitcher, appear to cancel out such that location entropy is neither a good nor a bad thing overall.

Peering into location more finely by examining within- and outside-zone entropy, however, showed counter-intuitive patterns. Location entropy outside the zone was associated with both more walks and strikeouts, while inside the zone, location entropy caused a dramatic decrease in walks. Peering at the list of pitchers with high in-zone entropy confirms that it is measuring something like control: Guys like Cliff Lee, Madison Bumgarner, and Cole Hamels show up in the list of the five highest in-zone entropies, and down the list a bit, Clayton Kershaw makes an appearance.

Uncertainty in location outside the zone correlated with more walks and more strikeouts. More walks are understandable, because I can imagine outside-zone entropy as measuring wildness. The manner in which outside-zone entropy causes strikeouts is a little trickier. A clue might be that outside-zone entropy is significantly associated with more swinging strikes, suggesting that these pitchers are somehow better able to cause hitters to chase pitches.

Any entropy-based analysis of location grapples with the question of how a pitcher should optimally distribute their pitches. Is it best to spread them all about the zone, knowing that to do so requires the hitter to think more carefully about whether and where to swing? Or alternatively, is it best to pick a handful of spots, say, each of the four corners, and pitch predictably but competently to those locations? There have been successes with both strategies, and many others besides.

There’s no question that location is a trickier beast than pitch type, perhaps most demonstrably because there is so much more dimension and richness to location data than pitch type data. Location doesn’t come in eight flavors; it comes in thousands of points, barely distinguishable to the normal human’s eye but important to an MLB hitter. That hitters take not only location but also pitch type into account—all within the context of the count, the pitcher’s tendencies, the base-out situation, and more—speaks to the task that has been called the hardest thing in sports (and maybe the galaxy).

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
bob4k14
4/01
What I wonder is how much time and thought an elite pitcher puts into developing a philosophy of what he is doing. In other words, does a major league pitcher think about the results he is getting; analyze himself; ask others what they are seeing; etc.? Or more specifically, what are the constraints on him attempting to do things differently without changing his arsenal, arm slot, pacing, and the like? I was an amateur pitcher and we didn't know any of the opposition players and there were no scouting reports. Every encounter was new and your approach was to play to your strengths, note the stance and swing, how far away or how close to the plate he stood, see what he does with different offerings, and more. You proceeded on what worked, and if things weren't working it was kitchen-sink time. But I really never developed a philosophy of what I was doing other than to be aggressive, don't throw it down the middle, and have three or four pitch types I could throw for strikes when needed. I had three favorite scenarios: get ahead of a left-handed hitter (I was a lefty), and throw a sidearm fastball in. Get ahead of a righty with power and throw either a changeup down and away or a slider down and in. They'd swing and miss or top it weakly to the right side on the changeup or beat it into the ground to the left side on the slider. The slider became my out pitch the last several years of my little baseball life on the mound.
bhalpern
4/01
For inside the zone, entropy by location seems like it would be a good measure of effectiveness. But I don't think that by itself is a good measure for outside the zone. I'd also want to consider the entropy of distance from the zone. If that entropy is high the pitcher is probably to some extent ineffectively wild, but low entropy may be more indicative of intentionally deceptive/effectively wild. Including that 3rd measure I would hypothesize a possible relationship ranked this way: A: In Strike Zone location entropy: H/L B: Out of Strike Zone location entropy: H/L C: Out of Strike Zone distance entropy: H/L 1: AH/BH/CL 2-3: AH/BH/CH and AH/BL/CL 4-5: AL/BH/CH and AL/BL/CL 6: AL/BL/CH
nada012
4/01
Interesting idea, I'll give it a try.
LlarryA
4/01
This is good this far, but I think we need to stop and look at what it means in real-pitcher terms, particularly in in-zone entropy. The most extreme combinations would be: High entropy, high success: pitchers who move the ball around, but in or around the zone, so batters feel they need to swing because they're not going to get a walk. Command and Control. High entropy, low success: can get the ball in the zone, but not necessarily where they want it, so maybe get hit hard. Low entropy, low success: the batter knows there's only a limited number of places the ball will end up, can pick and choose and hit hard. Low entropy, high success: pitcher can hit his spots, probably with velocity/movement such that the batter either thinks it's going to be a ball, or just can't get the bat on it. At this point, I think the next step is to overlay the locations of the pitches that are put into play. Even if we don't/can't account for the results of contact, we need to see if a guy is getting hit in the same locations he gets strikes or in different locations. As bob4k414 describes, if a pitcher can get weak contact reasonably dependably in one place, he's more likely to keep going there, even if he never gets a strikeout from it.
nada012
4/01
Good suggestions. I like the classifications you suggested with regards to entropy and success.
bob4k14
4/01
Good point Llarry Amrose. You can depend on it, even if you're not missing bats. I could tell the third baseman "Here it comes." One problem I had was once I learned the slider I mostly lost the curve. I got a lot of swings and misses on the curve but I felt the slider was a superior pitch, even without missing as many bats. I don't remember right-handed batters ever taking the slider for strikes. They seemed guaranteed to swing, and to foul it off or bounce to the left side. The only ones I got hurt on were to left-handers when I hung one because I wasn't used to seeing many guys over there and I lost my release point. Sorry if this is off-point. I'm trying to think this one through. There's a lot of food for thought here.
nada012
4/01
"Sorry if this is off-point." Not at all, I'm glad it provokes thought. Indeed, that was sort of the point.