July 20, 2011
A Zone of Their Own
The pitcher begins each confrontation with a batter with the initiative. He alone controls when the baseball is thrown, how it moves, and where it is located. Thus, the batter is by nature placed in a reactive position. However, the batter, too, has a measure of control over how the plate appearance proceeds. He stands at the plate with a club, and it is within his discretion to swing his weapon or not.
In the early days of baseball, the pitcher had the responsibility to deliver a pitch to the batter’s liking. That practice quickly fell by the wayside as the pitchers began bending the rules to deliver pitches that were harder to hit. By 1887, the batter could no longer request a high or low pitch, and by 1889, baseball settled on four balls for a walk and three strikes for a strikeout. By 1903, foul balls counted as strikes prior to a two-strike count, thus establishing the current boundaries for the battle between pitcher and batter.
Detailed pitch trajectory data gives us new insight into the battle for advantage waged between pitcher and batter over the territory of the strike zone. Dan Fox took the first looks at this topic back in 2007, establishing plate discipline metrics based upon detailed pitch location data. With the benefit of four more years of data and a better understanding of the strike zone, we can learn more about how various batters approach their confrontation with pitchers relative to the constraints imposed by the strike zone.
I will use the following definition of the strike zone, which I believe provides the best combination of simplicity and accuracy. I herein define the zone as a rectangle at the front of home plate with the borders set at the average location where umpires call at least 50 percent strikes to batters of that handedness and—for the top and bottom boundaries—of similar height. The exact zone boundaries used are as follows, where “px” is the horizontal location of the pitch crossing the front of home plate, in feet, and “pz” is the height of the pitch at the front of home plate:
RHB zone: -1.03 < px < 1.00 and (0.92 + batter_height*0.136) < pz < (2.60 + batter_height*0.136)
LHB zone: -1.20 < px < 0.81 and (0.35 + batter_height*0.229) < pz < (2.00 + batter_height*0.229)
Though the location of the pitch relative to the catcher target and other factors affect the actual strike zone called by the umpires and though our measurement of the actual pitch location has some uncertainty, the simplified rectangular zone described above is in agreement with 90 percent of umpire calls. Thus, it should suffice as a working definition to evaluate batter behavior. From this definition, we divide all pitches, except intentional balls and pitchouts, into two buckets: in-zone and out-of-zone. We ignore the umpire’s actual call on any pitch taken by the batter and instead assign it to a bucket based solely upon its measured location (and the batter’s handedness and height).
We also divide the batter’s actions into groups. The batter can take or swing. If he swings, he can miss or make contact. If he makes contact, the ball can go fair or foul. Furthermore, many batters change their approach to defend the strike zone when the count reaches two strikes.
Which batters see the most and fewest pitches in the strike zone, and why? Let’s look first at the league level. The first season for which we have nearly complete pitch location data is 2008. Here is the percentage of pitches in the strike zone, as defined above, by season for right-handed and left-handed batters. Numbers for the 2011 season are through the All-Star break.
Do right-handed batters really see 1.7 percent more pitches in the strike zone than left-handed batters, or is that an indication of a problem with the strike zones I defined for righties and lefties? There is no simple way to a conclusive answer to that question. However, one can check the strike zone definition against the actions and decisions of the batters and umpires.
Right-handed batters take about 1.6 percent fewer pitches that are called balls than do left-handed batters. The umpires’ calls and the batters’ swings are not perfect indicators of whether a pitch was actually within the strike zone, but they do line up in the same direction and with nearly the same size as the difference we found between righties and lefties from the strike zone definitions themselves. Based on this evidence, I am comfortable with continuing to use the strike zone definitions I established. The difference in the fraction of in-zone pitches seen by righties and lefties appears to be a real effect, whatever its cause.
Though there are some small fluctuations from season to season, the percentage of pitches in the strike zone has been fairly stable around 50 percent over the last four seasons. Over the period 2008-2011, which batter saw the lowest percentage of pitches in the zone?
Here are the leaders for batters with at least 1000 plate appearances.
And here are the batters with the highest percentage of pitches in zone.
The group of batters with the fewest pitches in zone certainly comprises a more formidable set of hitters than those with more pitches in zone. The latter group hit an average of only 11 home runs in 1527 plate appearances, or about four per full season. The former group hit an average of 89 home runs in 1945 plate appearances, or about 27 per full season.
It is not surprising that batter skill, particularly in the power department, would drive pitchers to pitch around a batter, but is that all that determines how many pitches a batter sees in the zone? Doumit is not a particularly powerful hitter, and while Sandoval and Beltran are good hitters, they are not elite power hitters. Guerrero and Soriano may have laid claim to that title at one point, but both are well past their prime years. If this is purely a measure of hitting prowess, where are Albert Pujols, Miguel Cabrera, Adrian Gonzalez, and Mark Teixeira?
The presence of Sandoval, Guerrero, and Soriano atop the list suggests an obvious answer. Not only do pitchers locate outside the zone against feared hitters, but they also pitch outside the zone to free swingers who are willing to chase pitches there.
Who swung at the greatest and lowest percentage of pitches out of the zone from 2008-2011?
The positions of most of the names on that list probably do not come as a surprise, but again, the best hitters in the league do not heavily populate either extreme.
A third factor that affects the in-zone percentage of pitches seen by a batter is his strike zone decision-making approach early in the count—related to out-of-zone swings, but not quite the same thing. We can define a “correct” strike zone decision for the purpose of this measurement as any time the batter swings at a pitch in the zone or lays off a pitch out of the zone. Batters who do this more often with zero strikes or one strike in the count see fewer pitches in the zone.
I am not quite sure why this is. Part of it is surely that if batters are likely to swing at pitches in the zone early in the count, pitchers may want to avoid serving up a pitch over the plate to them. However, the percentage of pitches taken in the zone is not a very powerful predictor of in-zone percentage on its own. Only when combined with laying off pitches out of the zone does it do the best job of identifying batters who are likely to see more pitches in the zone.
Here are the leaders and trailers in “correct” strike zone decision-making (prior to two strikes). See if you can make more sense of the effect than I can.
To this point, we have looked at statistics covering the whole period 2008-2011. However, two batters in particular have seen their percentages of in-zone pitches drop noticeably over this timeframe. Miguel Cabrera saw 48.3% of pitches in the zone in 2008-2009 and only 44.0% in zone in 2010-2011. The biggest drop belonged to Jose Bautista. Bautista saw 52.2% of pitches in the zone in 2008-9 and only 47.5% in the zone in 2010-2011. Bautista has seen fewer pitches in the zone coincident with his impressive power surge.
We have examined the factors that encourage pitchers to avoid throwing strikes to a batter, but why do batters choose to swing at pitches out of the zone? In general, swinging at a pitch out of the zone is a recipe for poor results. First of all, batters make contact only 45% of the time when they swing at a pitch out of the zone, as compared to 60% contact on swings at pitches in the zone. (I include foul balls as contact here for two-strike counts but not with less than two strikes.) Even when batters do make contact out of the zone, their results are poorer. Successful contact with an in-zone pitch results in a .500 slugging average. Successful contact with an out-of-zone pitch results in only a .370 slugging average.
That does not mean, though, that every swing at a pitch out of the zone is a bad idea. Maybe some batters are particularly good at bad-ball hitting. Vladimir Guerrero is well known as a batter who could homer on a ball off his shoe top, though he is not quite the offensive force he was five years ago. When I surveyed people on Twitter for the best bad-ball hitters in the game today, Pablo Sandoval’s name featured prominently in the responses. We know already that Sandoval and Guerrero lead with the most swings at out-of-zone pitches, but do free-swinging ways lead to successful out-of-zone contact?
Most free-swinging batters produce below-average results with their out-of-zone contact. Sandoval is the rare exception. Only a few other batters—Josh Hamilton, Carlos Gonzalez, and Ichiro Suzuki—are able to produce above-average results with out-of-zone contact while swinging at more than 35 percent of out-of-zone pitches.
In fact, nearly 40 percent of Sandoval’s extra-base hits come on pitches outside the zone.
Sandoval is truly the king of the bad-ball hitters.
Nearly every batter in the major leagues gets better results on contact with in-zone pitches than he does with out-of-zone pitches, though.
There are many possible metrics based upon zone statistics. For example, zone-based measures are good at predicting batters’ future walk and strikeout rates. Zone-based measures perform about as well as strikeout rate itself at predicting a batter’s future strikeout rate, and they perform better than walk rate itself (or even walk rate and strikeout rate together) at predicting a batter’s future walk rate. Further research is needed to determine how these measures could best be used either as diagnostic tools to identify changes in a batter’s approach and skills or as part of a projection system.
After accumulating this data, I was curious to see if I could measure the quality of a batter’s approach at the plate. Certainly, the Fish/Eye metric developed by Dan Fox is a good starting point for detailed pitch location data. Also, Russell Carleton developed a unique batting approach metric using signal detection theory to evaluate batters. He implemented his method without having access to pitch location data, but it could be adapted to use pitch locations. However, I took a different tack.
First, I divided all pitches (other than intentional balls and pitchouts) into buckets for each batter as described earlier: left-handed/right-handed, in-zone/out-of-zone, swing/take, and contact/miss. I also divided the pitches into two buckets based on the ball-strike count. I grouped zero- and one-strike counts together, separate from two-strike counts. For one thing, many batters become more defensive with two strikes, shortening their stroke and expanding their swing zone to avoid a strikeout. For another, foul balls do not count as strikes with two strikes already in the count.
Foul balls with two strikes have a slightly positive impact for the batter on the average results for a plate appearance, whereas foul balls with zero or one strike have a negative impact very similar to that of a called strike or a whiff. Therefore, I included foul balls in the “miss” bucket for 0/1 strikes and in the “contact” bucket for two strikes.
These combinations produce 24 separate buckets, 12 for each batter handedness. I determined the average run value of each bucket by computing the linear weights runs above average for the results of all plate appearances from 2008-2011 with a pitch in that bucket. Those values are as follows:
We can evaluate a specific batter by applying these run values and weighting by how often the batter’s action falls in a given bucket. This measures how well the batter makes swing decisions and how well he has the ability to implement those decisions by making contact with the pitch.
This measure does not account very well for how hard the batter hits the ball or whether he pulls the ball in the air or slaps it up the middle on the ground. It does include whatever effect that hitting with two strikes or hitting a pitch outside the zone has on the quality of contact, but it turns out to be fairly insensitive to power skill. Home run rates differ by only about 2x from the best to the worst contact bucket and by much less than that in the non-contact buckets.
However, we can see that batters with poor swing decision-making skills need good power skills (or superior fielding) in order to stay in the big leagues. We can compare batters’ power, as measured by the fraction of contact that they turn into home runs, to their skill at making and implementing good swing decisions. Here are the right-handed batters:
Pujols has an unusually disciplined approach at the plate for a hitter with such power, and Mark Reynolds survives at the plate because he can hit the ball a long way.
Here are the left-handed batters and switch hitters:
Of course, a valuation based upon league-average outcomes may not apply perfectly to every batter, but it does give a good sense of a batter’s approach at the plate.
There is much more than can be done with zone statistics than space allows here. The leaders and trailers in each bucket are interesting. Even more fascinating is to examine each individual batter to build a picture of his goals at the plate and his strategies for maximizing his strengths and keeping the pitchers from exploiting his weaknesses. How does he change his strategy with two strikes? Is he, like Yuniesky Betancourt, aggressive early in the count to avoid a two-strike hole where his lack of pitch recognition skills could prove deadly? Or does he wait patiently for his pitch, like Cust, because he is unable to do much with pitches out of the zone? Or, like Pujols, is he good at everything?
With reliable pitch location data and a good strike zone definition, zone-based statistics are a mother lode of insight into batter capabilities and strategies, waiting to be extracted and refined.