Happy Thanksgiving! Regularly Scheduled Articles Will Resume Monday, December 1
November 22, 2011
How Does Quality of Contact Relate to BABIP?
In the first part of this study, I used detailed batted ball speed information from HITf/x to examine the degree of skill that batters and pitchers had in quality of contact made or allowed. Here, I will look deeper into the question of why some batted balls fall for hits and others do not.
The Defense Independent Pitching Statistics (DIPS) developed by Voros McCracken and subsequent research by others had led much of the sabermetric community to conclude that pitchers have little control over the quality of batted ball contact that they allow beyond the ability to influence the vertical launch angle of the ball. In other words, pitchers primarily control their batted ball results by getting ground balls or fly balls, but not by controlling how hard the ball is hit.
However, by using detailed HITf/x data provided by Sportvision and MLBAM from the 2008 season, I found that a major-league pitcher does not only control whether he gets ground balls or fly balls; he also has a significant degree of control over how hard the ball is hit. I used the batted ball speed in the plane of the playing field as the measure for quality of contact. I found that the best prediction for the speed of any given batted ball was influenced about 1.7 times as much by the batter’s typical batted ball speed as by the pitcher’s typical allowed batted ball speed, largely because there is wider variety in average batted ball speed among major-league batters than among major-league pitchers.
How does this finding square with the previous studies on DIPS and pitcher control over batted ball results? There are two things that primarily influence whether a batted ball will become a hit. One is how hard the ball is hit, and the other is the direction in which it is hit, both vertically and horizontally. Batted ball classifications do a decent job of capturing information about the vertical launch angle of the ball. The success of batted-ball-based pitching metrics is due to the importance of vertical launch angle in determining whether batted balls result in hits. However, batted ball classifications do not capture very much information about the speed of the batted ball.
In the previous article, I reported on the split-half correlation in average horizontal speed off the bat (hSOB) for batters and pitchers in the 2008 season. I included home run balls in that analysis. If home runs are removed, the split-half correlations do not change much. The correlation coefficient is r=0.77 for batters, with an average of 196 balls in play in each half of the sample, and r=0.50 for pitchers, with an average of 244 balls in play in each half of the sample.
How well does average hSOB correlate to a player’s batting average on balls in the field of play (BABIP)? It performs about equally well for batters and pitchers, with a correlation coefficient of r=0.41 for batters and r=0.45 for pitchers, for batters and pitchers with at least 300 balls in play in 2008.
How hard the ball is hit is clearly an important determining factor for BABIP, both for batters and pitchers, but it is not the only factor. What else could affect BABIP? Vertical launch angle is certainly important, and we will discuss it further in a moment. Let’s explore some other possible factors first.
In 2002, McCracken observed that a pitcher’s strikeout rate could be used to help predict his future BABIP, and J.C. Bradbury confirmed that finding in 2005. Bradbury offered a plausible explanation: “The fear of strikeouts possibly induces hitters to take weaker protective swings to stay alive, and thus yields softer hits that are more likely to result in outs.”
In fact, we do find that pitchers who strike out more batters also yield softer batted balls. A pitcher’s strikeout rate and his average allowed hSOB on balls in play had a correlation coefficient of r=0.44 for the 135 pitchers who allowed at least 300 balls in play in 2008.
It is unclear from a single season of data whether pitcher strikeout rate is predictive of BABIP allowed other than through its correlation with the speed of the batted ball. A linear regression between 2008 strikeout rate and BABIP allowed did not find a significant relationship, either with or without hSOB included.
One other potential factor in a pitcher’s BABIP allowed is the quality of the fielders behind him. Logically, it makes sense that his team’s fielding would show up in his BABIP allowed. Statistically, it is harder to show, particularly in recent years. The relationship between a pitcher’s BABIP allowed and the BABIP allowed by the rest of the pitchers on the same team in a given season is fairly weak, with a correlation coefficient of r=0.18 for pitcher-seasons with at least 300 balls in play from 1993-2011.
Moreover, some of the credit for team BABIP goes to the other pitchers on the team rather than to the fielders. From the 2008 HITf/x data, we find that team BABIP correlates fairly well with the average team hSOB allowed.
The harder the ball was hit, on average, off of the team’s pitchers, the fewer balls the fielders turned into outs. (The BABIP-allowed numbers are based upon the batted balls in the HITf/x data set. There are some balls missing from the data, which means that the BABIP allowed will not exactly match the full-season totals.)
Despite having a better BABIP allowed, the Chicago White Sox may have actually had a worse fielding performance in 2008 than the Texas Rangers. Though the White Sox turned many more balls into outs, the balls they had to field were hit two miles per hour slower on average.
Splitting the credit for BABIP between pitchers and fielders is not simple, and it is a topic for further research. However, approaches that consider all balls in play to be of equal fielding difficulty assign a large chunk of the pitcher’s performance to the fielders.
We have seen that hit ball speed has a significant effect on the batter’s chance of getting a hit. Another significant factor is the vertical launch angle (VLA) of the batted ball.
BABIP peaks at a VLA of about 12 degrees, with about 80 percent of the balls struck at that angle falling safely in the field of play. Balls launched at zero degrees (flat initial trajectory) or at 25 degrees (low fly ball) fall in only half as often, though as launch angles approach 30 degrees, nearly 20 percent of batted balls leave the field as home runs.
In the first part of this study, I showed a graph of batting average as a function of batted ball speed in the plane of the playing field. It is also interesting to look at the batted ball speed in the plane inclined by 12 degrees above the horizontal, the launch plane for which the ball is most likely to become a hit.
Let’s examine four regions of the preceding graph in more detail, each corresponding to roughly a quarter of all batted balls.
First, there is the region with an initial speed off the bat of less than 60 mph in the 12-degree plane. The overall batting average in this region is a paltry .095. Few of these balls make it out of the infield. The chances of a hit rise to about 20 percent if the batter hits the ball hard enough to get it away from the catcher but not so hard that it gets quickly to an infielder. This region also includes a large number of popups that almost always turn into outs.
Let’s look at the batted ball fielding locations from MLBAM Gameday data for the balls hit at less than 60 mph in the 12-degree plane during the 2008 season.
Almost all of these balls stay on the infield, or in foul territory, or barely make it through to the edge of the outfield. Hits are most likely to occur when the third baseman fields a shallow dribbler. None of these balls is well-hit, so we would consider that the pitcher did his job for batted balls in this region. (Note that these are fielding locations, so a few balls may roll after landing until an outfielder picks them up. Also, the data may have a handful of errors that I have not scrubbed.)
Next, here are results for the balls that were hit between 60 and 80 mph in the plane 12 degrees above horizontal.
Balls hit harder than 60 mph in the 12-degree plane start to make it out of the infield in appreciable numbers, mostly in the air but also to a lesser extent on the ground. On the ground, most of these balls are turned into outs, but about 10 percent of them sneak between the infielders and another 6 percent result in infield hits.
In the air, most of the balls make it over the infielders, but balls hit with a launch angle higher than about 30 degrees are easy catches for the outfielders. Roughly a quarter of the air balls in this group fall in for hits. A few balls scoot down the foul lines for extra bases, but mostly these balls are not hit hard enough to find the gaps between the outfielders.
Hitting the ball harder between 60 and 80 mph does not result in much increase in batting average because the extra ground balls that make it between infielders are offset by extra air balls carrying to the outfielders. Pitchers who give up batted balls in this group have mostly done a good job and have about an 80 percent chance of getting an out. However, their chances of success are very sensitive to the vertical launch angle, with a window of VLA between 10 and 26 degrees where a ball has a 70 percent chance of falling between the infielders and outfielders for a hit.
Here are the locations of batted balls hit between 80 and 90 mph in the 12-degree plane.
Once the batted ball speed in the 12-degree plane exceeds 80 mph, over two thirds of the balls make it out of the infield, and batting average rises as the ball is hit harder. Ground balls start to find their way between infielders with greater frequency, with 28 percent making it to the outfield and another five percent resulting in infield hits. Air balls start to find the gaps between outfielders and down the lines, and some sneak over the fences in the corners. Most air balls, however, are still not hit hard enough to land over the heads of the outfielders.
Finally, here are the locations of batted balls hit harder than 90 mph in the 12-degree plane.
Nearly half of ground balls hit harder than 90 mph result in hits. Infielders are still able to scoop up the hard grounders hit close to them, but many make it through to the outfield. The majority of the balls hit this hard, though, are in the air. Many of these air balls find the gaps or sail over the outfielders’ heads, and 18 percent of them are home runs. Of those air balls that stay in the park, 64 percent result in hits.
Here is a summary of the results in the four regions of hSOB in the 12-degree plane.
Of the 40,364 hits tracked by HITf/x in the 2008 season, 52 percent of them were air balls that were hit harder than 80 mph in the 12-degree plane (including the 11 percent of hits that were home runs). Another 23 percent of the hits were groundball smashes, also hit harder than 80 mph in the 12-degree plane. But the remaining 25 percent of the hits were of the relatively cheap (or lucky) variety: 15 percent were bloops over the infield, six percent were seeing-eye grounders that were hit softer than 80 mph but managed to find a hole, and four percent were bleeders—ground balls that weren’t hit hard enough to make it to an infielder for an out.
How does the BABIP picture look if we focus only on the 75 percent of hits that were well-struck? Though batters hit .158 on any ball with a speed less than 80 mph in the 12-degree plane, they hit .497 on balls hit harder than 80 mph, .464 if we exclude home runs. If we assume that pitchers did their job any time a ball was hit less than 80 mph, and that a hit on such a ball was a matter of luck or poor fielding, we can credit the pitcher with .842 of an out for every such batted ball. If we combine that with the actual results on balls in play that were hit harder than 80 mph, we can calculate an adjusted BABIP for pitchers.
Such an adjusted BABIP metric has a better split-half correlation for pitchers with at least 300 balls in play in 2008, with a correlation coefficient of r=0.26, compared to r=0.13 for normal BABIP. In fact, if we further adjust the BABIP by also crediting the pitcher with .536 of an out for every ball in play hit harder than 80 mph, we can improve the split-half correlation coefficient to r=0.34. These pitchers had an average of 244 balls in each half of the sample, which means that we would regress their BABIP toward league average by adding in 464 balls in play at league-average BABIP. If we do that, here are the pitchers in 2008 with the best and worst regressed adjusted BABIP allowed.
Once again, these numbers are based upon balls in play from 2008 for which we have HITf/x data. Ideally, we would use HITf/x data from multiple years to make this calculation more accurate, and Mariano Rivera might look even more outstanding than he does here.
Of course, one could develop an expected BABIP metric based upon something more sophisticated than a two-fold division by batted ball speed in the 12-degree plane. One could also make a similar metric for batters.
What about line drives? Do pitchers control the amount of line drives they allow? As we discussed in the first article of this study, Bradbury and Gassko found from standard batted ball data that pitchers do not control their line drive rates. One season of HITf/x data is probably a better indicator than the standard batted ball data, but it is still not a big enough sample to come to a firm conclusion on the question. The number of line drives allowed by any given pitcher in one season is small and becomes smaller when split in half for correlation. However, the HITf/x data does offer some instructive insights on the nature of line drives.
First, we can divide line drives roughly into two categories. One category is balls hit hard in the air on a low trajectory; this makes up about three quarters of line drives. The other category is balls blooped over the heads of the infielders. If these balls were hit harder, they would turn into outs because the outfielders would catch them. This category makes up about one quarter of line drives.
The split-half correlation for pitcher line-drive rates, as best as it can be measured with this sample of data, is higher for the hard-hit category of line drives (r=0.19) than it is for the bloop-hit category of line drives (r=0.06). There is also some suggestion that a pitcher who gives up more of one type of line drive allows less of the other type. These conclusions beg for confirmation from a multi-year sample, however.
Another interesting way to look at the pitcher line drive rate question is to look at the batted ball characteristics of pitchers based upon how many line drives they allow. For example, we can divide the pitchers from 2008 into three groups based upon their regressed rate of hard-hit line drives allowed. Then, we can compare the average distribution of vertical launch angles for the batted balls allowed by these groups. (I defined hard-hit line drives as balls with a VLA between seven and 20 degrees and a speed of greater than 80 mph in the 12-degree plane.)
The group that allowed the most hard-hit line drives, unsurprisingly, allowed the most batted balls with a vertical launch angle around +15 degrees, which is near the center of typical line drive launch angles. Since we defined the three groups based on line-drive rates, the difference in that VLA range follows by definition. However, what is somewhat more interesting is that the pitchers who allowed more hard-hit line drives allowed fewer groundballs.
A different way to present the same information is to look at the difference in the number of batted balls in each VLA bin between the group of pitchers with a high line drive rate and the group with a low line drive rate.
The pitchers who gave up the most hard-hit line drives allowed about two percent more of their batted balls in the VLA range of 10-15 degrees, about two percent more in the 15-20 degree bin, and almost one percent more in the 5-10 degree bin, as compared to the pitchers who gave up the fewest hard-hit line drives. In other words, the group of pitchers who allowed the most line drives had a line drive rate about five percent higher than the group who allowed the fewest line drives.
The pitchers who allowed the most line drives also allowed fewer ground balls. Between half and one percent less of their total batted balls fell in each of the VLA bins between -30 and -5 degrees, as compared to the pitchers who allowed the fewest line drives.
I frankly am unsure what to make of this observation. Do “line-drive” pitchers really allow fewer groundballs? Does this mean that groundball pitchers allow fewer line drives?
To attempt to shed more light on the question, I split the batted balls randomly in half for all pitchers with at least 300 balls in play and grouped the pitchers by their rate of hard-hit line drives allowed in one half of the sample. Then, I looked at the VLA distribution for the pitcher groups in the other half of the sample.
The in-sample curve looks very similar to the previous graph, as expected. It is based upon the same data and a similar group of pitchers, but from half the sample instead of the full year. The out-of-sample curve shows how the pitcher groups performed relative to each other in the other half of the sample where I did not control for line-drive rate. The pitchers who allowed many hard-hit line drives in one half of the sample still allowed more line drives in the other half of the sample, though only about a fifth as many. They also allowed fewer groundballs. This seems to be a persistent feature.
What happens if we select our pitcher groups based upon rate of ground balls in the VLA range of -30 to 0 degrees (labeled GB* in the following graph), roughly the range where the line-drive pitchers show a dip in batted balls allowed?
These pitchers are preferentially groundballers, and they allow fewer line drives and fly balls in the half of the sample I used to select them. In the other half of the sample, they retain most of their groundball tendencies and practically all of their tendency to avoid fly balls. However, they do allow more line drives on low trajectories (with VLA between five and 15 degrees).
These results are in keeping with Brian Cartwright’s finding, based upon HITf/x data collected from April-June 2009 and recently published in The Hardball Times Baseball Annual 2012:
Due to the lower vertical angle of all balls off the pitches from groundball pitchers, balls hit in the air are more likely to be hits. This is the result of two things: air balls off groundball pitchers are more likely to be line drives, and balls categorized as “outfield flies” are more likely to be hits. All because of the lower vertical angle.
There is an alternative way to think about these batted balls that does not focus on the likely outcomes, which are based upon the dimensions of a baseball field and the typical positions of fielders. Instead, we can think about what constitutes square and solid contact with the baseball, independent of what this means about where the ball may land. Though it is more productive for a batter to hit slightly under the center of the ball, resulting in a line drive over the infielders, perhaps a batter hitting the ball squarely or just slightly over the center of the ball is also indicative of a pitch that was easy to contact solidly.
There is some evidence that batters use a slight upswing in order to have a better chance of contacting pitches that are descending due to gravity as they cross the plate. Dr. Robert Adair theorized in The Physics of Baseball that the optimum angle for a batter swing optimized for contact was about eight to 10 degrees above horizontal. Matt Lentzner and I found that the optimum batter performance occurred on pitches that were descending at about a seven-degree angle.
Perhaps the most solid contact by the batter is thus centered on a vertical launch angle of seven degrees? If so, we might get a better idea of whether a pitcher’s offerings are easy to square up if we include a larger region of solid contact centered on a VLA of seven degrees.
In order to investigate this hypothesis, I grouped the pitchers based upon their rate of solid contact allowed, defined as batted balls with a VLA between -10 and +25 degrees and a hSOB of at least 75 mph, in one half of the sample.
The solid contact rate does have some persistence from one split half of the sample to the other, similar to the persistence for line drive rate as typically defined.
We can firmly conclude that pitchers do have some persistent line-drive skill, or solid-contact skill. That skill appears to be related to the overall “shape” of their batted ball distribution.
The reasons that balls fall for hits are complex. We saw in the previous article that pitchers have significant control over how hard batted balls are hit. We saw here that the speed of the batted ball interacts with the vertical launch angle to determine whether the ball is likely to fall for hit.
About a quarter of hits are of the blooper and bleeder variety. The other three quarters of base hits are hit harder, smashed between fielders or blasted over the heads of outfielders. Pitchers do not seem to have much control over whether the bloopers and bleeders turn into hits. However, they do have control over how many bloopers and hard smashes they allow. Further research is needed to determine why some pitchers allow more hard-hit balls than others do.
HITf/x data has revealed limitations of the current DIPS metrics based upon batted ball categories. HITf/x-based metrics offer a great deal of promise for improvement over DIPS-based metrics in pitcher performance evaluation. However, it would be interesting to know if current public batted ball data, such as that from Gameday, could be adapted to capture additional BABIP effects, based on the learning from HITf/x.
Further research is also needed to better understand and quantify what pitchers and batters are doing to create weakly-hit or solidly-hit batted balls. There are many potential avenues of investigation. Certainly, factors such as pitch location, type, and speed could be important. Batter stances, swing planes, and swing speeds could also be important factors, though they are more difficult to measure. Ultimately, a model of the ball-bat collision utilizing data about the incoming pitch trajectory and outgoing batted-ball trajectory could be the most useful explanatory tool.
Thanks to Sportvision and MLBAM for providing the HITf/x data for the study. Thanks to Colin Wyers for his input and feedback.