September 15, 2009
Ahead in the Count
The BABIP Superstars
It is well known that pitchers have little control over Batting Average on Balls In Play (BABIP), and that an easy way to know a pitcher is due to improve or regress is to look at his BABIP and see whether it is significantly different than the league average of about .300. If he has surrendered hits on balls in play at a rate significantly above .300, he is probably going to see that come down, and if his BABIP is significantly below .300, he is probably going to return from whatever depths. It is also well known that hitters have more control over their BABIP, but not that much; a starting position player will hit about 500 balls in play per season, meaning that the standard deviation of his yearly BABIP is probably about .020, meaning that much of observed differences in hitter BABIP are fleeting, as one-third of players' BABIP marks belie their true skill level by .020 points or more. Even still, there are many hitters who consistently put up high BABIP rates.
Many people believe that BABIP success derives from line-drive proficiency, but line-drive rate only has a .17 year-to-year correlation. Even though the difference between a line drive and a fly ball is often subjective, independent records often show very similar numbers, so that's unlikely to be the reason why this correlation is so low. However, GB/FB ratio has a .77 year-to-year correlation; this is important, because the BABIP on ground balls is much higher than that of fly balls and popups (.236 this year, versus .137 for fly balls if you include popups, and even just .175 for outfield fly balls). BABIP on line drives is much higher (.724), but hitters do not consistently put up high line-drive BABIPs.
Looking at statistics such as these, I developed a model of predicting BABIP using batted-ball numbers and a few other relevant statistics to do a better job. This model augments a strong projection system that can rely on other sources of information. PECOTA has success projecting players because it is able to find other comparable players using all player seasons since World War II. This enables PECOTA to find players with similar statistics and characteristics, and project them more accurately than other systems. What PECOTA naturally misses while gaining this advantage is batted-ball rates; to gain its forecasting edge, PECOTA must compare players from many years ago before there were accurate records of ground balls, fly balls, line drives, and pop flies, and certainly before there were accurate records of BABIP on each of these batted-ball types. In fact, by using this information, my model has a higher correlation with current BABIP than even PECOTA does, as I discussed earlier this season, citing examples of its success. Of course, including this would likely force PECOTA to surrender its ability to project walks, strikeouts, and home runs, as well as removing its advantage in using historical data, but for this one statistic, my model was able to more accurately project 2009 BABIPs thus far, even without using simple adjustments like park effects and aging.
In this article, I will discuss the ten players who were at the top of my projections for BABIP this year, and I will explain why the model expected them to do so well. In fact, nine of these ten players have above-average BABIPs this season (the exception being Chipper Jones), and eight of these ten players have BABIPs of .337 or higher. Below, we will delve into how to discover these BABIP Superstars.
Derek Jeter: The new all-time Yankee hits leader is not a major home-run hitter with only 223 on his career, and he is not a superb contact hitter (with a 17 percent strikeouts per at-bat, just below the league average of about 20 percent). To make up for that, he records hits at an extraordinary rate when he does make contact. My system saw him continuing this trend in 2009, projecting him at .362-and his current BABIP is .361. Jeter's ground-ball rates range between 59-61 percent in 2006-2008, and so it was not hard seeing him get a lot more hits on balls in play than other players, since the average ground-ball rate is about 45 percent. Further, the average major leaguer hits popups about 7.8 percent of the time, but Jeter only averaged a little under two percent for the last three years, and is sitting only at 0.8 percent this year. Popups to the infield are almost always outs, and avoiding them will naturally raise a player's BABIP.
Jeter also managed to put up BABIPs on ground balls ranging from .276 to .287 over the previous three years; this is above the league average of about .240. Much of this comes from his infield hits per ground ball averaging about 8.8 percent of the time (compared to the league average of about six percent), but his outfield hits per ground ball averaged about 20 percent, which is above the league average of 18 percent as well. The rate of infield hits per ground ball is more consistent than outfield hits per ground ball primarily due to its correlation with speed, but both are persistent statistics. As it happens, Jeter has not done particularly well on ground balls this season, sitting at about .226 BABIP on them, but he has weathered this through a very high success on outfield fly balls. Outfield fly-ball BABIP averages about .175 this year, but Jeter is at .229, even higher than the .208 number he has recorded over the last three years. Outfield fly-ball BABIP has a decently high year-to-year correlation (.22) but not as high as ground balls (.32). Line-drive BABIP has a low year-to-year correlation (.12), but it does correlate with home-run rate. Jeter has been above average this year with BABIP on line drives (.811, above the league average of .729) despite not being a home-run hitter; home-run hitters are more typically the high line-drive BABIP hitters. This is likely to regress somewhat, but his temporarily lower-than-normal ground-ball BABIP indicates that he probably will continue to put up high BABIP marks.
Joe Mauer: Another BABIP Superstar, my model saw Mauer at around .345, and he has topped that by clocking in at .370. Mauer averaged about a 52 percent ground-ball rate over 2006-2008, and has been around 50 percent this year, and Mauer does particularly well at avoiding popups, only doing so on 1.5 percent of his batted balls. This cuts down on his automatic outs, allowing him to reach more frequently. Mauer hits his ground balls very hard, so even though he has only had a 4.2 percent rate of infield hits per ground ball this season, he has a 26.6 percent clip of outfield hits per ground ball, good for a .308 BABIP on ground balls. This is not all that likely to continue, but Mauer has averaged nearly 22 percent outfield hits per ground ball, notably above the league average of 18 percent. Part of Mauer's success on ground balls can be predicted from his solid contact rate; he has made contact with the ball on 87 percent of his swings this year, above the league average of 80.5 percent. Hitters who make better contact are less likely to hit weak ground balls, as there is a positive correlation between BABIP on ground balls (and in general) and contact rate. My suspicion is that players who miss the ball completely also frequently nub it off the bottom of the bat frequently, causing them to hit weak ground balls.
Even though Mauer's BABIP on ground balls is likely to regress, his BABIP on outfield flies should improve. This year, he is only around the league average on outfield fly-ball BABIP (.184), but has been markedly above average at around .221 over the last three years. His home runs per at-bat have gone up from 1.7 percent last year to 5.8 percent this time around, and this new-found power this season might explain some of this difference, as fewer deep flies count towards BABIP if they leave the park. As some of these home runs start to land on the wrong side of the fence (from his perspective), they will probably land for extra-base hits more frequently and drive his outfield fly-ball BABIP back up towards his career norms.
Ichiro Suzuki: Ichiro is widely renowned as an unconventional hitter. Projection systems frequently fall short on him, as his .357 career BABIP continues to vex them. My system only had him at .338, far below his current .387. Some of this is likely luck, as his line-drive BABIP is .769 this year, about a hundred points higher than his line-drive BABIP for 2006-2008. However, Ichiro's success on balls in play was certainly foreseeable thanks to his incredibly high ground-ball rate (58.6 percent), and his incredibly high rate of infield hits per ground ball-16.7 percent this year (about 13 percent over the previous three seasons), crushing the league average of 6.1 percent. Ichiro also avoids popping out much, but only because he avoids hitting the ball in the air a lot as his rate of popups per fly ball is average.
Michael Young: This season, I projected Young to hit .341 on balls in play, and he was hitting .351 before going down with a hamstring injury. As I mentioned in my article in May, he's a rare exception to the rule that a player's line-drive rate is not persistent; Young has a 24.2 percent line-drive rate this year, after putting up 21.6, 27.8, and 22.5 marks up in the previous three years. Brian Cartwright has pointed out that some of this may be due to the Rangers' home park's league-leading line-drive park factor (and perhaps due to friendly scoring decisions-further evidenced by his slightly below-average line-drive BABIP). However, no one on the Rangers has consistently put up line-drive rates as high as Young, and even if his numbers are inflated, his swing simply appears to lend itself to high line-drive rates. His strong swings also have led to high rates of outfield ground-ball hits, as he has a rate of 21.1 percent outfield hits per ground ball, consistent with his career norms. Young also pops out incredibly rarely, doing so on only 2.7 percent of the balls he has contacted this year, which is about one-third of the league average. This combination of his unique line-drive swing and his low pop-up rate has led to Young's BABIP success.
Chipper Jones: Chipper is the one player from the top ten of my BABIP projections who has a below-average BABIP this season. After putting up a .383 BABIP last year and my projecting him at .346 this year, Chipper has only managed to reach safely at a .292 clip on balls in play this year. The reason that he has so consistently put up high BABIP marks is that he hits the ball extremely hard. He has a below-average rate of reaching safely on infield hits (3.7 percent), but he has hit his ground balls to the outfield quite frequently up until this year; Chipper had 22 percent of his groundballs reach the outfield in 2006-2008, but in 2009, he has seen just 15 percent of his grounders reach the outfield. He has also seen his rate of fly-ball hits go down as well. His BABIP on outfield flies has fallen from an average of .243 over 2006-2008 down to .139 this year, well below the league average of .175. Both the plummeting rates of reaching base safely on fly balls and of shooting his grounders through to the outfield indicate that Chipper is simply hitting the ball much more softly than in previous years. Even as Chipper has maintained his impressively low pop-up rate (4.2 percent), he has not hit the ball as hard and has seen his BABIP suffer. As with everything else he does, his health is going to be the key determinant as to whether he can recover his BABIP Superstardom.
Matt Holliday: Holliday's .337 BABIP this season nearly matches my .341 projection for him going into this year. Although Holliday left Colorado and its BABIP-friendly dimensions, he has still managed to put up hits on balls in play at an impressive pace. Much of Holliday's success derives from very high BABIP on ground balls, and this year is no exception as he currently has a ground-ball BABIP of .316. This is partly due to his high infield hit rate (10.6 percent) and also his high outfield hit rate (21 percent). The rate of infield hits per ground ball is somewhat new to Holliday, who has gone from 6.3 to 7.9 to 10.1 percent across 2006-2008 before besting himself yet again this year, but he has always hit ground balls through to the outfield, consistently between 21-23.2 percent over the last several seasons. Holliday's strong power skill also allow his line drives to go further, as he has a .767 BABIP on line drives this year.
Other Top Ten Players in BABIP Projections: The six players listed above were the players for whom my model looked at 2006-2008 numbers and produced the highest BABIP projection. For players who did not have 300 PA in each of these seasons, I had to use alternative projections using smaller data. These were inherently more conservative, but they did suggest four other players would have very high BABIPs anyway, and all four of them have.
First, there's Matt Kemp. Despite Kemp's power tendencies and relatively high strikeout rate, Kemp is also a BABIP Superstar, but he's unlike other players similar to him in those regards because of his speed; a reflection of that is that Kemp has a 12.8 percent rate of infield hits per ground ball this season. He also rarely pops out when he doesn't miss, doing so only 4.6 percent of the time this season, and only 2.0 percent of the time in 2008. He also hits the ball very hard, allowing many line drives and outfield flies to fall in as well. This year, he has a BABIP of .359, even higher than my .346 projection.
I projected Yunel Escobar to have a .338 BABIP and he has managed a .321. One reason that Escobar has had a high BABIP in the past is his high ground-ball rate, which was 56.6 percent in 2007 and 58.1 percent in 2008. This has fallen to only 50.5 percent in 2009, explaining some of the lower-than-expected BABIP. He has however continued to avoid popups, doing so on just 3.7 percent of his balls in play this year, mirroring last year's numbers. Despite his speed, he does not get many hits on ground balls or even many infield hits, and has been about average in both categories. Instead, his batted-ball rates are the primary determinant of Escobar's BABIP skills. Although he has done quite well on fly balls this year (hitting .302 on outfield flies), this is unlikely to stay that high, so keeping his BABIP from falling will probably require bringing that ground-ball rate back up.
Even though Fred Lewis and Denard Span only had one season with over 300 PA apiece, my system still was able to see their high BABIP clips in 2009 coming. Both were projected at .341, and have 2009 BABIP marks of .359 and .355. Their high BABIP rates were foreseeable primarily due to their high ground-ball rates and low pop-up rates. Lewis has also succeeded at getting outfield ground-ball hits, while Span has succeeded at getting outfield fly-ball hits.
Where Do We Go From Here?
Looking at these ten BABIP Superstars, we can see several consistent trends emerging, even though there is no trait that all of these players have in common. One obvious source of high BABIP is high ground-ball rates and low pop-up rates. As ground balls lead to hits in play more frequently than fly balls, and line-drive rate is not terribly persistent, a high ground-ball rate can help a player improve his BABIP. Avoiding popups helps as well. Speed is obviously a common trait to several of these players, both because it increases the odds of ground balls becoming hits without reaching the outfield, but also because the infield playing in to prepare for this gives them less reaction time to avoid balls getting through to the outfield anyway. Line-drive BABIP is not a common source of high BABIP for the BABIP Superstars, as line-drive BABIP is mostly correlated with a skill for hitting for power, and few of these players are true power hitters. However, many of them have at least decent power, which keeps outfielders from cheating in.
This analysis relied on blunt, discrete classifications of batted-ball types. The difference between a fly ball and a line drive is somewhat subjective, and there is even some murkiness between an outfield fly ball and a infield popup. With less precise estimates of what we are trying to measure, a regression formula like the one that produced my BABIP projections will err on the side of caution, projecting players too close to average. As HITf/x will allow researchers to be more precise in their estimates of what happens after the ball is struck, it is important to keep in mind what we can learn from the data on these discrete batted-ball types. It appears that some hitters are able to consistently record high rates of ground balls reaching the outfield, and HITf/x will hopefully explain what kind of ground balls these are. Harry Pavlidis has shown us that the vertical angle at which a ground ball is struck has a high effect on the slugging average on contact for that ground ball, and perhaps this explains why better contact hitters succeed more on ground-ball hit rates-they probably are less likely to hit the ball at an extreme, instead pounding it into the ground. HITf/x will likely be even more helpful in determining why some players succeed on outfield fly balls and line drives. Knowing that fly-ball BABIP has higher persistence than line-drive BABIP could be a clue as to who is more likely to continue putting up high BABIP rates as more data becomes available. Although projections using these more precise, continuous descriptions of batted-ball types are likely to be quite accurate, that data is quite messy; without knowing how to analyze this data in general categories, we may miss important information.