It is well known that pitchers have little control over Batting Average on Balls In Play (BABIP), and that an easy way to know a pitcher is due to improve or regress is to look at his BABIP and see whether it is significantly different than the league average of about .300. If he has surrendered hits on balls in play at a rate significantly above .300, he is probably going to see that come down, and if his BABIP is significantly below .300, he is probably going to return from whatever depths. It is also well known that hitters have more control over their BABIP, but not that much; a starting position player will hit about 500 balls in play per season, meaning that the standard deviation of his yearly BABIP is probably about .020, meaning that much of observed differences in hitter BABIP are fleeting, as one-third of players’ BABIP marks belie their true skill level by .020 points or more. Even still, there are many hitters who consistently put up high BABIP rates.

Many people believe that BABIP success derives from line-drive proficiency, but line-drive rate only has a .17 year-to-year correlation. Even though the difference between a line drive and a fly ball is often subjective, independent records often show very similar numbers, so that’s unlikely to be the reason why this correlation is so low. However, GB/FB ratio has a .77 year-to-year correlation; this is important, because the BABIP on ground balls is much higher than that of fly balls and popups (.236 this year, versus .137 for fly balls if you include popups, and even just .175 for outfield fly balls). BABIP on line drives is much higher (.724), but hitters do not consistently put up high line-drive BABIPs.

Looking at statistics such as these, I developed a model of predicting BABIP using batted-ball numbers and a few other relevant statistics to do a better job. This model augments a strong projection system that can rely on other sources of information. PECOTA has success projecting players because it is able to find other comparable players using all player seasons since World War II. This enables PECOTA to find players with similar statistics and characteristics, and project them more accurately than other systems. What PECOTA naturally misses while gaining this advantage is batted-ball rates; to gain its forecasting edge, PECOTA must compare players from many years ago before there were accurate records of ground balls, fly balls, line drives, and pop flies, and certainly before there were accurate records of BABIP on each of these batted-ball types. In fact, by using this information, my model has a higher correlation with current BABIP than even PECOTA does, as I discussed earlier this season, citing examples of its success. Of course, including this would likely force PECOTA to surrender its ability to project walks, strikeouts, and home runs, as well as removing its advantage in using historical data, but for this one statistic, my model was able to more accurately project 2009 BABIPs thus far, even without using simple adjustments like park effects and aging.

In this article, I will discuss the ten players who were at the top of my projections for BABIP this year, and I will explain why the model expected them to do so well. In fact, nine of these ten players have above-average BABIPs this season (the exception being Chipper Jones), and eight of these ten players have BABIPs of .337 or higher. Below, we will delve into how to discover these BABIP Superstars.

Derek Jeter:
The new all-time Yankee hits leader is not a major home-run hitter with only 223 on his career, and he is not a superb contact hitter (with a 17 percent strikeouts per at-bat, just below the league average of about 20 percent). To make up for that, he records hits at an extraordinary rate when he does make contact. My system saw him continuing this trend in 2009, projecting him at .362-and his current BABIP is .361. Jeter’s ground-ball rates range between 59-61 percent in 2006-2008, and so it was not hard seeing him get a lot more hits on balls in play than other players, since the average ground-ball rate is about 45 percent. Further, the average major leaguer hits popups about 7.8 percent of the time, but Jeter only averaged a little under two percent for the last three years, and is sitting only at 0.8 percent this year. Popups to the infield are almost always outs, and avoiding them will naturally raise a player’s BABIP.

Jeter also managed to put up BABIPs on ground balls ranging from .276 to .287 over the previous three years; this is above the league average of about .240. Much of this comes from his infield hits per ground ball averaging about 8.8 percent of the time (compared to the league average of about six percent), but his outfield hits per ground ball averaged about 20 percent, which is above the league average of 18 percent as well. The rate of infield hits per ground ball is more consistent than outfield hits per ground ball primarily due to its correlation with speed, but both are persistent statistics. As it happens, Jeter has not done particularly well on ground balls this season, sitting at about .226 BABIP on them, but he has weathered this through a very high success on outfield fly balls. Outfield fly-ball BABIP averages about .175 this year, but Jeter is at .229, even higher than the .208 number he has recorded over the last three years. Outfield fly-ball BABIP has a decently high year-to-year correlation (.22) but not as high as ground balls (.32). Line-drive BABIP has a low year-to-year correlation (.12), but it does correlate with home-run rate. Jeter has been above average this year with BABIP on line drives (.811, above the league average of .729) despite not being a home-run hitter; home-run hitters are more typically the high line-drive BABIP hitters. This is likely to regress somewhat, but his temporarily lower-than-normal ground-ball BABIP indicates that he probably will continue to put up high BABIP marks.

Joe Mauer:
Another BABIP Superstar, my model saw Mauer at around .345, and he has topped that by clocking in at .370. Mauer averaged about a 52 percent ground-ball rate over 2006-2008, and has been around 50 percent this year, and Mauer does particularly well at avoiding popups, only doing so on 1.5 percent of his batted balls. This cuts down on his automatic outs, allowing him to reach more frequently. Mauer hits his ground balls very hard, so even though he has only had a 4.2 percent rate of infield hits per ground ball this season, he has a 26.6 percent clip of outfield hits per ground ball, good for a .308 BABIP on ground balls. This is not all that likely to continue, but Mauer has averaged nearly 22 percent outfield hits per ground ball, notably above the league average of 18 percent. Part of Mauer’s success on ground balls can be predicted from his solid contact rate; he has made contact with the ball on 87 percent of his swings this year, above the league average of 80.5 percent. Hitters who make better contact are less likely to hit weak ground balls, as there is a positive correlation between BABIP on ground balls (and in general) and contact rate. My suspicion is that players who miss the ball completely also frequently nub it off the bottom of the bat frequently, causing them to hit weak ground balls.

Even though Mauer’s BABIP on ground balls is likely to regress, his BABIP on outfield flies should improve. This year, he is only around the league average on outfield fly-ball BABIP (.184), but has been markedly above average at around .221 over the last three years. His home runs per at-bat have gone up from 1.7 percent last year to 5.8 percent this time around, and this new-found power this season might explain some of this difference, as fewer deep flies count towards BABIP if they leave the park. As some of these home runs start to land on the wrong side of the fence (from his perspective), they will probably land for extra-base hits more frequently and drive his outfield fly-ball BABIP back up towards his career norms.

Ichiro Suzuki:
Ichiro is widely renowned as an unconventional hitter. Projection systems frequently fall short on him, as his .357 career BABIP continues to vex them. My system only had him at .338, far below his current .387. Some of this is likely luck, as his line-drive BABIP is .769 this year, about a hundred points higher than his line-drive BABIP for 2006-2008. However, Ichiro’s success on balls in play was certainly foreseeable thanks to his incredibly high ground-ball rate (58.6 percent), and his incredibly high rate of infield hits per ground ball-16.7 percent this year (about 13 percent over the previous three seasons), crushing the league average of 6.1 percent. Ichiro also avoids popping out much, but only because he avoids hitting the ball in the air a lot as his rate of popups per fly ball is average.

Michael Young:
This season, I projected Young to hit .341 on balls in play, and he was hitting .351 before going down with a hamstring injury. As I mentioned in my article in May, he’s a rare exception to the rule that a player’s line-drive rate is not persistent; Young has a 24.2 percent line-drive rate this year, after putting up 21.6, 27.8, and 22.5 marks up in the previous three years. Brian Cartwright has pointed out that some of this may be due to the Rangers‘ home park’s league-leading line-drive park factor (and perhaps due to friendly scoring decisions-further evidenced by his slightly below-average line-drive BABIP). However, no one on the Rangers has consistently put up line-drive rates as high as Young, and even if his numbers are inflated, his swing simply appears to lend itself to high line-drive rates. His strong swings also have led to high rates of outfield ground-ball hits, as he has a rate of 21.1 percent outfield hits per ground ball, consistent with his career norms. Young also pops out incredibly rarely, doing so on only 2.7 percent of the balls he has contacted this year, which is about one-third of the league average. This combination of his unique line-drive swing and his low pop-up rate has led to Young’s BABIP success.

Chipper Jones:
Chipper is the one player from the top ten of my BABIP projections who has a below-average BABIP this season. After putting up a .383 BABIP last year and my projecting him at .346 this year, Chipper has only managed to reach safely at a .292 clip on balls in play this year. The reason that he has so consistently put up high BABIP marks is that he hits the ball extremely hard. He has a below-average rate of reaching safely on infield hits (3.7 percent), but he has hit his ground balls to the outfield quite frequently up until this year; Chipper had 22 percent of his groundballs reach the outfield in 2006-2008, but in 2009, he has seen just 15 percent of his grounders reach the outfield. He has also seen his rate of fly-ball hits go down as well. His BABIP on outfield flies has fallen from an average of .243 over 2006-2008 down to .139 this year, well below the league average of .175. Both the plummeting rates of reaching base safely on fly balls and of shooting his grounders through to the outfield indicate that Chipper is simply hitting the ball much more softly than in previous years. Even as Chipper has maintained his impressively low pop-up rate (4.2 percent), he has not hit the ball as hard and has seen his BABIP suffer. As with everything else he does, his health is going to be the key determinant as to whether he can recover his BABIP Superstardom.

Matt Holliday:
Holliday’s .337 BABIP this season nearly matches my .341 projection for him going into this year. Although Holliday left Colorado and its BABIP-friendly dimensions, he has still managed to put up hits on balls in play at an impressive pace. Much of Holliday’s success derives from very high BABIP on ground balls, and this year is no exception as he currently has a ground-ball BABIP of .316. This is partly due to his high infield hit rate (10.6 percent) and also his high outfield hit rate (21 percent). The rate of infield hits per ground ball is somewhat new to Holliday, who has gone from 6.3 to 7.9 to 10.1 percent across 2006-2008 before besting himself yet again this year, but he has always hit ground balls through to the outfield, consistently between 21-23.2 percent over the last several seasons. Holliday’s strong power skill also allow his line drives to go further, as he has a .767 BABIP on line drives this year.

Other Top Ten Players in BABIP Projections:
The six players listed above were the players for whom my model looked at 2006-2008 numbers and produced the highest BABIP projection. For players who did not have 300 PA in each of these seasons, I had to use alternative projections using smaller data. These were inherently more conservative, but they did suggest four other players would have very high BABIPs anyway, and all four of them have.

First, there’s Matt Kemp. Despite Kemp’s power tendencies and relatively high strikeout rate, Kemp is also a BABIP Superstar, but he’s unlike other players similar to him in those regards because of his speed; a reflection of that is that Kemp has a 12.8 percent rate of infield hits per ground ball this season. He also rarely pops out when he doesn’t miss, doing so only 4.6 percent of the time this season, and only 2.0 percent of the time in 2008. He also hits the ball very hard, allowing many line drives and outfield flies to fall in as well. This year, he has a BABIP of .359, even higher than my .346 projection.

I projected Yunel Escobar to have a .338 BABIP and he has managed a .321. One reason that Escobar has had a high BABIP in the past is his high ground-ball rate, which was 56.6 percent in 2007 and 58.1 percent in 2008. This has fallen to only 50.5 percent in 2009, explaining some of the lower-than-expected BABIP. He has however continued to avoid popups, doing so on just 3.7 percent of his balls in play this year, mirroring last year’s numbers. Despite his speed, he does not get many hits on ground balls or even many infield hits, and has been about average in both categories. Instead, his batted-ball rates are the primary determinant of Escobar’s BABIP skills. Although he has done quite well on fly balls this year (hitting .302 on outfield flies), this is unlikely to stay that high, so keeping his BABIP from falling will probably require bringing that ground-ball rate back up.

Even though Fred Lewis and Denard Span only had one season with over 300 PA apiece, my system still was able to see their high BABIP clips in 2009 coming. Both were projected at .341, and have 2009 BABIP marks of .359 and .355. Their high BABIP rates were foreseeable primarily due to their high ground-ball rates and low pop-up rates. Lewis has also succeeded at getting outfield ground-ball hits, while Span has succeeded at getting outfield fly-ball hits.

Where Do We Go From Here?

Looking at these ten BABIP Superstars, we can see several consistent trends emerging, even though there is no trait that all of these players have in common. One obvious source of high BABIP is high ground-ball rates and low pop-up rates. As ground balls lead to hits in play more frequently than fly balls, and line-drive rate is not terribly persistent, a high ground-ball rate can help a player improve his BABIP. Avoiding popups helps as well. Speed is obviously a common trait to several of these players, both because it increases the odds of ground balls becoming hits without reaching the outfield, but also because the infield playing in to prepare for this gives them less reaction time to avoid balls getting through to the outfield anyway. Line-drive BABIP is not a common source of high BABIP for the BABIP Superstars, as line-drive BABIP is mostly correlated with a skill for hitting for power, and few of these players are true power hitters. However, many of them have at least decent power, which keeps outfielders from cheating in.

This analysis relied on blunt, discrete classifications of batted-ball types. The difference between a fly ball and a line drive is somewhat subjective, and there is even some murkiness between an outfield fly ball and a infield popup. With less precise estimates of what we are trying to measure, a regression formula like the one that produced my BABIP projections will err on the side of caution, projecting players too close to average. As HITf/x will allow researchers to be more precise in their estimates of what happens after the ball is struck, it is important to keep in mind what we can learn from the data on these discrete batted-ball types. It appears that some hitters are able to consistently record high rates of ground balls reaching the outfield, and HITf/x will hopefully explain what kind of ground balls these are. Harry Pavlidis has shown us that the vertical angle at which a ground ball is struck has a high effect on the slugging average on contact for that ground ball, and perhaps this explains why better contact hitters succeed more on ground-ball hit rates-they probably are less likely to hit the ball at an extreme, instead pounding it into the ground. HITf/x will likely be even more helpful in determining why some players succeed on outfield fly balls and line drives. Knowing that fly-ball BABIP has higher persistence than line-drive BABIP could be a clue as to who is more likely to continue putting up high BABIP rates as more data becomes available. Although projections using these more precise, continuous descriptions of batted-ball types are likely to be quite accurate, that data is quite messy; without knowing how to analyze this data in general categories, we may miss important information.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Matt - Very interesting stuff. But it seems like a lot of this can be attributed to guys hitting the ball "hard" when they hit it on the ground (or hitting more line drives), which will hopefully be borne out more by HitFX. Seeing as hitting the ball hard is a prime objective of hitting well, how much is this list bound to be, year in and year out, just a catalogue of the "best" hitters by other metrics, too? These 10 aren't replacement level guys, but mostly all-stars - not just "BABIP All-Stars". Did you predict - successfully or otherwise - high BABIPs for more average (or even mediocre) players? If not, does that suggest a lessened usefulness of BABIP?
Dan, thanks for your comment. You're exactly right that BABIP Superstars are bound to be regular Superstars. 70% of PA end in a ball in play, so it will carry a lot of weight. It's also why discerning the origins of BABIP by batted ball type is so important.

The best hitters do succeed in other ways though, and it's not simply a matter of just hitting the ball hard. Note the absence of Albert Pujols on the list. My model project his BABIP at .313 and he's at .295. Homerun hitters hit the ball hard and also pop out more frequently. They also tend to pull the ball a lot (e.g. Howard, Dunn) which leads to lower BABIPs on groundballs specifically. A lot of the top guys in BABIP are groundball hitters too. Luis Castillo I had at .324 and he's at .347. Speed guys, too.

Even if HITf/x removes the noise from line drive rate a little bit by having a mathematical description, line drive rate still won't have enough persistence to explain BABIP.

The key usefulness in this approach is to discover which hitters are likely to maintain their BABIP and which aren't. Geovany Soto was an example I cited where he was seeing his line drives caught too infrequently to sustain his level of production. Francoeur hit too many popups to produce a decent BABIP. A key piece of information embedded in this article is that although most young players with high BABIPs will see their numbers fall over time, guys like Kemp, Lewis, and Span are particularly likely to keep their level of production because of the way that their numbers break down. You would regress a high BABIP like theirs a lot further towards the mean if they didn't have some of the skills that they do.
One thing about your article.... You did not mention WHERE the line drives were hit. One thing in particular about Michael Young is that he hits a very high number of line drives to right field (for a right handed hitter). Some of these are actually caught by the first baseman, second baseman, and right fielder, but many of them also turn into hits. he very rarely pulls the ball and when he does, he almost always hits either a ground ball or a fly ball, very rarely is it a line drive.

From what I saw of Mauer, he sprays his line drives everywhere and has great plate coverage and bat control.

Ichiro tends to hit the ball either on a line to left field, or on the ground somewhere (often for a hit, due to his speed out of the box). He doesn't tend to hit many fly balls at all.

No comment on Jeter or the others, as their at bats didn't really stick out for me.
Very good point. I've seen the ability to spread the ball around the field cited as a key factor of BABIP. It's tough to get data on that, so when I wrote the original article on another website before I joined BP, I didn't have access to it. In attempting to do analysis on smaller samples, I found that this opposite field skill was particularly useful in predicting BABIP when you only had the previous year's data available to you.

For guys like Ichiro who have been around a few years, their ability to go the other way adds less information. The reason is that his BABIP on each batted ball type would already be higher than others if he goes the other way well, and that would drive up the projection anyway.

Excellent point, though, and something I didn't have a chance to address in the article. Thanks.
Good to see a projection system that actually gets Ichiro right. Perhaps integrating this into PECOTA in some way (finding historically similar BABIP, perhaps) would improve it as well.
Matt, it would be great to see a spreadsheet for predicted 2010 BABIPs once the season is over. Is that something you could do? Thank you.
This is interesting as it relates to batters. As for pitchers, we know that pitchers only have a small amount of control over whether or not a ball in play is converted to a hit, but I believe they do have more control over the batted ball type (sinkers tend to produce more ground balls, etc.). I'm a little confused as to what a pitcher should strive for as it relates to batted ball type. I seem to recall an article where Brian Bannister mentioned that for 2008 he focused striking more batters out, but in doing so he gave up more line drives and thus had worse results. He changed his approach to try to give up a greater percentage of ground balls, and he has had some success at that.

I would have thought ground balls would be what a pitcher would want to get when he's not striking them out. Line drives are converted to hits a lot and fly balls turn into home runs a lot. That leaves grounders, but if I'm reading this right then hitters should strive for groundballs as well. What am I missing here? Could it be that grounders might yield base hits more than flyballs but produce worse results overall (since there will be fewer extrabase hits)?
I would guess the reason you get at near the end is exactly right. Specifically, you will never see a ground ball home run, so that's one big reason that flyballs can be damaging to pitchers.
Agreed. If you include homeruns, batting average on flyballs is similar to groundballs and slugging average is way higher.
Really great stuff Matt. There are way too many people that have clung to LD numbers because it's supposedly better than nothing, without any real quantitative basis. This is probably as good as we'll get pre-hitfx.
What did your model project Kelly Johnson's 2009 BABIP to be? And based on his numbers this year, what is his 2010 projection? He's had an abysmal season, but also a ridiculously low .245 BABIP mark...
It was set to be .320. Clearly missed the mark there! Looking through his numbers, it looks like his swing is probably off this season maybe? His line drive rate is way down and he's popping up more. He doesn't strike out more than last year and he seems to be walking at about the same pace too. I look at his numbers and I wouldn't expect him to stay this bad based on them, but perhaps scouting could probably say more about what's up with him.
Isn't there something to be said for Mauer and Span's success on ground balls being correlated to playing half their games on the turf at the Metrodome? Was playing surface factored in to your projections? The new FieldTurf is more natural than grass playing surfaces but its still not the same (I've played at the Rogers Centre). Good read Matt!
Absolutely park effects would have a huge effect. Brian Cartwright has done some research on these kinds of things and told me that even groundball outfield hit rates have large swings in park effects. Adjusting for team would certainly help. This model was a rough model including some basics. Adjusting for aging factors, specifically with respect to some speed related factors, would be helpful as well. Very good point.
This is a great article. However, one point I'm confused the line drive year-to-year correlation .17 (2nd paragraph) or .12 (6th paragraph)? Or am I misreading?
Line Drive/BIP has a year-to-year correlation of .17
Line Drive BABIP has a year-to-year correlation of .12
So there's slightly more consistency in ability to produce line drives than in ability to hit 'em (line drives) where they ain't? But not much persistent ability to do either? That makes sense.