In Part One of this series, I updated a model for projecting BABIP, continuing on my previous work from last year. I showed that BABIP can be predicted successfully by looking at batted-ball rates and BABIP on those individual batted-ball types.

Hitters who are able to square up on the ball better will hit more line drives and make more contact in general, so they will have higher BABIPs. Hitters who swing on more of a downward plane will have higher BABIPs, since ground balls are hits more often than outfield fly balls, and those hitters will also hit fewer pop-ups. Similarly, hitters who hit the ball harder in general will have higher BABIPs, and faster players will have higher BABIPs, too, particularly if they hit a lot of ground balls. While many hitters still fall very close to an average range of BABIP, around .300, there are a few hitters who are particularly above average and can be expected to post high BABIPs on a consistent basis for a number of reasons. Last September, I called this group "The BABIP Superstars" and many of these hitters are the same as last year’s group. Next, I will show the top 10 Expected BABIPs for 2010:

1. Derek Jeter

Expected BABIP: .351

PECOTA BABIP: .331

2009 BABIP: .368

Jeter swings on a downward plane that generates a lot of ground balls, while also getting a decent number of line drives. He has ranged between 57-59 percent as far as ground balls in recent years (the league average is 45 percent), and has ranged between 19-22 percent with his line-drive rates (league average is 19 percent). Jeter also rarely pops the ball up, doing so 1.7 percent of the time in 2007, 2.5 percent in 2008, and just 0.7 percent in 2009 (league average is 7.8 percent). Jeter is pretty good at hitting ground balls through the infield, but that skill has regressed somewhat in recent years. He still reaches on infield ground balls at a good rate though, ranging between 13-17 percent (league average is 11 percent). The average rate of hits on outfield fly balls is about .170, while Jeter has ranged anywhere from average to as high as .280 in recent years.

2. Matt Kemp

Expected BABIP: .346

PECOTA BABIP: .341

2009 BABIP: .345

Matt Kemp is one of the few hitters who can put up a high BABIP while not making great contact, striking out in 23 percent and 21 percent of his plate appearances the last two years (where league average is 17 percent). However, when he hits the ball, Kemp hits it very hard. He produced solid line-drive rates, 20 percent and 22 percent the last two years, which obviously helps his BABIP. He also is very good at reaching on infield ground balls, doing so at 16-percent and 22-percent clips the last two seasons (league average is 11 percent). He also rarely pops up (just 1.8 percent and 4.2 percent in 2008-09, below the league average of 7.8 percent), which certainly minimizes the automatic outs. He also does a good job of getting hits to the outfield on fly balls. His outfield fly-ball BABIP was also a solid .216 last year (league average is .170). Kemp also obviously has some home run power, which is indicative of his ability to hit the ball hard enough that it falls in as well.

3. Ichiro Suzuki

Expected BABIP: .346

PECOTA BABIP: .346

2009 BABIP: .384

Like Jeter, Ichiro also has a solid downward swing that produces a lot of ground balls in the hole and minimizes the pop-ups. His ground-ball rate has ranged between 56 percent and 58 percent in recent years, while his pop-ups have remained at between 4.5 percent and 6.5 percent. Ichiro also reaches on infield ground balls at more than twice the league average, doing so in 22 percent, 22 percent and 24 percent of the time the last three years, with a league average of 11 percent. He also makes very good contact, which indicates how well he squares the ball up.

4. Howie Kendrick

Expected BABIP: .340

PECOTA BABIP: .332

2009 BABIP: .338

Kendrick was not even on last year’s BABIP Superstars list, but he’s all the way up to fourth this year. Like Jeter and Ichiro, Kendrick also has the downward plane to his swing that generates a lot of ground balls (54 percent, 55 percent, and 55 percent in 2007-09), and very few easy popups (3.2, 2.1, 1.0 percent). He also does a good job of reaching on outfield flies (.311, .278, .186 in 2007-09). When he does keep the ball in the infield, he’s decidedly above average at reaching first in this situation as well, doing so at 18-percent, 10-percent and 17-percent rates in recent years.

5. Michael Young

Expected BABIP: .339

PECOTA BABIP: .335

2009 BABIP: .351

Young is simply a line-drive machine. He has not had fewer than 21.5 percent of his at-bats wind up as line drives in any of the last seven seasons, a shockingly good success rate at a statistic that has just a .37 correlation; he has had line-drive rates of 28 percent and 23 percent the previous two years. His strong swing also leads to very few pop-ups, doing so just 1.3 percent, 2.8 percent, and 2.9 percent of the time the last three years. He also is good at hitting hard ground balls, as his have reached the hole between 19-20 percent of the time the last four years in a row, above the league average of 17 percent.

6. Joe Mauer

Expected BABIP: .339

PECOTA BABIP: .334

2009 BABIP: .373

Mauer is yet another player with downward plane to his swing, with ground-ball rates of 51 percent, 55 percent, and 51 percent in the last three years. He also popped up at an incredibly low rate of 1.9, 3.2, and 1.5 percent the last three years. His shockingly high .373 BABIP in 2009 was largely due to 26 percent of his ground balls reaching the outfield, but his recent years have been more modestly above average, ranging between 18-21 recent over the previous five seasons. Mauer is good at getting fly balls to fall in the outfield too, with outfield fly-ball BABIPs of .198, .220, and .196 the last three years, above the league average of .170. He also makes solid contact, missing with about 10 percent of his swings, which is about half as much as other hitters whiff. Overall, Mauer just hits, but you knew that.

7. Fred Lewis

Expected BABIP: .338

PECOTA BABIP: .330

2009 BABIP .348

Certainly not a *regular* superstar by any means, Lewis is at least a BABIP Superstar if he can actually hit the ball in play often enough. Unfortunately, Lewis has struck out in 24 percent and 25 percent of his PA the last two years, which is the reason (coupled with a home-run rate below 2 percent) that he likely will not start for the Giants this year. However, when Lewis *does* hit the ball, he hits a lot of ground balls (55 percent and 51 percent the last two years), few popups (2.6 percent and 4.3 percent), and reaches on his infield ground balls at a high rate too (14 percent and 15 percent).

8. Shin-Soo Choo

Expected BABIP: .336

PECOTA BABIP: .339

2009 BABIP: .370

Much of Choo’s .370 BABIP in 2009 was luck, just like anyone who has a .370 BABIP, but he still possesses a good amount of BABIP skill, as evidenced by his .367 BABIP in 2008. He avoids popups (5.3 percent and 3.4 percent the last two years), while getting hits to fall in the outfield at a good rate as well (27 percent and 24 percent vs. league average of 17 percent). He also hits his ground balls through the hole at a good rate too (26 percent and 21 percent vs. league average of 17 percent). Choo definitely has a swing that will drive the ball, but he still is unlikely to keep up his extremely high BABIP from last year.

9. David Wright

Expected BABIP: .335

PECOTA BABIP: .345

2009 BABIP: .394

David Wright’s power outage in 2009 was masked by his ridiculously high BABIP. Wright, like every other hitter in the league, is not good enough to reach on 39 percent of his balls in play every year, but he does have a solid skill at getting the hits to fall in. The main reason is his solid line-drive rate which has been 25 percent, 25 percent, and 26 percent the last three years. He also reaches on a decent number of his infield ground balls too, thanks to his 15-percent rate the last couple of years. He also has power historically, and that certainly proxies for being able to hit the ball hard.

10. Joey Votto

Expected BABIP: .335

PECOTA BABIP: .319

2009 BABIP: .372

Votto hits a lot of line drives (24 percent and 25 percent the last two years) and few popups (3.1 percent, 3.3 percent), which certainly explains most of his BABIP skill. His power (4.6 percent and 5.3 percent HR/AB in the last two years) also indicates he is hitting the ball very hard, as does his outfield fly-ball BABIP, which went from .196 in 2008 up to .281 in 2009.

**BABIP Trouble in Cleats**

Although it’s easier to detect the BABIP Superstars, because they generally tend to keep their jobs, there are some hitters who keep their jobs without getting many hits on balls in play. Since most plate appearances result in a ball in play, these are certainly not the worst people in the world at getting hits on BABIP, but just the worst among projected major-leaguers for next year. Starting from the bottom, here are the BABIP troublemakers.

1. Edwin Encarnacion

Expected BABIP: .262

PECOTA BABIP: .276

2009 BABIP: .245

The sister to the BABIP Superstars’ downward plane is the BABIP Troublemakers' upward plane. Encarnacion has generated ground-ball rates of just 38 percent, 34 percent, and 36 percent the previous three years, while generating pops on 15 percent, 20 percent, and 10 percent of balls in play. He reached on fewer infield ground balls this year (just 5 percent after 13 percent in 2007 and 15 percent in 2008), while also getting fewer of them to get to the outfield (trending from 23 percent in 2007, 16 percent in 2008, down to just 7 percent in 2009).

2. Rod Barajas

Expected BABIP: .262

PECOTA BABIP: .268

2009 BABIP: .229

Barajas has the upward plane problem too, with just 33 percent, 37 percent, and 30 percent for his ground-ball rates the last three years, and high pop-up rates of 16 percent, 12 percent, and 15 percent to boot. Barajas also struggles to hit line drives, doing so just 13 percent, 16 percent, and 15 percent of the time the last three years. Barajas is also very slow, which makes it hard for him to beat out infield hits, though he was league average last season at reaching on infield ground balls (11 percent) after being around 7 percent the previous two years.

3. Geoff Blum

Expected BABIP: .263

PECOTA BABIP: .281

2009 BABIP: .266

Blum is pretty much a guy with average home-run and strikeout rates, while also being someone who is pretty slow and has an upward plane to his swing. He also has had some trouble hitting line drives as well. His ground-ball rates have been 38 percent, 37 percent, and 37 percent the last three years, while his pop-up rates have been 10 percent, 13 percent, and 14 percent, and his line-drive rates have been 19 percent, 14 percent, and 18 percent.

4. Chris Young

Expected BABIP: .264

PECOTA BABIP: .285

2009 BABIP: .268

PECOTA projects a rebound from Young in 2010 after struggling last year. However, his swing would certainly need to flatten back out for this to happen. His ground-ball rate fell from 38 percent in 2007 and 39 percent in 2008 down to 28 percent in 2009. His pop-up rate subsequently jumped from 12% in 2007 and 11% in 2008 up to a massively disappointing 22 percent in 2009. He does not make great contact, but is redeemed by his speed, as he has reached on 16 percent, 14 percent, and 16 percent of his infield ground balls.

5. Pat Burrell

Expected BABIP: .267

PECOTA BABIP: .274

2009 BABIP: .271

Burrell has always had an uppercut swing, but has lost some of the power that kept his BABIP high in the process. His ground-ball rate has ranged from 31-34 percent the last few years, while his pop-up rate has gone from 12 percent in 2007 and 2008 to 9 percent in 2009. He is among the slowest players in baseball, so he won’t beat out many infield hits, and with his power falling, his outfield fly-ball BABIP has fallen too. Over the last six years, it has fallen gradually: .164, .181, .188, .124, .142, and then to .103 last year. Without a power resurgence, Burrell is going to have a lot of trouble, since his BABIP is low and his strikeout rate is pretty high. Much of his career value has come from home runs and walks, but with fewer home runs, pitchers won’t want to walk him either.

**Conclusion**

Looking through the BABIP Superstars and the BABIP Troublemakers, it’s pretty clear that the important factors in keeping up your BABIP are having a downward plane to your swing while still being able to square it up, and also some power and some speed. Part Three of this series will discuss the hitters on which PECOTA and E–BABIP differ the most.

#### Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
In yesterday's comment section, I recall you mentioning that you'd make the full spreadsheet of e-BABIP available, but I don't see a link above. Will this still be made available?

Thanks!

The reason is predicts same year BABIP pretty well is that BABIP on line drives is about .730 and on everything else it's about .190. The difference between the best and worst at BABIP skill is only about .680-.780 in skill level and the difference between the best and worst at non-LD BABIP is probably only .140-.280. So if you weight those two groups, you can always get close IF YOU KNOW THE % OF LINE DRIVES. The real problem is that line drive has only a .37 year-to-year correlation, making it relatively less accurate than knowing a player's speed, power, contact, and GB, OFFB, and PU rates.

Your data is much more predictive and I agree with everything you said; the real breakthrough you've made is making predicting LD% and, consequently, BABIP a more reliable thing, and that's something I think we all find quite useful.

Mauer and Votto especially not only have good BABIPs to all fields, but also have good power to the opposite field, a trait not shared by very many big league players.

I would guess that this is just another indicator of superior bat control that also helps them maintain solid BABIP rates.

I've been trying to replicate what they have using Retrosheet's info (because it's free!), but not finding much success so far.

They have stats for each batter for balls hit "As L to R", meaning how many balls the batter hit as a Lefty to Right Field and what happened on those balls. So Joe Mauer went to Left Field on 160 plate appearances, went to Center Field on 152 PAs, and went to Right Field on 153. Very even. But more importantly, he had his best stats went he went to Left Field (his opposite field), when the vast majority of hitters suck going opposite field and are far better pulling the ball.

It's easily the best part of their site, but, as you mentioned, a lot of the best use of this data is in aggregation and they don't offer that. You need to have the actual Play-by-Play and reconstruct what they have to get the aggregate info and "type every number in for each individual player." And their play-by-play data comes from BIS, I think. Which costs $$$

A high line drive rate is better than

a high ground ball rate, which is better than

a high fly ball rate, which is better than

a high pop-up rate.

However, as SIERA shows, "run prevention improves as groundball rate increases".

For groundballs to be beneficial to both the hitter and the pitcher feels like a paradox. Is there an easy way out of that?

My article last week on SIERA and BABIP actually showed that pitchers who allow more fly balls have lower BABIPs, which is partly why the "cost" of a fly ball isn't as high for a metric like SIERA as it is for other metrics that are based on run estimation of individual outcomes. Since pitchers who allow more fly balls also allow more home runs, a fly ball is a bad thing for run prevention in both metrics, but it's not as bad because SIERA implicitly assumes that their BABIPs also go down a little.

I guess the real difference is home runs. BABIP is part of hitter skill, but certainly not all.

Look at it this way: Assuming that by knowing BABIP we consequently know OBP for each batted ball type (no one walks on a batted ball).

What would the SLG be for each batted ball type? That progression would go fly > line drive > grounder > popup. That's what makes line drive hitting clearly the best overall, and ground balls and fly balls a lot closer to equivalent until you factor in the double play consideration.

Getting on base is an important part of hitting, but it isn't the -only- part of hitting. Extra base hits are a pretty big deal, too, and ground balls only do that if you rip them down the 1st or 3rd baselines into the corner.

It's a good question. I would guess it's more useful at higher levels than lowel levels, and you would probably need to come out with some sort of Minor League Equivalencies for each metric to convert it. Also, since ballparks are so different across the minors, you couldn't get away without doing park effects as I've attempted to do. Once you figured out the MLEs and PFs, you could probably do well to convert infield hit rate in the minors to the majors considering the 3B are better, convert outfield fly ball BABIP to reflect better fielding OF. The real question is how much line drive rate would carry over. I would guess that although major league pitchers don't have much difference in their LD% abilities, there is probably a lot more difference between the LD% of MLB pitchers and pitchers in high-A ball. If you converted all these things, you really might be able to do something like that assuming you were careful.

At a more qualitative level, you could probably do a lot. What is the groundball/flyball ratio of this guy in AAA-- does it indicate a downward plane to his swing? Does pop-up rate shine any clues? How often is he reaching on infield hits? How much power does he have? What's his contact rate? How well does he get ground balls to the outfield and fly balls to fall in the outfield compared to other hitters with similar skills? Is this indicative of a good ability to spread the ball around?

Once you know all this, you can probably pin down where a hitter is. A speedy ground ball hitter who spreads the ball around, makes good contact, and rarely pops up might be a .330 BABIP kind of guy. A power hitter who rarely hits ground balls but spreads the ball around well might be a .285 guy. A power hitter who pulls everything but hits everything very hard and rarely pops up might be a .315 guy. Things like that are very informative.

1. Braun - He has a .338 career BABIP and it was .353 in '09. His batted ball data just doesn't appear to support this. He pops up a lot, has hit a below average amount of line drives, and though he is very athletic, he doesn't possess elite speed. Most projections seem to peg him for a ~.334 BABIP.

2. Bruce - Everyone knows he had a low BABIP last year (.221), but I'm wondering what it "should" have been given neutral luck. I'm guessing around .275-.280.

These articles have been great. I tried that THT xBABIP calculator awhile back but I thought it was spitting out some weird results. I get the sense that yours will be much more reliable.

What is the advantage of taking fly balls out of the analysis when looking at a hitters? Are HR/Flyball rates that much more constant for hitters than your other four component rates of batting average?

At the end of the day, what's the point in saying "OK, good I have his BABIP, now I just need to add in his HR/(not K not BIP) rate to calculate the a batting average.

There's also another reason, though. Say you have 100 balls in play and that there are 40 ground balls, 20 line drives, 30 outfield fly balls, and 10 pop ups. If none of those are home runs, then line drives only make up 20% of your balls in play. However, if you're Prince Fielder and 10 of every 30 outfield fly balls leave the yard, then 20 of every 90 balls in play are line drives-- 22% instead of 20%, which would drive BABIP up by about .010. So there's two effects there.

Step 2. Proceed

Does just analyzing balls in play make it more likely that you will correctly project a player's batting average? That's the goal (unless you want to say the goal is projecting their batting average as a portion of their OBP), isn't it?

Look at your top 10 guys - you've got 7 or 8 guys who historically have hit for high averages - and two guys with comparatively small sample sizes (Votto and Lewis). If all this analysis does is concludes that Joe Mauer, Michael Young and Ichiro are likely to hit for high averages - everybody already knew that.

The point of projection is to say that Michael Young and Joe Mauer will not only have "highish batting averages" but to pinpoint how high. To the extent that you can increase the accuracy of how high you think it will be, projecting BABIP has value. Do you believe projection is a useful concept at all?

Does breaking down BABIP into "component parts" make the prediction of batting average more accurate?

You haven't shown anywhere that it does, have you? As far as I can tell you've made an academic exercise basically trying to improve on other models prediction of BABIP which presumably causes their estimation of batting average to be off.

But what you haven't shown, and now it's becoming apparent you haven't even thought about it - is whether using this model actually improves the on the prediction of batting average over a simple weighted mean. If I take Marcel and add in park factors, and you take Pecota and revise the BABIP's with your new model - who comes out better at predicting batting average?

"They also were closer to actual BABIP than PECOTA was 57 percent of the time, and closer to actual BABIP than CHONE was 60 percent of the time. That might not seem like much, but those fractions are significant at the 95-percent and 99-percent level, respectively. In other words, there is less than a five-percent chance that there would have been that large of a difference between my BABIP model and PECOTA if they were equally as good, and less than a one-percent chance that the model would have beaten CHONE as badly as it did just by chance."

Implicit in this is that if you add in the same number of HR and SO that PECOTA predicts, but change the number of hits and outs on balls in play to E-BABIP, you will have a superior prediction of AVG.

If you're going to make flat out accusations and hurl insults, you should at least do a brief scan of the articles to make sure I didn't address this. I also talked about it in last year's "BABIP Superstars" article and last year's "You Can Beat PECOTA Without a Computer Model" articles.

Who was the person who said "How can you honestly read these two articles and actually type your Step 1?"? That was immmature and rude, particualy when I have had to repeat myself two times and couldn't get you to answer the question.

What I am saying is that looking at hits/(flyball in the ballpark) is nonsensical. Why not just look at hits/flyball? Why are you focusing only on BABIP?

Me: "disregard BABIP"

You: "Look here's my quote, and it says BABIP thee times and I compare it to two systems which predict BABIP".

Me: "Yeah, but why are you using BABIP?"

You: "I regressed BABIP, it's better than CHONE!"

Me: "Yeah, but why are you using BABIP?"

You: "Quit insulting me"

Seriously, have a colleague explain it to you. I can only ask a person a question three times before I lose my patience. I am sure you're very good at mathematics - it's ability to take a step back and apply reasoning that you're lacking.

The point of discussing hitter BABIP was mentioned in the BABIP Roundtable we did a couple months ago, and it's well worth reviewing if you're still skeptical. The key is that you want to regress the statistics that represent more luck and less skill more than you want to regress the statistics that represent less luck and more skill. Home run rates have less luck than line drive BABIP. But infield hit rates have less luck than outfield fly ball BABIP. By breaking each skill down, you take the deviations from average that represent skill and regress those to the mean less than you regress the deviations from average that represent luck.

As cantankerous as this thread of comments got, I like seeing them because I get to learn more about the underlying logic, concepts and assumptions/assertions about BABIP that someone like me who is not well versed in sabremetrics is unfamiliar with.

In either event, I like Matt's breakdown and thanks to Neal for stirring up a drill-down.

By enabling a more accurate prediction of BABIP, one has the ability to improve the modeling systems. This second step has not yet been accomplished.