Ahead in the Count: Predicting BABIP, Part 2

March 24, 2010

In Part One of this series, I updated a model for projecting BABIP, continuing on my previous work from last year. I showed that BABIP can be predicted successfully by looking at batted-ball rates and BABIP on those individual batted-ball types.

Hitters who are able to square up on the ball better will hit more line drives and make more contact in general, so they will have higher BABIPs. Hitters who swing on more of a downward plane will have higher BABIPs, since ground balls are hits more often than outfield fly balls, and those hitters will also hit fewer pop-ups. Similarly, hitters who hit the ball harder in general will have higher BABIPs, and faster players will have higher BABIPs, too, particularly if they hit a lot of ground balls. While many hitters still fall very close to an average range of BABIP, around .300, there are a few hitters who are particularly above average and can be expected to post high BABIPs on a consistent basis for a number of reasons. Last September, I called this group "The BABIP Superstars" and many of these hitters are the same as last year’s group. Next, I will show the top 10 Expected BABIPs for 2010:

1. Derek Jeter
Expected BABIP: .351
PECOTA BABIP: .331
2009 BABIP: .368

Jeter swings on a downward plane that generates a lot of ground balls, while also getting a decent number of line drives. He has ranged between 57-59 percent as far as ground balls in recent years (the league average is 45 percent), and has ranged between 19-22 percent with his line-drive rates (league average is 19 percent). Jeter also rarely pops the ball up, doing so 1.7 percent of the time in 2007, 2.5 percent in 2008, and just 0.7 percent in 2009 (league average is 7.8 percent). Jeter is pretty good at hitting ground balls through the infield, but that skill has regressed somewhat in recent years. He still reaches on infield ground balls at a good rate though, ranging between 13-17 percent (league average is 11 percent). The average rate of hits on outfield fly balls is about .170, while Jeter has ranged anywhere from average to as high as .280 in recent years.

2. Matt Kemp
Expected BABIP: .346
PECOTA BABIP: .341
2009 BABIP: .345

Matt Kemp is one of the few hitters who can put up a high BABIP while not making great contact, striking out in 23 percent and 21 percent of his plate appearances the last two years (where league average is 17 percent). However, when he hits the ball, Kemp hits it very hard. He produced solid line-drive rates, 20 percent and 22 percent the last two years, which obviously helps his BABIP. He also is very good at reaching on infield ground balls, doing so at 16-percent and 22-percent clips the last two seasons (league average is 11 percent). He also rarely pops up (just 1.8 percent and 4.2 percent in 2008-09, below the league average of 7.8 percent), which certainly minimizes the automatic outs. He also does a good job of getting hits to the outfield on fly balls. His outfield fly-ball BABIP was also a solid .216 last year (league average is .170). Kemp also obviously has some home run power, which is indicative of his ability to hit the ball hard enough that it falls in as well.

3. Ichiro Suzuki
Expected BABIP: .346
PECOTA BABIP: .346
2009 BABIP: .384

Like Jeter, Ichiro also has a solid downward swing that produces a lot of ground balls in the hole and minimizes the pop-ups. His ground-ball rate has ranged between 56 percent and 58 percent in recent years, while his pop-ups have remained at between 4.5 percent and 6.5 percent. Ichiro also reaches on infield ground balls at more than twice the league average, doing so in 22 percent, 22 percent and 24 percent of the time the last three years, with a league average of 11 percent. He also makes very good contact, which indicates how well he squares the ball up.

4. Howie Kendrick
Expected BABIP: .340
PECOTA BABIP: .332
2009 BABIP: .338

Kendrick was not even on last year’s BABIP Superstars list, but he’s all the way up to fourth this year. Like Jeter and Ichiro, Kendrick also has the downward plane to his swing that generates a lot of ground balls (54 percent, 55 percent, and 55 percent in 2007-09), and very few easy popups (3.2, 2.1, 1.0 percent). He also does a good job of reaching on outfield flies (.311, .278, .186 in 2007-09). When he does keep the ball in the infield, he’s decidedly above average at reaching first in this situation as well, doing so at 18-percent, 10-percent and 17-percent rates in recent years.

5. Michael Young
Expected BABIP: .339
PECOTA BABIP: .335
2009 BABIP: .351

Young is simply a line-drive machine. He has not had fewer than 21.5 percent of his at-bats wind up as line drives in any of the last seven seasons, a shockingly good success rate at a statistic that has just a .37 correlation; he has had line-drive rates of 28 percent and 23 percent the previous two years. His strong swing also leads to very few pop-ups, doing so just 1.3 percent, 2.8 percent, and 2.9 percent of the time the last three years. He also is good at hitting hard ground balls, as his have reached the hole between 19-20 percent of the time the last four years in a row, above the league average of 17 percent.

6. Joe Mauer
Expected BABIP: .339
PECOTA BABIP: .334
2009 BABIP: .373

Mauer is yet another player with downward plane to his swing, with ground-ball rates of 51 percent, 55 percent, and 51 percent in the last three years. He also popped up at an incredibly low rate of 1.9, 3.2, and 1.5 percent the last three years. His shockingly high .373 BABIP in 2009 was largely due to 26 percent of his ground balls reaching the outfield, but his recent years have been more modestly above average, ranging between 18-21 recent over the previous five seasons. Mauer is good at getting fly balls to fall in the outfield too, with outfield fly-ball BABIPs of .198, .220, and .196 the last three years, above the league average of .170. He also makes solid contact, missing with about 10 percent of his swings, which is about half as much as other hitters whiff. Overall, Mauer just hits, but you knew that.

7. Fred Lewis
Expected BABIP: .338
PECOTA BABIP: .330
2009 BABIP .348

Certainly not a regular superstar by any means, Lewis is at least a BABIP Superstar if he can actually hit the ball in play often enough. Unfortunately, Lewis has struck out in 24 percent and 25 percent of his PA the last two years, which is the reason (coupled with a home-run rate below 2 percent) that he likely will not start for the Giants this year. However, when Lewis does hit the ball, he hits a lot of ground balls (55 percent and 51 percent the last two years), few popups (2.6 percent and 4.3 percent), and reaches on his infield ground balls at a high rate too (14 percent and 15 percent).

8. Shin-Soo Choo
Expected BABIP: .336
PECOTA BABIP: .339
2009 BABIP: .370

Much of Choo’s .370 BABIP in 2009 was luck, just like anyone who has a .370 BABIP, but he still possesses a good amount of BABIP skill, as evidenced by his .367 BABIP in 2008. He avoids popups (5.3 percent and 3.4 percent the last two years), while getting hits to fall in the outfield at a good rate as well (27 percent and 24 percent vs. league average of 17 percent). He also hits his ground balls through the hole at a good rate too (26 percent and 21 percent vs. league average of 17 percent). Choo definitely has a swing that will drive the ball, but he still is unlikely to keep up his extremely high BABIP from last year.

9. David Wright
Expected BABIP: .335
PECOTA BABIP: .345
2009 BABIP: .394

David Wright’s power outage in 2009 was masked by his ridiculously high BABIP. Wright, like every other hitter in the league, is not good enough to reach on 39 percent of his balls in play every year, but he does have a solid skill at getting the hits to fall in. The main reason is his solid line-drive rate which has been 25 percent, 25 percent, and 26 percent the last three years. He also reaches on a decent number of his infield ground balls too, thanks to his 15-percent rate the last couple of years. He also has power historically, and that certainly proxies for being able to hit the ball hard.

10. Joey Votto
Expected BABIP: .335
PECOTA BABIP: .319
2009 BABIP: .372

Votto hits a lot of line drives (24 percent and 25 percent the last two years) and few popups (3.1 percent, 3.3 percent), which certainly explains most of his BABIP skill. His power (4.6 percent and 5.3 percent HR/AB in the last two years) also indicates he is hitting the ball very hard, as does his outfield fly-ball BABIP, which went from .196 in 2008 up to .281 in 2009.

BABIP Trouble in Cleats

Although it’s easier to detect the BABIP Superstars, because they generally tend to keep their jobs, there are some hitters who keep their jobs without getting many hits on balls in play. Since most plate appearances result in a ball in play, these are certainly not the worst people in the world at getting hits on BABIP, but just the worst among projected major-leaguers for next year. Starting from the bottom, here are the BABIP troublemakers.

1. Edwin Encarnacion
Expected BABIP: .262
PECOTA BABIP: .276
2009 BABIP: .245

The sister to the BABIP Superstars’ downward plane is the BABIP Troublemakers' upward plane. Encarnacion has generated ground-ball rates of just 38 percent, 34 percent, and 36 percent the previous three years, while generating pops on 15 percent, 20 percent, and 10 percent of balls in play. He reached on fewer infield ground balls this year (just 5 percent after 13 percent in 2007 and 15 percent in 2008), while also getting fewer of them to get to the outfield (trending from 23 percent in 2007, 16 percent in 2008, down to just 7 percent in 2009).

2. Rod Barajas
Expected BABIP: .262
PECOTA BABIP: .268
2009 BABIP: .229

Barajas has the upward plane problem too, with just 33 percent, 37 percent, and 30 percent for his ground-ball rates the last three years, and high pop-up rates of 16 percent, 12 percent, and 15 percent to boot. Barajas also struggles to hit line drives, doing so just 13 percent, 16 percent, and 15 percent of the time the last three years. Barajas is also very slow, which makes it hard for him to beat out infield hits, though he was league average last season at reaching on infield ground balls (11 percent) after being around 7 percent the previous two years.

3. Geoff Blum
Expected BABIP: .263
PECOTA BABIP: .281
2009 BABIP: .266

Blum is pretty much a guy with average home-run and strikeout rates, while also being someone who is pretty slow and has an upward plane to his swing. He also has had some trouble hitting line drives as well. His ground-ball rates have been 38 percent, 37 percent, and 37 percent the last three years, while his pop-up rates have been 10 percent, 13 percent, and 14 percent, and his line-drive rates have been 19 percent, 14 percent, and 18 percent.

4. Chris Young
Expected BABIP: .264
PECOTA BABIP: .285
2009 BABIP: .268

PECOTA projects a rebound from Young in 2010 after struggling last year. However, his swing would certainly need to flatten back out for this to happen. His ground-ball rate fell from 38 percent in 2007 and 39 percent in 2008 down to 28 percent in 2009. His pop-up rate subsequently jumped from 12% in 2007 and 11% in 2008 up to a massively disappointing 22 percent in 2009. He does not make great contact, but is redeemed by his speed, as he has reached on 16 percent, 14 percent, and 16 percent of his infield ground balls.

5. Pat Burrell
Expected BABIP: .267
PECOTA BABIP: .274
2009 BABIP: .271

Burrell has always had an uppercut swing, but has lost some of the power that kept his BABIP high in the process. His ground-ball rate has ranged from 31-34 percent the last few years, while his pop-up rate has gone from 12 percent in 2007 and 2008 to 9 percent in 2009. He is among the slowest players in baseball, so he won’t beat out many infield hits, and with his power falling, his outfield fly-ball BABIP has fallen too. Over the last six years, it has fallen gradually: .164, .181, .188, .124, .142, and then to .103 last year. Without a power resurgence, Burrell is going to have a lot of trouble, since his BABIP is low and his strikeout rate is pretty high. Much of his career value has come from home runs and walks, but with fewer home runs, pitchers won’t want to walk him either.

Conclusion

Looking through the BABIP Superstars and the BABIP Troublemakers, it’s pretty clear that the important factors in keeping up your BABIP are having a downward plane to your swing while still being able to square it up, and also some power and some speed. Part Three of this series will discuss the hitters on which PECOTA and E–BABIP differ the most.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Matt Swartz

Latest Articles

You need to be logged in to comment. Login or Subscribe

jivas21

3/24

Great stuff as always Matt.

In yesterday's comment section, I recall you mentioning that you'd make the full spreadsheet of e-BABIP available, but I don't see a link above. Will this still be made available?

Thanks!

Reply to jivas21

swartzm

3/24

Thanks! That will be at the end of tomorrow's article. Tomorrow's article is going to talk about E-BABIP vs. PECOTA-BABIP. Then I'll link to the E-BABIP spreadsheet at the end.

Reply to swartzm

jivas21

3/24

Awesome - thanks. I'm looking forward to the article.

Reply to jivas21

Junts1

3/24

Hey Matt, I am sort of fascinated at how reliable Marc Normandin's hacky short-cut of years past for BABIP, that it tends to equal LD%+12%, remains so reliably true even at more complex levels of BABIP understanding. Only the really heavy ground-ball players diverge far from that particular model, and it remains an incredibly useful shortcut, even for players like Kemp.

Reply to Junts1

swartzm

3/24

I'm not sure if Marc still uses that model, and what for, but it's been debunked as a predictor of next year's BABIP. The correlation between BABIP in year-1 and BABIP in year-2 is higher than the correlation between (LD% + .120) in year-1 and BABIP in year-2.

The reason is predicts same year BABIP pretty well is that BABIP on line drives is about .730 and on everything else it's about .190. The difference between the best and worst at BABIP skill is only about .680-.780 in skill level and the difference between the best and worst at non-LD BABIP is probably only .140-.280. So if you weight those two groups, you can always get close IF YOU KNOW THE % OF LINE DRIVES. The real problem is that line drive has only a .37 year-to-year correlation, making it relatively less accurate than knowing a player's speed, power, contact, and GB, OFFB, and PU rates.

Reply to swartzm

Junts1

3/24

It was used as a dirty indicator of current BABIP and whether someone was over or under-performing around the dawn of the player profile.

Your data is much more predictive and I agree with everything you said; the real breakthrough you've made is making predicting LD% and, consequently, BABIP a more reliable thing, and that's something I think we all find quite useful.

Reply to Junts1

sunpar

3/24

I can't speak for all of your top 10, but at least Jeter, Suzuki, Mauer, and Votto are very good at using all fields.

Mauer and Votto especially not only have good BABIPs to all fields, but also have good power to the opposite field, a trait not shared by very many big league players.

I would guess that this is just another indicator of superior bat control that also helps them maintain solid BABIP rates.

Reply to sunpar

swartzm

3/24

It definitely is. THT's xBABIP uses this statistic which they call "spray". Foruntately, using historical BABIP rates seems to be picking up this effect for me, but if I could get my hands on that data, I would like to see if high historical BABIPs on grounders and fly balls are more likely to be maintained by hitters who spread the ball to all fields.

Reply to swartzm

sunpar

3/25

I know fangraphs has splits info for all batters and one of their splits is outcomes on balls in play hit to Right, Center, and Left. I believe they use baseball info solutions, and I'm guessing you have access to similar info.

I've been trying to replicate what they have using Retrosheet's info (because it's free!), but not finding much success so far.

Reply to sunpar

swartzm

3/25

I think you mean Baseball-Reference, don't you? I can't find it on Fangraphs. I can't get any normal output that I could automatically merge, and I haven't been able to get reliable massive datasets in a form that wouldn't involve typing every number in for each individual player. In truth, I stink at programming, so I'm reliant on excellent help from Eric Seidman and others for these things. I'm not sure if people here are comfortable replicating the B-R L/C/R numbers reliably without muddy data.

Reply to swartzm

sunpar

3/25

The Fangraphs info is found in individual player pages, by clicking on the "splits" tab. Hopefully, this link works and sends you directly there: http://www.fangraphs.com/statsplits.aspx?playerid=1857&position=C&season=

They have stats for each batter for balls hit "As L to R", meaning how many balls the batter hit as a Lefty to Right Field and what happened on those balls. So Joe Mauer went to Left Field on 160 plate appearances, went to Center Field on 152 PAs, and went to Right Field on 153. Very even. But more importantly, he had his best stats went he went to Left Field (his opposite field), when the vast majority of hitters suck going opposite field and are far better pulling the ball.

It's easily the best part of their site, but, as you mentioned, a lot of the best use of this data is in aggregation and they don't offer that. You need to have the actual Play-by-Play and reconstruct what they have to get the aggregate info and "type every number in for each individual player." And their play-by-play data comes from BIS, I think. Which costs $$$

Reply to sunpar

TheRealNeal

3/25

Chris Young grounded out to 2nd a total of 1 time at home last year and matched that with 1 single to right field according to MLB.com. It seems like the way the players allow themselves to be defended is at least as important as quality of contact.

Reply to TheRealNeal

sensij

3/24

If I understand correctly, for batters:
A high line drive rate is better than
a high ground ball rate, which is better than
a high fly ball rate, which is better than
a high pop-up rate.

However, as SIERA shows, "run prevention improves as groundball rate increases".

For groundballs to be beneficial to both the hitter and the pitcher feels like a paradox. Is there an easy way out of that?

Reply to sensij

swartzm

3/24

Thanks for the question. There are several reasons for this. The primary reason is that ground ball hitters are good at BABIP, but not an EqA or any other measure of overall offense. Take Juan Pierre. He's a below average hitter with above average BABIP. The reason is largely related to ground ball rates. Another large factor is pitchers do not control their rate of home runs per outfield fly ball. So pitchers who allow more fly balls naturally allow more home runs. However, not all hitters that have high fly ball rates are going to have high home run rates, because there are huge difference in the rate of home runs per fly ball for hitters.

My article last week on SIERA and BABIP actually showed that pitchers who allow more fly balls have lower BABIPs, which is partly why the "cost" of a fly ball isn't as high for a metric like SIERA as it is for other metrics that are based on run estimation of individual outcomes. Since pitchers who allow more fly balls also allow more home runs, a fly ball is a bad thing for run prevention in both metrics, but it's not as bad because SIERA implicitly assumes that their BABIPs also go down a little.

I guess the real difference is home runs. BABIP is part of hitter skill, but certainly not all.

Reply to swartzm

mnsportsguy1

3/25

Way to go Matt, keep up the great work.

Reply to mnsportsguy1

Junts1

3/24

Not all hits are created equal. Ground balls may be more likely to resolve in a player reaching base than fly balls, but they are also a lot more likely to result in double plays and singles, whereas fly balls produce single outs and extra-base hits.

Look at it this way: Assuming that by knowing BABIP we consequently know OBP for each batted ball type (no one walks on a batted ball).

What would the SLG be for each batted ball type? That progression would go fly > line drive > grounder > popup. That's what makes line drive hitting clearly the best overall, and ground balls and fly balls a lot closer to equivalent until you factor in the double play consideration.

Getting on base is an important part of hitting, but it isn't the -only- part of hitting. Extra base hits are a pretty big deal, too, and ground balls only do that if you rip them down the 1st or 3rd baselines into the corner.

Reply to Junts1

kringent

3/24

This is really impressive stuff, Matt. Do you think your research is applicable to minor league performance/prospect watching?

Reply to kringent

swartzm

3/24

Thanks for the compliment :-)

It's a good question. I would guess it's more useful at higher levels than lowel levels, and you would probably need to come out with some sort of Minor League Equivalencies for each metric to convert it. Also, since ballparks are so different across the minors, you couldn't get away without doing park effects as I've attempted to do. Once you figured out the MLEs and PFs, you could probably do well to convert infield hit rate in the minors to the majors considering the 3B are better, convert outfield fly ball BABIP to reflect better fielding OF. The real question is how much line drive rate would carry over. I would guess that although major league pitchers don't have much difference in their LD% abilities, there is probably a lot more difference between the LD% of MLB pitchers and pitchers in high-A ball. If you converted all these things, you really might be able to do something like that assuming you were careful.

At a more qualitative level, you could probably do a lot. What is the groundball/flyball ratio of this guy in AAA-- does it indicate a downward plane to his swing? Does pop-up rate shine any clues? How often is he reaching on infield hits? How much power does he have? What's his contact rate? How well does he get ground balls to the outfield and fly balls to fall in the outfield compared to other hitters with similar skills? Is this indicative of a good ability to spread the ball around?

Once you know all this, you can probably pin down where a hitter is. A speedy ground ball hitter who spreads the ball around, makes good contact, and rarely pops up might be a .330 BABIP kind of guy. A power hitter who rarely hits ground balls but spreads the ball around well might be a .285 guy. A power hitter who pulls everything but hits everything very hard and rarely pops up might be a .315 guy. Things like that are very informative.

Reply to swartzm

Oleoay

3/25

How much do park factors affect ground ball/line drive/pop up rates? Short of altitude effects, shouldn't a pop up be a pop up or a groundball be a groundball anywhere?

Reply to Oleoay

willjosh09

3/24

Two players I'm looking forward to checking out when the spreadsheet is available:

1. Braun - He has a .338 career BABIP and it was .353 in '09. His batted ball data just doesn't appear to support this. He pops up a lot, has hit a below average amount of line drives, and though he is very athletic, he doesn't possess elite speed. Most projections seem to peg him for a ~.334 BABIP.

2. Bruce - Everyone knows he had a low BABIP last year (.221), but I'm wondering what it "should" have been given neutral luck. I'm guessing around .275-.280.

These articles have been great. I tried that THT xBABIP calculator awhile back but I thought it was spitting out some weird results. I get the sense that yours will be much more reliable.

Reply to willjosh09

TheRealNeal

3/24

The articles have been interesting, but when you start to break balls-in-play down into their component rates, I think you're starting to lose the thread.

What is the advantage of taking fly balls out of the analysis when looking at a hitters? Are HR/Flyball rates that much more constant for hitters than your other four component rates of batting average?

At the end of the day, what's the point in saying "OK, good I have his BABIP, now I just need to add in his HR/(not K not BIP) rate to calculate the a batting average.

Reply to TheRealNeal

swartzm

3/24

I'm not taking them out of the analysis. I regress on GB/BIP, PU/BIP, LD/BIP. That implicitly assumes OFFB/BIP is included. You can't included four things that add up to 100% in a regression because of multicollinearity. Instead, you include the three other things. The reason ground balls and line drives are positive coefficients is because the more of them you get, holding popups constant, the less flyballs are hit. And ground balls and line drives have higher BABIP than outfield fly balls. Similarly, popups are negative because holding line drives and ground balls, constant, more popups and fewer outfield fly balls means a lower BABIP.

Reply to swartzm

TheRealNeal

3/24

Totally missed the point there. Why are you not including fly balls that lead to HR's in the analysis? What's the implicit value of BABIP? The previous value was thought to be in saying "it's random, don't look at it too closely", by showing that BABIP isn't random, you're defeating the whole purpose of it. Does looking at just you're three rates there, give you a better shot of predicting batting average than looking at all at bats regardless of outcome? That's the point after all, BABIP doesn't mean anything in a winning ballgames sense, it's just an abstract.

Reply to TheRealNeal

swartzm

3/25

All fly balls are included, but the use of HR/FB is mostly a proxy variable for power. It's not implicitly subtracted at all. The more frequently hitters hit 400 foot fly balls, the more often they hit 350 foot fly balls so the more ground outfielders have to cover, and the fewer balls they catch.

There's also another reason, though. Say you have 100 balls in play and that there are 40 ground balls, 20 line drives, 30 outfield fly balls, and 10 pop ups. If none of those are home runs, then line drives only make up 20% of your balls in play. However, if you're Prince Fielder and 10 of every 30 outfield fly balls leave the yard, then 20 of every 90 balls in play are line drives-- 22% instead of 20%, which would drive BABIP up by about .010. So there's two effects there.

Reply to swartzm

TheRealNeal

3/25

Step 1. Dismiss 'balls in play as a concept which has already served it's purpose.

Step 2. Proceed

Does just analyzing balls in play make it more likely that you will correctly project a player's batting average? That's the goal (unless you want to say the goal is projecting their batting average as a portion of their OBP), isn't it?

Look at your top 10 guys - you've got 7 or 8 guys who historically have hit for high averages - and two guys with comparatively small sample sizes (Votto and Lewis). If all this analysis does is concludes that Joe Mauer, Michael Young and Ichiro are likely to hit for high averages - everybody already knew that.

Reply to TheRealNeal

swartzm

3/25

How can you honestly read these two articles and actually type your Step 1? Do you realize that balls in play being hits more often will necessarily imply higher batting averages? Of course players who have high BABIPs will have higher AVG and higher OBP. Studying BABIP as a subset of performance clearly has value, as I've explained in both of these articles, because it is more complicated to project.

The point of projection is to say that Michael Young and Joe Mauer will not only have "highish batting averages" but to pinpoint how high. To the extent that you can increase the accuracy of how high you think it will be, projecting BABIP has value. Do you believe projection is a useful concept at all?

Reply to swartzm

TheRealNeal

3/25

I'll make the question as simple as possible.

Does breaking down BABIP into "component parts" make the prediction of batting average more accurate?

You haven't shown anywhere that it does, have you? As far as I can tell you've made an academic exercise basically trying to improve on other models prediction of BABIP which presumably causes their estimation of batting average to be off.

But what you haven't shown, and now it's becoming apparent you haven't even thought about it - is whether using this model actually improves the on the prediction of batting average over a simple weighted mean. If I take Marcel and add in park factors, and you take Pecota and revise the BABIP's with your new model - who comes out better at predicting batting average?

Reply to TheRealNeal

swartzm

3/25

From yesterday's article:

"They also were closer to actual BABIP than PECOTA was 57 percent of the time, and closer to actual BABIP than CHONE was 60 percent of the time. That might not seem like much, but those fractions are significant at the 95-percent and 99-percent level, respectively. In other words, there is less than a five-percent chance that there would have been that large of a difference between my BABIP model and PECOTA if they were equally as good, and less than a one-percent chance that the model would have beaten CHONE as badly as it did just by chance."

Implicit in this is that if you add in the same number of HR and SO that PECOTA predicts, but change the number of hits and outs on balls in play to E-BABIP, you will have a superior prediction of AVG.

If you're going to make flat out accusations and hurl insults, you should at least do a brief scan of the articles to make sure I didn't address this. I also talked about it in last year's "BABIP Superstars" article and last year's "You Can Beat PECOTA Without a Computer Model" articles.

Reply to swartzm

TheRealNeal

3/25

What insult did I hurl? You're the one here who shows a wilful lack of reading comprehension and repeats answers to questions no one is asking.

Who was the person who said "How can you honestly read these two articles and actually type your Step 1?"? That was immmature and rude, particualy when I have had to repeat myself two times and couldn't get you to answer the question.

What I am saying is that looking at hits/(flyball in the ballpark) is nonsensical. Why not just look at hits/flyball? Why are you focusing only on BABIP?

Me: "disregard BABIP"
You: "Look here's my quote, and it says BABIP thee times and I compare it to two systems which predict BABIP".
Me: "Yeah, but why are you using BABIP?"
You: "I regressed BABIP, it's better than CHONE!"
Me: "Yeah, but why are you using BABIP?"
You: "Quit insulting me"

Seriously, have a colleague explain it to you. I can only ask a person a question three times before I lose my patience. I am sure you're very good at mathematics - it's ability to take a step back and apply reasoning that you're lacking.

Reply to TheRealNeal

swartzm

3/25

As I explained in the first article, the reason to regress BABIP separately is that K% and HR% are very reliable, and so you want to pin down the effects of the unreliable portion of batting average. The reason you won't do as well doing (outfield hits)/(outfield fly ball) as you will doing (outfield non-HR hits)/(outfield non-HR fly ball) is that hitters with difference home run skills are going to have vastly difference skills with respect to the former ratio but similar skills with respect to the latter ratio. Applying the same regression to all hitters on the first ratio will cause a huge problem.

The point of discussing hitter BABIP was mentioned in the BABIP Roundtable we did a couple months ago, and it's well worth reviewing if you're still skeptical. The key is that you want to regress the statistics that represent more luck and less skill more than you want to regress the statistics that represent less luck and more skill. Home run rates have less luck than line drive BABIP. But infield hit rates have less luck than outfield fly ball BABIP. By breaking each skill down, you take the deviations from average that represent skill and regress those to the mean less than you regress the deviations from average that represent luck.

swartzm

3/25

Also, your insult that I was referring to, which I'm repeating here is the implication that I have not thought about the point of this exercize which you called "academic." You again repeated that I was not applying logic. I think it should be abundantly clear that the whole strength of this exercize was in applying the logic. If you're not seeing it, that's a shame.

Oleoay

3/25

*pops up another bag of popcorn as he enjoys the flick*

As cantankerous as this thread of comments got, I like seeing them because I get to learn more about the underlying logic, concepts and assumptions/assertions about BABIP that someone like me who is not well versed in sabremetrics is unfamiliar with.

In either event, I like Matt's breakdown and thanks to Neal for stirring up a drill-down.

BurrRutledge

3/25

"Why are you focusing on BABIP?" ... because predictive models have previously made assumptions that it will regress to the mean, and they have not done a good job of estimating what it really should be. It's one of the reasons why PECOTA has underestimated Ichiro for years.

By enabling a more accurate prediction of BABIP, one has the ability to improve the modeling systems. This second step has not yet been accomplished.

Ahead in the Count: Predicting BABIP, Part 2

Thank you for reading

Latest Articles

Expert FAAB Review ’25: Week 26 $

Box Score Banter: The Hard Sell B

Ben Rice Became the Juan Soto Replacement the Yankees Needed $

Offense in 2025: Dead Ball, Livewire Bats $

Understanding Swing Processes Through Bat and Pitch Tracking B

Matt Swartz

Latest Articles

Expert FAAB Review ’25: Week 26 $

Box Score Banter: The Hard Sell B

Ben Rice Became the Juan Soto Replacement the Yankees Needed $