The following is an edited transcript of an in-house discussion among the Baseball Prospectus team about BABIP.

Ken Funck:
I’ve seen a few different Expected BABIP calculators based on batted-ball data. Is there a particular one we should use?

Matt Swartz:
I have an expected BABIP calculated that projects future BABIP that I’m definitely going to be working on in some future articles. The usage of batted ball rates and speed only tells a fraction of the story-line-drive rates for hitters are persistent, but not very much so. Knowing a hitter’s BABIP skill is more about power and infield pop-up rate than speed, unless you’re an extreme ground-ball hitter. If that’s the case, it’s more of a factor than the speed itself. If we want to have one that’s supposed to actually represent some sort of luck-neutral BABIP rather than projecting it forward, I would definitely be able to do that.

Colin Wyers:
I really don’t know if I think that BABIP is something to consider for hitters. I think, for instance, PECOTA looks at H/(AB-SO). Clay can correct me if I’m wrong. I like this for several reasons:

  1. You can call this stat BACON if you like, for Batting Average on CONtact. And, well, because I like it.

  2. For hitters, there’s less of a reason to split purely defense-independent versus defense-dependent skills.

The other interesting thing to note is that for pitchers, in the course of a season, they usually pitch in front of roughly the same defensive unit all the time, so there’s a lot more opportunity for one egregiously good or bad defense to sway the numbers. Hitters (at least ones that play enough to matter) will never see the same defense as often as a pitcher will.

That’s just me, though. I understand the appeal of the symmetry of using the same measure for each.

There’s definitely a lot of value in looking at hitter BABIP. The real reason is that, unlike HR/AB, BABIP for hitters has low persistence. It has more than pitcher BABIP has, but there are very specific luck factors that play into it.

For those interested, the math of this can be seen from using the formula for standard deviation of a binomial variable: p(1-p)/n, where p is probability and n is observations (at-bats or balls in play)-the variance in a hitter’s true BABIP skill level from his yearly BABIP result is going to be much higher than the difference between his realized HR/AB result and his true HR/AB skill level because BABIP is closer to .5 than HR/AB (making p*(1-p) higher). Not only that, but variance among hitter HR/AB skill levels is larger than variance among BABIP skill levels, so the end result is that a far larger portion of BABIP variance from hitter to hitter is comprised of the luck factor than or HR/AB.

After knowing what the math tells us, then, the thing to do is really separate out BABIP skill from BABIP luck by looking at hitters on the aggregate and trying to determine his true BABIP skill. This is no small issue, considering three-quarters of all PA end in a ball in play. We know that the hitter did something bad on a K and good on a HR, but there are plenty of bloop doubles and line drive outs in the gap that make looking at BABIP tough without looking at what skills the hitter has.

As far as the baseball aspects of what is going on, and how to separate skill from luck using BABIP, there are a few things to look for. BABIP on line drives is a particular area where luck can really sway a hitter. Those with power do tend to have a better BABIP on line drives, but the persistence is very low beyond that. Predicting future BABIP on line drives is better done by simply looking at power rather than looking at an historical line drive BABIP at all. BABIP on ground balls is made up of infield hits and outfield hits-the former has more persistence, but there is a clear tendency for some hitters to hit their ground balls to the outfield more often. Guys like Derek Jeter, Ichiro, and Joe Mauer are all consistently underestimated by most projected systems. As I understand it, there isn’t a direct acknowledgement of BABIP on ground balls in those systems. They just hit the ball hard and to all fields on the ground. The general reason to look at batter BABIP is that its persistence is so much lower, and actually exploring its components by batted ball types gets you a lot of information as to which mean to regress it to.

The benefit of looking at something like BACON is that you are basically answering the question of how often the hitter hit them where they ain’t, but it doesn’t necessarily separate out what’s persistent from what’s not. The HR/(AB-SO) is going to be a lot more reliable than the (SI+DO+TR)/(AB-SO) component of it. There’s definitely value in hitter BABIP in separating out luck, but it’s even valuable in determining what’s a skill and what’s not.

If we want to handle it binomially (and I agree that’s a very good way to do it) then I don’t think you want HR/AB. If you regress HR/AB and SO/AB (or what have you) separately, then you get really wacky results sometimes. Let me see if I can remember how to do the independent binomial components for a standard batting line of the top of my head:

$XB = (2B+3B)/(HHR)
$T = 3B/(2B+3B)

That should preserve the binomial property for every component and keep everything “independent” of each other for the purposes of regressing. (And yes, $H is functionally equal to BABIP, depending on how you want to do it. If you want to add in things like HBP, SH, SF, or ROE, you can do it, but let’s leave that off to the side for now.)

What you want to do at that point is use p(1-p)/n to figure the random variation and then figure the observed variation, then take the difference to get the “true” variation. (Actually it’s a little more involved than that, since the longer formula to figure the true variation includes true variation as one of its terms, so you have to iterate it a bit.)

I’m uncomfortable with any analysis involving batted-ball data, especially line drives. I don’t know if you saw the study I did on park effects for line drives with press box heights, but it’s a troubling issue to me.

Clay Davenport:
That is more or less the approach I use, other than inverting the order of a couple of things (I break them out as SO, BB, H, HR, TP, DB).

I strongly second Colin’s discomfort [with line-drive data]. The wakeup call for me was when I ran regressions of various park effects against each other. I found the PF for line drives and the PF for fly balls correlated at -.86. Any stat that relies on fly balls but not line drives (or vice versa) is going to run into trouble with teams at the extremes. I know the Cardinals were one-I can’t remember off the top of my head which direction it was-but it made more than a 30 percent difference to the question of “How many fly balls did Joel Pineiro allow?”

I definitely think park-adjusting these things can probably remove some of the bias, but I still think using noisy stats is not useless. No one has park factors that perfectly estimate the effect of something so we’re left making some (pardon the pun) ballpark adjustments regardless of which data we use.

I think the correlation between line-drive park factor and fly-ball park facot being so negative basically means that if you want to answer the question, “How much of the difference in LD and FB in different park factors is due to scorer bias?” then the answer is “almost all of it.” Colin’s THT article about press box heights answers the question “Where is that scorer bias coming from?” with “The angle he has.” But I think the answer to the question of “How badly does this bias hurt us from approximating people’s skill levels?” is basically “not very much.”

Also, if we’re really, really going to post all this discussion online, I’d like to give a shout-out to my lovely wife.

It doesn’t surprise me that Saint Louis is one; I had some talks with people after that article came out and realized that I had measured the height of the wrong press box (I used the TV one, since I got a lot of my data from scouting reports TBS published for their own use in broadcasts, while the Gameday stringer sits in a different press box altogether). They have one of the highest press boxes in the game. Off the top of my head, Pittsburgh is probably another park that’s an outlier, and Houston is probably one the other way.

As for the components, yeah, I don’t know that any one order is better than any other, and frankly it seems like more work than it would ever be worth to check.

I really liked your article on the press box heights. That’s really good stuff and important to remember. Ideally, if someone could do some park-adjustments to this type of stuff, that might at least correct some of this. But keep in mind that subjective scoring can only take you off track so far.

When I’m trying to predict ground-ball BABIP, I use (ROE+IFH) as a term rather than just IFH itself, but it’s valid to use AVG/OBP/SLG even if the scorer biases for ROE vs. IFH are probably questionable and possibly biased. Clay said the park factor on LD vs. FB once got to be at most 20 percent or something like that for maybe St. Louis or somewhere, but it was less than that in other places. I think there’s still a lot of value in looking at it, even if it’s imperfectly estimated, just like looking at HR rates are valuable even if they are better looked at after adjusting for park.

I think the true variation thing is pretty easy to compute. I used the sum of variances = variance of sums formula, and basically determined that batter BABIP skill level among major leaguers has a standard deviation of about .020, while realized BABIP has a standard deviation of about .030. I forget what it was for HR, but what you’re suggesting is basically what I did.

Christina Kahrl:
Going back to something I was working out with Eric a couple of weeks ago, I’d stress a couple of what should be obvious cautions as far as BABIP:

  1. Different types of players can and do wind up with different baselines. Even using something as crude as single-season stolen-base data to describe “speed guys,” you wind up with a group of players whose BABIPs are noticeably higher than the MLB average. Are they “lucky”? Of course not, they just bring a different set of skills to the plate (literally).

  2. Players with 1000 career PAs or more wind up with higher BABIPs than the MLB average; that’s a normal enough survivor effect, but it’s something to keep in mind when people start talking about “luck” when there’s simply a higher average for people who are successful enough in the majors to stick around.

    Or, to pick on a favorite craptastic briefly big-league ballplayer, Sammy Khalifa wasn’t lucky or unlucky relative to a worthwhile major-league average in 1985 when he produced a .278 BABIP; he was Sammy Khalifa, and, shortly thereafter, he was Sammy Khalifa someplace else besides the major leagues.

  3. As ever, I’d argue strongly against the inclusion of pitcher hitting data in any meaningful baseline describing what’s average for a major-league hitter. All you’re doing there is creating a slightly lower average that makes it easier for a few more people to appear average, at which point you get a little further from sorting out who’s lucky or unlucky or good or bad, or just plain.

Russell Carleton:
A few relevant numbers, from data I’ve yet to publish. (If people want it, feel free to steal it… it was for a project that I never quite got off the ground.) I’m a fan of using split-half reliability to determine the persistence of a stat. They are all from the batter’s perspective.

Split-half reliability of:
FB/BIP at 400 BIP: .789
GB/BIP at 400 BIP: .884
LD/BIP at 400 BIP: .565
PU/BIP at 400 BIP: .769

I’m using Retrosheet codes there and assuming that there’s no observer bias. (Please don’t hurt me, Colin.) Colin has shown previously that this is a problem and will need to be corrected.

FB and GB rates are very persistent. LD’s less so. That .50s range is maddening because it means an R-squared around 30 percent.

Singles per FB at 150 FB: .450
2B/3B per FB at 150 FB: .565
HR per FB at 150 FB: .790

Singles per GB at 150 GB: .244
2/3B per GB at 150 GB: .128

Singles per LD at 100 LD: .035
2B/3B per LD at 100 LD: .562

What happens on GB isn’t very consistent. I have also found evidence that infield and outfield grounders should be considered separately though. Line-drive singles seem to be more a function of whether a fielder got in the way.

Interesting. A few questions:

  1. What years does this cover? I wager you’d get different results looking at, say, ’05 to ’09 than ’89-’99 (’05 and on is Gameday data, ’89 to ’99 is Project Scoresheet).

  2. 2) How are you splitting the halves? Midseason? Even/odds?

Of course, again, if you regress based upon this, you should get some odd results-a player’s GB rate puts a constraint on their LD/FB/PU rate, for instance.

There are other biases as well-the same batted ball caught by an infielder versus an outfielder may be scored differently (if an infielder catches it, it would probably be scored PU, where if an outfielder catches it, it could be a FB).

You also have potential caught/not-caught bias, where if a ball is caught, maybe it’s scored a FB, where if the fielder can’t get to it, it may be scored as a liner.

For these numbers, I used Retrosheet from 2003-2008. I combined ’03-’04, ’05-’06, and ’07-’08, and split it in half based on evens/odds.

You could regress that batted-ball types to league mean and then pro-rate the answers out to 100 percent. Since batted-ball type numbers don’t regress overly much, it probably wouldn’t look too weird. My initial framework was to generate a projected batted ball profile. Once I knew that Smith was projected to have 100 GB, I would figure out his skill at getting singles/XBH on those and regress to mean as necessary, then prorate them back to 100 percent.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
I REALLY liked this article; I love it as a new "format" and would love to see more of them, if you have suitable topics and willing participants.
I do really like this format too. Talk about being a fly on the wall... :)

To Russell, wouldn't outfield grounders also, by definition, be ground balls that infielders didn't get to? And thus, be somewhat useful as a team infielder defensive metric?

Oh and Colin's "When is a flyball a line drive?" article is awesome.

Yes, they certainly would be. However, in this particular context, my point was that, for a batter, getting the ball through the infield on a GB, and getting an infield hit once an infielder has gotten to the ball are separate skills, although both would fall under the category "singles on a GB."
Not just the format but also the topic needs some props here. This did some major whetting. Love it if someone continues with a focused exploration that might come to some conclusions about what can be done about these problems.
Let's send them some Starbucks gold cards and leash them to the espresso machine with a laptop and a streaming podcast and see what happens.
I completely agree. This article was a lot of fun to read. Thanks to whoever proposed this concept.
It was a StatSpeak thing back in the day. I got to do them once a week with Colin, Eric & Russell.
What's (SI+DO+TR)?
Never mind, singles, doubles, triples.
I like the format. It might be more useful to keep these instead of articles explaining new, yet unfinished studies of different metrics.
Love it. Can you also tag these so that related conversations can be followed easily over time?
100% agree. I would be super happy if an unlucky intern got tasked with indexing past articles by topic and BP implemented an article tagging system. It would be extremely cool to be able to look up every article on baserunning that has been written at BP without searching and having some reasonably unrelated articles pop up in the search (i.e. a CK Transaction Analysis, a couple articles by Joe including one on Barry Bonds being a free agent).

Also, count me in as one of the people who totally digs this format.
I had a discussion on this with Matt back at Nats Park in September, but I'd been too busy coding Oliver to get into the research. I've got more free time now, and it is high on my list.

After looking at the hit f/x data or April '09 that was released, I visualized that BABIP likely relates to the vertical angle of the ball, or even better the velocity off the bat times the cosine of the vertical angle, giving the horizontal velocity, which is the reaction time for the fielder.

A cursory review of stats suggests that players with more GB & LD as opposed to FB and especially PU have higher BABIP. Those with higher mean vertical angles, who tend towards more FB & PU may have more HR, but a lower BABIP. This was a Derek Jeter vs Andruw Jones article a couple months ago which got me thinking about the subject.

You might try a regression to get coefficients for GB, LD, FB or PU, but I am going to look at comparable players - what is the composite BABIP of the x number of players most similar to a given player in their pct of GB, LD, FB & PU. That can be used an an expected value to regress the historical data towards.
That would probably pair well with observations that uppercut swings or even something as simple as "swinging for the fences" is bad. Perhaps it also lends credence to choking up on the bat, which might force a more level swing or, at least, deaden the bat.
The way you keep a batted ball from being an out is simply. There are really two. One is you put it in the seats.

The other says you should try to keep the launch angle down - I think 15 degrees is about where you want it. Then you want the spray angle as such so that you're hitting the ball between fielders. The second is really the most important - a ground ball up the middle is a hit a lot more often than a ground ball hit straight to the shortstop.
Just wanted to say that I really enjoyed this roundtable and can't wait for me.
In my own analysis of hitf/x data from last year, I pretty much confirm what Brian says about how BABIP depends on the vertical launch angle. Based on analysis of nearly 15k batted balls, I find that if the speed off bat is larger than about 80 mph (a rather modest number), then BABIP peaks in the angular range 10-15 degrees, with BABIP exceeding 85%. If you actually look at the trajectory of a ball hit at 85 mph, 12.5 deg, it lands about 240 ft from home plate with a hang time of 2.6 sec, so falls in front of the outfielders with high probability. Also, it eludes the infielders, since it is too high for them to catch (it is about 18 ft high when about 100 ft from home plate). The maximum height is a bit larger, ~20 ft. Whether you call it a line drive or something else is a matter of semantics. It is a very well-hit ball. If you look at home runs, then the home run probability peaks with a launch angle in the 25-35 deg range.

The message seems to be pretty clear, at least to me. If you want to get on base (as opposed to hit a home run), keep the launch angle low.

Unfortunately, there is not enough data that has been released to look at these numbers in a statistically meaningful way for specific hitters.
Very Cool
More please!
Anyone know what Adam Lind's BABIP and liner rate were for this past season? I've been trying to find it on the stats page, but have had no luck. Also, with either stat be in for hitters in this year's annual?

I ask because the cover of the annual refers to Lind as potentially being Mike Jacobs.