World Series time! Enjoy Premium-level access to most features through the end of the Series!
September 10, 2012
Resident Fantasy Genius
Using ISO to Legitimize High/Low BABIPs?
Last week, industry colleague Michael Salfino penned an interesting article for Yahoo! Sports discussing BABIP and how we might be able to tell legitimately good BABIPs apart from lucky ones (and legitimately bad BABIPs apart from unlucky ones). A couple of you pointed the article out to me and asked for my take on it, so I thought it best to simply write up a post in case others were interested. The article opens with:
I want to drive home one more time that big breakthrough that I think we've had here this year, which is using isolated slugging allowed (slugging average minus batting average) as a projection tool.
Because it's never too early throw an aside into an article or make a point that's only tangential to the topic at hand, I want to take a second to step on my soap box and talk about batting average. People love to talk about batting average as a flawed or overrated stat, but I want to point out that the stat itself isn’t flawed any more than an “advanced” stat like BABIP or HR/FB or FIP is. It does what it’s designed to do and nothing more. It’s only overrated insofar as the prominence the mainstream media and old-school thinkers tend to place on it (which Salfino understands and hints at—this is more of a general rant than one pointed at anyone in particular, to be clear). Take batting average for what it is, and it’s no more flawed than strikeout rate.
The main problem pundits point out with batting average is that all hits are weighted the same. Salfino mentions this as the reason why batting average is overrated and as the reason why BABIP may be flawed:
With BABIP we're sort of stuck with judging all hits as hits, period, because we're already subtracting homers (which are out of play). We at least need to check BABIP against isolated slugging.
When talking about a hitter’s contribution to his team, of course it’s incorrect to assume that a home run and a single are worth the same amount—a fact many fans and analysts fail to recognize. Used like this, batting average is a descriptive stat. But when we’re talking about a pitcher’s BABIP, it’s usually as a predictive stat. These are two very different things, and the distinction shapes how we use a particular stat. After all, if we don’t know what we want a stat to accomplish, how can we possible hope to use it correctly?
The way we generally use BABIP is to look at pitchers with extreme values and say that, in the future, the pitcher’s BABIP will be league average, dragging the pitcher’s ERA along for the ride. It’s a shortcut for regressing to the mean. Essentially, because there is so much noise in BABIP, we say it’s safe enough to assume 100 percent regression to the mean when dealing with single-year samples. And because doing a full regression would give us something close to this anyway, we’re generally okay with using this shortcut.
My feeling now is that if a pitcher has a good ERA and a good BABIP allowed (which has been viewed as lucky by default) PLUS a good ISO allowed, then he's probably not lucky. At least, he's not nearly as lucky as we think. On the other hand, if a high ERA pitcher has a bad/high BABIP allowed but also a high ISO allowed (about .133 is the league average), then he's probably not unlucky – meaning we wouldn't expect the ERA to be significantly lower the next year.
On the surface, this seems to make sense. BABIP counts up all the hits but ignores the quality, and intuitively, the quality of the hit should matter. The problem is that in a single-year sample for pitchers, it really doesn’t. I’ve shown before that singles, doubles, and triples take a long-time to stabilize, making it reasonable enough to regress them all 100 percent to the mean when using shortcut methods like BABIP. In other words, if you’re using BABIP in the first place, hit quality really doesn’t matter. So while "we're sort of stuck with judging all hits as hits," that's actually okay.
But then what to make of this: “Remember, ISO allowed seems to be pretty consistent year-to-year.” Stats that are consistent from year-to-year—stats like strikeout rate, walk rate, and groundball rate—are usually important to pay attention to, so what about ISO? The main reason for Salfino’s finding is that home runs—the most stable of all four hit types by a wide margin—make up a big part of ISO (while being ignored by BABIP). Singles—highly unstable—are excluded from ISO (but are huge in BABIP). This gives us the illusion of ISO being much more stable than BABIP. The problem is that, while home runs are relatively stable, once we account for the fact that they are fly balls (which are even more stable), they become incredible unpredictable—more so than BABIP, even!
Indeed, if we look at all pitcher seasons from 2005 to 2011 with at least 200 IP, there is a strong .69 correlation between ISO and HR/FB. In other words, a huge chunk of ISO is explained by a stat this is even harder to predict than BABIP!
Take a look at the pitchers deemed “Not Lucky,” this time with HR/FB attached (keeping in mind that league average HR/FB is 13.6 percent):
As you can see, while the low ISOs of these players may make it seem like they deserve their low BABIPs because they’re not allowing hard contact, they’ve really just been lucky in terms of not allowing home runs (keeping their ISOs so low). As a group, their HR/FB is a paltry 9.9 percent.
Let’s look at the guys who were deemed “Not Unlucky”:
As expected, we see a similar pattern here. Of these high BABIP/high ISO pitchers, just two managed to post better-than-average HR/FB rates. The rest were all unlucky to some degree, with a cumulative 15.4 percent HR/FB. It’s not that these guys are giving up bad contact; they’re just getting unlucky with their home runs.
So to answer the question originally posed by the readers that emailed me, unfortunately, I wouldn’t pay much attention to ISO as a tool for legitimizing high or low BABIPs. It seemed like a nice thought, but it just doesn’t hold up when we examine it a little closer.