Imagine if the entire baseball blogosphere started using the original Runs Created formula-the one Bill James developed circa Off The Wall-as our primary way of valuing a player’s offensive contribution. Forget run environments, linear weights, league adjustments, and all of the other things we’ve learned over the past thirty years; instead, for the sake of efficiency, we went back to (H + BB) * (TB) / (PA). Maybe it’s not perfect, but hell, it’s easy, and it’s not like Willie Bloomquist is going to come out better than Adam Dunn.

Sounds ridiculous, right? But that’s more or less what’s happening right now with defense independent pitching stats. Quick, who’s been better this year (numbers as of Tuesday afternoon):

Pitcher          K/9   BB/9   HR/9   GB%    ERA    FIP   SNWP
Jeff Niemann     5.7    3.1    0.8  40.7%  3.62   4.24   .567
Matt Garza       8.0    3.5    1.1  41.7%  3.69   4.23   .569

It looks pretty close. They play in the same ballpark, obviously, so we’ll assume a similar run environment. Niemann and Garza have virtually identical ERAs, FIPs, and Support-Neutral Winning Percentages. Garza has a better K/BB ratio, but Niemann has allowed fewer home runs.

If you follow these things, you know where I’m going with this: HR/9-or HR/PA, depending on how you want to look at it-was one of Voros McCracken’s original Three True Outcomes way back in 1999, and it’s been treated as such ever since. But that didn’t make sense then, and it doesn’t make sense now-a pitcher’s home run-rate isn’t nearly as stable from year to year as his strikeout and walk rates, a fact that Voros himself noted in his early articles. Logically enough, when it’s used as a major component of a defense-independent pitching stat, it makes that metric less stable as well.

This isn’t new ground. There’s been plenty of research done on the subject, and the explanation is pretty clear: a pitcher’s HR/FB rate correlates about as well as his BABIP from year to year, especially after you adjust for park effects. Over the course of several years, there will be statistically significant differences between pitchers. But that simply doesn’t manifest itself every single year, and if you’re trying to evaluate a pitcher on a season-by-season basis, or in the middle of a season, it’s probably better to just leave it out.

Of course, we already have a couple of stats that do just that. If we use Nate Silver‘s QuikERA, which uses K%, BB%, and GB%, Garza comes out well ahead, with a 4.22 QERA to Niemann’s 4.95 QERA. Another metric, xFIP, is very similar, in that it normalizes HR/FB, and has Garza at 4.39 compared to Niemann’s 5.03. These numbers tend to be very predictive; the year-to-year r-squared coefficients for QuikERA and xFIP center around 0.45, whereas FIP (which uses HR, K, BB, and HBP) comes in at 0.25, and ERA and RA are around 0.10. (These numbers are based on single season data 2004-2008. A different dataset might give you slightly different results, but should always lead to the same conclusion.)

Yet, despite the obvious logic of using GB% or a normalized HR/FB number as the third true outcome for single-season stats, the uptake in the blogosphere has been oddly slow. FIP has largely become the de facto rate stat for pitchers, and while it’s useful over the long haul, it leans far too heavily on HR/9 to be used for shorter time spans. Graham MacAree’s tRA has also gained some steam, but it’s based on batted-ball data which, for my tastes, is still far too subjective (although that won’t be the case forever-more on this below).

Going back to Niemann and Garza, just under eight percent of Niemann’s fly balls have left the ballpark this year, while Garza is just over eleven percent. So while their ground-ball percentages are virtually equal, Niemann’s HR/9 is significantly lower. FIP sees those home runs as reflections of each pitcher’s true talent, and rates the two pitchers equally. In contrast, xFIP and QuikERA see HR/9 as a function of the pitchers’ ground-ball rates, prone to tremendous amounts of random variation, and give Garza a huge edge. If those are my only two choices, I’m taking the latter.

That might seem a bit unfulfilling, and I don’t totally disagree with that sentiment, given that pitchers do have some control over their BABIP and HR/FB over long periods of time. For example, from 2005 to 2008, about 9.4 percent of the fly balls hit off of Chien-Ming Wang went for home runs, while Felix Hernandez was closer to 17 percent. The pitchers’ respective park factors actually increase that gap, as the old Yankee Stadium turned was about league average, while Safeco Field depressed home runs a bit. While that doesn’t absolutely mean that Wang’s fly balls are less likely to turn into home runs than Felix’s, it is certainly a very, very strong hint.

So how do we reconcile this for single-season (or in-season, for that matter) data? The best way would probably be to use the regression method outlined in the appendix of The Book, both for park-adjusted HR/FB and BABIP. The FIP formula could be adjusted to use these new estimates, or perhaps a new regression-driven linear weights approach could be built from scratch (a la wOBA, but with “true talent” estimates instead of actual measured components). Either way, this “new” rate could also be used on a game-to-game basis to figure out each pitcher’s SNWP. I’ll leave this to people who didn’t get a two on their AP calc exam (I cheated off of someone who thanked me after the test for giving him all of his answers), but I’m pretty certain this is the right approach to take.

Looking ahead, this will all hopefully become a moot point in the not-too-distant future, with HITf/x, and perhaps even GAMEf/x, becoming full-blown realities. In that case, we could have linear weights for every hit ball depending on its speed, angle, and end location. This would probably require some heavy regression as well, especially for game-to-game comparisons, but we would undoubtedly end up with a much clearer picture of each player’s underlying performance.

For the time being though, let’s stop relying on FIP for single-season stats. If we can build a regression-driven hybrid, great. But otherwise, let’s stick to QuikERA and xFIP.

Statistics courtesy of Baseball Prospectus’s Bil Burke and The Hardball Times