BP Comment Quick Links



August 5, 2009 Using ToolsAlphabet Soup
Imagine if the entire baseball blogosphere started using the original Runs Created formulathe one Bill James developed circa Off The Wallas our primary way of valuing a player's offensive contribution. Forget run environments, linear weights, league adjustments, and all of the other things we've learned over the past thirty years; instead, for the sake of efficiency, we went back to (H + BB) * (TB) / (PA). Maybe it's not perfect, but hell, it's easy, and it's not like Willie Bloomquist is going to come out better than Adam Dunn. Sounds ridiculous, right? But that's more or less what's happening right now with defense independent pitching stats. Quick, who's been better this year (numbers as of Tuesday afternoon): Pitcher K/9 BB/9 HR/9 GB% ERA FIP SNWP Jeff Niemann 5.7 3.1 0.8 40.7% 3.62 4.24 .567 Matt Garza 8.0 3.5 1.1 41.7% 3.69 4.23 .569 It looks pretty close. They play in the same ballpark, obviously, so we'll assume a similar run environment. Niemann and Garza have virtually identical ERAs, FIPs, and SupportNeutral Winning Percentages. Garza has a better K/BB ratio, but Niemann has allowed fewer home runs. If you follow these things, you know where I'm going with this: HR/9or HR/PA, depending on how you want to look at itwas one of Voros McCracken's original Three True Outcomes way back in 1999, and it's been treated as such ever since. But that didn't make sense then, and it doesn't make sense nowa pitcher's home runrate isn't nearly as stable from year to year as his strikeout and walk rates, a fact that Voros himself noted in his early articles. Logically enough, when it's used as a major component of a defenseindependent pitching stat, it makes that metric less stable as well. This isn't new ground. There's been plenty of research done on the subject, and the explanation is pretty clear: a pitcher's HR/FB rate correlates about as well as his BABIP from year to year, especially after you adjust for park effects. Over the course of several years, there will be statistically significant differences between pitchers. But that simply doesn't manifest itself every single year, and if you're trying to evaluate a pitcher on a seasonbyseason basis, or in the middle of a season, it's probably better to just leave it out. Of course, we already have a couple of stats that do just that. If we use Nate Silver's QuikERA, which uses K%, BB%, and GB%, Garza comes out well ahead, with a 4.22 QERA to Niemann's 4.95 QERA. Another metric, xFIP, is very similar, in that it normalizes HR/FB, and has Garza at 4.39 compared to Niemann's 5.03. These numbers tend to be very predictive; the yeartoyear rsquared coefficients for QuikERA and xFIP center around 0.45, whereas FIP (which uses HR, K, BB, and HBP) comes in at 0.25, and ERA and RA are around 0.10. (These numbers are based on single season data 20042008. A different dataset might give you slightly different results, but should always lead to the same conclusion.) Yet, despite the obvious logic of using GB% or a normalized HR/FB number as the third true outcome for singleseason stats, the uptake in the blogosphere has been oddly slow. FIP has largely become the de facto rate stat for pitchers, and while it's useful over the long haul, it leans far too heavily on HR/9 to be used for shorter time spans. Graham MacAree's tRA has also gained some steam, but it's based on battedball data which, for my tastes, is still far too subjective (although that won't be the case forevermore on this below). Going back to Niemann and Garza, just under eight percent of Niemann's fly balls have left the ballpark this year, while Garza is just over eleven percent. So while their groundball percentages are virtually equal, Niemann's HR/9 is significantly lower. FIP sees those home runs as reflections of each pitcher's true talent, and rates the two pitchers equally. In contrast, xFIP and QuikERA see HR/9 as a function of the pitchers' groundball rates, prone to tremendous amounts of random variation, and give Garza a huge edge. If those are my only two choices, I'm taking the latter. That might seem a bit unfulfilling, and I don't totally disagree with that sentiment, given that pitchers do have some control over their BABIP and HR/FB over long periods of time. For example, from 2005 to 2008, about 9.4 percent of the fly balls hit off of ChienMing Wang went for home runs, while Felix Hernandez was closer to 17 percent. The pitchers' respective park factors actually increase that gap, as the old Yankee Stadium turned was about league average, while Safeco Field depressed home runs a bit. While that doesn't absolutely mean that Wang's fly balls are less likely to turn into home runs than Felix's, it is certainly a very, very strong hint. So how do we reconcile this for singleseason (or inseason, for that matter) data? The best way would probably be to use the regression method outlined in the appendix of The Book, both for parkadjusted HR/FB and BABIP. The FIP formula could be adjusted to use these new estimates, or perhaps a new regressiondriven linear weights approach could be built from scratch (a la wOBA, but with "true talent" estimates instead of actual measured components). Either way, this "new" rate could also be used on a gametogame basis to figure out each pitcher's SNWP. I'll leave this to people who didn't get a two on their AP calc exam (I cheated off of someone who thanked me after the test for giving him all of his answers), but I'm pretty certain this is the right approach to take. Looking ahead, this will all hopefully become a moot point in the nottoodistant future, with HITf/x, and perhaps even GAMEf/x, becoming fullblown realities. In that case, we could have linear weights for every hit ball depending on its speed, angle, and end location. This would probably require some heavy regression as well, especially for gametogame comparisons, but we would undoubtedly end up with a much clearer picture of each player's underlying performance. For the time being though, let's stop relying on FIP for singleseason stats. If we can build a regressiondriven hybrid, great. But otherwise, let's stick to QuikERA and xFIP. Statistics courtesy of Baseball Prospectus's Bil Burke and The Hardball Times.
Shawn Hoffman is an author of Baseball Prospectus. 19 comments have been left for this article.

Outstanding article. I didn't know all of that about FIP and SNWP, and I normally shy away from stat heavy articles, but this was easy to read and understand.