BP Comment Quick Links


February 9, 2010 Introducing SIERAPart 2Part 1 of this series marked the introduction of SkillInteractive Earned Run Average, or SIERA, an ERA estimator that more accurately gauges the runprevention skills of a pitcher relative to his controllable skills. Part 1 focused on the introductory aspects, similarly to going over a syllabus on the first day of class, but today we'll recap the steps that led to SIERA’s creation. One of the major reasons for SIERA’s existence is that prior estimators broke plenty of ground. In this respect, SIERA represents another evolutionary step in the process of removing the effects of defense on pitcher statistics that came into play when Henry Chadwick conjured up the earned run average metric over a century ago. Chadwick’s metric proved popular at the time and remains one of the most frequently cited tools for determining the quality of pitchers. Back at the turn of the 21st century, however, Voros McCracken shocked the nation with seminal research on the roles of defense and luck in ERA, finding that hurlers exhibited little persistence in their BABIP (batting average on balls in play), and concluding that more went into Chadwick’s toy than what the pitcher could control. This led to the invention of FIP, or Fielding Independent Pitching, which estimates ERA from the three statistics McCracken found to be persistent—walks, strikeouts and home runs. FIP essentially marked the beginning of approximating ERA through defensive independence, and can be calculated as: FIP = 3.20 + (3*BB – 2*K + 13*HR)/IP, where the 3.20 is a constant contingent upon the league and year, used to place the estimator on the ERA scale. It is very true that FIP will provide a better estimate of a pitcher’s skill level than his ERA, because the latter is open to bloop hits or nabbed line drives. Bloops and other unfortunate events can cause BABIP to fluctuate while those hits or lack thereof can aggregate to create a rift between measured success and actual talent. The problem here deals with the lack of persistence in BABIP as well as in the rates of home runs per fly ball, as intraclass correlations over the span of 200309 show that HR/FB, no matter how one chooses to calculate it (out of outfield flies or total flies), does not produce an r greater than 0.15—and home runs per outfield fly ball net of team home runs per outfield fly ball (to control for park effects) only leaves an ICC of 0.084. FIP attempts to correct for BABIP luck but fails to correct for the luck inherent in HR/FB, perpetually over or underrating certain types of pitchers in the process. The natural way to correct for some of this home run luck is to adjust FIP through the use of expected, not actual, dingers. The expected tally is calculated by multiplying the league average rate of home runs per outfield flies, as opposed to also lumping in popups, by the total number of outfield flies. These corrections comprise xFIP, created by The Hardball Times and currently housed at Fangraphs. If the league average HR/FB is 18 percent and a pitcher allows 85 outfield fly balls, his expected home runs tally would equal 15.3. If he actually allowed 23 home runs, then his xFIP would be lower than his unadjusted FIP, as the poor luck with home runs would be expected to even out in the next year. Nate Silver introduced QERA to Baseball Prospectus in 2006 using a similar approach, while acknowledging that run scoring is nonlinear because more base runners leads to more runs allowed. QERA used a quadratic form that incorporated walk, strikeout, and ground ball rates, keeping constant the usage of walks and strikeouts but more accurately modeling home runs surrendered through the ground ball rate. Silver also made another improvement by looking at walk and strikeout rates per plate appearance, instead of per nine innings. The reason is quite intuitive as a lower BABIP will lead to higher innings pitched totals and lower K/9, BB/9, and HR/9 rates even though it is not something that DIPS credits as in the pitcher’s control. Unfortunately, the adjustment methodology was still flawed, as QERA used the percentage of ground balls per ball in play, instead of per plate appearance. This has been criticized due to the idea of using common denominators in a formula. Walks and strikeouts were per plate appearance, so why weren’t grounders treated the same way? The criticism is certainly valid, since pitchers who allow fewer balls in play will gain less by having a higher percentage of grounders, while those who allow more will see a commensurate gain. Consider a pitcher who strikes out or walks half of the hitters he faces. Why should his ground ball rate per ball in play be as significant as another pitcher who neither strikes out nor walks anybody at all? SIERA corrects this issue by using a variable suggested at the Inside the Book blog: (GB(FB+PU))/PA. This variable corrects for the common denominator problem and simultaneously treats line drives neutrally. The latter fix is critical given the lack of persistence of liners – the individual rate, isolated from team, produces a .007 ICC—and looks at the extent to which grounders exceed or fall short of the sum of outfield flies and popups. Reverting to QERA for a minute, another advantage it has over competitors is that it implicitly considers nonlinear returns to each term (K%, BB%, GB%), and interactions between those terms. A pitcher’s walk rate impacts his QERA at an increasing rate as they begin to advance batters who have already walked, and pitchers who walk a great deal of hitters may benefit more from grounders than their strikezone stingy compadres. The formula for QERA is: (a + b*GB% + c*BB% + d*SO%)^2 ... where a = 2.69, b = 0.66, c = 3.88, and d = 3.4. Unfoiled, this means that the following is also true: QERA = a^2 + b^2*GB%^2 + a*b*GB% + c^2*BB%^2 + a*c*BB% + d^2*SO%^2 + a*d*SO% + b*c*GB%*BB% + b*d*GB%*SO% + c*d*BB%*SO% QERA considers that the effect of BB% on ERA may be nonlinear, and that if, for example, c^2 is large, walks may increase ERA at an increasing rate; jumping from 48 percent may not hurt ERA as much as a jump from 812 percent. QERA also allows for ground balls to be more beneficial for pitchers who walk a greater percentage of hitters, as the term b*c*GB%*BB% is negative; increasing your ground ball percent from 40 to 45 may do more for a pitcher who has a high walk rate than for one who walks fewer hitters. Unfortunately, this functional form is very limiting. The three components are not all quadratic, as while the rate of whiffs and grounders is, the rate of walks is not. Since a squared term has to be positive, then b^2, c^2, and d^2 are all positive, but our results show that the coefficient in place of b^2 should be negative since more ground balls can drive down ERA at an increasing rate. Another limit of this functional form is that the interaction terms (e.g. b*c for the product of ground ball and walk rates) are limited by what numbers for a, b, c, and d are the most realistic for the earlier terms in the equation. The term in place of c*d should probably be zero, as pitchers who walk a great deal of hitters do not necessarily benefit any more from strikeouts than pitchers who walk next to nobody. If c or d were zero, QERA would predict that strikeout or walk rates have nothing to do with run prevention, a clearly false result. SIERA’s regression treats each of these terms individually, replacing four parameters to estimate ERA with 10 to begin the analysis, creating the following formula: SIERA = a + b*GB%^2 + c*GB% + d*BB%^2 + e*BB% + f*SO%^2 + g*SO% + h*GB%*BB% + i*GB%*SO% + j*BB%*SO% Then, insignificant terms are removed; in this case, the level of significance is derived from the pvalue reported in the regression as well as clinical assumptions. Part 3 will investigate more closely what went into the formula as well as the process of deriving the end result, while the rest of the week will test SIERA against other estimators and highlight specific pitchers for whom this estimator more accurately gauges skillbased contributions.
Matt Swartz is an author of Baseball Prospectus. 29 comments have been left for this article. (Click to hide comments) BP Comment Quick Links robustyoungsoul (42732) Really, really exciting stuff guys. Looking forward to part 3. Feb 09, 2010 10:33 AM TGisriel (2498) I'm glad you disclose the numbers and process. I'm glad you explain what you're doing and why it should be an improvement on earlier metrics. I am reminded, however, of the title of the edited collection of Bill James articles which went something along the lines of this time let's throw out the bones. Feb 09, 2010 12:50 PM prospero14 (2206) Huh. Once you multiply out the equation for QERA, it seems completely natural to ask: which quadratic function of GB%, BB%, and SO% best predicts future performance? Feb 09, 2010 13:08 PM John Carter (22689) Not to take anything away from Voros McCracken who's work with BABIP and various stats was no doubt very important, but Bill James was the first person I read discussing the importance of team defense (let alone park effects, etc.) on ERA. That was in the 80s. As I recall he coined DER and discussed how nonstrikeout pitchers rely more on having good defenses behind them. Feb 09, 2010 13:42 PM Eric M. Van (31218) Here's food for thought for version 2.0. Feb 09, 2010 15:51 PM Fresh Hops (41607) Are you sure that this positive correlation between BB% and "HR rate" isn't an artifact of some sort? I'm worried that once you control for other factors like GB%, it goes away. What do you mean HR rate? HR/FB? HR/9? (Sinker ballers often don't have the same control skills as other pitchers; this may be because it's harder to control sinking pitches or perhaps its just that a high GB% means you can have greater success with weaker control, so they just get away with it. I don't know. Such a fact might explain the correlation between BB% and HR rate.) Feb 09, 2010 16:36 PM Eric M. Van (31218) Not an artifact. Same pitchers, look at yeartoyear changes in BB rate, and HR / Contact follows very mildly but with immense statistical significance. See below for the details. Feb 10, 2010 16:10 PM Dr. Dave (1652) Eric, you raise a very important distinction when you mention skill versus approach. We really don't have any idea at all what the effect on ERA would be for a particular individual pitcher to nibble more (or less), given his skill set. Feb 09, 2010 19:41 PM Juris (1283) Your last para is especially valuable, because it reminds us that these metrics can't cover every single contingency, and that while there is value in adding complexity to the measure there's also value in keeping it from being "overfitted" to every single contingency. In the end, the "residuals" such as those you mention will be instructive, but shouldn't necessarily lead to making the indicator itself even more complex. Feb 09, 2010 21:00 PM Eric M. Van I'm not sure that there is a correlation between BB% and HR%. I'm finding only 0.03 in my data set. There seems to be a correlated between doubles and walks, and between doubles and home runs, but not between walks and home runs. Feb 10, 2010 08:54 AM Eric M. Van (31218) Change in HR/Contact, adjusted for age and for any change in role, correlates to change in (BBIBB)/(PAIBB), r = .130, p = .000015 (n = the 1107 pitchers who faced 200+ BFP in consecutive seasons for the same team playing in the same park, 20022009). Feb 10, 2010 16:05 PM Fresh Hops (41607) "It is very true that FIP will provide a better estimate of a pitcher’s skill level than his ERA, because the latter is open to bloop hits or nabbed line drives." Feb 09, 2010 16:29 PM John Carter (22689) I am noticing that PECOTA's projected ERA, including PERA, and EqERA have much higher estimates than CHONE's and Ken Warren's projected ERAs on pitchers with very high GB/FB. Is that something that needs to be adjusted/updated? Will PECOTA start gearing their ERA projections more towards SIERA based projections next year? Feb 09, 2010 20:53 PM John Carter (22689) (I don't mean to insinuate that PECOTA is less right than CHONE, etc.  just wondering if there is a GB/FB component to its player matching and if it has been checked as to whether that would improve it or not  and if it would how strongly that characteristic should be used. I realize GBs haven't even been measured for very long, so I guess there would have to be some translation and migration from HR rate data to GB/FB data.) Feb 10, 2010 08:23 AM I think PECOTA does use information on groundout/flyout ratio, but I'm not totally sure. I think SIERA will be more involved next year in the PECOTA process but I'm not sure really where it fits in. I do think that batted ball statistics that have only been collected properly since 2003 would help all projection systems immensely, but that it might be tough to incorporate some of that accumulated knowledge into a model like PECOTA without throwing out 50 years of other data is uses effectively too. Feb 10, 2010 08:47 AM myshkin (3684) I like the ideas so far, but I have something of a nitpick, but I still regret not making any fuss when MGL and others started using "regress" as a transitive verb... Feb 10, 2010 01:21 AM goldenyeti (6983) From http://www.thefreedictionary.com/regress Feb 10, 2010 08:28 AM Yeah, maybe we shouldn't have used the word unfoil. It was supposed to be somewhat of a play on firstinnerouterlast, but really we should have stuck with unravel. I think it'll be clearer in today (Wednesday's) article, though we probably used the word again IIRC. Feb 10, 2010 08:44 AM John Carter (22689) I am getting a little disenchanted with GB/FB data. It seems these numbers vary even more wildly from season to season by the same pitcher than HR/9. What's going on? Feb 10, 2010 08:28 AM Hmm...there must be something wrong with your data source. GB/FB, GB/Batted Ball, FB/Batted Ball these all have correlations of something like .70.80 yeartoyear. I'm pretty sure they are more persistent than even strikeout rates for pitchers. The thing is that you cannot use data before 2003. It's possible that you are looking at Groundout/Flyout data, but even that should be reliable. HR/9 on the other hand, has a yeartoyear correlation of something like .2 and that breakdowns when you net out team effects and do HR/outfield flyball. Feb 10, 2010 08:40 AM John Carter (22689) Thanks, Matt. I am looking at the GB/FB column data on FanGraphs. Perhaps, I am being fooled by seasons with a tiny sample size as each level (minors, majors) has a separate line, though, I don't think so. I am making my own projections from career data counting the most recent seasons the heaviest, so I am checking each line's inning total to asess the significance of its corresponding GB/FB data. Hence, I can see how some pitcher's GB/FB jumps from 1.50 to .75 and back. In GB% that would be 60% and 43%, if GB% = GB / (GB+FB), so I see that the percentage change in GB/FB is far greater than it is in GB%, despite that they are both measures of ground ball tendency. If you say that both yeartoyear correlations are in the same .7  .8 range, I guess that difference gets ironed out in the calculation. However, from all my work, those jumps in FG's GB/FB look a heck of a lot heftier than the year to year changes in K/9 or K/BB, but I am not looking at those other stats as closely, because projections are already provided. Feb 10, 2010 12:56 PM Not a subscriber? Sign up today!

Fascinating stuff, gents. Any chance I can cajole you, or other readers, into giving me a couple values from 2009 season pitchers so I can confirm my spreadsheet is right? Or, could anyone confirm these random values? Aardsma 3.21? Accardo 5.21? Mike Adams 2.12? Proofreading this formula is making my head hurt. I rarely wade too deeply into these more challenging articles, but this one has me glued to my monitor.
Thanks. I think you are typing the formula in wrong though. I'm getting Aardsma 3.41, Accardo, 5.20, and Adams 4.17.
He may not be making the squared GB term negative if (GB(FB+PU))/PA is negative.