Part 1 of this series marked the introduction of Skill-Interactive Earned Run Average, or SIERA, an ERA estimator that more accurately gauges the run-prevention skills of a pitcher relative to his controllable skills. Part 1 focused on the introductory aspects, similarly to going over a syllabus on the first day of class, but today we'll recap the steps that led to SIERA’s creation. One of the major reasons for SIERA’s existence is that prior estimators broke plenty of ground. In this respect, SIERA represents another evolutionary step in the process of removing the effects of defense on pitcher statistics that came into play when Henry Chadwick conjured up the earned run average metric over a century ago.
Chadwick’s metric proved popular at the time and remains one of the most frequently cited tools for determining the quality of pitchers. Back at the turn of the 21st century, however, Voros McCracken shocked the nation with seminal research on the roles of defense and luck in ERA, finding that hurlers exhibited little persistence in their BABIP (batting average on balls in play), and concluding that more went into Chadwick’s toy than what the pitcher could control. This led to the invention of FIP, or Fielding Independent Pitching, which estimates ERA from the three statistics McCracken found to be persistent—walks, strikeouts and home runs. FIP essentially marked the beginning of approximating ERA through defensive independence, and can be calculated as: FIP = 3.20 + (3*BB – 2*K + 13*HR)/IP, where the 3.20 is a constant contingent upon the league and year, used to place the estimator on the ERA scale.
It is very true that FIP will provide a better estimate of a pitcher’s skill level than his ERA, because the latter is open to bloop hits or nabbed line drives. Bloops and other unfortunate events can cause BABIP to fluctuate while those hits or lack thereof can aggregate to create a rift between measured success and actual talent. The problem here deals with the lack of persistence in BABIP as well as in the rates of home runs per fly ball, as intra-class correlations over the span of 2003-09 show that HR/FB, no matter how one chooses to calculate it (out of outfield flies or total flies), does not produce an r greater than 0.15—and home runs per outfield fly ball net of team home runs per outfield fly ball (to control for park effects) only leaves an ICC of 0.084. FIP attempts to correct for BABIP luck but fails to correct for the luck inherent in HR/FB, perpetually over- or underrating certain types of pitchers in the process.
The natural way to correct for some of this home run luck is to adjust FIP through the use of expected, not actual, dingers. The expected tally is calculated by multiplying the league average rate of home runs per outfield flies, as opposed to also lumping in popups, by the total number of outfield flies. These corrections comprise xFIP, created by The Hardball Times and currently housed at Fangraphs. If the league average HR/FB is 18 percent and a pitcher allows 85 outfield fly balls, his expected home runs tally would equal 15.3. If he actually allowed 23 home runs, then his xFIP would be lower than his unadjusted FIP, as the poor luck with home runs would be expected to even out in the next year.
Nate Silver introduced QERA to Baseball Prospectus in 2006 using a similar approach, while acknowledging that run scoring is non-linear because more base runners leads to more runs allowed. QERA used a quadratic form that incorporated walk, strikeout, and ground ball rates, keeping constant the usage of walks and strikeouts but more accurately modeling home runs surrendered through the ground ball rate. Silver also made another improvement by looking at walk and strikeout rates per plate appearance, instead of per nine innings. The reason is quite intuitive as a lower BABIP will lead to higher innings pitched totals and lower K/9, BB/9, and HR/9 rates even though it is not something that DIPS credits as in the pitcher’s control.
Unfortunately, the adjustment methodology was still flawed, as QERA used the percentage of ground balls per ball in play, instead of per plate appearance. This has been criticized due to the idea of using common denominators in a formula. Walks and strikeouts were per plate appearance, so why weren’t grounders treated the same way? The criticism is certainly valid, since pitchers who allow fewer balls in play will gain less by having a higher percentage of grounders, while those who allow more will see a commensurate gain.
Consider a pitcher who strikes out or walks half of the hitters he faces. Why should his ground ball rate per ball in play be as significant as another pitcher who neither strikes out nor walks anybody at all? SIERA corrects this issue by using a variable suggested at the Inside the Book blog: (GB-(FB+PU))/PA. This variable corrects for the common denominator problem and simultaneously treats line drives neutrally. The latter fix is critical given the lack of persistence of liners – the individual rate, isolated from team, produces a .007 ICC—and looks at the extent to which grounders exceed or fall short of the sum of outfield flies and popups.
Reverting to QERA for a minute, another advantage it has over competitors is that it implicitly considers non-linear returns to each term (K%, BB%, GB%), and interactions between those terms. A pitcher’s walk rate impacts his QERA at an increasing rate as they begin to advance batters who have already walked, and pitchers who walk a great deal of hitters may benefit more from grounders than their strike-zone stingy compadres.
The formula for QERA is:
… where a = 2.69, b = -0.66, c = 3.88, and d = -3.4. Un-foiled, this means that the following is also true:
QERA = a^2 + b^2*GB%^2 + a*b*GB% + c^2*BB%^2 + a*c*BB% + d^2*SO%^2 + a*d*SO% + b*c*GB%*BB% + b*d*GB%*SO% + c*d*BB%*SO%
QERA considers that the effect of BB% on ERA may be non-linear, and that if, for example, c^2 is large, walks may increase ERA at an increasing rate; jumping from 4-8 percent may not hurt ERA as much as a jump from 8-12 percent. QERA also allows for ground balls to be more beneficial for pitchers who walk a greater percentage of hitters, as the term b*c*GB%*BB% is negative; increasing your ground ball percent from 40 to 45 may do more for a pitcher who has a high walk rate than for one who walks fewer hitters.
Unfortunately, this functional form is very limiting. The three components are not all quadratic, as while the rate of whiffs and grounders is, the rate of walks is not. Since a squared term has to be positive, then b^2, c^2, and d^2 are all positive, but our results show that the coefficient in place of b^2 should be negative since more ground balls can drive down ERA at an increasing rate.
Another limit of this functional form is that the interaction terms (e.g. b*c for the product of ground ball and walk rates) are limited by what numbers for a, b, c, and d are the most realistic for the earlier terms in the equation. The term in place of c*d should probably be zero, as pitchers who walk a great deal of hitters do not necessarily benefit any more from strikeouts than pitchers who walk next to nobody. If c or d were zero, QERA would predict that strikeout or walk rates have nothing to do with run prevention, a clearly false result.
Then, insignificant terms are removed; in this case, the level of significance is derived from the p-value reported in the regression as well as clinical assumptions.
Part 3 will investigate more closely what went into the formula as well as the process of deriving the end result, while the rest of the week will test SIERA against other estimators and highlight specific pitchers for whom this estimator more accurately gauges skill-based contributions.