Baseball fans who have no use for advanced metrics can realize the flaws in evaluating pitchers by their won-lost records, but may struggle to understand the inherent flaws in the more commonly used earned run average. Henry Chadwick invented ERA in the 19^{th} century to measure the effect of defense on pitching performance, but not until Voros McCracken explained the concept of Defense Independent Pitching Statistics (DIPS) did our understanding of the relationship between pitching and defense take a big step forward. McCracken explained that pitchers controlled the rates of whiffing, walking, and getting walloped with home runs, showing that the correlation between these statistics in consecutive years was strong. Though he inferred an ability for hurlers to control these numbers, another finding suggested little persistence in their Batting Average on Balls in Play (BABIP), leading to the conclusion that ERAs were dependent on defense (or luck), and therefore very volatile.
Armed with this information, sabermetricians began to develop methods of estimating ERA by controlling for the factors that can muddy the proverbial waters. These estimators enable the evaluation of pitching performance based on what pitchers actually control, rendering more accurate the tracking of their abilities. Watching trends in actual skills that pitchers control can help us better grasp whether shifts in ERA are the result of changes from the individual or from external factors. Since then, many competing estimators have emerged with their accompanying strengths and weaknesses. Perhaps the most popular ERA estimator is Fielding Independent Pitching (FIP), which uses the following straightforward formula: FIP = 3.20 + (3*BB – 2*K + 13*HR)/9, where the 3.20 is a constant dependent on the league and year, used to place the outputted number on the ERA scale.
Researchers have noted that, among the defense-independent statistics, home runs are by far the least predictable. Although home-run rate has shown itself to be more repeatable than BABIP, the lack of persistence makes such a comparison similar to justifying a D grade by mentioning that other classmates failed the test. Further research revealed that the percentage of fly balls that left the yard (HR/FB) sported about as little persistence as BABIP, and second-generation estimators attempted to eliminate HR/FB luck from estimation. One of the more obvious adjustments is to simply approximate the number of home runs that would have been hit if the pitcher had neutral luck in the fly ball department, and re-computing FIP with this estimate. This metric, known as Expected Fielding Independent Pitching (xFIP), uses the regular FIP formula but it replaces HR with xHR, the metric described above. This estimator marked an upgrade over FIP given the accepted notion that HR/FB has much more of a foundation in luck than actual skill, but there was still ample room for improvement.
Nate Silver invented QERA back in 2006 for Baseball Prospectus to adjust for a few issues with FIP and xFIP, and while he referred to the stat as a toy, it represented a big step upward in the methodology of estimators. The formula—QERA = (2.69 – .66*GB% + 3.88*BB% – 3.4*K%)^2—derives one of its main benefits from the fact that it accounts for non-linear run scoring; the more baserunners allowed, the higher the percentage that will score. It also removes the bias that innings pitched totals are subject to batted-ball luck and a pitcher with a higher BABIP will have a lower K/IP even if he strikes out the same percentage of hitters. QERA has another problem of its own, in that GB% is really GB/Ball in Play (or, GB/BIP), while BB% and K% are measured per batters faced (SO/PA and BB/PA).
In other words, for pitchers who strike out and walk large numbers of hitters, changes in ground balls per ball in play affect their QERA as much as they do for pitchers who barely strike out or walk any hitters, even though the latter group’s ground-ball rate actually represents a higher tally. Further, while QERA picks up some of the interaction between walk, strikeout, and ground-ball rates, it does not necessarily weight them correctly.
With that in mind, we have invented a new statistic, Skill-Interactive Earned Run Average (SIERA), which corrects the problems with old estimators while adding a few more realistic assumptions. This was done first by un-foiling all of the individual components in QERA while making an adjustment for the issue with the ground-ball denominator issue, and testing to see which interactions and squared terms were relevant by using multiple linear regression analysis. Essentially, we changed the GB/BIP to (GB-FB-PU)/PA and evaluated all of the terms in the exponential regression, removing those with insignificant p-values; while the QERA formula only shows three variables, un-foiling the formula reveals several more. We identified two terms that were not useful: the squared term of walks, and the interaction between walk and strikeout rate. The squared terms on strikeout and ground-ball rates were both significant, and we also found important interactions between walks and grounders and between whiffs and grounders that have strong effects on run scoring.
As a result, SIERA accomplishes the following:
- Allows for the fact that a high ground-ball rate is more useful to pitchers who walk more batters, due to the potential that double plays wipe away runners.
- Allows for the fact that a low fly-ball rate (and therefore, a low HR rate) is less useful to pitchers who strike out a lot of batters (e.g. Johan Santana's FIP tends to be higher than his ERA because the former treats all HR the same, even though Santana’s skill set portends this bombs allowed will usually be solo shots).
- Allows for the fact that adding strikeouts is more useful when you don't strike out many guys to begin with, since more runners get stranded.
- Allows for the fact that adding ground balls is more useful when you already allow a lot of ground balls because there are frequently runners on first.
- Corrects for the fact that QERA used GB/BIP instead of GB/PA (e.g. Joel Pineiro is all contact, so increasing his ground-ball rate means more ground balls than if Oliver Perez had done it, given he's not a high contact guy).
- Corrects for the fact that FIP and xFIP use IP as a denominator which means that luck on balls in play changes one's FIP.
The new ground-ball statistic used is: (GB-(FB+PU))/PA. Now walks, strikeouts, and grounders use the same denominator, avoiding any type of weighting issues. GB/PA could have been used instead of GB/BIP, but our findings suggested that line drives per ball in play exhibited virtually no persistence, and did not represent a pitcher skill. When his line-drive rate is low, the pitcher is probably just lucky, but ground-ball, fly-ball, and pop-up rates will increase to make up the difference. Since ground-ball rate for the league as a whole is similar to the sum fly-ball and pop-up rates, using the difference between the two eliminates some of the luck that would make this estimator look bigger than its britches. For the same reason, pop-up rate was allowed to negatively affect SIERA because it is a symptom of the pitcher throwing the ball that generates an upward trajectory, which could lead to an increase in home runs. A pitcher’s skills are throwing strikes, making hitters miss, and throwing with angles and spins such that the trajectory of the ball is downward when it hits the bat. A popup almost always represents an out, but it also represents a potential problem for the pitcher in the future.
Simply running a regression analysis to predict park-adjusted ERA and developing a statistic that introduces these improvements to Defense Independent Pitching Statistics would be useless if it did not predict ERA better than other statistics. Not only did SIERA emerge as the leader in ERA estimators, we discovered more importantly that using the same regression analysis on different datasets shows that the coefficients developed continue to predict ERA better than other estimators, proving that our analysis was not biased by retroactively predicting the mark. Specifically, using 2003-08 data to generate a formula and then testing it on 2009 pitchers, SIERA emerged as the best estimator of park-adjusted ERA in the following year and the best at predicting same-year ERA amongst the estimators that treat HR/FB as luck; FIP and tRA consider it to be more skill-laden.
In other words, it is impossible to best FIP in terms of same-year mirroring unless HR/FB is treated as a skill, but tests have shown that HR/FB itself is unstable and not indicative of something within the control of the pitcher. FIP and tRA lead other estimators that do not credit the pitcher for this luck in predicting same year Earned Run Average, but SIERA overtakes both in predicting future performance, which is arguably much more important. After all, the primary goal of ERA estimators is to approximate a skill set that can successfully generate low ERAs while being as accurate as possible in the modeling and assumptions deriving the formula.
In the coming days, we will explain in more detail the derivation of SIERA, provide some tests to check its performance, and offer examples of pitchers for whom the metric performs vastly better than other estimators. The last part is very important, as a small change in ERA estimation is not necessarily a big deal unless there are pitchers who are perpetually underrated or overrated by similar statistics. This is certainly true in the case of SIERA and FIP for a player like Santana, whose solo home run tendencies are inaccurately punished by FIP in a way that underestimates his skill by a significant amount. The introduction of a metric that properly accounts for all that was mentioned above helps to evaluate pitchers in a more precise and useful way than ever before.
For now, we leave you with the formula for the statistic that will be kept here moving forward and will soon be found on the revamped reports:
SIERA = 6.145 – 16.986*(SO/PA) + 11.434*(BB/PA) – 1.858*((GB-FB-PU)/PA) + 7.653*((SO/PA)^2) +/– 6.664*(((GB-FB-PU)/PA)^2) + 10.130*(SO/PA)*((GB-FB-PU)/PA) – 5.195*(BB/PA)*((GB-FB-PU)/PA)