When Eric Seidman and I introduced SIERA last winter, we ran a number of tests to determine if our theoretical foundation of run prevention led to a superior estimation of pitchers’ skill levels. While SIERA had a solid advantage at predicting future ERA over some ERA estimators and a last decimalpoint small lead over xFIP, we ran the tests again after 2010 to ensure that it held a lead going forward. Although the regression formula did not incorporate future ERAs and should not have been biased, it's still important to test the following year to see how well SIERA held up.
The short story is this: SIERA had a good year in 2010, as it extended its lead over other estimators. Below is the root mean square error (RMSE) of the difference between various estimators and parkadjusted ERA (using the Lahman Database’s threeyear pitcher park factors) for all pitchers with two consecutive years of at least 40 innings pitched. (To avoid the mixup of last year, I obtained xFIP, FIP, and tERA directly from FanGraphs’ website.)
Table 1. Estimators' RMSE of difference between ParkAdjusted ERA (IP>=40 both years), Unweighted, 20092010
Estimator 
RMSE 
1.083 

xFIP 
1.139 
1.180 

tERA 
1.220 
ERA (pkadj) 
1.322 
SIERA Tests for 200310
This was the typical order of ERA estimation for the previous six years as well:
Table 2. Estimators' RMSE of difference between ParkAdjusted ERA (IP>=40 both years), Unweighted
Estimator 
200910 
200809 
200708 
200607 
200506 
200405 
200304 
1.083 
1.213 
1.097 
1.191 
1.179 
1.131 
1.067 

xFIP 
1.139 
1.200 
1.136 
1.209 
1.204 
1.138 
1.086 
1.180 
1.264 
1.192 
1.255 
1.294 
1.239 
1.124 

tERA 
1.220 
1.312 
1.235 
1.325 
1.265 
1.321 
1.425 
ERA (pkadj) 
1.322 
1.478 
1.388 
1.440 
1.492 
1.385 
1.403 
SIERA has been ahead in six of seven years, with xFIP in second place in each of those six years, and the two switching places in 2009 ERA estimation (using 2008 estimators). FIP typically finished third, tERA typically finished fourth, and previous year’s parkadjusted ERA finished fifth. The exceptions were 200506 (when tERA outestimated FIP), and 200304 (when tERA finished behind parkadjusted ERA itself).
Putting together the complete 200310 dataset of seven pairs of years, we get the following RMSEs:
Table 3. Difference between ParkAdjusted ERA (IP>=40 both years) for 200310, Unweighted
Estimator 
RMSE 
1.139 

xFIP 
1.160 
1.223 

tERA 
1.301 
ERA (pkadj) 
1.416 
SIERA finished modestly ahead of xFIP overall, by .021 points, and both were ahead of the other three estimators. FIP finished a solid third, though to be fair to it, it isn't a parkadjusted statistic and was never intended to estimate parkadjusted ERA. So, I checked to see how FIP did at predicting next year’s unadjusted ERA; it only did slightly better with a 1.216 RMSE. Of course, that included pitchers who changed teams and obviously had different park effects, but using only pitchers who pitched on the same team in two consecutive years, the RMSE actually fell to 1.160, the same RMSE that xFIP had for parkadjusted ERA estimation.
Pitchers who did not switch teams threw more innings, and were therefore easier to predict overall, so I ran the same test of other estimators on parkadjusted ERA for only pitchers who did not switch teams, and found that the order remained the same:
Table 4. Pitchers with IP>=40 both years with same team, Unweighted
Estimator and Estimated 
RMSE 
1.093 

xFIP with ParkAdjusted ERA 
1.127 
1.160 

tERA with ParkAdjusted ERA 
1.235 
One criticism of this test is that it treats pitchers who had 41 innings the same as pitchers who threw 241 innings, so I reran the test weighted by next year’s innings pitched. The order remained solidly the same, but unsurprisingly, estimation was better overall for pitchers with more innings:
Table 5. Difference between Estimator and Next Year’s ParkAdjusted ERA (IP>=40 both years), Weighted by next year’s IP
Estimator 
RMSE 
1.013 

xFIP 
1.033 
1.094 

tERA 
1.195 
ERA (pkadj) 
1.272 
While the consensus has emerged that RMSE is the best test to run, some still prefer to see correlations so I checked that as well, with SIERA similarly ahead:
Table 6. Correlation with Next Year’s ParkAdjusted ERA for Pitchers with IP>=40 both years
Estimator 
Correl 
.398 

xFIP 
.352 
.341 

tERA 
.347 
ERA (pkadj) 
.295 
Variance in ERA Estimators
Something that I have always found interesting about these ERA estimators is that, since they are not projections, they do not regress to the mean at all. The larger the variance of a statistic, the less it regresses to the mean by definition, which means that its ability to estimate the next year’s ERA is going to be worse, even though it may be picking up on temporary skill levels. For the record, the standard deviations are as follows:
Table 7. Standard Deviation of Estimators among Pitchers with IP>=40
Estimator 
Standard Deviation 
.740 

xFIP 
.682 
.901 

tERA 
1.029 
ERA (pkadj) 
1.247 
1.157 
Unsurprisingly, the statistics that include unregressed home runs per flyball rates (FIP and tERA) have higher standard deviations, but xFIP has a lower standard deviation than SIERA, meaning that it regresses more to the mean than SIERA. Thus, it is a good sign that SIERA is picking up on enough skill level as it still does better at predicting next year’s ERA despite less natural regression to the mean built in.
How Often is SIERA Closest?
Other than RMSE and correlations, another way to see how well a statistic does is to simply see how often each statistic is closer to next year’s parkadjusted ERA. I did this, and pitted ten different pairs of ERA estimators against each other to see how often each side won. The following are all very statistically significant (with p<.002), as there are over 1800 pitchers who pitched at least 40 innings in two consecutive years between 2003 and 2010:
Table 8. Percentage of Pitchers (IP>=40) For Whom Estimator is Closed to Next Year’s ParkAdjusted ERA
WinRate 
vs. xFIP 
vs. FIP 
vs. tERA 
vs. ERA (pkadj) 
54.6% 
54.1% 
56.8% 
58.9% 

xFIP 

53.9% 
56.0% 
58.0% 


55.8% 
59.2% 

tERA 



53.5% 
SIERA finished ahead of xFIP, FIP, tERA, and parkadjusted ERA, each by a noticeable and very statistically significant if not visually damning amount. Note that xFIP beat the other three by significant margins as well, FIP beat tERA more often, and tERA beat parkadjusted ERA itself most often.
Comparing Estimators to Years Other Than The Following
When SIERA came out last year, Brian Cartwright came up with another way of testing estimators: he looked at the RMSE between ERA estimators and parkadjusted ERAs in different years other than just next year. After all, these are not meant to be projection systems. They are meant to estimate skill level, and next year’s parkadjusted ERA has been considered a pretty good estimate of this year’s skill level. However, so is last year’s parkadjusted ERA, so is two years from now, as is two years ago. So I checked all of these, sameyear ERA (which obviously gives home run inclusive estimators FIP and tERA an obvious leg up), three years ahead, and three years behind:
Table 9. RMSEs of Estimators in Year T with ParkAdjusted ERA in Years T3 through T+3, Unweighted
Comparison Years 
T 
T+1 
T1 
T+2 
T2 
T+3 
T3 
Estimator 
RMSE 
RMSE 
RMSE 
RMSE 
RMSE 
RMSE 
RMSE 
0.997 
1.139 
1.138 
1.209 
1.200 
1.234 
1.271 

xFIP 
1.002 
1.160 
1.165 
1.218 
1.226 
1.252 
1.295 
0.843 
1.223 
1.237 
1.301 
1.290 
1.308 
1.342 

tERA 
0.946 
1.301 
1.323 
1.399 
1.364 
1.400 
1.420 
ERA (pkadj) 

1.416 
1.416 
1.509 
1.509 
1.516 
1.516 
Unsurprisingly, FIP and tERA do best at sameyear ERA, since they give credit or blame to pitchers for their home run rates. I was pleased to see that SIERA was closest in every other comparison. The order stayed the same for the other estimators as well.
These are all unweighted observations, meaning that pitchers who throw more innings are treated the same as pitchers who throw fewer. The following table shows the RMSE with each pitcher weighted by their innings pitched in the subsequent year:
Table 10. RMSEs of Estimators in Year T with ParkAdjusted ERA in Years T3 through T+3, Weighted by IP
Comparison Years 
T 
T+1 
T1 
T+2 
T2 
T+3 
T3 
Estimator 
RMSE 
RMSE 
RMSE 
RMSE 
RMSE 
RMSE 
RMSE 
0.875 
1.013 
1.019 
1.070 
1.074 
1.088 
1.142 

xFIP 
0.876 
1.033 
1.042 
1.083 
1.102 
1.115 
1.166 
0.736 
1.094 
1.129 
1.158 
1.169 
1.173 
1.235 

tERA 
0.862 
1.195 
1.236 
1.287 
1.263 
1.282 
1.331 
ERA (pkadj) 

1.272 
1.314 
1.354 
1.403 
1.364 
1.415 
Once again, the order of estimators remained the same in each year in question:
Table 11. Correlation of Estimators with ParkAdjusted ERA in Years T3 through T+3
Comparison Years 
T 
T+1 
T1 
T+2 
T2 
T+3 
T3 
Estimator 
Correl 
Correl 
Correl 
Correl 
Correl 
Correl 
Correl 
.600 
.398 
.357 
.343 
.309 
.305 
.244 

xFIP 
.606 
.352 
.325 
.305 
.273 
.244 
.214 
.738 
.341 
.303 
.286 
.269 
.270 
.235 

tERA 
.710 
.347 
.303 
.286 
.280 
.276 
.233 
ERA (pkadj) 

.295 
.295 
.232 
.232 
.237 
.237 
Correlations seemed to move all over the place, but SIERA did have the highest correlation for every year other than same year.
I also tested "win rate," the percentage of how often each statistic was closest to parkadjusted ERA, looking at different years.
The first table is sameyear ERA, in which FIP unsurprisingly wins the most. SIERA is closer slightly more often than xFIP. This is a statistically significant win percentage, albeit a small one (p=.02). All other comparisons are statistically significant except SIERA’s deficit in predicting sameyear ERA compared to tERA is not (p=.45), despite the fact that tERA credits a pitcher with the full run cost of the actual number of home runs they surrender.
Table 12. Percentage of Pitchers for Whom Estimator Was Closer to ParkAdjusted ERA in Same Year
WinRate (year T) 
vs. xFIP 
vs. FIP 
vs. tERA 
52.1% 
41.2% 
49.3% 

xFIP 

39.4% 
48.1% 


58.0% 
Since we have already looked at predicting next year’s parkadjusted ERA, we jump to "predicting" last year’s parkadjusted ERA.
Table 13. Percentage of Pitchers for Whom Estimator Was Closer to ParkAdjusted ERA in Previous Year
WinRate (year T1) 
vs. xFIP 
vs. FIP 
vs. tERA 
vs. ERA (pkadj) 
54.3% 
55.8% 
59.3% 
57.0% 

xFIP 

53.4% 
58.4% 
58.3% 


55.9% 
56.9% 

tERA 



51.2% 
Again, SIERA continues to do better, followed by xFIP, FIP, tERA, and parkadjusted ERA itself.
The following tables look at two years from now, two years ago, three years from now, and three years ago:
Table 14. Percentage of Pitchers for Whom Estimator Was Closer to ParkAdjusted ERA in Two Years
WinRate (year T+2) 
vs. xFIP 
vs. FIP 
vs. tERA 
vs. ERA (pkadj) 
52.8% 
53.8% 
57.3% 
58.6% 

xFIP 

54.7% 
57.5% 
59.6% 


55.4% 
58.4% 

tERA 



53.3% 
Table 15. Percentage of Pitchers for Whom Estimator Was Closer to ParkAdjusted ERA Two Years Before
WinRate (year T2) 
vs. xFIP 
vs. FIP 
vs. tERA 
vs. ERA (pkadj) 
53.9% 
55.8% 
57.5% 
59.8% 

xFIP 

53.7% 
58.1% 
59.3% 


56.2% 
58.1% 

tERA 



53.7% 
Table 16. Percentage of Pitchers for Whom Estimator Was Closer to ParkAdjusted ERA in Three Years
WinRate (year T+3) 
vs. xFIP 
vs. FIP 
vs. tERA 
vs. ERA (pkadj) 
53.3% 
55.6% 
55.8% 
59.1% 

xFIP 

53.8% 
55.8% 
57.2% 


54.0% 
58.8% 

tERA 



53.2% 
Table 17. Percentage of Pitchers for Whom Estimator Was Closer to ParkAdjusted ERA Three Years Before
WinRate (year T3) 
vs. xFIP 
vs. FIP 
vs. tERA 
vs. ERA (pkadj) 
52.5% 
52.8% 
56.8% 
54.9% 

xFIP 

49.9% 
54.8% 
54.7% 


56.5% 
54.3% 

tERA 



51.6% 
The results and order of best statistics are similar in each of these tables. Almost all of these are statistically significant, thanks to the large sample size of pitchers, with exceptions including tERA’s victory over parkadjusted ERA is not significant in predicting ERA from three years ago, predicting ERA three years ahead, nor predicting ERA from one year ago. Additionally, SIERA's victory over xFIP at predicting ERA three years prior is not significant, nor is xFIP’s deficit compared to FIP three years prior.
Averaging Multiple Years of ERA Estimators
In discussing this article with Sky Kalkman, he suggested that I look at whether multiple years of this estimator averaged out would show FIP to be better at picking up the elusive skill level in home runs per fly ball. Considering this very plausible, I checked this a few different ways.
First, I looked at just averaging the previous three years of ERA estimators to predicting the fourth year’s parkadjusted ERA, without doing any weighting. I looked at both RMSE and the correlation. Second, I weighted the estimator by innings pitched in each of those first three years. Again, I looked at the RMSE and the correlation. Third, I used that estimate but checked the RMSE while weighting the pitchers by innings pitched in the fourth year. The results?
Table 18. ThreeYear Average of Estimator versus ParkAdjusted ERA in Fourth Year

Unweighted 
Unweighted 
ERA weighted by IP, average estimator weighted by nextyear IP 

Estimator over Three Years 
RMSE 
Correl 
RMSE 
Correl 
RMSE 
1.104 
.433 
1.107 
.427 
.968 

xFIP 
1.120 
.390 
1.123 
.386 
.984 
1.122 
.411 
1.122 
.410 
.975 

tERA 
1.141 
.427 
1.135 
.425 
1.016 
ERA (pkadj) 
1.177 
.397 
1.178 
.396 
1.021 
SIERA continued to be the best in each of these five tests, though FIP basically caught up with xFIP. Although tERA did decently well with correlations, it fell short in RMSE.
Mixing and Matching
I also ran some other tests (not included) in which I used weighted averages of xFIP and FIP to see if they did better as a mixture than separately. They did, and the best mixture seemed to be 80%/20% for xFIP/FIP in oneyear estimation and 40%/60% for threeyear estimation. However, these did not outdo SIERA in any of the cases, except the threeyear estimation did tie it for observations weighted by IP in the fourth year and estimator weighted by IP in the first three years. However, the rest of the tests had SIERA safely in front.
Squeezing the data every which way, it remains true that 2010 continues to show SIERA to be the best ERA estimator. It is clear that xFIP is almost as good, though if left with one, I would prefer SIERA (perhaps obviously). Interestingly, running a regression of parkadjusted ERA on the previous year’s SIERA and xFIP shows that not only does SIERA to a better job, you should actually lower the expected ERA of a pitcher with a higher xFIP and the same SIERA. The formula given is:
ERA (pkadj) = 1.60 + .914*SIERA – .277*xFIP
Both coefficients are statistically significant (p=.000, p=.013 for SIERA and xFIP respectively). This means that xFIP is not giving extra information beyond what SIERA does. This peculiar result of a negative coefficient is probably a result of sampling bias, but it is still worth reporting.
Why SIERA Succeeds
The natural question that everyone asked last year when we came out with SIERA was not just if it was the best estimator, but why it is the best estimator. Why is it that a statistic that has fewer years to work with–and therefore does not precisely estimate the runeffect and outeffect of strikeouts, walks, and home runs–does better than statistics like xFIP and FIP, that do precisely estimate those things?
My further research over the last year has helped me understand why. The following are the highlights of this research. The first one listed is the one that we knew already when we published SIERA last year, but it is not the primary reason at all.
 Ground balls matter more for pitchers who get more walks and fewer strikeouts because they allow more runners to reach first base.
 Groundball pitchers allow fewer hits and fewer extrabase hits on ground balls than nongroundball pitchers, and SIERA acknowledges this effect due to its negative coefficient on groundball rate squared.
 Pitchers with higher groundball rates (but not too high) allow the highest BABIPs and SIERA picks up on this reversing effect of ground balls on BABIP due to their correlation.
 Pitchers with higher strikeout rates allow lower BABIPs and lower HR/FB rates, and SIERA picks up on this correlation. This is why the coefficient on strikeout rate in SIERA is so negative–because pitchers with high strikeout rates not only prevent runs by getting outs, but because they also allow fewer hits on balls in play and fewer home runs on fly balls.
 Pitchers with higher strikeout rates get more ground balls in doubleplay situations.
 Pitchers with lower walk rates issue more of their walks strategically, and thus the average damage of a walk from a high walk pitcher is higher, another effect which SIERA picks up.
My educated guess is that reasons 2) and 4) are the primary reasons for SIERA's superior estimation skill. The stark difference between QERA’s RMSE and SIERA’s RMSE in last year’s testing was primarily due to the negative coefficient on groundball rate squared in SIERA. When we ran our initial tests on SIERA, the inclusion of a variable for the square of groundball rate often did the most to improve estimation. Further, even though pitchers do have some control over BABIP, the amount that they do control is very similar to the amount that SIERA credits them with through BABIP’s correlation with strikeouts and groundball rates.
With this, we're not done with SIERA. We knew when we introduced it that we only had so many years of battedball data, and more years will undoubtedly help us better estimate the many coefficients in SIERA (though we have not yet looked at how much 2010 can help). Also, as Colin Wyers has done some work on parkadjusted battedball rates for pitchers, this may or may not help SIERA improve its estimation skill. If it does, we will be able to make these changes as well. Furthermore, as run environments change, SIERA will need to adjust accordingly too.
ERA estimation is clearly a complicated task with a lot of moving parts. SIERA is currently the best way to take a snapshot of a pitcher’s skill level, but with a lot of competition out there, we will continue to work on delivering even better ways of understanding pitcher’s skill level.
In the end,