Last week, Matt and I introduced and explained the derivation of SIERA, Skill Interactive Earned Run Average, a stat designed to pick up on the interactions between metrics within a pitcher’s control and let us know his park-adjusted ERA based on a set of sustainable skills. One of the introductory articles focused on testing the metric against its peers, and while it stacked up well against the competition, some concerns were raised over the the disconnect between FIP and xFIP in our testing, as the latter should theoretically perform better than the former when it comes to predicting ERA in the following season.
It turns out these concerns were well-founded as a trip back to my coding revealed a flaw in how xFIP and its inputs were being computed. I shouldn’t have made this mistake, and I should have caught it earlier on, but now it is time to rectify the error, as should be the case in any metric-creation process; tests are run, flaws revealed, corrections made, and improvements constantly applied. When the xFIP calculations are corrected, it does beat FIP, as it should. However, it is also in a literal dead heat with SIERA, with SIERA still coming out ahead in RMSE testing, but barely (1.159 for SIERA vs. 1.162 for xFIP), which does not invalidate anything, but rather invites some food for thought as far as the similarities and differences for the two stats.
So what does this tell us? Well, firstly it says that I need to get my head out of my (expletive deleted) and be much more careful in my coding and reviewing, but it also says that SIERA and xFIP are the best estimators around right now, and a thorough picture of run prevention at this stage should entail looking at both metrics. It also tells us that we should continue to work on refining SIERA, factoring in a lot of the excellent suggestions posted by readers on this site and others. Though they are closely linked, SIERA and xFIP are calculated from such different angles that it should be clear each will have its strengths and weaknesses in certain areas.
SIERA remains particularly strong for pitchers with very high and very low ground ball rates, and is very strong for pitchers with relatively average strikeout rates. Additionally, it works very well for the above average pitchers in terms of overall quality, making it a worthwhile tool for fantasy competitors. The differences are still pretty slight, so it should be repeated that SIERA and xFIP should be used in conjunction to one another right now in order to paint the most accurate portrait, but some of the ideas discussed in our threads were fantastic and will be potentially applied as we continue to develop the metric.
A final thought before signing off for now: There has been plenty of discussion of how we introduced the metric, and ways to improve in that forum. Our goal is to be as transparent as possible, and so as BP moves forward and introduces or refines metrics, is there anything specific that did or didn’t work in terms of really breaking down, in an in-depth fashion, how a stat is derived?