Last week Matt Swartz published an updated analysis of ERA estimators. He was kind enough to share his data so I could take a look at the accuracy of ERA estimators as a function of innings pitched. In other words, is there a difference in accuracy between the estimators given 100 historical innings pitched versus 500?  (Hint: yes.)

We can measure a lot of things that happen while a pitcher is on the mound, but it takes a while for the real information to show itself. As we collect more data, the random noise is more likely it is to cancel itself out. For example, during any one season you'll see a lot of .270 BABIPs, but once we look at careers over five-season stretches, .270 BABIPs are few and far between.
Some peripherals include a lot of noise, such as BABIP, HR/FB% and LOB%. Other peripherals start with a low noise-to-information ratio, such as SO/PA, BB/PA, and GB/BIP. As such, we might guess that metrics like SIERA and xFIP, which only use the latter peripherals as inputs, will more accurately reflect true talent in the short run, while things like ERA will be more accurate in the long run, because they can pick up on the former peripherals. Short term we need to reduce noise, but long term we can maximize information.
Methodology: Using pitchers who threw at least 40 innings from 2004 through 2010, I binned them according to total innings pitched the previous three years, every 100 IP. (Note that the data set only goes back until 2003 and didn't include any seasons with fewer than 40 innings.) The first bin has 40-100 IP in years n-3 through n-1, and the last bin has 600+ IP. For each bin, I calculated the weighted metrics (ERA_adj, FIP, xFIP, SIERA, and tERA) over the preceding three seasons, and then found the RMS error compared to the following season's park-adjusted ERA, weighted by the following season's IP total. A lower number means less disagreement, which is good.
Results (full data table at end of post):
  • The more historical information, the better future ERA can be predicted by all the metrics. Stunner, I know.
  • Given less than 200 innings pitched — a full season for most starting pitchers, or three years for a relief pitcher — SIERA holds a small advantage over xFIP, which in turn holds a small advantage over FIP. ERA_adj and tERA lag behind.
  • With more than 200 innings pitched, SIERA, xFIP and FIP merge in effectiveness.
  • By 500 IP, all five estimators are on equal ground.
  • ERA never surpasses the peripheral-based estimators, but maybe we just haven't included enough history to detect ERA's advantage, yet.
(Not that the only job of an ERA estimator is to predict future ERA.  Another important use is to evaluate in-season performance.  Comparison to future ERA remains a handy benchmark, however, because determining in-season true-talent ERA is an important part of projection.)
40<100 490 1.14 1.18 1.27 1.44 1.61
100<200 609 1.05 1.09 1.15 1.31 1.26
200<300 326 1.09 1.08 1.09 1.20 1.24
300<400 179 .92 .95 .97 1.04 1.01
400<500 141 .97 .96 .95 1.03 1.02
500<600 149 .99 1.01 .98 .98 .97
600+ 111 .74 .75 .72 .80 .73