CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here for forgotten password Click here to subscribe

<< Previous Article
Premium Article Contractual Matters: P... (02/02)
<< Previous Column
Between The Numbers: G... (11/09)
Next Column >>
Premium Article Between The Numbers: T... (03/18)
Next Article >>
Fantasy Article Fantasy Beat: The Cons... (02/02)

February 2, 2011

Between The Numbers

Better With Less: ERA Estimators

by Sky Kalkman

Last week Matt Swartz published an updated analysis of ERA estimators. He was kind enough to share his data so I could take a look at the accuracy of ERA estimators as a function of innings pitched. In other words, is there a difference in accuracy between the estimators given 100 historical innings pitched versus 500?  (Hint: yes.)

We can measure a lot of things that happen while a pitcher is on the mound, but it takes a while for the real information to show itself. As we collect more data, the random noise is more likely it is to cancel itself out. For example, during any one season you'll see a lot of .270 BABIPs, but once we look at careers over five-season stretches, .270 BABIPs are few and far between.
 
Some peripherals include a lot of noise, such as BABIP, HR/FB% and LOB%. Other peripherals start with a low noise-to-information ratio, such as SO/PA, BB/PA, and GB/BIP. As such, we might guess that metrics like SIERA and xFIP, which only use the latter peripherals as inputs, will more accurately reflect true talent in the short run, while things like ERA will be more accurate in the long run, because they can pick up on the former peripherals. Short term we need to reduce noise, but long term we can maximize information.
 
Methodology: Using pitchers who threw at least 40 innings from 2004 through 2010, I binned them according to total innings pitched the previous three years, every 100 IP. (Note that the data set only goes back until 2003 and didn't include any seasons with fewer than 40 innings.) The first bin has 40-100 IP in years n-3 through n-1, and the last bin has 600+ IP. For each bin, I calculated the weighted metrics (ERA_adj, FIP, xFIP, SIERA, and tERA) over the preceding three seasons, and then found the RMS error compared to the following season's park-adjusted ERA, weighted by the following season's IP total. A lower number means less disagreement, which is good.
 
Results (full data table at end of post):
 
 
Takeaways:
  • The more historical information, the better future ERA can be predicted by all the metrics. Stunner, I know.
  • Given less than 200 innings pitched -- a full season for most starting pitchers, or three years for a relief pitcher -- SIERA holds a small advantage over xFIP, which in turn holds a small advantage over FIP. ERA_adj and tERA lag behind.
  • With more than 200 innings pitched, SIERA, xFIP and FIP merge in effectiveness.
  • By 500 IP, all five estimators are on equal ground.
  • ERA never surpasses the peripheral-based estimators, but maybe we just haven't included enough history to detect ERA's advantage, yet.
(Not that the only job of an ERA estimator is to predict future ERA.  Another important use is to evaluate in-season performance.  Comparison to future ERA remains a handy benchmark, however, because determining in-season true-talent ERA is an important part of projection.)
 
IP Bin Count SIERA xFIP FIP tERA ERA_adj
40<100 490 1.14 1.18 1.27 1.44 1.61
100<200 609 1.05 1.09 1.15 1.31 1.26
200<300 326 1.09 1.08 1.09 1.20 1.24
300<400 179 .92 .95 .97 1.04 1.01
400<500 141 .97 .96 .95 1.03 1.02
500<600 149 .99 1.01 .98 .98 .97
600+ 111 .74 .75 .72 .80 .73

Sky Kalkman is an author of Baseball Prospectus. 
Click here to see Sky's other articles. You can contact Sky by clicking here

Related Content:  Peripherals

24 comments have been left for this article. (Click to hide comments)

BP Comment Quick Links

BP staff member Christina Kahrl
BP staff
(11)

First off, Sky, welcome aboard, it's great to have you here. Second, to follow up on one of Tango's comments last week, I'd be interested to this sort of evaluation using RA9 instead of ERA. Focus on scoreboard outcomes, and not a scorer's opinion, as it were.

Feb 02, 2011 14:25 PM
 
Sky Kalkman

Thanks, and I totally agree. Would also be great to include a bunch of other ERA estimators, park- and league-adjust all of them, see if the conclusions hold across time, etc. I hope and think we'll get to doing a lot of that, but for now, this data was readily available.

Feb 02, 2011 14:30 PM
rating: 0
 
TangoTiger

So given 400 IP or more, FIP wins? Even though you are comparing against park-adjusted ERA instead of actual ERA?

This seems like a big deal, a huge deal, no? This is saying that the batted ball data is worse than just knowing the number of HR allowed.

Am I misinterpreting?

Feb 02, 2011 20:56 PM
rating: 0
 
Sky Kalkman

I guess if it's worth nothing .04 differences, it's worth noting .02 differences. In which case ERA is a co-winner at 500+ IP. The muddling of lines was all I really noticed at that end of the graph. Although I'm very curious if FIP or ERA would pull ahead with more historical data.

We're aware of irregularities in batted ball data, but I also wonder if FIP (and ERA) are picking up on park effects (and defense) over the longer time period, information that might be removed with adjustments.

Lots of fun questions.

And of course, if we require accuracy, we should find ourselves the nearest decent projection.

Feb 03, 2011 04:07 AM
rating: 0
 
TangoTiger

Right, all legitimate questions.

The test is the following: given all known information for each pitcher (his career past performance, his recent past performance, his batted ball distribution, his performance with men on base, the fielding talent of his past fielders, his parks, his past teams, his 2011 team, his 2011 fielders, etc), what will be his RA9 in 2011?

Now, FIP is saying: "I don't care about anything, other than his BB, K, HR, HBP numbers. I'll make my estimate based solely on that."

PECOTA, SIERA, et al would say: "My god, I definitely need all that past information. It's critical that I know all that. I'll make my estimate based solely on that."

And when 2011 comes to a close, what's going to happen? I think you can make a decent case that all that extra effort may bring you very little, and perhaps will even be a negative (i.e., over adjusted).

So, that's the real test. Until then, we're dancing around the entire issue with these various other tests, because they are all going to be biased to some extent toward one metric or another based on however you setup the various other tests.

Feb 03, 2011 04:29 AM
rating: 0
 
Sky Kalkman

Not to be the new guy who shills for the company's stats, but SIERA doesn't "need all that past information". It's HR, BB, and GB/FB (and PAs instead of IP -- heck maybe FIP and xFIP would be improved simply by upgrading denominators; IP are influenced by things they're trying to ignore.) SIERA inputs are basic.

*My* real test was to see which metrics did better short term, not longer term, although all questions are interesting. Which one do I want to use when Livan Hernandez posts a 2.50 ERA the first two months of the season? I think it's pretty clear you want SIERA or xFIP. If he does it for 2+ years, eh, it doesn't much matter (although I'd really like to see if a projection system can beat the estimators.)

It's like a point I made in Dave's Matt Cain post at Fangraphs yesterday. When the Cain hullaboo started, it was quite fair to challenge his ERA as somewhat flukey, given his xFIP. But now that we have three more years of him posting 7.0% HR/FB rates, we need to focus on FIP instead of xFIP. Our answer changes because the amount of data changes. Now, if there was evidence that his HR/FB was "real" three years ago, that would be awesome to find.

Feb 03, 2011 04:57 AM
rating: 0
 
TangoTiger

It's not clear at all "short term".

If Livan has say a 2.50 FIP over the first two months, but a 5.50 SIERA over those same two months, and the question you are asking: "How will he do over the next 4 months", well, my answer is "Use his entire career."

You are suggesting that if you intentionally limit yourself to only using two months of short-term data, and discarding the rest of his past data, then SIERA will do better. Well, giving that the batted ball distributions stabilize faster than HR rates, then you are correct.

But, there is no reason to limit yourself to only looking at his first two months of data.

What we have with Livan is a history, and you use that history. And this is exactly what you have shown, that if you look at all pitchers with a minimum of 400 IP, then FIP does a bit better than SIERA. That is, knowing his HR allowed (that's what's in FIP but not in SIERA) is better than knowing his batted ball distribution (that's what's in SIERA but not in FIP).

So, if you want to argue that for guys with less than 200 career IP you prefer SIERA, then fine.

***

Two more points:
1. I'll keep repeating this, but as long as you compare SIERA to park-adjusted future ERA, and you compare FIP to park-adjusted future ERA, you are biasing the results against FIP. You should no longer perform that test ever. If Ubaldo has a high FIP one year because of HR, he'll have a high FIP the next year because of HR, and you can't compare it to park-adjusted ERA (which presumes a flatter HR rate).

2. FIP is not meant to be predictive! FIP merely represents current performance. In no way should one even think that you would regress K rates the same as HR rates. If I wanted a "predictive FIP", I would probably do something like (5*HR + 2*BB - 2*SO)/PA + constant or something.

I think anyone here can find a stat that predicts future RA9 better than FIP and better than SIERA by focusing only on HR, NIBB+HBP, SO.

There's my next challenge to the community.

Feb 03, 2011 06:29 AM
rating: 0
 
Sky Kalkman

Yeah, "short-term" was a bad way to put it. No reason to toss out historical information if you have it. I should have said that if you have little information (from the past three years) it appears SIERA and xFIP are the metrics to look at. That's mostly younger starters and relief pitchers. Or anyone in any given season if you're into measuring value that way, I suppose.

Feb 03, 2011 06:53 AM
rating: 0
 
TangoTiger

Agreed.

Feb 03, 2011 07:06 AM
rating: 0
 
evo34

Why don't you create and test a "predictive FIP" then?


"2. FIP is not meant to be predictive! FIP merely represents current performance. In no way should one even think that you would regress K rates the same as HR rates. If I wanted a "predictive FIP", I would probably do something like (5*HR + 2*BB - 2*SO)/PA + constant or something."

Feb 03, 2011 13:40 PM
rating: -1
 
TangoTiger

Way ahead of you. Check my blog for FutureFIP.

Feb 03, 2011 16:19 PM
rating: 0
 
evo34

Nice job.

I think the next step is to look at which estimator performs best for in-season projections. That is, compare the first-half ERA estimator with the second-half actual ERA (filtering out team changers and guys with not enough IP in both halves), and assess the average errors.




Feb 02, 2011 22:04 PM
rating: 0
 
JeffZimmerman

Congrats on the new gig. With all the time you have Sky, could you also look into pitchers that changed teams. How do pitchers perform with a new stadium and defense behind them?

Feb 03, 2011 05:53 AM
rating: 0
 
Sky Kalkman

Thanks, Jeff. Pretty sure I have about as much free time as you do ;)

The data set I have doesn't include team, so I'll add your questions to the queue when someone does a bigger study.

Feb 03, 2011 06:49 AM
rating: 0
 
studes
(280)

Congrats, Sky, and welcome to BPro.

I'm also someone who feels that, once you reach a certain threshold, ERA is just as good a predictor as anything. Happy to see you peg the threshold. I'm pretty comfortable saying that, after a couple of years as a starting pitcher, actual performance is as good an indicator as these other metrics.

Technical question: how good are any of them anyway? At 100 innings, it appears that the average RMSE is 1.2 points of ERA. Is RMSE similar to standard deviation? Can you say that 66% of all estimates fall within one RMSE, or something similar to that? So the best any of these estimates might get is 66% within plus or minus 2.4 ERA?

If so, it makes the difference between the RMSE of these measures kind of trivial.

Feb 03, 2011 08:47 AM
rating: 0
 
BP staff member Colin Wyers
BP staff

You figure SD and RMSE the same way, mathematically.

Feb 03, 2011 10:03 AM
 
studes
(280)

Thanks, Colin. So, picking on the 100-200 bin, when you use SIERA to predict future ERA, 66% of the results will be within 2.1 runs of ERA and 95% will be within 4.2 (that's taking the RMSE on both sides).

When you use ERA to predict ERA, 66% of the results will be within 2.5 runs of ERA and 95% will be within 5.0 runs.

Is that right? If so, it's obviously an improvement, but I think somewhere we should acknowledge that ERA is just plain tough to predict, regardless of what measure you use.

Looked at that way, I don't think there's much to choose between SIERA, xFIP and FIP. Just use whatever you're comfortable using.

Feb 03, 2011 10:11 AM
rating: 0
 
TangoTiger

I wouldn't use the word within, as I think it would imply +/-. In your case, you are saying that the *range* is 2.1 runs (i.e., +/- 1.05).

But, yeah, ERA is notoriously difficult to estimate because of BABIP and sequencing.

Feb 03, 2011 10:45 AM
rating: 0
 
BP staff member Colin Wyers
BP staff

Well, assuming the error is symmetrical Studes is right. If the error isn't symmetrical (and I would suspect it isn't), then it's more of an approximation, but I think it's still a useful way of putting it.

Feb 03, 2011 10:56 AM
 
TangoTiger

The error definitely can't be symmetrical. But, that's a vagary of using runs per out. If you instead used the square root, you'd get something closer to symmetrical.

That is, if we treat runs as a multiplication of OBP and SLG (for illustration purposes), and if each of those has a symmetrical error, then multiplying the two won't give you a symmetrical error.

Feb 03, 2011 11:16 AM
rating: 0
 
BarryR

Interesting stuff. What I would like to know is whether there are pitchers who consistently under/over perform the various metrics. If so, is there a common thread among them?
Whether these metrics are intended as predictive or not, they will be used that way, whether to analyze trades, signings, or for fantasy purposes. In order to effectively analyze a pitcher's future value, it is necessary to know if there is reason to believe or doubt the numbers in his specific case.

Feb 03, 2011 11:51 AM
rating: 0
 
TangoTiger

If there was a common thread, it would be identified and included as a parameter in the estimator.

Feb 03, 2011 12:53 PM
rating: 0
 
BarryR

And I feel foolish for optimistically suggesting it's existence.
I'd still like to know if there are consistent under/over pitchers though.

Feb 04, 2011 15:25 PM
rating: 0
 
TangoTiger

SIERA excludes HR. So anyone with a HR skill will under/over. Brett Myers for one.

FIP excludes batted balls. So anyone with a batted ball distribution skill will over/under. Felix maybe is one.

Basically, whatever parameter is being ignored is a candidate for being over/under.

These metrics purposefully ignore parameters because they want to, not because they necessarily think there's no skill there.

Feb 04, 2011 17:40 PM
rating: 0
 
You must be a Premium subscriber to post a comment.
Not a subscriber? Sign up today!
<< Previous Article
Premium Article Contractual Matters: P... (02/02)
<< Previous Column
Between The Numbers: G... (11/09)
Next Column >>
Premium Article Between The Numbers: T... (03/18)
Next Article >>
Fantasy Article Fantasy Beat: The Cons... (02/02)

RECENTLY AT BASEBALL PROSPECTUS
Every Team's Moneyball: Cincinnati Reds: Go ...
Every Team's Moneyball: Chicago White Sox: T...
Premium Article Some Projection Left: The Moran Mystery
Notes from the Field: Seven Days and 32 Pros...
Spring Training Notebook: Cactus League
Premium Article Rubbing Mud: The Demise of the Two-Out Rally
Premium Article Some Projection Left: Matuella has Tommy Joh...

MORE FROM FEBRUARY 2, 2011
Please Allow Me to (Re)Introduce Ourselves
Prospects Will Break Your Heart: What Could ...
Purpose Pitches: The Problems Mark Cuban Won...
Premium Article Prospectus Hit and Run: I Saw 'em When
Fantasy Article Fantasy Beat: The Constant Gardner
Fantasy Article Fantasy Beat: Value Picks at First Base, Thi...
Premium Article Contractual Matters: Prince Albert at the Fi...

MORE BY SKY KALKMAN
2011-03-25 - BP Unfiltered: Depth Charts Update 3/25
2011-03-14 - BP Unfiltered: Depth Charts Hotline
2011-02-16 - Fantasy Beat: How Not to Offend Me With a Sc...
2011-02-02 - Between The Numbers: Better With Less: ERA E...
More...

MORE BETWEEN THE NUMBERS
2011-04-07 - Between The Numbers: Fun with selective endp...
2011-03-31 - Between The Numbers: Projected Standings and...
2011-03-18 - Premium Article Between The Numbers: The Rule X Draft
2011-02-02 - Between The Numbers: Better With Less: ERA E...
2010-11-09 - Between The Numbers: Glove checks
2010-10-04 - Between The Numbers: Eric Thames and Home Ru...
2010-10-02 - Between The Numbers: How the West Will Be Wo...
More...

INCOMING ARTICLE LINKS
2014-10-07 - Baseball Therapy: The Cardinals Do Not Own C...
2012-12-10 - Premium Article Transaction Analysis: Dodgers Go Big for Gre...