BP Comment Quick Links


February 8, 2010 Introducing SIERAPart 1Baseball fans who have no use for advanced metrics can realize the flaws in evaluating pitchers by their wonlost records, but may struggle to understand the inherent flaws in the more commonly used earned run average. Henry Chadwick invented ERA in the 19^{th} century to measure the effect of defense on pitching performance, but not until Voros McCracken explained the concept of Defense Independent Pitching Statistics (DIPS) did our understanding of the relationship between pitching and defense take a big step forward. McCracken explained that pitchers controlled the rates of whiffing, walking, and getting walloped with home runs, showing that the correlation between these statistics in consecutive years was strong. Though he inferred an ability for hurlers to control these numbers, another finding suggested little persistence in their Batting Average on Balls in Play (BABIP), leading to the conclusion that ERAs were dependent on defense (or luck), and therefore very volatile. Armed with this information, sabermetricians began to develop methods of estimating ERA by controlling for the factors that can muddy the proverbial waters. These estimators enable the evaluation of pitching performance based on what pitchers actually control, rendering more accurate the tracking of their abilities. Watching trends in actual skills that pitchers control can help us better grasp whether shifts in ERA are the result of changes from the individual or from external factors. Since then, many competing estimators have emerged with their accompanying strengths and weaknesses. Perhaps the most popular ERA estimator is Fielding Independent Pitching (FIP), which uses the following straightforward formula: FIP = 3.20 + (3*BB  2*K + 13*HR)/9, where the 3.20 is a constant dependent on the league and year, used to place the outputted number on the ERA scale. Researchers have noted that, among the defenseindependent statistics, home runs are by far the least predictable. Although homerun rate has shown itself to be more repeatable than BABIP, the lack of persistence makes such a comparison similar to justifying a D grade by mentioning that other classmates failed the test. Further research revealed that the percentage of fly balls that left the yard (HR/FB) sported about as little persistence as BABIP, and secondgeneration estimators attempted to eliminate HR/FB luck from estimation. One of the more obvious adjustments is to simply approximate the number of home runs that would have been hit if the pitcher had neutral luck in the fly ball department, and recomputing FIP with this estimate. This metric, known as Expected Fielding Independent Pitching (xFIP), uses the regular FIP formula but it replaces HR with xHR, the metric described above. This estimator marked an upgrade over FIP given the accepted notion that HR/FB has much more of a foundation in luck than actual skill, but there was still ample room for improvement. Nate Silver invented QERA back in 2006 for Baseball Prospectus to adjust for a few issues with FIP and xFIP, and while he referred to the stat as a toy, it represented a big step upward in the methodology of estimators. The formula—QERA = (2.69  .66*GB% + 3.88*BB%  3.4*K%)^2—derives one of its main benefits from the fact that it accounts for nonlinear run scoring; the more baserunners allowed, the higher the percentage that will score. It also removes the bias that innings pitched totals are subject to battedball luck and a pitcher with a higher BABIP will have a lower K/IP even if he strikes out the same percentage of hitters. QERA has another problem of its own, in that GB% is really GB/Ball in Play (or, GB/BIP), while BB% and K% are measured per batters faced (SO/PA and BB/PA). In other words, for pitchers who strike out and walk large numbers of hitters, changes in ground balls per ball in play affect their QERA as much as they do for pitchers who barely strike out or walk any hitters, even though the latter group’s groundball rate actually represents a higher tally. Further, while QERA picks up some of the interaction between walk, strikeout, and groundball rates, it does not necessarily weight them correctly. With that in mind, we have invented a new statistic, SkillInteractive Earned Run Average (SIERA), which corrects the problems with old estimators while adding a few more realistic assumptions. This was done first by unfoiling all of the individual components in QERA while making an adjustment for the issue with the groundball denominator issue, and testing to see which interactions and squared terms were relevant by using multiple linear regression analysis. Essentially, we changed the GB/BIP to (GBFBPU)/PA and evaluated all of the terms in the exponential regression, removing those with insignificant pvalues; while the QERA formula only shows three variables, unfoiling the formula reveals several more. We identified two terms that were not useful: the squared term of walks, and the interaction between walk and strikeout rate. The squared terms on strikeout and groundball rates were both significant, and we also found important interactions between walks and grounders and between whiffs and grounders that have strong effects on run scoring. As a result, SIERA accomplishes the following:
The new groundball statistic used is: (GB(FB+PU))/PA. Now walks, strikeouts, and grounders use the same denominator, avoiding any type of weighting issues. GB/PA could have been used instead of GB/BIP, but our findings suggested that line drives per ball in play exhibited virtually no persistence, and did not represent a pitcher skill. When his linedrive rate is low, the pitcher is probably just lucky, but groundball, flyball, and popup rates will increase to make up the difference. Since groundball rate for the league as a whole is similar to the sum flyball and popup rates, using the difference between the two eliminates some of the luck that would make this estimator look bigger than its britches. For the same reason, popup rate was allowed to negatively affect SIERA because it is a symptom of the pitcher throwing the ball that generates an upward trajectory, which could lead to an increase in home runs. A pitcher’s skills are throwing strikes, making hitters miss, and throwing with angles and spins such that the trajectory of the ball is downward when it hits the bat. A popup almost always represents an out, but it also represents a potential problem for the pitcher in the future. Simply running a regression analysis to predict parkadjusted ERA and developing a statistic that introduces these improvements to Defense Independent Pitching Statistics would be useless if it did not predict ERA better than other statistics. Not only did SIERA emerge as the leader in ERA estimators, we discovered more importantly that using the same regression analysis on different datasets shows that the coefficients developed continue to predict ERA better than other estimators, proving that our analysis was not biased by retroactively predicting the mark. Specifically, using 200308 data to generate a formula and then testing it on 2009 pitchers, SIERA emerged as the best estimator of parkadjusted ERA in the following year and the best at predicting sameyear ERA amongst the estimators that treat HR/FB as luck; FIP and tRA consider it to be more skillladen. In other words, it is impossible to best FIP in terms of sameyear mirroring unless HR/FB is treated as a skill, but tests have shown that HR/FB itself is unstable and not indicative of something within the control of the pitcher. FIP and tRA lead other estimators that do not credit the pitcher for this luck in predicting same year Earned Run Average, but SIERA overtakes both in predicting future performance, which is arguably much more important. After all, the primary goal of ERA estimators is to approximate a skill set that can successfully generate low ERAs while being as accurate as possible in the modeling and assumptions deriving the formula. In the coming days, we will explain in more detail the derivation of SIERA, provide some tests to check its performance, and offer examples of pitchers for whom the metric performs vastly better than other estimators. The last part is very important, as a small change in ERA estimation is not necessarily a big deal unless there are pitchers who are perpetually underrated or overrated by similar statistics. This is certainly true in the case of SIERA and FIP for a player like Santana, whose solo home run tendencies are inaccurately punished by FIP in a way that underestimates his skill by a significant amount. The introduction of a metric that properly accounts for all that was mentioned above helps to evaluate pitchers in a more precise and useful way than ever before. For now, we leave you with the formula for the statistic that will be kept here moving forward and will soon be found on the revamped reports: SIERA = 6.145 – 16.986*(SO/PA) + 11.434*(BB/PA) – 1.858*((GBFBPU)/PA) + 7.653*((SO/PA)^2) +/– 6.664*(((GBFBPU)/PA)^2) + 10.130*(SO/PA)*((GBFBPU)/PA) – 5.195*(BB/PA)*((GBFBPU)/PA)
Matt Swartz is an author of Baseball Prospectus. 52 comments have been left for this article. (Click to hide comments) BP Comment Quick Links Second order interaction terms! Feb 08, 2010 10:11 AM Rowen Bell (5629) Isn't one of Matt/Eric's points that the secondorder interaction terms are actually implicit in Nate's QERA formula, it's just that we don't think about them, because of the way in which he left the form of the formula? Feb 08, 2010 11:24 AM Exactly right. We also let them take on more realistic values because we unfoiled the regression. Feb 08, 2010 11:31 AM CRP13 (46873) For a lowly engineer like me, paying to be a premium member of this website is like getting to look in through the window of a Mensa meeting. Feb 08, 2010 10:16 AM dianagram (9530) I bet San Diego pitchers benefit the most from this, hence I shall hereafter refer to this as "The Treasure of the SIERA Padre" Feb 08, 2010 10:16 AM The Iron_Throne (4630) Could you comment on the use of a second order positive term for K%? Is this a fitting artifact (ie no constant could accurately fit the data for a first order term), or is there a concept behind it? If it is the former, might I suggest letting the exponent of the first term float to clean up the equation a bit. Feb 08, 2010 10:40 AM Very good question. We did run regressions on various subsets of data and this term was basically always positive. It fits with the general theme pretty well too. Think of it this way: when do you want a strikeout most? With runners on base. The more K's you get, the fewer runners are on base though, so it tends to get gradually less affective. Feb 08, 2010 10:48 AM So what's the ... I don't know the proper math term ... the point where there's diminished return for increased K rate? My guess is somewhere around 7. Feb 08, 2010 11:35 AM It happens bit by bit, so it's tough to pick an exact number. I know that SIERA tested particularly well for pitchers in the 6.59.0 range of K/9 while doing about as well as everything else for K/9 above that. Feb 08, 2010 11:54 AM sbrousc (33447) I believe the term is "inflection point," no? Feb 08, 2010 15:01 PM Gordon (1198) There is no inflection point. The inflection point would be where the second derivitive of SIERA with respect to (SO/PA) equals zero. But the second derivitive of SIERA with respect to (SO/PA) is a constant (2*10.169). [This assumes that SO is independent of the other parameters, which isn't quite truein the extreme, a pitcher who strikes every batter wouldn't have any BB, GB, or FB. But I suspect that this effect is small.] Feb 09, 2010 09:07 AM The Iron_Throne (4630) Will, there's always a diminished return on increasing K rate per this fit, but if you're asking where the first negative Krate term is overwhelmed by the two positive Krate terms, the answer is going to be a function of GBrate, as there is the +9.561*(SO/PA)*((GBFBPU)/PA) term, which is also positive. The answer then, for a GBrate of 0.49 (Brandon Webb 2006) is Krate=0.66, GB=0.35 (Adam Wainwright 2009) is Krate=0.73, and GBrate of 0.28 (Aaron Harang 2007) is Krate=0.76. These are obviously nonphysical numbers, since you can't strike out even 66% of batters faced, and results from this equation being fit to a data set of real values. Since you can't extrapolate a purely phenomenological equation outside of its set of data, these numbers are meaningless. Feb 08, 2010 15:07 PM As far as the diminishing run prevention effect of strikeouts, it does really matter where the BB and GB numbers are because those determine the number of baserunners and the double play ability to remove those baserunners. Feb 08, 2010 15:23 PM Gordon (1198) The positive coefficient on the (SO/PA)^2 term does mean that at some point additional SO/PA increases SIERA. This is when the first derivitive of SIERA wrt (SO/PA) is equal to zero, which is at a SO/PA of (18.0559.561(GBFBPU)/PA)/(2*10.169). This looks like an outrageous strikeout rate, so it probably isn't an issue. Feb 09, 2010 09:25 AM sunpar (38553) I like the work the new(er) BP writers are doing. This was great. Feb 08, 2010 11:40 AM sunpar (38553) By the way, now that we've seen what SIERA can do for Santana, do we find out Wednesday if it solves the "Tom Glavine is not subject to FIP" phenomena? Feb 08, 2010 11:43 AM Haha unfortunately, Glavine baffles SIERA as well, at least for 20032008 where there is actually batted ball statistics recorded. His SIERA's look similar to his other ERA estimators, all ahead of his ERA (about 4.9 versus 4.2 for those six years). The thing about Glavine was that he was far superior to his peers at pitching to the situation. I think that pitchers with really high ground ball rates may be particularly good at pitching to the situation, but at least from 200309, Glavine is pretty average there so he is not a puzzle SIERA can foil. Feb 08, 2010 11:52 AM sunpar (38553) As a huge Braves fan throughout Glavine's career, I don't ever recall him being an extraordinary groundball pitcher at the very least, not on Maddux's level. He got his share of GBs, but mostly he just seemed to coax a lot of lazy fly balls to CF and LF. Feb 08, 2010 12:06 PM Definitely could be part of it. Glavine's career BABIP is .286, while Wakefield's is .281. SIERA and FIP do about as well when it comes to Wakfield. The thing about Glavine's BABIP partly is that he played in front of good defense, so that's not all the effect. It definitely would explain some of it, though. Feb 08, 2010 12:11 PM sunpar (38553) Good point on the defense. I'm sure Andruw Jones made a ton of those FBs look lazier than they were. Feb 08, 2010 13:14 PM dantroy (7559) Does SIERA inform the PECOTA projections, or are they entirely separate things? Feb 08, 2010 12:12 PM SIERA helped find some of the mistakes in the first round of PECOTAs but it wasn't early enough to actually build it in to 2010 PECOTA. It definitely could be part of the process more next year, though I'm not quite sure about that. Feb 08, 2010 12:15 PM dantroy (7559) Thanks. Is it safe to assume you will rolling out 2010 SIERA projections in the near future? Feb 08, 2010 12:27 PM The 2010 Annual will list 2009 SIERA and will compute 2010 SIERA according to the projection. Pretty soon, the 200309 SIERA's will be available on the Statistics Reports and the 2009 SIERA's will do very well at helping predict 2010 ERA, at least net of park effects. Feb 08, 2010 12:29 PM dianagram (9530) Ruben Sierra just called, and says if you give a guy like Bill Pecota some recognition, at least you could spell the better player's last name right. :) Feb 08, 2010 12:43 PM Juris (1283) @Matt and Eric: interesting work. I think some of us would like to see a simple correlation matrix of SIERA against the other ERA estimators including FIP, QERA and ERA itself (!) based on crosssectional (single season) data. Feb 08, 2010 12:55 PM Juris, Feb 08, 2010 12:59 PM sroney (1190) This says "a pitcher with a higher BABIP will have a lower K/IP even if he strikes out the same percentage of hitters." Feb 08, 2010 13:19 PM Sky Kalkman (3454) Are you going to include a BaseRunsstyle combination of events in the analysis articles? Especially when PECOTA's spitting out frequency of events for you, it's a pretty good ERA predictor... Feb 08, 2010 13:55 PM Are you asking how well this would match up with linear weights? If so, that's not directly in subsequent articles, but I think it would probably match it reasonably well, at least as more data is collected on batted ball numbers over the next few years and the coefficients are refined. Some of the strength in the estimation might come from situational pitching, which probably wouldn't show up quite as much in linear weights as I understand it, but certainly the magnitude of the interaction terms at the end might work out pretty well at least. It's a good question, but I'm not sure yet. Feb 08, 2010 14:00 PM Quick note that we didn't mention in the formula for pitchers who give up MORE fly balls and pop ups than ground balls, the ((GBFBPU)/PA)^2 term would be positive, but we basically made it negative in that case. So that term should be negative or positive depending on the sign of (GBFBPU)/PA. Feb 08, 2010 14:14 PM sunpar (38553) On a somewhat related note, I'm sitting here at work where we have various TVs tuned to different news networks (I work for a media company), and I just saw Nate Silver discussing politics on MSNBC. From QERA to senatorial elections; what a career path. Feb 08, 2010 14:22 PM jdtk99 (38768) One of the benefits of qERA was that it tends to stablilize quickly. Will this be true of SIERA? Feb 08, 2010 15:32 PM TheRealNeal (26363) Are all the predictive systems you mentioned using the same factors when arriving at your "park adjusted ERA"? Feb 08, 2010 18:54 PM It predicts parkadjusted ERA the following year best, and I have a large doubt that starters are systematically paired with certain weather and umpires, so that should even out when predicting the following year's ERA. It also does better than other HR/FBluckneutral estimators in sameyear ERA so it covers all the bases. Although there could be parks that favor lefties and righties, I'm not sure how this would correlate with K%, BB%, and GB% enough to affect this. Feb 08, 2010 19:10 PM McNulty (39298) I'm curious as to the process for coming up with the coefficients in the formula. Feb 08, 2010 21:49 PM Eric M. Van (31218) This looks like a very nice advance in metrics that assume that pitchers do not have significant variations in hardness of contact allowed. That is an incorrect assumption*, but the variations are small enough at the MLB level to make such a metric very useful, and may help get a handle on those slight but real differences. Feb 08, 2010 23:07 PM Thank you for pointing this out. The correlation of BABIP and Defense Independent Pitching Statistic is something I've discussed before. It's small but it's there. The benefit of using regression to do this is that it picks up this effect. Pitchers with higher Krates have lower BABIPs, and both the extra K/PA and the fewer H/BIP lower ERA, but the regression will pick up both effects. The only thing that SIERA will leave out is BABIP effects that are uncorrelated with ground ball, strikeout, and walk rates, which are very small effects. That's why this is more of an ERA estimator based on skills than based on DIPS. Feb 09, 2010 05:29 AM Eric M. Van (31218) Good! Grabbing a chunk of the true BABIP skill via the K and BB factors in the regression makes the metric even better than I realized. Feb 09, 2010 15:22 PM anderson721 (18704) Is there a formula that predicts the relationship between the # of baserunners and the % that score? Feb 09, 2010 15:41 PM bobbygrace (38384) PU stands for popups, in case anyone else didn't know! (It would be useful to put "PU" in the glossary; it seems to me that it's a less standard stat than, say, GB or FB, which are found there.) Feb 14, 2010 07:58 AM studes (280) One correction: xFIP is not based on HR/FB, but HR/OF (home runs per outfield fly). Sounds minor, but it's an important distinction. Feb 16, 2010 07:39 AM sdgeiger (6577) I'm new to these advanced forms of statistical analysis. Aug 15, 2010 16:53 PM The typical boxscore in a newspaper typically doesn't write down batted ball types, but even FanGraphs boxscores just lump it all in with fly balls. That's okay because we have fly balls and pop ups always added together in the equation so it's okay to call all pop ups fly balls and totally have the equation work. Any game summary on Gameday will include pop ups too. Aug 30, 2010 15:47 PM Not a subscriber? Sign up today!

I was told there would be no math questions.
Original cast SNL reference for the win...