March 17, 2010
Ahead in the Count
Why SIERA Doesn't Throw BABIP Out with the Bath Water
It sometimes seems as if the main reason people are wary of Defense Independent Pitching Statistics as a way to measure pitching performance is that they are reluctant to believe the theory that pitchers do not control the hit rate on balls in play (BABIP). It does not make intuitive sense, and it isn't even entirely true. Certainly, fans who disagree loudly with these theories should be reassured by the knowledge that defense-neutral ERA estimators are usually much closer to next year's ERA than the previous year's ERA, but many fans still can't get past the point that ERA estimators usually assume that pitchers do not have control over the outcome of balls in play. That is because these estimators simply look to interpret the effect on scoring of a strikeout, walk, and home run. This gives them the strength to predict ERA well because they are able to explicitly state the effect of each of these outcomes.
Alternatively, you can now find Baseball Prospectus' new defense-neutral ERA estimator, SIERA, in the stat reports. SIERA does not make this assumption. That's because we know that pitchers do control their BABIP to a certain degree. In any given season, the average starting pitcher who can keep his job will have his BABIP determined roughly 75 percent by luck, 13 percent by his team's defense/park, and 12 percent by his own skill.* How do we actually figure out the 12 percent that is skill, when we know that the variance in BABIPs and the limits of sample size imply that such a large fraction is luck? Fortunately, J.C. Bradbury found in 2005 that much of BABIP skill from pitchers can actually be explained by their defense-independent skills. In fact, about 86 percent of the pitcher portion of BABIP skill is explained by these statistics.**
When I say that these defense independent pitching statistics "explain" BABIP skill, I mean that you can figure how good pitchers are at preventing hits on balls in play by knowing how good they are at these skills. Specifically, pitchers who strike out a lot of hitters also induce a lot of weak contact. Randy Johnson was good at preventing hits on balls in play, and Tim Lincecum looks pretty good, too. Additionally, pitchers who walk a lot of hitters—those who miss the corners of the plate by a few inches—also have trouble with leaving the ball a few inches towards the middle of the plate, and are more likely give up more hits. Greg Maddux is an example of the opposite extreme as his impeccable control allowed him to keep the hits down. Pitchers who allow a lot of fly balls also induce a lot of pop-ups, easy outs that keep BABIP down. Both Jered Weaver and Ted Lilly are fly ball-prone, and while that hurts their home run numbers, they also induce a fair amount of popups. The ability to control the ball, miss the bat, and hit the top of the bat all correlate very highly and explain most of the pitcher's ability to control BABIP, and this is all utilized by SIERA.
What SIERA does is simply ask the question, "How much did teams score off pitchers with these whiffing, control, and grounder skills?" instead of the question that many estimators ask: "How much do these whiffing, control, and grounder outcomes affect scoring precisely?" The latter is certainly a valuable question, but if you are interested in checking how much pitchers' BABIP skills might affect their ERA, SIERA is the statistic for you. SIERA does not precisely estimate the exact run-scoring impact of those strikeouts and other outcomes because there are only seven years' worth of reliable batted-ball data, so the coefficients certainly require some fine tuning as we get the data. However, the benefit of a statistic like SIERA is that we no longer are asking how well a pitcher pitched if we ignore his BABIP skill. Instead, we are incorporating the vast majority of BABIP skill in our estimate.
For the Mathematically Inclined
This section is for those people who want to see the math involved in determining my numbers above, and also provides some transparency. It is not necessary to get the main point of the article: SIERA incorporates most of the skill that pitchers do have to influence BABIP.
*: To figure out the percentage of BABIP explainable by skill, luck, and defense, we do not need to guess. We know that variance of BABIP should equal the sum of the variance in BABIP explainable by luck, defense/park, and the pitcher himself. Since it’s a binomial variable, we can pin this down pretty exactly, since binomial variables have known variance. Firstly, the standard deviation of BABIP among all pitchers with 150 innings in a season from 2003-09 was .02125 (variance = 4.52*10^-4). The average amount of balls in play for pitchers in the sample was 610 with an average BABIP of .295. Thus, we can figure the standard deviation that we would observe due to luck if all teams and pitchers and stadiums did have the same BABIP: sqrt((.295)*(1-.295)/610)=.0184 (variance = 3.40*10^-4). This gives us the fraction of BABIP that is luck: (3.40*10^-4)/(4.52*10^-4) = 75%. The other 25 percent is going to be defense, park, or skill.
Fortunately, we can figure out the defense and park effects together by looking at overall team BABIPs in this time period. Teams had a standard deviation of BABIP within each season of about .0104 (variance = 1.08*10^-4). However, the amount of standard deviation that we would expect among all of these teams with an average BABIP of .298 and about 4,300 balls in play would be, as before, sqrt((.298)*(1-.298)/4300)=.00698 (variance =4.86*10^-5), which means that the variance in actual skill level for the defenses specifically is 1.08*10^-4 – 4.86*10^-5 = 5.93*10^-5, so the standard deviation in team BABIP skill should be .0077. Thus, two-thirds of teams should be between .290 and .306 in BABIP skill level, which sounds reasonable. It also means that team defense and park effects combine to explain 13 percent of BABIP.
This means that we have 12 percent of BABIP that cannot be explained by luck, defense, or park effects, and that should mean that 12 percent of BABIP is pitcher skill. Thus the pitcher BABIP skill should have a standard deviation of about .00721, and that two-thirds of pitchers probably fall between .291-.305 in terms of their actual abilities to prevent BABIP. In fact, 95 percent of pitchers should fall in between .283-.313 in their BABIP prevention abilities. That certainly is not a complete lack of skill difference, but it is small compared to the skill difference in strikeouts, walks, and ground balls.
**: To determine the amount of pitcher BABIP that can be explained by strikeout, walk, and ground-ball skills, I simply ran a regression of pitcher BABIP (weighted by PA) on the three main variables used in SIERA (SO/PA, BB/PA, and (GB-FB-PU)/PA) for all pitchers with over 40 inning in a seasons. This gave me a formula that pitcher BABIP skill is .304 - .077*(SO/PA) + .018*(BB/PA) + .052*((GB-FB-PU)/PA). The variance in projected BABIPs for all pitchers with 150 innings was 4.49*10^-5. Since actual pitcher BABIP skill should have a variance of 5.20*10^-5, that means we can explain about 86 percent of pitcher BABIP by looking at their three primary skills. Thus, SIERA picks up on the majority of pitchers' actual BABIP skill, which explains its strong estimation abilities even with only seven years of data to work with.