There are two more important reasons why SkillInteractive Earned Run Average’s (SIERA) is so successful at predicting the following year’s ERA. First, most other DefenseIndependent Pitching Statistics, like FIP and xFIP, assume that pitchers have no control over their Batting Average on Ball in Play (BABIP), but we know that they do have some control. I have shown before that pitchers with high strikeout totals and low groundball rates tend to allow fewer hits per ball in play, and thus lower BABIPs. Of course, BABIP is subject to so much luck that it is nearly impossible to discern a pitcher’s true ability to prevent hits on balls in play from his historical BABIP. That is why last year’s FIP is much better at predicting this year’s ERA than last year’s ERA is. It strips ERA of BABIP (and sequencing) altogether and assumes leagueaverage BABIP for all pitchers and random sequencing.
Another reason that SIERA is great at predicting ERA is that it accounts for the runprevention effect of ground balls – particularly, it controls for the fact that the effect is nonlinear. Not only do more ground balls lead to fewer runs allowed, but the difference in ERA between pitchers who generate 40 to 50 percent ground balls is smaller than the difference in ERA between pitchers who generate 50 to 60 percent ground balls. That is why there is a negative coefficient on the squared groundball rate term in SIERA.
At the time, Eric and I believed the fewer runs allowed was just an artifact of groundball double plays erasing singles, but the effect is simply too large for that explanation. The interaction terms actually add only a small (but useful) effect at predicting ERA, but the groundball rate is really the difference between SIERA and its predecessors. Additionally, the implied positive groundball squared coefficient in QERA is actually that statistic’s biggest flaw. In this article, I will look in more detail at groundball pitchers and why they are so good at preventing runs.
The first thing that I checked was whether pitchers with high groundball rates allowed lower hit rates on ground balls. Teams with good infield defense may target groundball pitchers, so to correct for this, I looked at the batting average allowed on ground balls for each pitcher relative to their team. Just to be assured that battedball classification did not get in the way (lest Colin Wyers show up at my door with a baseball bat), I looked at pitchers’ groundball rates relative to their team (though the effect was clear either way).
For the 3,297 pitchers who allowed at least 100 balls in play in a season between 2003 and 2010, the correlation between their net groundball rate and their net groundball batting average was .185.
I was concerned that relievers facing samehanded hitters might be creating a false correlation, so I looked at only pitchers with 300 balls in play in a season and found an even higher correlation: .241.
Pitchers who allow more ground balls allow fewer groundball hits. The following table shows all pitchers who have had consecutive seasons of a 60 percent groundball rate (with at least 300 balls in play) since 2003:
Name 
Career Groundball BABIP (since 2003) 
Teams’ Groundball BABIP (weighted) 
Net Groundball BABIP 
.231 
.254 
.023 

.209 
.227 
.018 

.221 
.233 
.012 

.222 
.236 
.014 

.218 
.248 
.030 

.205 
.233 
.028 

.225 
.249 
.024 
Each of these guys has a career BABIP on ground balls that is at least 12 points below their teams’ groundball BABIP. This is not a coincidence. They don’t just induce contact with downward trajectories – they induce ground balls that are easier to field.
This is also true for slugging average on ground balls in play. There is a similar negative correlation for this group.
Groundball Percentage 
Net Batting Average on Ground Balls (W/ROE) 

BIP minimum 
>100 
>300 
>100 
>300 
Correlation 
.185 
.242 (1174) 
.198 
.260 (1174) 
Do groundball pitchers induce weak contact on all balls in play? No. The reverse seems to be true for fly balls. Looking at outfield fly balls only, and excluding home runs, groundball pitchers have a distinctly higher BABIP and slugging average on balls in play.
Groundball Percentage 

BIP minimum 
>100 
>300 
>100 
>300 
Correlation 
.278 
.319 (1174) 
.259 
.313 (1174) 
However, these groundballers do not exhibit any tendencies toward linedrive BABIPs and infield popup BABIPs that are different than other pitchers. There is almost zero correlation year to year for BABIP on line drives or popups for any pitchers.
Correlation 
Net Groundball Rate 
Net Linedrive Rate (same year) 
Net Linedrive BABIP (next year) 
Net Groundball Rate (Observations) 
1.00 (1174) 


Net LineDrive BABIP: Same Year (Observations) 
.033 
1.00 (1174) 

Net LineDrive BABIP (Next Year) (Observations) 
.022 (549) 
.025 (549) 
1.00 (1174) 
Correlation 
Net Groundball Rate 
Net Popup BABIP Same Year 
Net Popup BABIP Next Year 
Net Groundball Rate (Observations) 
1.00 (1174) 


Net LineDrive BABIP: Same Year (Observations) 
.079 
1.00 (1174) 

Net LineDrive BABIP (Next Year) (Observations) 
.054 (549) 
.013 (549) 
1.00 (1174) 
The .025 correlation year to year on linedrive BABIP is particularly surprising because it is at odds with previous research. Six years ago, Mitchel Lichtman found that linedrive BABIP was persistent for pitchers, but look at the linedrive BABIP net of team linedrive BABIP and this unravels. This is a mixture of team defense adjustment and official scorer adjustment, but it unteaches something important about pitcher BABIP that many of us thought we knew.
Contrast the randomness of linedrive BABIP and popup BABIP with groundball and flyball BABIPs, which have a .188 and .152 yeartoyear correlations respectively, net of team. Overall, groundball pitchers allow higher BABIPs, but not higher slugging average on balls in play. This is primarily because ground balls are hits more often than fly balls, but the slugging on the two types of batted balls is similar. The correlation between groundball rate and overall BABIP and SLGBIP is shown below, with following year BABIP and following year SLGBIP alongside it:
Groundball Percentage 
Net BABIP Same Year (W/ROE) 
Net BABIP Next Year (W/ROE) 

BIP minimum 
>100 
>300 
>100 
>300 
Correlation 
.193 
.170 (1174) 
.169 
.141 (1174) 
Groundball Percentage 
Net SLGBIP Same Year (W/ROE) 
Net SLGBIP Next Year (W/ROE) 

BIP minimum 
>100 
>300 
>100 
>300 
Correlation 
.033 
.024 (1174) 
.044 
.002 (1174) 
However, correlation is a rough statistic that does not reveal subtleties or curvatures, so it misses the truth of what is going on.
Nonextreme groundball pitchers (those with 4560 percent of balls in play as a ground ball) allow the highest BABIPs, but pitchers with groundball rates over 60 percent actually allows average BABIPs.
To adjust for the fact that some pitchers allowed more ground balls than others, I weighted the BABIP of each pitcher by the actual number of ground balls allowed.
Groundball Rate 
Observations (Total Weight) 
Average Net Group BABIP (W/ROE) 
Standard Deviation of Net Group BABIP (W/ROE) 
> 60% 
294 (28,167) 
.0013 
.0390 
5560% 
378 (41,526) 
.0049 
.0356 
5055% 
648 (80,636) 
.0052 
.0305 
4550% 
1,058 (130,793) 
.0040 
.0298 
4045% 
1,231 (122,382) 
.0006 
.0337 
3540% 
863 (63,500) 
.0067 
.0366 
< 35% 
877 (27,836) 
.0132 
.0501 
The following graph of the average net BABIP in each group by groundball rate is even clearer:
While the lowest BABIPs belong to pitchers who allow fewer than 40 percent ground balls (these pitchers often have high infield popup rates), the pitchers with the highest groundball rates had lower BABIPs than pitchers with just slightly aboveaverage groundball rates.
Moving on to look at slugging average on balls in play, we see that the noncorrelation of groundball rate and slugging average on balls in play does not mean no relationship exists. In fact, pitchers with very low and very high groundball rates had the lowest slugging average on balls in play, while the highest slugging average on balls in play belongs to pitchers with average groundball rates.
Groundball Rate 
Observations (Total Weight) 
Average Net Group SLGBIP (W/ROE) 
Standard Deviation of Net Group SLGBIP (W/ROE) 
> 60% 
294 (28,167) 
.0154 
.0507 
5560% 
378 (41,526) 
.0035 
.0485 
5055% 
648 (80,636) 
.0012 
.0432 
4550% 
1,058 (130,793) 
.0032 
.0410 
4045% 
1,231 (122,382) 
.0022 
.0467 
3540% 
863 (63,500) 
.0023 
.0529 
< 35% 
877 (27,836) 
.0066 
.0729 
Both of these statistics work well with a quadratic fit with respect to groundball rate in a regression analysis.
The BABIP (with errors included) of the 3,297 pitchers with at least 100 balls in play is best predicted by the following formula:
Net BABIP (with ROE) = .002 + .136*(net GB%) – .401*(net GB%)^2
The pstatistic on both net groundball rate and its square was less than .01.
The SLGBIP (with errors included as singles) of the same 3,297 pitchers with at least 100 balls in play is best predicted by the following formula:
Net SLGBIP (with ROE) = .003 + .097*(net GB%) – .611*(net GB%)^2
Again, the pstatistic is less than .01 for both net groundball rate and its square.
The best way to adjust for this would be to also adjust for the strikeout and walk rates, in which case we would get the following equations for BABIP and SLGBIP:
Net BABIP (with ROE) = .010 + .121*(net GB%) – .383*(net GB%)^2 – .082*(K%) + .033*(BB%)
Net SLGBIP (with ROE) = .017 + .072*(netGB%) – .580*(GB%) – .150*(K%) + .065*(BB%)
In both equations, the net groundball rate and its square had pstatistics that were less than .01, as was the strikeout rate coefficient. The walk rate coefficient had p=.032 for SLGBIP and p=.123 for BABIP, the latter of which is not statistically significant, but is still suggests an effect.
Given this curvature, it only makes sense that the inclusion of the groundball squared term did so much to help SIERA to fit the data.
The shape is less obvious when looking at the following year’s BABIP and SLGBIP because groundball rates jump around, but we still see a generally similar effect in the two tables and graphs of these statistics below:
Groundball Rate 
Observations (Total Weight) 
Average Net Group BABIP (W/ROE) 
Standard Deviation of Net Group BABIP (W/ROE) 
> 60% 
126 (31,168) 
.0040 
.0324 
5560% 
177 (46,584) 
.0015 
.0284 
5055% 
310 (86,437) 
.0049 
.0301 
4550% 
521 (156,376) 
.0050 
.0286 
4045% 
536 (154,716) 
.0029 
.0299 
3540% 
364 (87,212) 
.0038 
.0311 
< 35% 
310 (53,627) 
.0112 
.0397 
Groundball Rate 
Observations (Total Weight) 
Average Net Group SLGBIP (W/ROE) 
Standard Deviation of Net Group SLGBIP (W/ROE) 
> 60% 
126 (31,168) 
.0085 
.0431 
5560% 
177 (46,584) 
.0058 
.0375 
5055% 
310 (86,437) 
.0019 
.0422 
4550% 
521 (156,376) 
.0054 
.0398 
4045% 
536 (154,716) 
.0016 
.0421 
3540% 
364 (87,212) 
.0009 
.0433 
< 35% 
310 (53,627) 
.0085 
.0572 
This data shows that groundball pitchers have a hidden value and that modeling BABIP indirectly as we did with SIERA, rather than assuming pitchers do not control it, helps improve the prediction of ERA. Statistics like xFIP have the benefit that the run value of defenseindependent statistics (strikeouts, walks, and home runs) are done precisely with linear weights, but xFIP does not take into account that BABIP is lower for flyball pitchers. However, SIERA shows that extreme groundball pitchers also have a skill at preventing BABIP themselves.
BABIP does (.07 for HR/FB versus .13 for BABIP, both net of team rates) – but SIERA still gives us some help at picking up some of this effect.
Next time, I will look in more detail at another statistic that pitchers have little control over – the rate of home runs per fly ball. This statistic has a low yeartoyear correlation, lower than