Ahead in the Count: Ground-ballers: Better than You Think

December 15, 2010

There are two more important reasons why Skill-Interactive Earned Run Average’s (SIERA) is so successful at predicting the following year’s ERA. First, most other Defense-Independent Pitching Statistics, like FIP and xFIP, assume that pitchers have no control over their Batting Average on Ball in Play (BABIP), but we know that they do have some control. I have shown before that pitchers with high strikeout totals and low ground-ball rates tend to allow fewer hits per ball in play, and thus lower BABIPs. Of course, BABIP is subject to so much luck that it is nearly impossible to discern a pitcher’s true ability to prevent hits on balls in play from his historical BABIP. That is why last year’s FIP is much better at predicting this year’s ERA than last year’s ERA is. It strips ERA of BABIP (and sequencing) altogether and assumes league-average BABIP for all pitchers and random sequencing.

Another reason that SIERA is great at predicting ERA is that it accounts for the run-prevention effect of ground balls – particularly, it controls for the fact that the effect is nonlinear. Not only do more ground balls lead to fewer runs allowed, but the difference in ERA between pitchers who generate 40 to 50 percent ground balls is smaller than the difference in ERA between pitchers who generate 50 to 60 percent ground balls. That is why there is a negative coefficient on the squared ground-ball rate term in SIERA.

At the time, Eric and I believed the fewer runs allowed was just an artifact of ground-ball double plays erasing singles, but the effect is simply too large for that explanation. The interaction terms actually add only a small (but useful) effect at predicting ERA, but the ground-ball rate is really the difference between SIERA and its predecessors. Additionally, the implied positive ground-ball squared coefficient in QERA is actually that statistic’s biggest flaw. In this article, I will look in more detail at ground-ball pitchers and why they are so good at preventing runs.

The first thing that I checked was whether pitchers with high ground-ball rates allowed lower hit rates on ground balls. Teams with good infield defense may target ground-ball pitchers, so to correct for this, I looked at the batting average allowed on ground balls for each pitcher relative to their team. Just to be assured that batted-ball classification did not get in the way (lest Colin Wyers show up at my door with a baseball bat), I looked at pitchers’ ground-ball rates relative to their team (though the effect was clear either way).

For the 3,297 pitchers who allowed at least 100 balls in play in a season between 2003 and 2010, the correlation between their net ground-ball rate and their net ground-ball batting average was -.185.

I was concerned that relievers facing same-handed hitters might be creating a false correlation, so I looked at only pitchers with 300 balls in play in a season and found an even higher correlation: -.241.

Pitchers who allow more ground balls allow fewer ground-ball hits. The following table shows all pitchers who have had consecutive seasons of a 60 percent ground-ball rate (with at least 300 balls in play) since 2003:

Name	Career Ground-ball BABIP (since 2003)	Teams’ Ground-ball BABIP (weighted)	Net Ground-ball BABIP
Fausto Carmona	.231	.254	-.023
Roy Halladay	.209	.227	-.018
Derek Lowe	.221	.233	-.012
Tim Hudson	.222	.236	-.014
Chien-Ming Wang	.218	.248	-.030
Brandon Webb	.205	.233	-.028
Jake Westbrook	.225	.249	-.024

Each of these guys has a career BABIP on ground balls that is at least 12 points below their teams’ ground-ball BABIP. This is not a coincidence. They don’t just induce contact with downward trajectories – they induce ground balls that are easier to field.

This is also true for slugging average on ground balls in play. There is a similar negative correlation for this group.

Ground-ball Percentage

Net Batting Average on Ground Balls (W/ROE)

Net Slugging Average on Ground Balls (W/ ROE)

BIP minimum

>100

>300

>100

>300

Correlation
(Observations)

-.185
(3297)

-.242

(1174)

-.198
(3297)

-.260

(1174)

Do ground-ball pitchers induce weak contact on all balls in play? No. The reverse seems to be true for fly balls. Looking at outfield fly balls only, and excluding home runs, ground-ball pitchers have a distinctly higher BABIP and slugging average on balls in play.

Ground-ball Percentage

Net BABIP on Outfield Fly Balls (W/ROE)

Net BABIP on Outfield Fly Balls (W/ ROE)

BIP minimum

>100

>300

>100

>300

Correlation
(Observations)

.278
(3297)

.319

(1174)

.259
(3297)

.313

(1174)

However, these ground-ballers do not exhibit any tendencies toward line-drive BABIPs and infield pop-up BABIPs that are different than other pitchers. There is almost zero correlation year to year for BABIP on line drives or pop-ups for any pitchers.

Correlation (300 Balls in Play or more)	Net Ground-ball Rate	Net Line-drive Rate (same year)	Net Line-drive BABIP (next year)
Net Ground-ball Rate (Observations)	1.00 (1174)
Net Line-Drive BABIP: Same Year (Observations)	.033 (1174)	1.00 (1174)
Net Line-Drive BABIP (Next Year) (Observations)	.022 (549)	.025 (549)	1.00 (1174)

Correlation (300 Balls in Play or more)	Net Ground-ball Rate	Net Pop-up BABIP- Same Year	Net Pop-up BABIP- Next Year
Net Ground-ball Rate (Observations)	1.00 (1174)
Net Line-Drive BABIP: Same Year (Observations)	.079 (1174)	1.00 (1174)
Net Line-Drive BABIP (Next Year) (Observations)	.054 (549)	-.013 (549)	1.00 (1174)

The .025 correlation year to year on line-drive BABIP is particularly surprising because it is at odds with previous research. Six years ago, Mitchel Lichtman found that line-drive BABIP was persistent for pitchers, but look at the line-drive BABIP net of team line-drive BABIP and this unravels. This is a mixture of team defense adjustment and official scorer adjustment, but it un-teaches something important about pitcher BABIP that many of us thought we knew.

Contrast the randomness of line-drive BABIP and pop-up BABIP with ground-ball and fly-ball BABIPs, which have a .188 and .152 year-to-year correlations respectively, net of team. Overall, ground-ball pitchers allow higher BABIPs, but not higher slugging average on balls in play. This is primarily because ground balls are hits more often than fly balls, but the slugging on the two types of batted balls is similar. The correlation between ground-ball rate and overall BABIP and SLGBIP is shown below, with following year BABIP and following year SLGBIP alongside it:

Ground-ball Percentage

Net BABIP- Same Year (W/ROE)

Net BABIP- Next Year (W/ROE)

BIP minimum

>100

>300

>100

>300

Correlation
(Observations)

.193
(3297)

.170

(1174)

.169
(3297)

.141

(1174)

Ground-ball Percentage

Net SLGBIP- Same Year (W/ROE)

Net SLGBIP- Next Year (W/ROE)

BIP minimum

>100

>300

>100

>300

Correlation
(Observations)

.033
(3297)

-.024

(1174)

.044
(3297)

-.002

(1174)

However, correlation is a rough statistic that does not reveal subtleties or curvatures, so it misses the truth of what is going on.

Non-extreme ground-ball pitchers (those with 45-60 percent of balls in play as a ground ball) allow the highest BABIPs, but pitchers with ground-ball rates over 60 percent actually allows average BABIPs.

To adjust for the fact that some pitchers allowed more ground balls than others, I weighted the BABIP of each pitcher by the actual number of ground balls allowed.

Ground-ball Rate	Observations (Total Weight)	Average Net Group BABIP (W/ROE)	Standard Deviation of Net Group BABIP (W/ROE)
> 60%	294 (28,167)	.0013	.0390
55-60%	378 (41,526)	.0049	.0356
50-55%	648 (80,636)	.0052	.0305
45-50%	1,058 (130,793)	.0040	.0298
40-45%	1,231 (122,382)	-.0006	.0337
35-40%	863 (63,500)	-.0067	.0366
< 35%	877 (27,836)	-.0132	.0501

The following graph of the average net BABIP in each group by ground-ball rate is even clearer:

While the lowest BABIPs belong to pitchers who allow fewer than 40 percent ground balls (these pitchers often have high infield pop-up rates), the pitchers with the highest ground-ball rates had lower BABIPs than pitchers with just slightly above-average ground-ball rates.

Moving on to look at slugging average on balls in play, we see that the non-correlation of ground-ball rate and slugging average on balls in play does not mean no relationship exists. In fact, pitchers with very low and very high ground-ball rates had the lowest slugging average on balls in play, while the highest slugging average on balls in play belongs to pitchers with average ground-ball rates.

Ground-ball Rate	Observations (Total Weight)	Average Net Group SLGBIP (W/ROE)	Standard Deviation of Net Group SLGBIP (W/ROE)
> 60%	294 (28,167)	-.0154	.0507
55-60%	378 (41,526)	-.0035	.0485
50-55%	648 (80,636)	.0012	.0432
45-50%	1,058 (130,793)	.0032	.0410
40-45%	1,231 (122,382)	.0022	.0467
35-40%	863 (63,500)	-.0023	.0529
< 35%	877 (27,836)	-.0066	.0729

Both of these statistics work well with a quadratic fit with respect to ground-ball rate in a regression analysis.

The BABIP (with errors included) of the 3,297 pitchers with at least 100 balls in play is best predicted by the following formula:

Net BABIP (with ROE) = -.002 + .136*(net GB%) – .401*(net GB%)^2

The p-statistic on both net ground-ball rate and its square was less than .01.

The SLGBIP (with errors included as singles) of the same 3,297 pitchers with at least 100 balls in play is best predicted by the following formula:

Net SLGBIP (with ROE) = -.003 + .097*(net GB%) – .611*(net GB%)^2

Again, the p-statistic is less than .01 for both net ground-ball rate and its square.

The best way to adjust for this would be to also adjust for the strikeout and walk rates, in which case we would get the following equations for BABIP and SLGBIP:

Net BABIP (with ROE) = .010 + .121*(net GB%) – .383*(net GB%)^2 – .082*(K%) + .033*(BB%)

Net SLGBIP (with ROE) = .017 + .072*(netGB%) – .580*(GB%) – .150*(K%) + .065*(BB%)

In both equations, the net ground-ball rate and its square had p-statistics that were less than .01, as was the strikeout rate coefficient. The walk rate coefficient had p=.032 for SLGBIP and p=.123 for BABIP, the latter of which is not statistically significant, but is still suggests an effect.

Given this curvature, it only makes sense that the inclusion of the ground-ball squared term did so much to help SIERA to fit the data.

The shape is less obvious when looking at the following year’s BABIP and SLGBIP because ground-ball rates jump around, but we still see a generally similar effect in the two tables and graphs of these statistics below:

Ground-ball Rate	Observations (Total Weight)	Average Net Group BABIP (W/ROE)	Standard Deviation of Net Group BABIP (W/ROE)
> 60%	126 (31,168)	.0040	.0324
55-60%	177 (46,584)	.0015	.0284
50-55%	310 (86,437)	.0049	.0301
45-50%	521 (156,376)	.0050	.0286
40-45%	536 (154,716)	-.0029	.0299
35-40%	364 (87,212)	-.0038	.0311
< 35%	310 (53,627)	-.0112	.0397

Ground-ball Rate	Observations (Total Weight)	Average Net Group SLGBIP (W/ROE)	Standard Deviation of Net Group SLGBIP (W/ROE)
> 60%	126 (31,168)	-.0085	.0431
55-60%	177 (46,584)	.0058	.0375
50-55%	310 (86,437)	.0019	.0422
45-50%	521 (156,376)	.0054	.0398
40-45%	536 (154,716)	-.0016	.0421
35-40%	364 (87,212)	-.0009	.0433
< 35%	310 (53,627)	-.0085	.0572

This data shows that ground-ball pitchers have a hidden value and that modeling BABIP indirectly as we did with SIERA, rather than assuming pitchers do not control it, helps improve the prediction of ERA. Statistics like xFIP have the benefit that the run value of defense-independent statistics (strikeouts, walks, and home runs) are done precisely with linear weights, but xFIP does not take into account that BABIP is lower for fly-ball pitchers. However, SIERA shows that extreme ground-ball pitchers also have a skill at preventing BABIP themselves.

Next time, I will look in more detail at another statistic that pitchers have little control over – the rate of home runs per fly ball. This statistic has a low year-to-year correlation, lower than BABIP does (.07 for HR/FB versus .13 for BABIP, both net of team rates) – but SIERA still gives us some help at picking up some of this effect.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Matt Swartz

Latest Articles

You need to be logged in to comment. Login or Subscribe

moscow25

12/15

Lost you in the middle there. If there is a considerable benefit to being a high GB% pitcher that xFIP doesn't capture, wouldn't it take fewer tables to show it?

I don't doubt that your research is correct and there is an effect, but I'm having a hard time grasping the magnitude of this effect. Can you please provide some examples, on a scale like runs per game?

Reply to moscow25

swartzm

12/15

Yes, there is a benefit to being a high GB% pitcher than xFIP doesn't calculate. xFIP assumes BABIP skill is equal for all pitchers so it misses pitcher differences in that skill. xFIP's strength is that it more precisely knows the direct effect on runs of HR, BB, and K than SIERA can tell its own. The tables aren't a takedown of xFIP. They're just a way of highlighting the effect of ground balls that I have found.

The standard deviation of pitcher BABIP skill is about .007, meaning that it's probably about 0.15-0.20 runs per nine innings for the average pitcher. SIERA picks up on this skill pretty well-- and specifically does so because it has a GB^2 term in the equation. The point of the article is to explain why that term came up as it did, and how helpful it is to understanding pitching.

Reply to swartzm

surfdent48

12/15

It seems ground balls result in about a 23% chance of being a hit and fly balls about a 14% chance of being a hit. Whichever starter for the Marlins has the best ground ball % must really be at a strong disadvantage with their very poor infield defense. Not just errors made, but especially easy double plays not turned and grounders not fielded that could have been. If this pitcher switched teams his effectiveness should really go up. Also, what is the possessed skill a pitcher has to induce a lot of infield pop-ups? It seems to me to be just randomness and maybe a "just missed a home run" swing? And if an infielder catches a popup a couple of feet on the outfield grass this would technically not be an infield popup- but just as easy an out. How could this be measured or evaluated?

Reply to surfdent48

swartzm

12/15

I'm actually discussing this in the next article which should be out tomorrow. There is actually somewhat of a skill with inducing pop-ups, probably related to movement.

Reply to swartzm

laynef

12/15

The graphs aren't showing up in my browser? Are they working for everyone else?

Reply to laynef

rawagman

12/15

me neither - I use Chrome

Reply to rawagman

DrDave

12/15

"The following graph of the average net BABIP in each group by ground-ball rate is even clearer:"

I'm sure they would if there weren't a typo in the HTML for most of them. Looks like the "" tag is getting closed prematurely, right in the middle of the "...GB(1).jpg" part of the file name.

Reply to DrDave

DrDave

12/15

Grr. Trying to say "the / tag" (let's see if I escaped that correctly).

Reply to DrDave

DrDave

12/15

Nope. Sigh.

Reply to DrDave

dstamand

12/15

Interesting analysis. What do the low p-statistics tell us about the validity of the independent variables used in the analysis and, therefore, the conclusions?

Also can't see the graphs in my browser (Safari)

Reply to dstamand

swartzm

12/15

The p-statistics being low is good. It means that there is virtually no chance that these variables would be so far from zero if those variables had nothing to do with ERA. Specifically, it's very unlikely that the effect of GB% on BABIP is just linear. The curve that you see in the tables (and hopefully the graphs...they're working on it) is significant enough.

Reply to swartzm

dcbove

12/15

I've always been under the impression that "extreme flyball" pitchers - outside the confines of huge pitcher's parks - should be treated as if their spontaneous combustion was imminent. However, the "NETSLGBIP" and "NETBABIP" graphs seem to denote otherwise. Do they?

So, is the "extreme flyball" flavor of pitcher actually more successful than one would think? Or is it that only truly superlative pitchers (those possessing wicked high heat?) can get away with being "extreme flyball" pitchers and there is some sort of selection bias going on?

Reply to dcbove

swartzm

12/15

Extreme fly balls pitchers are going to struggle to give up a lot of home runs. They may be solo home runs more often due to the infrequent hits on balls in play, but they will still be home runs. Pitchers who give up a lot of pop-ups are bound to do a little better than pitchers who give up only long flies. Regardless, the pitcher would need to have great K/BB numbers to offset the home run issue. There definitely could be a few. I think guys like Ted Lilly, Scott Baker, Jered Weaver all get away with high fly ball rates.

Reply to swartzm

surfdent48

12/15

What is it about Lilly, Weaver and Baker that lets them get away with this? More than just luck or randomness?

Reply to surfdent48

swartzm

12/15

They each command the strike zone. Very few walks and plenty of strikeouts. You can be a slow baserunner as a hitter if you have power. You make up for the skill elsewhere. This is a similar thing.

Reply to swartzm

dcbove

12/15

Hmm... I would think that extreme fly ball pitcher = more home runs seems pretty intuitive.

I think I've just been led to believe that extreme fly ball pitchers tend to get lit up like fireworks. I just thought it was an interesting data point that both BABIP and SLGBIP seemed lower for extreme fly ball pitchers than average GB/FB pitchers. And I wondered if there was an explanation?

Reply to dcbove

swartzm

12/15

Oh, that's specifically because fly balls and pop-ups are outs more often than ground balls. About 23-24% of ground balls go for hits, while about 17-18% of fly balls and about 2% of pop-ups. FB pitchers have fewer hits on balls in play, but those hits do tend to go for extra bases, and they also tend to allow HRs too.

Reply to swartzm

PhilliesRed

12/15

I have to say, I think this is great: very informative and very interesting. Matt, did you run this analysis through any peer review? I'd be very curious to hear what other statisticians had to say about this.

Reply to PhilliesRed

swartzm

12/15

I didn't actually run this through peer review. A lot of the time our writers email each other about articles and talk them out, but short of a little discussion with Eric, I felt comfortable presenting these as facts. The conclusions are pretty evident, so I wasn't too worried about interpreting them incorrectly here. There's been some discussion of some of the peripheral issues on Tom Tango's blog but nothing really about the content at this point.

Reply to swartzm

lucasjthompson

12/15

Man, SIERA's great! Now, if I could only find it on the website...

Reply to lucasjthompson

swartzm

12/15

Statistics tab-- Pitcher-Team-Rates or Pitcher-Season-Rates:
Here's the link to pitcher-team-rates http://www.baseballprospectus.com/statistics/sortable/index.php?cid=224511 is Pitcher Team Rates

Reply to swartzm

lucasjthompson

12/15

Thanks! Any plans to put it on the player pages? (you should)

Reply to lucasjthompson

swartzm

12/15

I think so. They should be getting overhauled soon, and I've asked to make sure SIERA is on them. In my spreadsheets, I find it helpful to have SIERA year by year for pitchers so I'm sure readers would too!

Reply to swartzm

danmckay

12/16

Yes, please. I'd love to see SIERA on the player pages.

Reply to danmckay

joelefkowitz

12/15

just making sure i'm parsing this right. i'd appreciate if you could tell me if i have any of the following wrong.

if babip on line drives is not persistent, thats saying it's not a skill pitcher's have control over. What about line drive rates in general, are those persistent? If we see two pitchers, pitcher A with an above average line drive rate, but normal babip on LD, and pitcher B with average line drive rates but above average babip on LD, who would we expect more regression from, pitcher B?

Reply to joelefkowitz

swartzm

12/15

Line drive rate for major league pitchers has almost no persistence at all. Something along the lines of <.01 year-to-year correlation. I'm sure that if you included a bunch of A-level pitchers in the major leagues, they'd allow a lot of line drives, but among MLB pitchers who can keep their jobs (i.e. maintain at least a K% of 10% and do at least something else well), they're line drive rate is not persistent at all.

Reply to swartzm

hmamis

12/15

Could you please run an article "statistics for dummies" I am a retired physician, not a retired math professor/engineer.

Reply to hmamis

Ahead in the Count: Ground-ballers: Better than You Think

Thank you for reading

Latest Articles

Something’s Off $

TA94: April $

MLU: ‘Tugboat’ Wilkinson is Cruising $

Box Score Banter: There Are No Rules in the Rain B

Welcome to the Wendelstedt Umpire School $

Matt Swartz

Latest Articles

Something’s Off $

TA94: April $

MLU: ‘Tugboat’ Wilkinson is Cruising $