One of my favorite things to do with baseball statistics is to pick two of them and see what kind of relationship they have.  Many pitchers have changed locations this off-season and will have to get accustomed to the new team defense behind them. Some have made the move to strong defensive teams while others have moved into situations that are a step down from what they have been accustomed to having around them.

Not only are pitchers changing places but so are good and bad defenders, which makes it tough to predict how a team defense could affect a pitcher in 2013, but that does not prohibit from looking back at how the relationship works. Specifically, we can look at the relationship between a pitcher’s batting average on balls in play and team Defensive Efficiency ratings to see how one impacts the other.

Our crack stats team pulled together a report for me showing all 3,105 pitchers that threw at least 160 innings for a single team in a season from 1974 to 2012. In comparing the two metrics, we see that Defensive Efficiency (DEF_EFF) and batting average on balls in play (BABIP) have a negative correlation of r=-0.57.

Generally, scores of higher than 0.7 or lower than -0.7 are considered strong while anything less than 0.3 on either end is considered weak. The correlation between BABIP and DEF_EFF is not strong, but it’s not one that can be completely dismissed either.

A prime example of why the correlation is not stronger is the recently traded James Shields. If you look to the far left of the image above, you’ll see a small dot near the .28 mark on the y axis to the extreme left of the x axis. That represents the .282 BABIP Shields posted during the 2007 season when the then Devil Rays defense turned in the worst team defensive efficiency in the entire sample: .670. It was a historical low for a team score, and the improvement from 2007 to 2008 was one of the many reasons the Rays won the American League that season. In 2007, Shields had a .282 BABIP; Scott Kazmir was at .332 and Edwin Jackson was at .341. Jump ahead to the 2010 season when the Rays had a .722 team defensive efficiency, and Shields had a .341 BABIP.  Shields owns both the best BABIP against the worst defensive efficiency as well as the third-highest BABIP for any pitcher on a team with a team defensive efficiency score of at least .720.

The table below shows how the two metrics break down at .025 intervals:


























If you are one to avoid risk, the obvious plan of attack would be to target pitchers on teams with the highest defensive efficiency scores.  Unfortunately, only two teams have eclipsed the .725 threshold in the past three seasons: the 2011 Rays (.735) and the 2010 Athletics (.726). The American League does, however, dominate the leaderboard over the last three seasons; it owns each of the top 11 spots and 12 of the top 20. The Rays have three of the top 10 scores pitching in a pitchers’ park, Oakland has two of the top ten in their pitchers’ park, while Texas has two top-ten finishes in their hitters’ park.

Getting back to Shields, he moves from a team that has averaged a .723 defensive efficiency score over the past five seasons to one that has averaged a .696 score over that same time period.  Shields had that rough 2010 season despite the strong team defense mainly due to pitch sequencing issues and giving his cutter usage a baptism by fire. He made some slight mechanical tweaks that off-season and came into 2011 with a different sequencing strategy, and the results since then have been excellent.

Shields has pitched converted himself into a groundball pitcher in recent years.  He has both succeeded in spite of poor defensive play and has squandered excellent defensive support in the past. While this may be attributable to simple luck, it’s also possible that those worrying about how he’ll fare away from the comforts of Tropicana Field and the Rays defense may be making much ado about nothing.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Just a note: In 2007 Shields LD% was 16.3% and in 2010 it jumped to 20.3%. That is a pretty large swing, and thus while luck may play a part in this poor 2010 BABIP, it's tough for any good defensive team to catch line drives.
Also important to take into consideration, though, is that line drives are almost as variable as BABIP is.
Sequencing issues? So it is pointless overthinking Shield's value if you don't know how he's going to 'sequence' with a new pitching coach in a new home park? Do we know different pitching coaches/catchers sequencing proclivities? In short: is there a stat for that?
Here's some info into the sequencing issue

Shields has been rather consistent the past 2 seasons whereas 2010 was a mess due in part to predictability and not having complete command of his cutter.
Aside from the fact that Defensive Efficiency lumps errors and hits together into "outs not made," BABIP is just the inverse of Defensive Efficiency. Isn't it? As such, the correlation of the two measures within a sample is kind of meaningless, isn't it? A high DE and a low babip go together almost by definition so that finding doesn't really say anything about how a defense impacts a pitchers BABIP. It just says that if a pitcher has a high babip he probably also has a low DE. But who cares? What am I missing?
The actual inverse would be the quick way of 1 minus the BABIP. The way our engine computes it is 1-(H-HR)/(AB-SO-HR+SH+SF). According to our glossary, the ROE component, that was once part of the formula, is no longer.

A high DE & low BABIP have a good relationship, but it's also not impossible for a pitcher to put up a strong BABIP with a very weak DE behind him. In the sample sizes, there is a trend for the highest BABIP to be lower as the DE goes higher, but the low end remained rather static.
Thanks for pointing me to your glossary. Here are the definitions I found:

BABIP = (H - HR) / (AB - SO - HR + SF)
Defensive Efficiency = 1- (H-HR)/(AB-SO-HR+SH+SF)

So, for all intents and purposes, these two numbers are mathematical inverses (the strange omission of SH from the BABIP formula notwithstanding). Within the same set of plate, appearances, if you compute the two numbers, BABIP = 1- DE.

Each data point of your analysis includes a number for BABIP and a number for DE. But DE does not appear to be 1-BABIP (that is, the points don't all lie along y=1-x. But I don't understand why. Are you computing DE over a larger sample of PA's than the sample used to compute BABIP? Maybe one is computer for just the PA's against the pitcher and the other is computed for all of the defense's PA's? That would make sense. You can ignore this if you think I'm getting too technical but it seems like an ambiguity that obscures the whole focus of your analysis--to me. So maybe I'm not the only one wondering this. No worries either way.
It looks like each observation for the BABIP axis is an individual pitcher and each observation on the DE axis is a team. If you look closely, you'll see that the slope of the regressed line is very close to -1, even though all of the points don't lie on or close to that line. Each of the fuzzy green vertical lines is the conditional distribution of BABIP for the pitchers on each team. The mean of each conditional distribution is 1-DE for each team (apart from the SH difference and apart from the fact that pitchers with less than 160 IP have been excluded).

So, your observation is correct: it looks like the correlation described in this article is tautological.
Yes, the BABIP's are from individual pitchers while the DE results are team based. The slope of the regressed line comes in at -0.57.

The conclusion of this piece is we really don't know what the change of location will do to him, or any pitcher that moves around. There is some correlation there, but with r=-0.57, it isn't terribly strong. The "duh" observation is that your risk of a high BABIP is reduced when pitching on a team with a strong DE.
This chart is the sort of analysis that gives us more information (or at least places to ask questions) about the outliers, than about the middle. Even better would be to plot a pitcher's BABIP against the DE of his team in all other PAs (that is, with the Pitcher Under Test removed from the DE). Then the outliers could be inspected to see if they are particularly well/badly suited to their teams (imagine Derek Lowe in front of Brooks Robinson, Ozzie Smith, Bill Mazeroski and Keith Hernandez {all in-their-prime versions}...) or if something else is going on.
+1 Larry, I couldn't agree more.

Hopefully we are getting beyond the "BABIP = luck" paradigm for pitchers (many thanks to the excellent work by Mike Fast), and there is much to be learned from outlier players.

I think that everyone can agree that Justin Verlander is an outlier, in every sense of the word, and his putting up a .273 BABIP last season in front of the 27th-ranked defense in baseball by Def-Eff (.693) tells us something about the nature of the stat. In 2011, he put up a ridiculous .236 BABIP in front of a .708 Def-Eff Tiger defense (18th in baseball). It is similar with Clayton Kershaw (though Kersh is not so extreme) - these guys live in the lower-left quadrant of the correlation chart.

The middle of the graph is still muddled, either because we lack the tools sensitive enough to measure them with precision (my contention), or because the actual impact is quite small. But it's not a coincidence that the pitchers who are generally regarded as the best in the game also happen to have very low hit rates.

Great work, as always, Jason.
+1 to the idea of removing the pitcher's PA's from his team's BABIP (I wouldn't be surprised if this is what JC did). Within the pitcher's PA's, the relationship between the two statistics is tautological, as we've established, so including them is going to bias the observed relationship toward that y=1-x line. Removing them gets us closer to the relationship we're really trying to study.

Anyway, thanks all. I understand the article's argument much better now as being somewhat about the lack of steepness in the line's slope and somewhat about the high variance of the team-conditional distribution.
That isn't what I did but I've put the request in with our stats group to see if it is possible to pull such a report.
The stats team pulled the data for me. When I run a line of best fit for BABIP against Team DE without that pitcher's PA's included, I get r=-0.36. The average difference between the overall Team DE and the adjusted Team DE was .004. The largest difference belonged to Catfish Hunter whose overall Team DE was .729 but only .715 when he wasn't pitching. Conversely, the Rays were a .734 DE when Shields wasn't pitching but a .722 overall in 2010
the complete data set can be found here if you would like to look at it.
If I play with the starting point of the X axis only, here are the r= figures

DE >=.720 - r=-0.34
DE >=.700 - r=-0.50
DE >=.680 - r=-0.56

If I tweak the table to ranges of .025 for DE, I get this mean BABIP for each range:

.675-.699 = .302 (14% of the overall sample)
.700-.724 = .284 (67% of the overall sample)
.725-.749 = .264 (19% of the overall sample)
How about the type of pitcher and type of fielding excellence? If a pitcher gives up a lot of ground balls how much does the infield help or hurt, and the same for fly ball pitchers and the outfield? My bias would suggest that infield would help ground ball pitchers more than outfielders would help fly ball pitchers, but I was amazed that the Tigers ERA would go up a run with their infield last year, and I was surprised their pitching looked as good as it did. Has this been well-studied?