Happy Thanksgiving! Regularly Scheduled Articles Will Resume Monday, December 1
September 13, 2012
In A Pickle
Defense in the 2012 Pennant Races
I'd like to tell you an introductory anecdote or post a GIF (soft-g) or something to ease you in to what we're about to engage in, but I don't think my chops are up to snuff, so why don't we just dive in?
There's a neat thing we provide in our stat reports on this website. It's called Team Defensive Efficiency and it has two key stats: defensive efficiency (DE), a very basic measure attributable to Bill James that simply says what percentage of balls in play a defense converts into outs, and Park Adjusted Defensive Efficiency (PADE), a metric designed almost a decade ago (gosh I'm old) by James Click, who these days graces the Rays' front office roster, serving as their Director of Baseball Research and Development. PADE, as the name implies, adjusts for each team's park's effect on their defensive performance and spits out a number that represents a percentage above or below average at turning balls in play into outs.1
You could click over to the stat report, but I'm afraid if you do that, you'll never come back here, so I'll reproduce an excerpt. Here's a table of baseball teams showing their defensive efficiencies and their PADEs, as grabbed on Wednesday night (i.e. the stats are updated through Tuesday's games), sorted by DE :
If you haven't looked at these stats before, I think the most interesting and important part is to get a sense of scale, an idea of the spread between good and bad. Let's look at a subset (for no real reason): The worst American League defense by DE turns about 69 percent of balls into outs; the best is a touch above 72 percent. Similarly, the PADEs range from two and a half percent below to two and a half percent above average. Those don't seem like huge differences! But that's why I wanted to write this, because those spaces between top and bottom in overall defense2 don't look that large but will (don't skip to the end!) turn out to be larger than you might realize.
Let's start by adding the balls in play and outs each team has recorded to the above table:3
I don't know if there's an easy way to visualize how many balls that is, to make it somehow concrete or to create a useful metaphor. If you can, then I'm happy for you. If you can't, that's OK—we'll get to more concrete ways to understand these numbers below.
There are some interesting pieces to look at, though, like picking out some pairs of teams for comparison. The A's and Royals are separated by 65 balls in play but 161 outs, for instance. Or: Look how few balls in play the horrendous Detroit defense has had to deal with. That's called matching a strength to a weakness. Maybe this Dave Dombrowski guy knows a little somethin' about somethin'.
Back in the main thread of things, when you add up all those balls in play and all those outs, you find that the league defensive efficiency is about .707. We can use this to figure out how many outs each team would have recorded on its balls in play if it had an average defense, and then see how many outs above or below average it actually did record.4 (I've rounded everything to whole numbers here because, eh, this ain't science. Any precision implied by decimal points would be false.)
This is starting to get a little more concrete, right? We know what outs are. There are 27 of them in a game, and these numbers are small enough that we can compare them to the number of games the teams have played. The Nationals, for instance, have recorded an extra out about once every three games.
Colorado, which as you will note from PADE cannot blame Coors for this, has gifted the bad guys more outs than any other team by a wide margin. One hundred seven outs! That's a lot of outs. Honestly, is it any wonder that Josh Outman (8.72 ERA) isn't living up to his name? It's his defense's fault!
But those are the unadjusted numbers. How can we park-adjust them? Here's what we do. Take PADE and divide by 100 to get it from percentage format into a decimal that we can actually use in maths. Then multiply that by the AvgOuts we figured in Table 3 because what PADE tells us is how many more (or fewer) outs a team gets than the league average. Fairly straightforward, right? Here's the table:
Colorado got worse!
Hey, you want to really pile on a team that doesn't need you in its ear? Check out Boston. They've posted a respectable DE this year, just a tad below league average, but that park, per the PADE methodology, means that the Red Sox should have been recording a whole lot more outs, so their PADE-based outs "above" average winds up worse than all but four teams in baseball.
But even if we use the devices of "one out every N games" as we did above with Washington, we're still operating in the realm of outs and balls in play, which aren't natural numbers that we're used to dealing with. They're a little foreign. Do you, off the top of your head, know what the run value of an out is? I'm sure a couple of you will raise your hands (stop showing off), but I'm equally sure that most of you will need to go look it up, just like I do. Maybe I'm projecting. Either way, though, the question is how to translate these "extra outs" numbers into runs. And here's where it gets complicated and where we have to start getting into estimation, where we leave the realm of actual facts and enter fantasy lands that we hope approximate reality.
The problem is that we don't know which balls were caught by Matt Joyce from Tampa but were not caught by Carlos Gonzalez in Colorado. Which teams are turning gappers (likely doubles and occasional triples) into outs? Which teams are letting infield bleeders (likely singles) through? Which team is really bad at the Bermuda Triangle play where two outfielders and one middle infielder all converge and nobody catches the ball while the runner hustles his way into second for the cheapest "double" you'll ever see?
We don't know. There's data out there that claims to know things of this sort, and if there's an official Baseball Prospectus position on that data, I haven't heard it, but you can count me as one of those convinced by Colin Wyers's work in the area showing that the biases in at least some of that data are too significant to ignore. So I don't want to use that data. I want to figure out what we can know from the objective numbers we have on the defensive efficiency stat report and that we can figure for decades and decades into the past if we want.
So what we have to do is make some estimates. The two ways to convert Outs Above Average into runs that immediately come to (my) mind are like so:
There are surely other ideas for how to convert outs into runs, but this article is running long enough as it is.
Figuring a value for the Outs Above Average the first way is easy. In 2010,5 the linear weights value of a single (for the offense) was 0.4595 and the value of an out was -0.1645. Thus the value of turning what would be a single into an out is 0.624 runs saved for the defense. (And the value of turning an out into a single is obviously -0.624.) Let's add those values to our table:
These acronyms are getting absurd, I realize, but hopefully it's pretty clear what everything means. Is it? Maybe it's not. "PADEOAA-RAA" is "Park Adjusted Defensive Efficiency–based Outs Above Average hyphen Runs Above Average." Got it? That's the runs total for the outs figure that's based on PADE. The column to the left of that is the runs total for the outs figure that's based on raw defensive efficiency. So we've got runs! Finally! We know what runs are.
The Padres, then, if you like PADE and you assume that every ball they caught that other teams missed would have only been a single, have been 49 runs above average on defense this year. In their run environment, that's over five wins easily. Sadly, this is the Padres we're talking about, so we're looking at the difference between 64 wins (17 1/2 games out of first) and their current 69 wins (12 1/2 games out). Defense turned them from pitiful into an also-ran!
On the other end of the spectrum are the Rockies, who currently have a run differential of -101. All else being equal, were their defense average instead of pitiful, they could be ... well, they could be the Padres, who have a -37 differential.
Enough N.L. West talk. Ready for the final addition? Here we go.
In 2010, the breakdown of singles, doubles, and triples6 on hits in play went like this: about 75.4 percent were singles, 22.4 percent were doubles, and 2.3 percent were triples. (Rounding is why there's an extra 0.1 percent.) Multiplying each of these percentages by the linear weights values of each event (0.4595 again for singles, 0.7595 for doubles, 1.0295 for triples) results in a "hits in play" linear weights value of 0.5396. Going back to the same outs value of -0.1645 means that a potential hit turned into an out by this accounting saves 0.7041 runs for the defense. Does that make sense?
This method, of course, pushes the teams at either end even farther out to the extremes. The A's, for instance, add five runs by this method and clock in at 40 runs above average. The team's pitching staff has gotten a lot of love (not undeservedly! They're fifth in baseball in Fair Run Average) and the offense has received some notice in the second half (fifth in baseball in runs since the All-Star break—no, seriously, go look), but boy howdy, a four- to five-win defense is an awfully nice thing to have, isn't it?
And how about the American League Central? Here are the Tigers: 75-67. Here are the White Sox: 76-66. And here's the gap between the two teams on defense, rounded to a nice round number: 50 runs. I don't know what kind of runs-to-wins conversion you prefer in your baseball analysis, but the ones I favor have 50 runs being worth way more than one win.
Noting, by the way, that the Tigers have received just a .261/.292/.408 line from their designated hitters this season, it's certainly fair to ask whether Brandon Inge's plus defense at third and .218/.275/.383 batting line might have served Motor City better than its collection of DHs and Miguel Cabrera's dastardly defense at the hot corner. (To be fair, Cabrera's FRAA stands at just -2 for the season. On the other hand, Inge's is +3 in less than 3/5 of the playing time, and, again, the gap between Detroit and Chicago is one game.)
So! We've come to the conclusion and my utter lack of narrative structure in this piece is about to be exposed. I don't have a conclusion. I'll have to steal a trick from Tommy Bennett and ask you all a question instead:
Do you like defense? Is defense fun?