World Series time! Enjoy Premium-level access to most features through the end of the Series!
October 9, 2003
Improving on Defensive Efficiency
Evaluating defense has always been one of the more difficult tasks for performance analysts. The first reason for this is that looks can be deceiving. Sure, that acrobatic shortstop playing in the country's largest market might appear to be a superior defender to the untrained eye, but all too often we draw our conclusions by putting emphasis on the outcome rather than the process of fielding the ball, itself. The second reason is the still-severe limitations we face with regard to collecting data, and how to properly interpret that data once we get a meaningful amount of it. Granted, there are some statistics that can be used when evaluating defense--errors, fielding percentage, Range Factor, Zone Rating, etc.--but none of them is without its flaws.
Which bring us to one of Bill James' measures for quantifying defensive performance: Defensive Efficiency (provided here by Keith Woolner). Defensive Efficiency is a metric that measures a team's ability to turn balls-in-play into outs, using the formula...
Despite being raw and only applying to entire teams, Defensive Efficiency is a fair measure of overall defensive performance. But that doesn't mean it can't be improved.
Defense can be broken down into several facets, primarily pitching, ballpark, and actual defensive performance. While we've conceded that pitching and defense are extremely difficult to separate, it's much easier to take into account the venue in which the game took place. For example, looking at this season's numbers, Defensive Efficiency rates the Rockies as one of the league's worst defenses, while the Dodgers have one of the best. But how much of that is actual performance, and how much of that is simply a function of each team's playing environment? Can we determine how the Rockies would perform if they played someplace else?
We already use park factors when adjusting hitting and pitching statistics, and they can be applied to defense as well. However, using established park factors to adjust our defensive statistics would yield skewed results, as they take into account the full slate of offensive statistics, most notably home runs. Smaller ballparks are the main concern, as they yield a higher park factor, mostly thanks to home runs. But the fact of the matter is that many small ballparks might actually be easier to play defense in, since their outfield is much smaller.
Since we can't use established park factors, the first step has to be to establish a defensive baseline for each park in the majors. There are several ways to do this. One would be to generate a Def_Eff number for each park in the majors, using James' formula but applying it to parks instead of teams, and using statistics over a wider range of time (say, three-to-five years). However, doing so would still allow various defenses too much input on the park factors. For instance, how would Turner Field's park factor look if Andruw Jones hadn't been patrolling center field the entire time? Or Torii Hunter in the Metrodome and Ichiro Suzuki and Mike Cameron in Safeco? Even when allowing for visiting teams' numbers, those star defensive players make up half of the available statistics for a par,k and could skew the numbers.
Instead, at Keith's suggestion, we'll use the ratio of each team's home Def_Eff to it's away Def_Eff, using numbers for the last three years where applicable. There are some adjustments to be made, particularly in Cincinnati, where the Great American Ballpark has only been open for one season, and in Puerto Rico, where there were a mere 22 games played in Hiram Bithorn Stadium. Though a small sample size flag should go up for both of those, it shouldn't increase the amount of statistical noise (the nice way of saying "error") in our numbers. The results are below, complete with Clay Davenport's full park factors for 2001-03. Note that with DE Park Factor, the lower the number, the harder it is to play defense in a particular park; whereas with Clay's Full Park Factor, the higher numbers indicate advantages to the hitters.
DE Park Full Park Park Factor Factor --------------------------------------------- Coors Field 0.9544 1112 Kauffman Stadium 0.9773 1100 Fenway Park 0.9779 1010 Ballpark at Arlington 0.9779 1053 Bank One Ballpark 0.9782 1060 Metrodome 0.9875 1009 Minute Maid Park 0.9880 1038 Olympic Stadium 0.9898 1067 Tropicana Field 0.9899 997 Sky Dome 0.9915 1034 Edison Field 0.9953 987 PNC Park 0.9955 1009 Pac Bell Park 0.9984 942 Jacobs Field 1.0021 997 Pro Player Stadium 1.0027 955 Busch Stadium 1.0028 974 Turner Field 1.0038 986 Hiram Bithorn Stadium 1.0092 1120 Comerica Park 1.0093 966 Great American Ballpark 1.0097 998 Wrigley Field 1.0098 976 Shea Stadium 1.0104 950 New Comiskey Park 1.0113 1018 Miller Park 1.0127 995 Network Associates Col. 1.0161 1003 Qualcomm Stadium 1.0169 918 Yankee Stadium 1.0173 976 Camden Yards 1.0206 959 Veterans Stadium 1.0216 939 SafeCo Field 1.0218 949 Dodger Stadium 1.0294 917
There are a few notable parks that move up or down the list. Pac Bell Park, normally one of the game's most pitcher-friendly parks, is actually slightly tougher-than-average on defenses, likely owing much to its expansive outfield. Pro Player Stadium falls into this category as well. Fenway Park, despite having a park factor almost identical to Network Associates Coliseum, is actually one of the most difficult parks on defenders--but for slightly different reasons than Pac Bell. While the Coliseum is a symmetrical ballpark, Fenway's nooks, crannies, and monsters turn many routine fly balls into singles and doubles (or for Bucky Dent, home runs). For the most part, the results hold true to our previous thinking--Dodger Stadium is up at the top, while Coors is way off the bottom--but the adjustments will make our measurements more accurate.
Now that we've established a general idea of how difficult it is to play defense in each park, we can see how teams perform against that average. To do so, we need to establish a team baseline for the season. By multiplying the number of games a team plays in each park by that park's defensive average and then dividing by the total number of games played, we can establish a baseline for how difficult a team's schedule was on the defense. To save some space, we won't include those numbers here, but if you really want them, let me know.
Then, quite simply, we divide James' raw Defensive Efficiency for each team, re-centered around the league average, by each team's schedule-adjusted Defensive Efficiency. This calculation yields a percentage that gives us an idea of how each team's defense performed against the expected league average, given their schedule. We'll clumsily call this metric PADE--Park Adjusted Defensive Efficiency--and will now open the floor for suggestions for a new acronym. Here are the results for 2003:
Team PADE ------------------------------------- Tampa Bay Devil Rays 2.141 Seattle Mariners 1.887 Houston Astros 1.831 San Francisco Giants 1.774 Oakland Athletics 1.522 Anaheim Angels 1.067 Cleveland Indians 1.020 Chicago White Sox 0.756 Arizona Diamondbacks 0.752 Minnesota Twins 0.570 Atlanta Braves 0.552 Kansas City Royals 0.512 Los Angeles Dodgers 0.148 St Louis Cardinals 0.064 Montreal Expos 0.049 Pittsburgh Pirates -0.081 Colorado Rockies -0.171 Philadelphia Phillies -0.324 Boston Red Sox -0.390 San Diego Padres -0.517 Chicago Cubs -0.679 Detroit Tigers -1.061 Florida Marlins -1.116 Cincinnati Reds -1.162 Toronto Blue Jays -1.208 New York Mets -1.226 Milwaukee Brewers -2.141 Texas Rangers -2.332 New York Yankees -2.497 Baltimore Orioles -2.690
A league average defense will yield a rating of 0.000. A team with a PADE of 1.000 turns 1% more BIP into outs than an average team in their schedule--not an insignificant amount.
There aren't many teams that suddenly appear to be significantly better or worse defensively than with James' original metric. However, there are a few interesting moves to note.
So what is PADE good for? Primarily, it's a purer metric of a total team' s defensive performance that does not punish or reward teams for playing in certain parks. Using PADE does not account for how individual players affect a team's defense, but it can, among other things, give a better estimate of which pitchers benefited the most from nifty glovework, yielding more accurate appraisals of individual pitching performances. There are still many improvements that can be made to our defensive statistics, but removing park factors is a good first step.