Last time, we cooked up a way to remove park effects when looking at Bill James’ Defensive Efficiency, a stat that measures the percentage of balls in play fielded by a team’s defense. The new metric, tentatively called PADE, ranked teams on a zero-centered scale, showing how well a team performed against the league average with their given schedule. The intent was to more fairly judge defenses against each other rather than punish teams like Colorado and Boston for having to play in more difficult venues.

As stated before, defense can be broken down into many facets, but the three most prevalent parts are park factors, pitching, and actual defensive performance. Since we’ve already figured out how to remove the first one–park factors–the next logical step is attempting to correct for pitching, leaving us closer to a metric that measures only defensive performance.

To do this, we’ll take a similar approach to the first version of PADE, but instead of defensive park factors, we’ll use defensive pitcher factors. The first step is to determine an expected defensive efficiency for every pitcher, based on their career history.

Looking for defensive efficiency ratings sorted by pitchers inherently concedes that pitchers have some degree of control over the number of hits per balls in play. This concession will not sit well with many readers, mostly owing to Voros McCracken’s article of a few years ago. To sum up very quickly, McCracken broke down pitching into statistics over which the pitcher had control (BB, HBP, SO, HR) and those over which he does not (hits on balls in play–a stat he called IPAvg for “In Play AVG”); from there, he pointed out that pitchers have little if any control over the percentage of their balls in play that become hits. Since the publication of this article, there has been a great deal of research surrounding his conclusions, most notably by BP’s Keith Woolner and Diamond Mind Baseball’s Tom Tippett. Based on their research, McCracken has softened his original conclusion and it’s now safe to say that pitchers do have some control over their IPAvg, especially when considering large sample sizes.

Back to what we were getting at: in order to attempt to remove the quality of pitching from our defensive metrics, we need to establish just how difficult it is to play defense behind each team’s pitching staff. For a first attempt, we’ll establish an expected IPAvg for all pitchers, simply averaging their total career numbers. Rather than look simply at the last three years as we do with park factors, it’s important to look at a pitcher’ s total career because of the large variation in IPAvg from year to year. Armed with an IPAvg for every pitcher, we simply weight that by how much of the pitching they did for their team and produce an expected defensive efficiency based on the pitching staff (PitchDE). Here are the results:

Team                    PitchDE
Seattle Mariners         .7286
Anaheim Angels           .7227
Tampa Bay Devil Rays     .7221
San Francisco Giants     .7182
Arizona Diamondbacks     .7171
Oakland Athletics        .7146
Chicago White Sox        .7140
Los Angeles Dodgers      .7132
Cleveland Indians        .7131
Baltimore Orioles        .7129
St Louis Cardinals       .7116
New York Yankees         .7108
Philadelphia Phillies    .7099
San Diego Padres         .7096
Atlanta Braves           .7093
Boston Red Sox           .7091
Minnesota Twins          .7082
Chicago Cubs             .7077
Cincinnati Reds          .7072
Montreal Expos           .7068
New York Mets            .7061
Houston Astros           .7060
Kansas City Royals       .7059
Florida Marlins          .7053
Pittsburgh Pirates       .7053
Toronto Blue Jays        .7046
Colorado Rockies         .7041
Detroit Tigers           .7003
Milwaukee Brewers        .7000
Texas Rangers            .6964

Obviously there are some problems here. Primarily, there are quite a few rookie or relief pitchers with small sample size problems, making younger staffs more susceptible to outliers that skew the numbers. Next, there’s the problem of pitchers having spent too much of their time in one park or in front of one defense. This is mainly a problem with teams laden with homegrown talent who pitch in extreme parks (see: Athletics, Oakland). Finally, there’s the slight problem of pitchers who have been around long enough to have significant playing time in the late ’80s and early ’90s when the game was played slightly differently.

In order to correct for some of these factors, we’re going to have to make a few basic adjustments. First, to eliminate the problem of extremely lengthy careers, we’ll limit our numbers to the last nine seasons, starting after the 1994-95 strike. Secondly, similar to what Tom Tippett did, we’ll look at pitchers’ IPAvg versus their team rather than the raw numbers. By doing so, we can better isolate pitchers from park and team defensive quality effects.

Here come the equations, so if that’s not your cup of tea, feel free to scroll down. For each pitcher, we’ll compute a year-by-year weighted IPAvg, basically dividing the team’s IPAvg (tIPAvg) by the player’s IPAvg (pIPAvg). This calculation will give us a number based around 1.000 that tells us whether a player was generally tougher on the defense (lower than 1.000) or easier on the defense (higher than 1.000).

Now that we have this year-by-year IPAvg based on the team environment (tIPAvg/pIPAvg), we need to compute a career average for each pitcher. If we simply averaged the numbers, we’d get some skewed results because of playing time issues–a season in which a pitcher only faced 10 batters would count the same as a season in which he faced 500. So we’ll weight each season by playing time and recalculate using the following formula:

pExIPAvg = Sum((tIPAvg/pIPAvg)*(pBIP/sum(pBIP)))

where pBIP is the number of balls in play a pitcher allows in a given season and we’re summing year-by-year. This metric, called pExIPAvg, is each pitcher’s average IPAvg since 1995 and we’ll expect the pitcher to allow that IPAvg this season.

Here’s where we can correct for pitchers with small sample sizes – rookies and specialty relievers. (Thanks again to Keith Woolner for the shove in the right direction.) For pitchers with fewer than 500 BIP, we’ll adjust things slightly, using a combination of their own performance and their team’s overall performance to reel in some of the extreme outliers.

For example, in the original formula, a pitcher who has only 100 BIP, yields 30 hits (a .300 IPAvg), and pitches in front of a defense that yields only .280 IPAvg would get a very extreme baseline of 0.933 (.280/.300). Instead, we’ll use ((100*.300)+(500-100*.280))/500 = (150)+(112)/500 = .284. Then, .280/.284 = 0.986, instead of 0.933. For those of you who like complicated equations, this can be expressed as:

If (pBIP < 500) then pExIPAvg = ((pBIP*pIPAvg)+((500-pBIP)*tIPAvg))/500

I used 500 BIP as a minimum because it removed all the drastic outliers while not affecting pitchers with significant playing time. Results do not change dramatically moving the limit higher.

Now we can apply our pExIPAvg for each pitcher to his team in a given season and compute an expected tIPAvg for that year. We’ll again need to correct for playing time, but this time as a percentage of the team BIP instead of the sum of a pitchers playing time over his career. It’s important to use BIP as the playing time metric instead of batter’s faced (BFP) because we’re looking at how the pitcher affects the defense rather than run scoring in total. By doing so, we correct for the fact that players like Kerry Wood, who throw more than their fair share of strikeouts and walks, don’t weigh on the defense as much as players like Greg Maddux or Tom Glavine despite having similar playing time levels based on BFP. So we’ll multiply each players expected IPAvg by his percentage of team BIP and then sum for each team:

tExIPAvg =  sum((pExIPAvg)*(pBIP/tBIP))

where tBIP is the number of balls in play the team allowed on the season and we’re summing team by team.

We can call this the team expected IPAvg (tExIPAvg). This metric will again be centered around 1.000 with higher numbers meaning it’s easier on the defense, and lower numbers being harder. The numbers are slightly off 1.000, probably because pitchers who tend not to get shelled don’t last very long, so we’ll re-center around the actual average. Here are the team-by-team results:

Team                   tExIPAvg
Philadelphia Phillies   0.9781
Houston Astros          0.9786
Detroit Tigers          0.9883
Montreal Expos          0.9886
Pittsburgh Pirates      0.9905
Atlanta Braves          0.9917
Chicago Cubs            0.9924
Texas Rangers           0.9927
Cincinnati Reds         0.9939
New York Yankees        0.9952
Milwaukee Brewers       0.9953
Los Angeles Dodgers     0.9958
St Louis Cardinals      0.9971
Toronto Blue Jays       0.9972
New York Mets           0.9988
Florida Marlins         0.9989
San Diego Padres        1.0004
Cleveland Indians       1.0007
Kansas City Royals      1.0009
Anaheim Angels          1.0026
Chicago White Sox       1.0045
Minnesota Twins         1.0055
Boston Red Sox          1.0085
Arizona Diamondbacks    1.0087
San Francisco Giants    1.0097
Colorado Rockies        1.0110
Oakland Athletics       1.0111
Baltimore Orioles       1.0113
Tampa Bay Devil Rays    1.0209
Seattle Mariners        1.0310

Again, the lower the number, the harder things are on the defense. Based on the numbers above, we can weight every team’s defensive performance like we did with PADE, yielding a zero-based scale that tells us how well a defense did given it’s particular pitching staff. To do so, we use the following formula:

PIDE = (Def_Eff/LgDef_Eff)/tExIPAvg

where Def_Eff is a team’s raw defensive efficiency on the season, LgDef_Eff is the league average defensive efficiency. We’ll then subtract 1 and multiply by 100 to get a scale of percentages. We’ll call this new metric PIDE for Pitching Independent Defensive Efficiency (though reader AC’ s suggestion of “DEREK: Defensive Efficiency Ratings, Even Keeled” was much better than anything I thought up. Seriously, “PIDE”, that’s pathetic.) Anyway, the results:

Team                     PIDE
Houston Astros          3.617
Philadelphia Phillies   3.183
Atlanta Braves          1.867
Los Angeles Dodgers     1.856
Oakland Athletics       1.231
Montreal Expos          1.175
Chicago White Sox       0.894
Cleveland Indians       0.823
Pittsburgh Pirates      0.810
Anaheim Angels          0.787
San Francisco Giants    0.751
Chicago Cubs            0.724
St Louis Cardinals      0.577
Detroit Tigers          0.437
San Diego Padres        0.231
Cincinnati Reds         0.014
Seattle Mariners       -0.117
Tampa Bay Devil Rays   -0.244
New York Mets          -0.483
Minnesota Twins        -0.532
Kansas City Royals     -0.673
Florida Marlins        -0.781
Milwaukee Brewers      -0.860
New York Yankees       -1.185
Arizona Diamondbacks   -1.239
Toronto Blue Jays      -1.301
Boston Red Sox         -2.186
Texas Rangers          -2.417
Baltimore Orioles      -2.633
Colorado Rockies       -3.309

Looking quickly at the results and comparing them with the original list from PADE, there are a few basic points to be made about teams this season:

  • Houston, who finished third in PADE, appears at the top here and seems assured to claim the title as best pure defense in the league this season.
  • Tampa doesn’t look nearly as good on this list as they finish below average after topping the original PADE lists. Their success in the PADE and raw Def_Eff rankings may be due more to a pitching staff that yields typically fewer H/BIP than average no matter what kind of defense they have behind them.
  • No matter how you look at it, the AL East plays some pretty crummy defense; four of the five teams finish in the bottom seven in PIDE and the bottom 12 in PADE.
  • Our claims about the Rockies getting a bad rap from raw Defensive Efficiency may have been premature as they find themselves bringing up the rear in PIDE. They may, however, be the case study for how to combine to two lists based on their extreme park effects as a move to the expanses of Coors or similarly difficult defensive parks may affect pExIPAvg.
  • No matter how you slice it, the Orioles play some terrible, terrible defense. Ishtar terrible.
  • Based on this season’s numbers, it doesn’t appear that good defense is a prerequisite for success. The eight playoff teams averaged to finish 15th in PADE and PIDE. This year’s World Series combatants finished an average of 26th and 23rd.

It would be ideal to combine the two metrics, PADE and PIDE, into a single measure of defensive efficiency. Simply averaging the pitching and park baselines to find an expected performance is one easy way to accomplish this, but that assumes that park and pitching both have an equal effect on the defense. We don’t know that this is so. Further, the range of baselines in PIDE (5.3%) is much larger than PADE (3.3%), though we could use standard deviations to normalize them to each other before combining. (The most likely explanation for this discrepancy is the fact that each team plays behind a unique pitching staff, while teams play in many of the same parks a comparable number of times.)

Regardless, combining PADE and PIDE in a way that accurately reflects events on the field requires a longer look, a longer column, and this one has rambled on long enough for today. In the meantime, looking at the PADE and PIDE lists side by side, broad conclusions about the true defensive performance of teams around the league can be drawn with more confidence than simply looking at Bill James’ Defensive Efficiency.