I suppose I ought to say something about the Gold Gloves, huh?

Now, I’m sure everyone knows what I think about Derek Jeter’s defense—it’s probably being overrated even by “advanced” defensive metrics (which aren’t exactly kind to him). So how did he win?

Yesterday, I was listening to ESPN’s Rob Neyer talking to Michael Kay on the radio about the Gold Glove awards. And what stuck out in my mind was when Kay asked Neyer, “If you need a guy to field a ball hit right at him, who do you pick? Is it Jeter, or Elvis Andrus, or Alexei Ramirez?”

Neyer responded that he’d pick Jeter. And I think that goes a long way towards explaining why Jeter won the Gold Glove—he’s good at the elements of fielding that are easy to notice. He fields the balls he gets to very well.

And I think there’s a growing awareness that fielding the balls one gets to only tells a little bit of the story on defense. Getting to balls is also a part of the story—probably a much bigger part of the story. And I think a lot of the frustration people feel over things like Jeter getting the Gold Glove comes from the fact that the Gold Glove voters (and a lot of other people) don’t seem to recognize the importance of that aspect of fielding. The Gold Glove ought to go to the people who are good at fielding more balls, shouldn’t it?

But how do you know who’s good at getting to more balls?

Last week, I talked about the idea that sabermetrics is (in part) the scientific study of baseball. I don’t think that all of sabermetrics is science, of course, but I think that’s a large part of it. And I think that more than anything else, a sabermetrics that is manifestly anti-scientific ceases to be sabermetrics at all.

One of the key requirements for scientific research is reproducibility—the idea that independent observers can come to the same conclusions as the original researchers. The reproducible parts of fielding analysis are things like DER—measures that tabulate plays made and balls in play, objective facts that anyone can derive using scorekeeping methods that Henry Chadwick came up with in the 1870s. Modern advancement in fielding analysis relies on estimates of expected outs from batted ball data.

And that data has proven not to be reproducible on two counts—different data providers cannot reproduce the same estimates of where a ball landed and how it got there, and different analysts using the same batted-ball data cannot reproduce the same estimates of expected outs.

Last week, Tom Tango published his team defensive ratings, based upon the Fan Scouting Report. He compared the results with team totals of two defensive metrics based on Baseball Info Solutions data:

 I also included the totals from Dewan (DRS) and MGL (UZR).  Dewan’s numbers were dropped by 10 runs per team because the average is +10 per team.  I don’t know why.… The three systems agreed on the Nationals, and strongly disagree on the Indians.  Correlation of Fans to Dewan is r=.60, and Fans to UZR is also r=.60.  UZR to Dewan is r=.57.  That’s pretty whacked out when you consider that UZR and DRS use the same data source.

I decided to check the correlation between all of the measures and DER:









All Three




(“All Three” is the average of the Fans, DRS and UZR; BIS is the average of DRS and UZR).

What the results suggest is that all of these measures are roughly as well-correlated with DER as they are with each other. And that makes sense, since all of them implicitly include all of the information included in DER in their measurement. Metrics based upon batted-ball data, at the team level, boil down to:

(DER – exDER) * BIP

Which is to say a team’s actual DER, minus what the expected DER would be given the estimated batted-ball distribution, multiplied by the number of balls in play (This is a bit of a simplification—certain categories of plays are excluded, like popups, and in the case of UZR, plays by the catcher and pitcher are ignored. And these measures are also looking at things like outfield throwing arms, double-play rates, etc. But fundamentally, turning batted balls into outs is the most important aspect of team defense).

What I want to emphasize here is that there is little room for defensive metrics to differentiate in terms of measuring DER. Because of their focus on measuring individual player contributions, certain categories of plays are excluded—but outside of that, the number of outs recorded and the number of balls in play is not in dispute. Those are facts; everything else is an estimate.

So let’s turn to partial correlation, which tells us the correlation between two variables after removing the influence of a third variable. In this case, we’ll look at the correlation between the various metrics after controlling for the influence of DER. The partial correlation between UZR and DRS is .28 (Between the Fans and UZR it’s .32, and between the Fans and DRS it’s .35).

So only a quarter of the agreement between DRS and UZR is caused by factors outside of plain old DER. That’s very little agreement on estimates of expected outs and elements of defense not included in DER. It’s worth reemphasizing that DRS and UZR both use the same source of batted-ball data, and different data providers can disagree significantly in terms of hit location and batted ball type.

And so if we want to know why people don’t trust what we’ve come to call “advanced” fielding analysis, it’s really because we haven’t given them a reason to trust it. And that’s because of a fundamental abandonment of what makes sabermetrics compelling—the search for objective truth. For a time we stopped doing science when it comes to fielding analysis, and instead have been doing baseball alchemy—trying to transmute lead into Gold Gloves.

The AL Gold Glove voters made a mistake in giving Jeter the award. But I think we make a much bigger mistake if we castigate the Gold Glove voters for their beliefs without a serious effort to give them something in which they ought to believe.