CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here for forgotten password Click here to subscribe

Premium and Super Premium Subscribers Get a 20% Discount at MLB.tv!

<< Previous Article
Premium Article Future Shock: Scouting... (07/16)
Next Article >>
Premium Article Prospectus Hit and Run... (07/16)

July 16, 2010

Indefensible

What Do We Really Know About Defense?

by Colin Wyers

Occasionally, I get asked—what’s going on with my attempts to make a defensive metric? I started off working on a Loess-based defensive metric, and then efforts just stalled. Because of the stall, it’s a fair question, and one that’s harder to answer than I think the questioners realize, because I’ve been slowly coming to some realizations about defensive metrics in general, and they aren’t encouraging.

The short version: I’m not really sure that we’ve gotten any further than where we were when Zone Rating and Defensive Average were proposed in the '80s. And if we have gotten further, I’m not sure how we would really tell. I’ve discussed some of this recently, first in a rather sprawling discussion at Tom Tango’s blog, and then in a conversation with Kevin Goldstein and Jason Parks on the BP podcast. But now’s a nice time to sort of take some time and compose those thoughts.

Let’s start with first principles, I mean really basic stuff: What is sabermetrics? Bill James proposed a definition—“the search for objective knowledge about baseball." And—that really does say a lot, doesn't it? It defines sabermetrics as the search, not the result. It tells us we are looking for knowledge. And it tells us we want to be objective about it.

Now the question comes: Are we being objective about fielding analysis? In other words, do we know what we think we know?

The Trouble with Defense

For the most part, those who are inclined to the sabermetric world view have come to a consensus on the evaluation of offense. There are occasional arguments, but over what I would call "little things." There is more agreement than disagreement, by a long shot.

But now imagine for a second that managers no longer got to set the lineup order. Maybe the umpire throws dice to determine who the next batter is. Or he has a spinner, stolen from a game of Chutes & Ladders. And then imagine that nobody is recording how many times a hitter came to the plate, simply how many innings he played and how many hits, walks, etc. he got.

What would our analysis of offense look like then? Probably a lot like range factor, for example—you'd simply have to hope that over time, the number of plate appearances per inning played approached the average. And over time, you may even be right. (Of course, there's no guarantee that a single season is enough time for this to happen; actually, you'd expect it to not even out for a substantial number of players in any one season.)

And that's where we've been for the longest time when it comes to measuring defense. The solution to this has been to use batted-ball data (both an indicator of how the ball was hit—ground ball, line drive, fly ball, popup—and where it was hit) to approximate chances.

What the Data Says

Now, I've spent a lot of time writing about the data that we're using. To be rather indulgent and quote myself:

A baseball fact is, simply put, something where the decision has a direct outcome on the game. Changing a strikeout into a walk has a very large effect, for instance—it provides both a baserunner for the offense and prolongs the inning.

The batted-ball data we have doesn't conform at all to the definition of baseball stats proposed above, so it's very difficult to say how well those measurements are describing the essential reality on the field of play. I have been studying differences in the data and it seems to shed very little light on the subject. What I can say with some certainty:

  • There are definite differences in how different data providers are defining the events that are occurring.
  • We have not yet established which of the data providers are correct, or more appropriately, we haven't established which are more correct.
  • To the extent that the data providers are erring, it seems that some of the errors are systemic—that is to say, they can be counted upon to repeat themselves in a similar fashion over a long period of time.
  • When multiple data providers are in agreement, we can only say that it is due to something in common between them—we cannot necessarily assume that the underlying reality is the only common element. There is a potential for shared bias, so that multiple data providers are wrong in similar fashions over time.

It's the third point that actually provides the biggest problem for us. If the errors were simple, isolated mistakes, then we could simply address them by adding more data. Over time, we would expect the errors to "wash out." But that is not how bias behaves—we cannot assume that bias will wash out, no matter what the sample size is, or how much we regress a sample to the mean.

And so when we look at repeatability of metrics, we run into a problem that we don't know how much of that repeatability is due to underlying skill, and how much is due to bias.

I've focused on the potential for bias in the batted-ball classifications, largely due to the availability of the data. But there are certainly other ways the data could potentially be biased. Commenter Guy at Tango's blog notes:

The most likely systematic bias in the data will be exacerbated, not remedied, by regression. That is the bias toward rating plays as “easier” when they become outs, or when fielders get to them quickly. Imagine having people rate the difficulty of 200 GBs into the 3B-SS hole from video. Now, imagine that the fielders are digitally removed, and the video stopped before it’s clear whether the ball reaches the OF, and the plays are scored again. Does anyone doubt that the balls that became hits will on average be rated as easier in the second scoring, while the outs become more difficult.

Or, as I put it on the podcast—imagine a ball hit between the shortstop and third baseman. Or imagine several, some where the shortstop gets to the ball, some where the third baseman does, some where it goes past them for a hit. What are your frames of reference, watching on video?

For example, watch this play by Ryan Braun from the All-Star Game. What do you see when the ball is caught, other than Braun, some grass, and maybe a little bit of the outfield fence? And that's a highlight-reel play, where you're getting multiple angles. What about a routine catch? Another clip from the All-Star GameMarlon Byrd's throw to get David Ortiz at second. How much of a frame of reference are you getting to determine the location of the ball?

One can suppose a range bias for the location data, where a fielder's ability to get close to the ball (much less field it) influences the scoring of where the ball was on the field. Is there any evidence for this sort of a bias? Perhaps. What I did was take all players with at least 100 innings played in back-to-back seasons, and look at their plays made and balls in zone as defined by Baseball Information Solutions (from the leaderboards at Fangraphs.com). This is based upon the same BIS data that is fed into UZR or the Fielding Bible Plus/Minus stats. The data ran from 2003-09.

So I looked at BIZ and total plays (Plays plus OOZ, or "out of zone" plays, as defined on Fangraphs) per inning, and divided that by the positional average for that season. Then I looked at the correlation between years:

 

BIZ

AllPlays

All

0.14

0.23

IF

0.14

0.26

OF

0.15

0.19

The auto-correlation for how many plays a player makes isn't really that much higher than the autocorrelation for chances, as defined by BIS. This is especially true for outfielders.

So we have questions about the data quality, as yet unresolved. And I wonder—what conclusions can we draw from the data when we don't know these things?

Method Man

Even using the same data, though, you can come up with drastically different results. Fangraphs publishes two defensive metrics, UZR and Defensive Runs Saved. These are both derived from the same BIS batted-ball data, and purport to measure the same thing (a fielder's value above average, compared to his peers at his position). The correlation between the two for 2009 for qualified starters, as reported by Mitchel Lichtman, UZR's creator, is .79.

Compare that to the correlation between the primary offensive rate stat on Fangraphs, wOBA, and a pretty crude bases per plate appearance measure—(TB+BB+HBP)/PA. For qualified starters in 2009, the correlation is .94.

So you have two methods that seem to disagree quite a bit, at least compared to offensive metrics. And that agreement seems to be driven largely by the underlying data—using the plays and ball-in-zone data from BIS, I constructed a quick-and-dirty runs above/below average metric (similarly to what I did here). That rubric, with almost no adjustments, correlated with DRS at 0.76 and with UZR at 0.65. It seems that simply using the same batted-ball data (and the same set of underlying facts—so-and-so made so many plays and was on the field so often) will get you most of the way to that level of agreement, regardless of method.

So our metrics don't do a very good job of agreeing. We don't know which methods are "better," only which ones we like more. And our data hasn't been validated against some objective standard.

To me, this opens up a simple question—how good are our defensive metrics? Are they useful? How useful?

 And if we go back to the beginning, where we talked about what sabermetrics is about, it doesn’t seem to me to be good or valid sabermetrics to accept these metrics without some sort of evidence, some objective facts that show they measure what we think they measure. And I think the burden of proof is on those who are making claims based upon these metrics to provide that evidence.  

Colin Wyers is an author of Baseball Prospectus. 
Click here to see Colin's other articles. You can contact Colin by clicking here

Related Content:  The Who,  Data Use

33 comments have been left for this article.

<< Previous Article
Premium Article Future Shock: Scouting... (07/16)
Next Article >>
Premium Article Prospectus Hit and Run... (07/16)

RECENTLY AT BASEBALL PROSPECTUS
Premium Article Baseball Therapy: Trading Ryan Howard For No...
Premium Article What You Need to Know: July 30, 2014
Fantasy Article Dynasty Dynamics: Who We're Selling at the D...
Eyewitness Accounts: July 30, 2014
Premium Article Scouting the Draft: The Cape League All-Star...
The Lineup Card: Nine Last-Minute Trades
Premium Article Moonshot: Separating the Phenoms Who'll Make...

MORE FROM JULY 16, 2010
Premium Article Prospectus Hit and Run: Big Gains Afield
Premium Article Future Shock: Scouting Casey Kelly
Premium Article Checking the Numbers: Where Will Oswalt Go?
Premium Article Transaction Action: Junior Circuit Jumble
Premium Article Ahead in the Count: Why You Can Go for the G...
Premium Article Prospectus Q&A: John Jaso
Premium Article On the Beat: A Second Second City Revival?

MORE BY COLIN WYERS
2010-08-04 - Manufactured Runs: By Land, Sea, and Air
2010-07-28 - Manufactured Runs: Looking Farther Afield
2010-07-23 - Premium Article Manufactured Runs: King Without a Crown
2010-07-16 - Indefensible
2010-06-30 - Premium Article Manufactured Runs: Who's an All-Star?
2010-06-23 - Manufactured Runs: Batted Balls
2010-06-14 - Premium Article Manufactured Runs: Stop, Drop and Roll
More...


INCOMING ARTICLE LINKS
2012-08-06 - Premium Article Pebble Hunting: The Best Baseball Questions ...
2012-04-17 - Baseball ProGUESTus: Giving Difficult Plays ...
2010-11-24 - Premium Article Warning Track Power: To Play or Not to Play:...
2010-08-13 - Premium Article Prospectus Hit List: Big Hits and Near Misse...
2010-07-28 - Manufactured Runs: Looking Farther Afield
2010-07-16 - Premium Article Prospectus Hit and Run: Big Gains Afield