I have been making something of a ruckus recently about where I feel the state of current defensive analysis is. I have been long on listing problems, and short on proposing solutions.

Well, allow me to make amends there. I don’t pretend to have the problem solved. I’m not sure any of us will ever see it truly solved. But I think—or at least, hope—this can point us in the right direction.

The Two Problems

We can really subdivide our problems neatly in two. One is the issue of bias, the other of uncertainty.

Let us start with the latter. What we are trying to do here is measure, and then compare, two things:

  1. How many plays a player has made, and
  2. How many plays we think an average player at that position would have made, given the same chances.

The first, we all think we can measure directly—given the record, we can readily come up with a total. We may have some disagreement over what to count, but if we agree on what we should be counting, we can come to an agreement. The second is an estimate and, as such, is subject to error. Over time, the error in our estimate should come down (as a proportion of our estimate, that is).

Now, what modern defensive metrics (one based on observational data, like batted-ball types, hit locations, etc.) are trying to do is to cut down on the effects of measurement error on our estimate of plays made by an average player.

By attempting to reduce measurement error, those metrics have introduced the potential for bias into their estimates, however. The two key ones are:

  1. Park-scorer biases. To the extent that a park influences the scoring of batted balls, that has an impact on our estimates. It could have to do with the identity of the scorer in different parks. It could relate to the vantage point of the scorers in each park. Regardless of the source, it distorts the estimates of a fielder’s chances.
  2. Range biases. To the extent that a fielder’s range (or the range of his teammates) influences the scoring of batted balls (either by type or location), that also distorts the picture of a fielder’s abilities. The most obvious possible effect is that a good fielder will raise the number of estimated chances he gets by getting to more balls (or at least getting closer to them)—and vice versa for a poor fielder. This would both artificially compress the observed spread of fielding performance, and systemically underestimate fielders with good range (and overestimate fielders with poor range).

So what we have is some presumption of increased accuracy, in exchange for additional bias. What we do not know, as of yet, is how much accuracy we are gaining, at the expense of how much bias. And I think that’s an important thing to know—if your gain in accuracy is less than the amount of bias you’re introducing, you haven’t actually gotten better, you’ve gotten worse.

And we know how to solve the accuracy problem—get more data! Over a long enough timeline, the estimates will improve on their own. Adding more data doesn’t make bias any better, though—in fact, over time, the effect of bias becomes more powerful.

Just the Facts, Ma’am

So let’s take a different approach. Let’s try to design a fielding metric with no bias—or, at least, attempt to minimize the effect of bias. What we can do is:

  1. Restrict ourselves to looking at only factual data—data we can validate objectively. That means no batted-ball data, no hit location data, etc.
  2. For estimating the amount of plays an average player at that position would have made, ignore data about the outcomes of batted balls whenever possible.
  3. Err on the side of caution when deciding whether or not to adjust—in other words, make as few adjustments as possible. We can allow the data to be expressive by getting the metric out of its way whenever we can.

Over time, the potential inaccuracies of our data should wash out, and because we think we are minimizing our potential for bias, over a long period of time we should be able to be confident of our measure of a fielder’s ability.

Figuring Plays Made

Looking at play-by-play data available from Retrosheet, we can start off with counting the plays a player actually made on the field.

Ideally, what we would do is separate the fielding of balls hit on the ground (OK, OK—ground balls) from balls hit in the air (pop-ups, liners, and fly balls). But we’ve already committed to not using that sort of data. Is there anything we can do, simply looking at facts, to determine what sort of plays a player made?

For outfielders, it’s a simple matter. We just count an outfielder’s unassisted putouts as his plays made. (His assists we can examine separately at a later date.)

For an infielder, how are we to determine whether he caught the ball on the fly or fielded it on the ground, without resorting to batted-ball categorizations? It’s simple (if a bit messy for first basemen and pitchers):

  1. An assist by the infielder who first fielded the ball counts as a play made on a “ground ball.” (This is not always the case—a fielder who deflects a ball that is then fielded by another player for an out is credited with an assist. But this is rare enough that over time we can ignore it, and in the short run we can do little about it.
  2. An unassisted putout of a baserunner, other than the batter, by an infielder is a play made on a “ground ball.” For catchers, second basemen, third basemen, and shortstops, an unassisted putout of the batter is a play made on an “air ball.” There are rare occasions, mostly for second basemen, where this isn’t the case, but again over time we shouldn’t have to worry about this.
  3. For first basemen, an unassisted putout of the batter is a “ground-ball” out when it was either on a bunt attempt or hit by a left-handed batter. For pitchers, an unassisted putout of the batter is a “ground-ball” out on a bunt attempt only. All others are classified as “air-ball” outs. This is probably the least-confident part of the system, but for now we’ll leave it as it is.

So this gives us, at the team level, outs on the ground versus outs in the air. And what we see is a strong negative relationship between ground plays and air plays, with a correlation of -0.77. So when a team makes a lot of ground-ball plays, the most likely explanation is that they saw a lot of ground balls.

So, let’s adjust for that. What we can do is look at how many plays a team made in total, compared to the average team, and then look at how many ground-ball plays a team made compared to how many air-ball plays they made. A team with superior ground-ball fielders will not only have more ground-ball plays but likely more plays made overall.

So for a team that’s above-average on making ground-ball plays but below-average in making total plays, we “shift” the responsibility toward the ground-ball plays (in other words, inflate the amount of ground-ball plays we think the team should have made, but deflate the amount of air-ball plays we think the team should have made), while keeping the total number of plays we think the team should have made constant.

This is, for lack of a better term, our “ground-ball rate” adjustment. It’s a bit of a misnomer, because we ignore any scorer data on the number of ground balls a defense saw. And it is possible that including that scorer data could improve the process here as well. But for now, let’s err on the side of excluding that data.

Breaking Down the Fielders

What we do now is apply the process from above to individual fielders. As we did for teams, we break down outfielder plays, infielder plays on the ground, and infielder plays in the air. That tells us how many plays each fielder made.

Then we look at each batted ball and estimate the likelihood that each fielder makes a play on it. The only data we are considering right now is the handedness of the batter who hit the ball. (For first basemen, we’re also considering whether or not they had to hold a runner at first.) We aren’t considering who eventually fielded the ball, whether or not the ball was an out or a hit, etc. Why? Because the outcome of the batted ball is a potential source of bias. By giving up some accuracy in the short run, we allow truly great fielders to look truly great—otherwise, we artificially compact the spread of the impact of top fielders over time.

So we have our measure of plays made, and our estimate of chances. We can leave off there, at least for infielders (Outfielders will require a bit more work, I’m afraid—and that will have to wait for another day). But we discussed uncertainty—can we at least try and measure it?

Ignore, at least for now, uncertainty about actual plays made—for first basemen and pitchers especially we do have some, but enough that we can afford to at least set it aside for a while. But for our estimate of how many plays a fielder should have made, we know there is a margin of error. What we can do is calculate the uncertainty of our estimate per ball in play, and use that to figure our total uncertainty for any given player.

What I did is figure the root mean square error between the average number of plays made and the actual plays made, on an individual basis.

For example: In 2009, with a right-handed hitter batting, a shortstop will make a play on a ball in play roughly 12 percent of the time. (For a left-handed batter, a shortstop will make a play on a ball in play roughly six percent of the time.) But the margin of error around our estimate of how often a shortstop will make any single play is about 30 percent. (Notably the error is asymmetrical—obviously there is no chance of a shortstop making a negative play, even if in exasperation I may have accused Alex Gonzalez of it during the ’03 playoffs.)

To attribute that margin of error over a number of chances, we take:

What’s interesting about this is that the margin of error per BIP drops, the more BIP we observe. So, after 100 BIP, the margin of error for any one play drops all the way to three percent.

(That’s why, to me, uncertainty is preferable to bias—with enough statistical power, we can plow through uncertainty readily. Without an accounting of what the bias is, we’re essentially powerless against it.)

Some Examples

After taking you all this way, surely I wouldn’t leave you without something to look at, would I? Here are the top 10 seasons by a shortstop since 1950, according to our new fielding metric:











Guillen, Ozzie









Ryan, Brendan









Fermin, Felix









Belanger, Mark









Tulowitzki, Troy









Sanchez, Rey









Thon, Dickie









Smith, Ozzie









Martinez, Felix









Sanchez, Rey









I’ve provided a tentative conversion of plays to runs, although it still needs a little work. Note, for instance—Ozzie Guillen is being credited with about 73 plays above the average shortstop for 1988. That’s pretty impressive. It’s also pretty imprecise, with a margin of error around 20 plays.

What’s important to note is that the error is not symmetrical—we think there’s practically no chance that Guillen really made over 90 plays above average, for instance.

So, on a single-season level, we see some quizzical results. (Brendan Ryan? Really?) The important thing to remember is—we aren’t very confident in those results! Our confidence increases as we move to the career level, though:





Smith, Ozzie



Belanger, Mark



Sanchez, Rey



Russell, Bill



Valentin, Jose



Guillen, Ozzie



Templeton, Garry



Groat, Dick



Maxvill, Dal



Gagne, Greg



It isn’t to say there’s no uncertainty. We can say, given the statistical evidence we have at hand, there’s a small (but not impossible) chance that Mark Belanger saved more runs compared to the average shortstop than Ozzie Smith did. And after that, well, nobody else is in the running.

And nobody has really disputed how good Ozzie Smith was—but other metrics haven’t fully captured the magnitude of it. Our own FRAA, for instance, gives Smith 266 runs above average. Sean Smith’s TotalZone says 239 runs above average. In reality, Ozzie was better than that—a lot better.

What’s Next

Well, obviously I have to produce outfielder measurements as well. And there are probably still some tweaks to be made to this system that could improve it.

But past that—these values cannot simply be used in place of FRAA to calculate WARP the way we’re doing it now. We have this measure of uncertainty. We can similarly compute uncertainty for our offensive metrics (and it’s quite a bit smaller on a per-play basis). We cannot, in coming up with a single value to express a player’s season, add defense to offense as though we are equally certain of both.

So we’re going to be revising WARP to account for this uncertainty. Along the way, we’ll be adding some other enhancements to WARP as well. And we’ll be looking at pitching—after all, a lot of what we’ve always thought was pitching is fielding, isn’t it? And so any uncertainty we’ve had in measuring fielding spills over into pitching as well.

So, consider this a beginning, not an end.

Notes and Asides

I should give a nod to Bill James’ Fielding Win Shares, which served as an inspiration for some of my efforts here. I should also give a nod to the work Smith has done on TotalZone, which was also something I spent a lot of time thinking about.

For some discussion on the spread of defense, I cannot recommend these enough: