Premium and Super Premium Subscribers Get a 20% Discount at MLB.tv!
July 28, 2010
Looking Farther Afield
I have been making something of a ruckus recently about where I feel the state of current defensive analysis is. I have been long on listing problems, and short on proposing solutions.
Well, allow me to make amends there. I don’t pretend to have the problem solved. I’m not sure any of us will ever see it truly solved. But I think—or at least, hope—this can point us in the right direction.
The Two Problems
We can really subdivide our problems neatly in two. One is the issue of bias, the other of uncertainty.
Let us start with the latter. What we are trying to do here is measure, and then compare, two things:
The first, we all think we can measure directly—given the record, we can readily come up with a total. We may have some disagreement over what to count, but if we agree on what we should be counting, we can come to an agreement. The second is an estimate and, as such, is subject to error. Over time, the error in our estimate should come down (as a proportion of our estimate, that is).
Now, what modern defensive metrics (one based on observational data, like batted-ball types, hit locations, etc.) are trying to do is to cut down on the effects of measurement error on our estimate of plays made by an average player.
By attempting to reduce measurement error, those metrics have introduced the potential for bias into their estimates, however. The two key ones are:
So what we have is some presumption of increased accuracy, in exchange for additional bias. What we do not know, as of yet, is how much accuracy we are gaining, at the expense of how much bias. And I think that’s an important thing to know—if your gain in accuracy is less than the amount of bias you’re introducing, you haven’t actually gotten better, you’ve gotten worse.
And we know how to solve the accuracy problem—get more data! Over a long enough timeline, the estimates will improve on their own. Adding more data doesn’t make bias any better, though—in fact, over time, the effect of bias becomes more powerful.
Just the Facts, Ma’am
So let’s take a different approach. Let’s try to design a fielding metric with no bias—or, at least, attempt to minimize the effect of bias. What we can do is:
Over time, the potential inaccuracies of our data should wash out, and because we think we are minimizing our potential for bias, over a long period of time we should be able to be confident of our measure of a fielder’s ability.
Figuring Plays Made
Looking at play-by-play data available from Retrosheet, we can start off with counting the plays a player actually made on the field.
Ideally, what we would do is separate the fielding of balls hit on the ground (OK, OK—ground balls) from balls hit in the air (pop-ups, liners, and fly balls). But we’ve already committed to not using that sort of data. Is there anything we can do, simply looking at facts, to determine what sort of plays a player made?
For outfielders, it’s a simple matter. We just count an outfielder’s unassisted putouts as his plays made. (His assists we can examine separately at a later date.)
For an infielder, how are we to determine whether he caught the ball on the fly or fielded it on the ground, without resorting to batted-ball categorizations? It’s simple (if a bit messy for first basemen and pitchers):
So this gives us, at the team level, outs on the ground versus outs in the air. And what we see is a strong negative relationship between ground plays and air plays, with a correlation of -0.77. So when a team makes a lot of ground-ball plays, the most likely explanation is that they saw a lot of ground balls.
So, let’s adjust for that. What we can do is look at how many plays a team made in total, compared to the average team, and then look at how many ground-ball plays a team made compared to how many air-ball plays they made. A team with superior ground-ball fielders will not only have more ground-ball plays but likely more plays made overall.
So for a team that’s above-average on making ground-ball plays but below-average in making total plays, we “shift” the responsibility toward the ground-ball plays (in other words, inflate the amount of ground-ball plays we think the team should have made, but deflate the amount of air-ball plays we think the team should have made), while keeping the total number of plays we think the team should have made constant.
This is, for lack of a better term, our “ground-ball rate” adjustment. It’s a bit of a misnomer, because we ignore any scorer data on the number of ground balls a defense saw. And it is possible that including that scorer data could improve the process here as well. But for now, let’s err on the side of excluding that data.
Breaking Down the Fielders
What we do now is apply the process from above to individual fielders. As we did for teams, we break down outfielder plays, infielder plays on the ground, and infielder plays in the air. That tells us how many plays each fielder made.
Then we look at each batted ball and estimate the likelihood that each fielder makes a play on it. The only data we are considering right now is the handedness of the batter who hit the ball. (For first basemen, we’re also considering whether or not they had to hold a runner at first.) We aren’t considering who eventually fielded the ball, whether or not the ball was an out or a hit, etc. Why? Because the outcome of the batted ball is a potential source of bias. By giving up some accuracy in the short run, we allow truly great fielders to look truly great—otherwise, we artificially compact the spread of the impact of top fielders over time.
So we have our measure of plays made, and our estimate of chances. We can leave off there, at least for infielders (Outfielders will require a bit more work, I’m afraid—and that will have to wait for another day). But we discussed uncertainty—can we at least try and measure it?
Ignore, at least for now, uncertainty about actual plays made—for first basemen and pitchers especially we do have some, but enough that we can afford to at least set it aside for a while. But for our estimate of how many plays a fielder should have made, we know there is a margin of error. What we can do is calculate the uncertainty of our estimate per ball in play, and use that to figure our total uncertainty for any given player.
What I did is figure the root mean square error between the average number of plays made and the actual plays made, on an individual basis.
For example: In 2009, with a right-handed hitter batting, a shortstop will make a play on a ball in play roughly 12 percent of the time. (For a left-handed batter, a shortstop will make a play on a ball in play roughly six percent of the time.) But the margin of error around our estimate of how often a shortstop will make any single play is about 30 percent. (Notably the error is asymmetrical—obviously there is no chance of a shortstop making a negative play, even if in exasperation I may have accused Alex Gonzalez of it during the ’03 playoffs.)
To attribute that margin of error over a number of chances, we take:
What’s interesting about this is that the margin of error per BIP drops, the more BIP we observe. So, after 100 BIP, the margin of error for any one play drops all the way to three percent.
(That’s why, to me, uncertainty is preferable to bias—with enough statistical power, we can plow through uncertainty readily. Without an accounting of what the bias is, we’re essentially powerless against it.)
After taking you all this way, surely I wouldn’t leave you without something to look at, would I? Here are the top 10 seasons by a shortstop since 1950, according to our new fielding metric:
I’ve provided a tentative conversion of plays to runs, although it still needs a little work. Note, for instance—Ozzie Guillen is being credited with about 73 plays above the average shortstop for 1988. That’s pretty impressive. It’s also pretty imprecise, with a margin of error around 20 plays.
What’s important to note is that the error is not symmetrical—we think there’s practically no chance that Guillen really made over 90 plays above average, for instance.
So, on a single-season level, we see some quizzical results. (Brendan Ryan? Really?) The important thing to remember is—we aren’t very confident in those results! Our confidence increases as we move to the career level, though:
It isn’t to say there’s no uncertainty. We can say, given the statistical evidence we have at hand, there’s a small (but not impossible) chance that Mark Belanger saved more runs compared to the average shortstop than Ozzie Smith did. And after that, well, nobody else is in the running.
And nobody has really disputed how good Ozzie Smith was—but other metrics haven’t fully captured the magnitude of it. Our own FRAA, for instance, gives Smith 266 runs above average. Sean Smith’s TotalZone says 239 runs above average. In reality, Ozzie was better than that—a lot better.
Well, obviously I have to produce outfielder measurements as well. And there are probably still some tweaks to be made to this system that could improve it.
But past that—these values cannot simply be used in place of FRAA to calculate WARP the way we’re doing it now. We have this measure of uncertainty. We can similarly compute uncertainty for our offensive metrics (and it’s quite a bit smaller on a per-play basis). We cannot, in coming up with a single value to express a player’s season, add defense to offense as though we are equally certain of both.
So we’re going to be revising WARP to account for this uncertainty. Along the way, we’ll be adding some other enhancements to WARP as well. And we’ll be looking at pitching—after all, a lot of what we’ve always thought was pitching is fielding, isn’t it? And so any uncertainty we’ve had in measuring fielding spills over into pitching as well.
So, consider this a beginning, not an end.
Notes and Asides
I should give a nod to Bill James’ Fielding Win Shares, which served as an inspiration for some of my efforts here. I should also give a nod to the work Smith has done on TotalZone, which was also something I spent a lot of time thinking about.
For some discussion on the spread of defense, I cannot recommend these enough: