BP Comment Quick Links


July 28, 2010 Manufactured RunsLooking Farther AfieldI have been making something of a ruckus recently about where I feel the state of current defensive analysis is. I have been long on listing problems, and short on proposing solutions. Well, allow me to make amends there. I don’t pretend to have the problem solved. I’m not sure any of us will ever see it truly solved. But I think—or at least, hope—this can point us in the right direction. The Two Problems We can really subdivide our problems neatly in two. One is the issue of bias, the other of uncertainty. Let us start with the latter. What we are trying to do here is measure, and then compare, two things:
The first, we all think we can measure directly—given the record, we can readily come up with a total. We may have some disagreement over what to count, but if we agree on what we should be counting, we can come to an agreement. The second is an estimate and, as such, is subject to error. Over time, the error in our estimate should come down (as a proportion of our estimate, that is). Now, what modern defensive metrics (one based on observational data, like battedball types, hit locations, etc.) are trying to do is to cut down on the effects of measurement error on our estimate of plays made by an average player. By attempting to reduce measurement error, those metrics have introduced the potential for bias into their estimates, however. The two key ones are:
So what we have is some presumption of increased accuracy, in exchange for additional bias. What we do not know, as of yet, is how much accuracy we are gaining, at the expense of how much bias. And I think that’s an important thing to know—if your gain in accuracy is less than the amount of bias you’re introducing, you haven’t actually gotten better, you’ve gotten worse. And we know how to solve the accuracy problem—get more data! Over a long enough timeline, the estimates will improve on their own. Adding more data doesn’t make bias any better, though—in fact, over time, the effect of bias becomes more powerful. Just the Facts, Ma’am So let’s take a different approach. Let’s try to design a fielding metric with no bias—or, at least, attempt to minimize the effect of bias. What we can do is:
Over time, the potential inaccuracies of our data should wash out, and because we think we are minimizing our potential for bias, over a long period of time we should be able to be confident of our measure of a fielder’s ability. Figuring Plays Made Looking at playbyplay data available from Retrosheet, we can start off with counting the plays a player actually made on the field. Ideally, what we would do is separate the fielding of balls hit on the ground (OK, OK—ground balls) from balls hit in the air (popups, liners, and fly balls). But we’ve already committed to not using that sort of data. Is there anything we can do, simply looking at facts, to determine what sort of plays a player made? For outfielders, it’s a simple matter. We just count an outfielder’s unassisted putouts as his plays made. (His assists we can examine separately at a later date.) For an infielder, how are we to determine whether he caught the ball on the fly or fielded it on the ground, without resorting to battedball categorizations? It’s simple (if a bit messy for first basemen and pitchers):
So this gives us, at the team level, outs on the ground versus outs in the air. And what we see is a strong negative relationship between ground plays and air plays, with a correlation of 0.77. So when a team makes a lot of groundball plays, the most likely explanation is that they saw a lot of ground balls. So, let’s adjust for that. What we can do is look at how many plays a team made in total, compared to the average team, and then look at how many groundball plays a team made compared to how many airball plays they made. A team with superior groundball fielders will not only have more groundball plays but likely more plays made overall. So for a team that’s aboveaverage on making groundball plays but belowaverage in making total plays, we “shift” the responsibility toward the groundball plays (in other words, inflate the amount of groundball plays we think the team should have made, but deflate the amount of airball plays we think the team should have made), while keeping the total number of plays we think the team should have made constant. This is, for lack of a better term, our “groundball rate” adjustment. It’s a bit of a misnomer, because we ignore any scorer data on the number of ground balls a defense saw. And it is possible that including that scorer data could improve the process here as well. But for now, let’s err on the side of excluding that data. Breaking Down the Fielders What we do now is apply the process from above to individual fielders. As we did for teams, we break down outfielder plays, infielder plays on the ground, and infielder plays in the air. That tells us how many plays each fielder made. Then we look at each batted ball and estimate the likelihood that each fielder makes a play on it. The only data we are considering right now is the handedness of the batter who hit the ball. (For first basemen, we’re also considering whether or not they had to hold a runner at first.) We aren’t considering who eventually fielded the ball, whether or not the ball was an out or a hit, etc. Why? Because the outcome of the batted ball is a potential source of bias. By giving up some accuracy in the short run, we allow truly great fielders to look truly great—otherwise, we artificially compact the spread of the impact of top fielders over time. So we have our measure of plays made, and our estimate of chances. We can leave off there, at least for infielders (Outfielders will require a bit more work, I’m afraid—and that will have to wait for another day). But we discussed uncertainty—can we at least try and measure it? Ignore, at least for now, uncertainty about actual plays made—for first basemen and pitchers especially we do have some, but enough that we can afford to at least set it aside for a while. But for our estimate of how many plays a fielder should have made, we know there is a margin of error. What we can do is calculate the uncertainty of our estimate per ball in play, and use that to figure our total uncertainty for any given player. What I did is figure the root mean square error between the average number of plays made and the actual plays made, on an individual basis. For example: In 2009, with a righthanded hitter batting, a shortstop will make a play on a ball in play roughly 12 percent of the time. (For a lefthanded batter, a shortstop will make a play on a ball in play roughly six percent of the time.) But the margin of error around our estimate of how often a shortstop will make any single play is about 30 percent. (Notably the error is asymmetrical—obviously there is no chance of a shortstop making a negative play, even if in exasperation I may have accused Alex Gonzalez of it during the ’03 playoffs.) To attribute that margin of error over a number of chances, we take:
What’s interesting about this is that the margin of error per BIP drops, the more BIP we observe. So, after 100 BIP, the margin of error for any one play drops all the way to three percent. (That’s why, to me, uncertainty is preferable to bias—with enough statistical power, we can plow through uncertainty readily. Without an accounting of what the bias is, we’re essentially powerless against it.) Some Examples After taking you all this way, surely I wouldn’t leave you without something to look at, would I? Here are the top 10 seasons by a shortstop since 1950, according to our new fielding metric:
I’ve provided a tentative conversion of plays to runs, although it still needs a little work. Note, for instance—Ozzie Guillen is being credited with about 73 plays above the average shortstop for 1988. That’s pretty impressive. It’s also pretty imprecise, with a margin of error around 20 plays. What’s important to note is that the error is not symmetrical—we think there’s practically no chance that Guillen really made over 90 plays above average, for instance. So, on a singleseason level, we see some quizzical results. (Brendan Ryan? Really?) The important thing to remember is—we aren’t very confident in those results! Our confidence increases as we move to the career level, though:
It isn’t to say there’s no uncertainty. We can say, given the statistical evidence we have at hand, there’s a small (but not impossible) chance that Mark Belanger saved more runs compared to the average shortstop than Ozzie Smith did. And after that, well, nobody else is in the running. And nobody has really disputed how good Ozzie Smith was—but other metrics haven’t fully captured the magnitude of it. Our own FRAA, for instance, gives Smith 266 runs above average. Sean Smith’s TotalZone says 239 runs above average. In reality, Ozzie was better than that—a lot better. What’s Next Well, obviously I have to produce outfielder measurements as well. And there are probably still some tweaks to be made to this system that could improve it. But past that—these values cannot simply be used in place of FRAA to calculate WARP the way we’re doing it now. We have this measure of uncertainty. We can similarly compute uncertainty for our offensive metrics (and it’s quite a bit smaller on a perplay basis). We cannot, in coming up with a single value to express a player’s season, add defense to offense as though we are equally certain of both. So we’re going to be revising WARP to account for this uncertainty. Along the way, we’ll be adding some other enhancements to WARP as well. And we’ll be looking at pitching—after all, a lot of what we’ve always thought was pitching is fielding, isn’t it? And so any uncertainty we’ve had in measuring fielding spills over into pitching as well. So, consider this a beginning, not an end. Notes and Asides I should give a nod to Bill James’ Fielding Win Shares, which served as an inspiration for some of my efforts here. I should also give a nod to the work Smith has done on TotalZone, which was also something I spent a lot of time thinking about. For some discussion on the spread of defense, I cannot recommend these enough:
Colin Wyers is an author of Baseball Prospectus. Follow @cwyers
21 comments have been left for this article. (Click to hide comments) BP Comment Quick Links Randy Brown (189) Great piece (again). Every time I see 'Colin Wyers' on the main page nowadays, I make a beeline for that article. Jul 28, 2010 07:54 AM Mike Fast (4387) As Colin mentioned, the margin of error is proportional the square root of the number of balls in play. The margin of error as a percentage, then, is proportional to the inverse of the square root of the number of balls in play. Jul 28, 2010 08:27 AM Mike Fast (4387) Excellent work, Colin! Jul 28, 2010 08:32 AM Mike Fast (4387) Colin, as I mentioned on Twitter, can you use these numbers to estimate the magnitude of range bias for various advanced fielding systems (and at various positions)? Over a large sample of players, the parkscorer bias should become much less important. Jul 28, 2010 08:56 AM ScottyB (23917) YAY!!!!!!! I've long thought that more sports analysis need to include some sort of margin for error or measure of variance. Thank you for doing so (it warms this academic researcher's heart!) Jul 28, 2010 09:07 AM ScottyB (23917) For example, placing 95th percentile ranges around such stats as MORP would be informative. Hypothetically, Ben Sheets may have a $6M Morp with a WIDE range while, oh... Mike Pelfrey may have a similar morp with low variance. Jul 28, 2010 09:15 AM baserip4 (44653) Excellent work, but clearly incorrect since it does not find that Omar Vizquel is the greatest shortstop of alltime. Jul 28, 2010 09:21 AM Darsox64 (10662) Perhaps I'm misunderstanding the methodology, but I don't understand why one should expect that "calculating the uncertainty of our estimate per ball in play, and use that to figure our total uncertainty for any given player" will give you the correct distribution of the error. Unless you assume that, in the sample size of five or ten years or the length of a career, the error will begin converging to an easytoplay with distribution (i.e. one without too much kurtosis), your assertions about the error term don't have a very strong foundation. Jul 28, 2010 10:40 AM Well, it's an interesting question  are there persistent factors that would keep a fielder's chances from converging over time? Jul 28, 2010 12:08 PM Darsox64 (10662) I wasn't saying it's not going to converge eventually, but we don't know when that eventually is. At a sample size of infinity, a method like yours will dominate methods that require biases and judgment calls. Before infinity, it is completely possible (probable) that making adjustments which can introduce range or park effect biases will give us better estimators. Jul 28, 2010 14:48 PM studes (280) I have a stupid question and/or comment. With the "ground ball adjustment," we're hurting aboveaverage infielders who are paired with belowaverage outfielders, right? And vice versa? Jul 28, 2010 12:41 PM studes (280) By the way, I do see that you addressed that in your article. I guess I'm just pulling it out a bit more. Jul 28, 2010 12:43 PM I hardly think it's a stupid question. I think it's a really interesting one. Jul 28, 2010 12:51 PM studes (280) Big huzzahs for introducing margins of error, by the way. Love it. Another stupid question, though: what range does the margin of error represent? 99% of potential outcomes? Jul 28, 2010 12:49 PM Mike Fast (4387) It's one standard deviation, which is about 68% of potential outcomes, right? Jul 28, 2010 13:26 PM studes (280) Yes, that's 68% (34% on either side). Two standard deviations are 95% and three are 99%. Jul 28, 2010 14:20 PM Tommy Bennett (15654) Those percentages are true if and only if the distribution of error is normal. Jul 28, 2010 14:29 PM studes (280) Yah, good point. I have no idea how to apply standard deviations to nonnormal distributions. Think it substantially changes the 68%? Jul 28, 2010 18:13 PM Tommy Bennett (15654) Well, you can state the minimum amount included within a given number of standard deviations of the mean using Chebyshev's inequality. It has the benefit of applying even to nonnormal distributions. However, it's pretty conservative and usually is an underestimate so is of limited useful value. Jul 28, 2010 21:13 PM Not a subscriber? Sign up today!

Thanks, for the great work. I'd love to see the complete list for infielders, but I just have to know what your system thinks of Nick Punto. Is Ron Gardenhire's deep and abiding love for Punto's getafteritness warranted?