August 4, 2010
By Land, Sea, and Air
Last week, I began to lay out the foundations for evaluating a player’s defensive contributions without relying on subjective evaluations. Let’s try to move along and at least finish up with the basics, and maybe find some areas where we can improve upon later.
As a reminder—all leaderboards are being presented from 1952 onward. For players preceding the Retrosheet play-by-play era, this technique is sadly not usable. At a later date, we’ll look at adapting what this teaches us to a non play-by-play defensive metric.
These are simple. They could possibly be more complicated. But I thought erring toward simplicity was the correct move.
Right now, I am using single-year park factors, figured by taking a team’s performance (as well as their opponent’s) at home divided by performance on the road. These figures are then regressed roughly 40 percent to the mean. (That figure is computed separately for each sort of outs—grounder, infield air ball and outfield air ball—as well as for a team’s total plays made.)
When computing the team ground-ball out and air-ball out rates, those figures are first park adjusted and then used to figure the team ground-ball and air-ball split. That way there’s no “double counting” for a team that plays in a park that favors ground-ball outs over air-ball outs (or vice versa).
Does this make the metric better? I think so, and we can actually test the proposition. By applying park factors in this way, we can (slightly) reduce the margin of error of our estimates of average plays made—an average decline of about .01 for middle infielders. That’s not a big thing (and it doesn’t seem to make much of a difference for our career leaders), but it’s a real improvement and its measurable.
Further investigation may indicate that other sorts of park factors work better—maybe ones that apply a different adjustment to shortstops than second basemen, for instance. Or using three-year factors instead of single-year factors. And these are things that we can test, and provide an objective basis for saying—OK, these park factors work better than these other park factors.
With pitchers, we cannot use the team’s average ground-ball/air-ball split to figure out a pitcher’s chances—instead, we have to use a pitcher’s own ground-ball and air-ball out rate to figure it.
Other than that, the pitcher is treated the same as any other infielder, for the purposes of computing his defensive rating. And when we look at the career leaders, what we see doesn’t necessarily surprise us all that much:
Oh look, Greg Maddux was good at fielding. Who would have guessed? What might surprise people is the magnitude. We may be overstating the run value of a play at pitcher a little here—right now I have a generic run value for all play.
And, of course, we have an issue here in that we know Maddux made a lot more plays than the average pitcher, but we don’t know how many of those balls the other fielders may have made a play on if Maddux hadn’t. There’s a potential ball-hogging effect here, and that could be throwing off the results.
And another look at the career leaderboards, this time at the very bottom:
Jack Morris may have known how to win, but it doesn’t seem as though he knew how to field his position.
We’ve been tracking two kinds of air-ball plays this whole time—ones made by outfielders and ones made by infielders. And it turns out that what we see is that, just like ground-ball plays and air-ball plays, there’s a significant negative correlation between the two: Teams with more infield air outs make fewer outfield air outs, all else being equal. So, like we adjusted for a team’s ground outs versus air outs, we apply the same adjustment process to compensate.
This brings up an interesting question—at least, I find it interesting. What do you do with a team’s defensive ratings on infield air-ball plays made? And let’s think about the nature of these sorts of plays. Some of them will be a snag of a line drive, but most will be popups. Now popups are the easiest plays to make, typically—if a ball stays up in the air that long, it’s almost always caught.
So now here’s the question: Do teams with more infield air-ball plays have more popups (that is to say, an easier set of air balls to catch), or are infielders “hogging” a larger share of air balls than the average team?
As I see it, in neither case does the infielder himself deserve much credit for catching the popup—that is to say, the ball was essentially a sure out the moment it left the bat. For a skied ball like that hit to the outfield grass, if the infielder doesn’t go back and get it, the outfielder will surely come in and get it.
So if it’s simply a question of ball hogging, then our adjustments should settle the matter—the outfielders are credited in a way that adjusts for the stolen opportunities by the infielders. But if it’s a nature of the batted-ball distribution—in that case, shouldn’t the pitcher be credited for getting more popups? This isn’t a question that can be solved through recourse to the batted-ball data, of course—the way the batted-ball types are defined requires that all popups are balls caught by infielders, and so cannot tell us if it’s a function of ball hogging.
But let’s move on to outfielders for a moment. A look at the career leaders in center field:
Now somebody is going to look at that chart and go, “Are you trying to tell me that Paul Blair was a better defensive center fielder than Willie Mays?” And I think we should be open to the possibility that it’s so. (For what it’s worth, Blair certainly was well-regarded as a defensive center fielder, winning eight Gold Gloves.) But we should emphasize that this isn’t a conclusion that we’re particularly confident in.
What we know is that Blair made a little over 103 more plays above average than Mays did, given our estimate of average plays made. Now let’s take and add our margins of error for each. Because of the way that standard error scales, we need to do it like this:
Then we take the difference, and divide by the combined standard error to get the z-score:
A z-score of one would indicate 68 percent confidence. We’re a little more confident than that—about 72 percent. That’s certainly not nothing, but it falls well short of 100 percent confidence. So two reasonable people could look at all the available evidence, and come down on different sides of who the best defensive center fielder was. And that’s fine. Once we’re honest with ourselves over our confidence level in how well we can measure things, we realize that while the numbers can start a conversation, on their own they can rarely finish one.
One of the things that has occurred to me as I looked at how we measure defense is—to the extent that we’re uncertain about how to measure fielding, we’re uncertain about how to measure certain aspects of pitching as well.
This isn’t to repudiate the great work Voros McCracken did with Defense Independent Pitching; it’s probably the single-most groundbreaking finding in the history of sabermetrics. But the central finding that “there is little if any difference among major-league pitchers in their ability to prevent hits on balls hit in the field of play” doesn’t mean we can’t go looking for those little differences. And one of those little differences seems to be in how well pitchers field their position. And we can now start incorporating that into how we evaluate pitchers.
But next up, I’m going to look at evaluating offense. I don’t think there’s anywhere near as much ground to break on offense as there is defense. But we certainly know there is some margin of error in calculating a player’s offensive value as well. I’ll let a little bit of the cat out of the bag early: There’s a lot less than in measuring defense. And there are some interesting implications of that—but we’ll save that for another day.