Remember a few weeks ago when Alex Gordon was leading the American League in WAR? No one questions that Gordon is having a(nother) really good season and should rightly get some down-ballot MVP votes, but the best player in the American League? People quickly noticed that a good chunk of Gordon’s WAR came from his defensive ratings, where, at the time, he was picking up roughly two wins worth of value in left field. Gordon’s regarded as a good left fielder, but “good left fielder” is also the “great personality” of fielding aficionados.
It’s been known for a long time that a single year’s defensive ratings, particularly for outfielders, isn’t a reliable indicator of a player’s talent level. It might accurately represent what he did in the past year, to the extent that the data sources that we have available can do that, but it doesn't tell us what he really is. Commonly, I hear “you need three years’ worth of fielding data to get a reliable sample.” That’s fine if we want to know how good a defender someone really is, but when we are trying to figure out questions like “Who had the best 2014 season among these 15 randomly selected teams?” it means that a good chunk of that value is based on a stat that could just be a mirage.
I often hear people at this point appeal to technology. Statcast, the much-discussed stalking mechanism, erm … tracking technology that will tell us exactly where everyone was at all times on a baseball field will save us. Earlier in the year, MLBAM put up this lovely teaser video of Jason Heyward making a fantastic catch to end a game from July 2013. Wanna watch?
Currently, even the best “advanced” defensive metrics are based on data sources that have a lot of holes. Stringers manually input where a ball landed. They make judgment calls on whether the ball was a line drive or a fly ball. There’s very little data on how long a ball was in the air or how fast it was hit. No one tracks where a fielder was and how far he had to run. The metrics do the best they can with what data are out there (and it’s a heck of a lot better than fielding percentage), but what if we had better data? Statcast is that data set.
Or is it? As we can see from the Heyward video, we can expect to have information on fly ball hang time, fielder positioning (and distance from the eventual landing place), fielder reaction time and foot speed, and the length of the route that he actually took to get to the ball and how efficient that route was. Surely, more refined data will make for a better fielding metric!
Warning! Gory Mathematical Details Ahead!
Let’s talk about why defensive metrics, particularly outfield ones, are unreliable in the first place by starting with a fantastically oversimplified model. Most of what outfielders do to earn their keep is track down fly balls. We’ll start with a fairly standard fly ball with four seconds of hang time that is currently somewhere in the middle of the outfield of your favorite stadium. It would be really helpful for the defense if the center fielder could go get that.
Now let’s assume that as soon as the ball is hit, the center fielder reacts immediately (not true, but we’ll talk about that in a minute), and that he runs in a straight line toward where the ball will land (also not true, but again, we’ll talk about that). If the ball is going to come down right where he was standing to begin with, any minimally competent MLB outfielder wearing a glove could have converted that into an out. There are also sections of the park such that no matter how much range a fielder has, there is no human being that can run that far in four seconds.
So there are balls that can be caught by even the worst outfielder and balls that can’t be caught by the best. Then there are the ones in the middle that can be caught by the good fielders, but not by the bad. In our oversimplified model, we can assume that our center fielder has an effective range in the shape of a circle. But how big is that circle? We don’t have data explicitly on range for players, but we can make a few reasonable estimates. Let’s use a reasonable proxy for range, which is speed down the line from home to first. An 80 (elite) runner makes that trip of 90 feet from the right-handed batter’s box in four seconds flat (or less). A 50 (average) runner makes that trip in 4.3 seconds. Let’s just assume that those numbers hold in the outfield as well. On our four-second fly ball, the elite runner has an effective range of 90 feet, while someone running a 4.3 pace could cover only 83.7 feet.
This becomes a geometry problem. A circle with a radius of 90 feet has an area of 25,446.9 square feet. One with a radius of 83.7 feet has an area of 22,009 square feet. That means that the difference between an elite-range center fielder and just a guy is 3,437.9 square feet. Since center fielders are generally chosen because they have some wheels, while the 80 runner probably makes for an elite center fielder, the 50 runner is probably considered a poor one (again, taking out reaction time and route running).
Consider that your basic MLB park has a fair territory area somewhere around 110,000 square feet. If we are generous and say that the outfielders are only responsible for half of that area (55,000 square feet), and that the difference between good and bad outfielders in their effective ranges are similar (call it 3,500 square feet each), then 16 percent of outfield area falls into the category of balls that a really good outfielder would get to while a really bad one would not.
In 2013, center fielders handled an average of four balls per game that were classified either as fly balls or line drives, either ones that they caught on the fly or that they just picked up after they fell to the ground for a hit. That’s roughly 650 per season. If only 16 percent of those are in the area between the ranges of the good and the bad, then we’re talking about 100 or so fly balls. And that’s assuming that all balls hang in the air for a good 4 seconds. We’re probably talking about double digit numbers of fly balls where there’s any chance for the good and bad to show who they really are.
Let’s go back to some of the assumptions we made about range that were silly. Not all players have the same reaction time. There’s a good deal of scouting that looks at a fielder’s “first step.” Some people react more quickly than others. In general, it’s assumed that humans react to a visual stimulus (that ball … it is coming nigh to me!) within about 200 milliseconds, but we also see that there is considerable variation between people on this. We also know that some fielders take better routes than others (and with the new route efficiency stats, we ought to be able to prove it). With Statcast data, we ought to be able to put together some formula for figuring out how quickly, on average, each fielder reacts, and how fast he runs and how well he plans out his route and therefore figure out an effective range for each fielder. How much ground can Alex Gordon really cover? It’s just a matter of getting some math done.
Statcast promises to be a fantastic tool for figuring out what Alex Gordon’s or Jason Heyward’s true range is on your average fly ball. The ability to break apart reaction time from route efficiency from foot speed and even look at it directionally (how is Gordon when going to his left? His right?) will be a great boon to outfield coaches and talent evaluators. Unfortunately, it might not tell us what we actually want to know.
There are two issues that could torpedo our quest for a good reliable fielding metric. One is the issue of fielder positioning. If a fielder starts 70 feet from a ball, it’s a lot easier to catch than if it is 90 feet away. Undoubtedly, once these data are released into the wild, we’ll start to see multi-level analyses looking at whether certain fielders always seem to be nearer to fly balls (or grounders, for that matter) and whether those players tend to cluster on certain teams. Suppose that one team seems to position its players better than the others, regardless of whether the fielders then go on to make the catch. Should we credit the players on that team (or on those teams) for the catches they make or should some of that go to whatever system is telling them where to stand? What if Alex Gordon’s catches are all because the Royals are amazing at positioning?
But there’s another threat to a good, reliable fielding metric. Fielding is really rather noisy when you think about it. Consider for a moment that over four seconds of hang time, we estimate the difference between a good fielder and a poor one as a couple of feet of effective range. If for some reason our poor fielder had a few feet of head start, he would begin to look like an elite defender. How easy is it to get a few extra feet? Easier than you imagine, at least once in a while. Next time you are at a game, take a few moments and watch the outfielders in between pitches. Don’t watch the pitches. Watch the outfielders. They move around. Not a lot, mind you, but a lot of them fidget. I can’t say I blame them. As an outfielder, you stand there for minutes on end with nothing to do. And if you take a jump to the left or a step to the right, that’s a couple feet of movement. Remember that the subset of fly balls that distinguishes good fielders from bad is very small. But suppose that on one of those, the bored fidgeting just happened to take our fielder in the right direction. Over a small sample size, it wouldn’t take more than a couple (un)lucky strikes to bend the results one way or the other.
Even setting that aside, reaction time itself is highly variable within a person. We know that in the lab, well-rested people who have to perform a sustained-attention task are bound to have moments where their reaction time doubles (or more). Going from a base reaction time of 200 ms to 400 ms might not seem like much to the naked eye, but remember that 200 ms is the difference between a 50 runner down the line and a 70 runner. When someone is sleep deprived, the chances of a big lapse in reaction time go up further. These lapses happen randomly, and losing two tenths of a second is losing 5 percent of the time available to catch that four-second fly ball. It’s probably only a difference of a couple of feet, but we’ve seen that a couple of feet make a big difference. There’s a reason there’s a stimulant problem in baseball. Even slicing imperceptible amounts of time off reaction times in the field can have a big impact.
Fielding, especially in the outfield, is a very statistically noisy process when you break it down. No wonder we have trouble coming to a good consensus on how much value an outfielder added. He might be a very different outfielder from play to play and there are only a small handful of plays over the course of a season that will help us to distinguish who is good and who isn’t. That’s a recipe for a very unreliable metric.
Why Won’t Statcast Solve Our Problems?
Statcast will give us plenty of information on players and individual batted balls. We’ll probably be able to build much better models of how much a ball in the air hit with X distance at Y angle is worth in expected value. We’ll probably have a better idea of the overall true talent of individual players. But the question that we want to know, at least as it pertains to WAR, is “What did Alex Gordon do in 2014?” Did he catch those balls or not? The fact that whether or not he made the catch might have more to do with random luck isn’t as important.
In theory, something that is very luck-driven can simply count on the law of large numbers to equalize that random variance and bleed it out of the measure. But we’ve seen that over the course of a year, we’re only going to get a double-digit sample of balls that actually matter. Worse, because the difference between the ball being caught and falling for a hit in terms of value is something on the order of three quarters of a run, we can expect big swings in value even based on a couple of lucky catches or unlucky misses. It’s entirely possible that Alex Gordon really is a +20 defender in left field. It’s also possible that he’s really +10 and having a good year in the luck department. Or an average defender having an amazingly lucky year.
Sometimes the answer isn’t more granular data. The best answer is “Um, it’s really hard to measure this in the allotted time. Sorry.” I wouldn’t suggest giving up, but the reality is that we’re just going to have to live with the uncertainty. We’ve known that WAR has some margin of error that comes built in. I just don’t want people to get their hopes up that there’s something on the horizon that can save us. Outfield defense is just really hard to pin down.