The basic problem with trying to evaluate fielding performance is that it’s too complicated.
The greatest change in baseball thought over the past 20 years has been the shift of focus from one offensive statistic (number of hits / number of times to plate that did not result in a walk) to a better one (number of times reached base / number of times at the plate). Granted, I realize that I’m omitting sacrifice flies and catcher interferences there, but that’s the essence of batting average and on-base percentage. If you only knew on-base percentage, you’d do pretty well comparing players.
Unfortunately, there is no easy way to do this with fielding statistics–a fact that results in a disagreement between our eyes, instincts, and what we read. I’ve been trying to educate myself on fielding statistics for the last couple years, and I want to admit up front that I have not been able to reconcile them with my own evaluation. When I see Mike Cameron rated as a slightly above-average center fielder last year, I roll my eyes, because I have in my head a mental image of how far he can go to get a ball–a massive expanse few visiting outfielders can cover. The issue, though, is that it’s not an accurate picture or particularly useful in evaluation.
The first unit of mainstream fielding evaluation is the error. A player gets an error when he makes an obvious gaffe that results in a runner advancing. With this knowledge, raw errors can be compared: Orlando Cabrera led the major league shortstops last year with 29 errors, while Mike Bordick only made one.
That said, there are a few obvious problems with errors. First, they are the subjective judgment of the official scorer of that game. For an error to awarded, the scorer must believe that an ordinary effort would have gotten the runner out (or not allowed the runner the extra base). Hitters want to be credited for hits, while fielders don’t want to be marked with errors. This has led to direct lobbying of scorers by players and teams on decisions, and because the scorer is team-employed rather than a member of the umpiring crew, there’s a bias, conscious or not. It also requires the scorer make an evaluation on what an average player at that position would have done. If you’re a home-town scorer for a bad shortstop, half the plays by a shortstop you see are going to be that bad shortstop; he’s going to change the average you’re scoring against, no matter how you resist–it’s the nature of perception.
Second, it’s a counting statistic, like home runs. Boston’s Lou Merloni played an inning in left and made no errors. Does that mean he was a better left fielder than Jacque Jones, who made five errors while playing regularly all season. Hell no.
The error is also a little team-dependent: a shortstop with a slick glove-man at first base will have fewer errors because the ball will get past the first baseman less often on errant throws, which advance the runner and result in an error for the shortstop.
To turn errors into a rate statistic, we have fielding percentage, as (assists + putouts) / (assists + putouts + errors). This introduces a couple of new counting stats. First, though, a plea for indulgence: as I talk about the different stats–here and elsewhere, for the sake of ease of reading and writing–I’m going to gloss over much of the messy details. I’m going to talk about putouts without interference, for instance, and assists without the wording in the rules about deflection or slowing the ball mentioned.
Putouts are outs the fielder is directly involved in: they catch the fly ball, they tag a runner, or catch a ball while touching a base for a force-out. Catchers are credited for a putout on a strikeout as well.
Assists are awarded when a throw on a ball is made that results in a putout. For instance, a shortstop who fields a ground ball and throws the runner out at first is awarded with an assist, and the first baseman gets a putout. Six-four-three double plays get the second baseman both a putout (forcing the runner to second) and an assist (throwing the batter out at first).
This sounds pretty good, but again, it depends entirely on the error to determine the effectiveness of a shortstop.
One-error Mike Bordick’s fielding percentage for 2002 was .998, while 29-error Orlando Cabrera’s fielding percentage was .962.
- Three times A-Rod gets a solid hit
- Five times A-Rod makes a normal out
- One time A-Rod ropes a double on a brilliant curve that only two, three guys could have hit
- One time A-Rod pops up on a hanging curve even Rey Ordonez would have gotten around on
That’s a .400 on-base percentage, right? Four hits, six outs, in ten chances. Not under the fielding percentage system, though: he’d be scored as having an error on that pop-up, and the extra hit would have been no more than another successful chance. A-Rod’s “batting percentage” would be
(4 hits) / (4 hits + 1 error) = .800. Meanwhile, Rey Ordonez–who goes 2-10 on the same day, with no great hits and no mistakes–will post a (2 hits) / (2 hits + 0 errors) = 1.000 “batting percentage.”
And you can’t build a house on a bad foundation. Where batting average depends a little on errors, fielding percentage depends entirely on errors, and errors aren’t a good statistic.
Enter Range Factor. It’s a STATS, Inc.-invented statistic, designed to measure the number of plays a player successfully makes. There are many ways I’ve seen this defined, but the official STATS, Inc. method off their glossary page is that Range Factor is calculated as ((assists + putouts)*9) / (defensive innings played). Innings played is the number of outs seen by a player while at that position, divided by three. Multiplying the assists and putouts by nine approximates the number of plays that the player would make at a position if he played there for an entire game.
For example, if I play shortstop for an inning and I catch a pop-up and field a ground-ball I throw to first for a force-out, my Range Factor is (1 + 1)*9 divided by 1 = 18. I would be the best shortstop of all time if I kept that up. An average shortstop’s range factor’s about 4.6.
Range Factor offers a sometimes dramatically different picture than fielding percentage when looking at players. Mike Bordick remains near the top–second at 5.07–but Colorado’s Juan Uribe jumps to 5th with 4.89. Derek Jeter, who made a modest 14 errors and ranks in the middle of the pack in fielding percentage, drops to tie Tony Womack for dead-last, both with an RnF of just 3.81. Think about that: Juan Uribe makes an extra out a game over Derek Jeter.
Range factor is flawed too, though in different ways than we’ve seen before. Because it’s a measure of the number of outs a player creates in a measure of time, it is dependent on opportunity. The number of balls a player sees can vary tremendously. Hypotheticallly, if I was the shortstop on a team with a staff of flyballers, while my identical (and totally illegal) clone Kered played behind a rotation of groundballers, my clone would make many more plays and have a Range Factor much higher than mine.
Which brings us back to our subjects. Mike Bordick played behind a staff that has a ground ball to fly ball ratio of 1.31 in 2002–allowing 2,119 ground balls to 1,620 fly balls. Meanwhile, Juan Uribe’s team had a similar G/F ratio of 1.32–allowing 2,108 grounders to 1,603 fly balls. But Jeter. The Yankee staff got far fewer ground-balls either the O’s or Rockies did; their ratio was 1.15, as they gave up 1,907 ground balls and 1,655 fly balls in 2002.
With an extra 200 ground balls over a season, it’s not hard for a shortstop to pick up one a game and see their Range Factors climb. Even if you figure that those grounders are evenly distributed among the infielders, 50 extra grounders fielded would be an extra 0.3 on your fielding percentage, and when the difference from best-to-worst is 1.4 points, that’s significant.
Can you adjust for the difference in opportunity? As a quick diversion, here’s how often a couple of shortstops made a play (PO + A) as a percentage of all the team ground balls:
- 37% Alex Rodriguez, Texas
- 36% Juan Uribe, Colarado
- 33% David Eckstein, Anaheim
- 30% Derek Jeter, New York/A
- 27% Mike Bordick, Baltimore
Now, I have no idea how handy that is. It certainly doesn’t mean that Rodriguez fields 37% of all ground balls, since he accumulates putouts on balls that aren’t grounders as well. At best it’s a ballpark figure, heavily dependant on playing time. Bordick, for instance, appeared in only 115 games last season, while A-Rod played 160–a fact that drives down Bordick’s numbers tremendously. I could adjust that based on playing time, but I’ve got an article to finish here.
Back to Range Factor. It’s also affected to some extent by park effects, too, since a hitter’s park with no foul territory will result in fewer putouts for corner infielders, catchers, and to a lesser extent, corner outfielders.
To put Range Factor to work, it’s best to compare a player to his on-team comparables. Unfortunately, this is hard to do because sample size enters into it: if A-Rod plays 160 games at shortstop, there’s almost no innings there by other players to compare his numbers against. Still, let’s say you’re interested in whether a particular player has deteriorated over a period of a couple years while in the same park. You can compare his numbers at the same position against his backups, and maybe start to draw some conclusions.
Range Factor’s not really too complicated, but on a game-by-game basis, I don’t look at my hand-scored game and say, “Well, looks like Guillen had a good day in the field today–he had four put-outs and two assists, for a Range Factor of six.” Now this partly because it’s unlikely Carlos Guillen would have a game that good, but also because the scoresheet is offense-oriented (which is an entirely different issue). Still, if MLB keeps packing commercials into the half-innings, it’s something for me to do between hitting the can, buying more beer, and getting into trouble.
Of course, there’s the other problem, in that it’s still not a per-chance rate stat like batting average or OBP. It’s a purely aesthetic criteria, but it’s weird to explain to a newcomer that Barry Bonds gets on-base a spectacular 50% of the time, but in the field Derek Jeter makes under four plays per game.
Defensive Average (DA) and Zone Rating (ZR) are intended to try and resolve some of these issues. Defensive Average grew out of Project Scoresheet and the Baseball Workshop, when widespread availability of the location of every ball hit allowed people to attempt to assign the responsibility for fielding each ball to a player. Then it becomes possible to measure that against whether they fielded the ball successfully. Zone Ratings allow for areas of the field where balls hit to some areas aren’t assigned to a player–areas not considered “fieldable,” essentially, where it’s no one’s fault if a ball drops for a hit.
Both offer what should be the magic bullet: a percentage number we can use like batting average. Looking at STATS, Inc.’s Zone Rating, for instance, we can say that Alex Rodriguez fields 92% of balls hit into his zone, Mike Bordick 90%, with Juan Uribe taking a huge hit to 84%. And Jeter? Second-to-last with 80%, ahead only of Tony Womack at 76%.
Defensive Average and Zone Rating both have deficiencies, though. For one, there’s the issue of where a ball landed. It’s done by humans, marking it down on a sheet from some terrible seat in the stands. Plus, the areas of responsibilities are not defined differently for each park, and that opens up a whole box of other problems. They’re defined radially from home plate, and each park is divided into slices in the same way. This means that while the infield covers zones of the same size in every stadium, a park with a deep center field will have a much greater area in those zones than a shallow one, and a center fielder assigned to patrol that area has a much harder task ahead of him.
There’s also the issue of what happens to balls that can be taken by more than one fielder. Think of easy pop-ups. If the same outfielder gets all of them, his ratings are going to go up while his buddies will have theirs depressed. It’s a complementary system. After all, unlike the number of runs you can score, or times yu can get on base, the number of outs your team records on defense is finite–somewhere in the ballpark of 4,500 per season.
ZR may also face some problems with positioning. If a team has really good data on where balls fall against their team–or even better, each pitcher and batter–they may adjust their positioning dramtically (think of the Bonds shift, for instance) while these zone-responsibility measures assume they stand around in standard formation before the ball is hit. That well-positioned outfield might make many more plays in what is usually considered “unplayable” territory, while letting fewer hits go into zones that woiuld normally be covered. (Zone Rating counts outs made outside the area of responsibility, though, so shifting shouldn’t hurt a player’s ZR unless the shifts are counter-productive). It also means a player in a defensive position that doesn’t see many balls could (for example) get more credit for playing shifts if they mean that he makes more outs. Now, you’re not going to see that happen in games, but there it is.
Fielding percentage, Range Factor, and Zone Rating are the three most-commonly available fielding measures. They all have problems, but together we can use them to draw rough conclusions. You can’t hide a truly awful or outstanding defender. Alex Rodriguez is the fourth-best regular shortstop in fielding percentage, ninth-best in Range Factor, and first in Zone Rating. You can rest assured he’s pretty good. Similarly, Tony Womack is third-worst in fielding percentage, tied for last in Range Factor, dead last in Zone Rating. Survey says: the man’s pretty awful with the glove.
And for the most part, that’s where the normal fan’s interest stops. People understand batting average, on-base percentage, and (for the most part) slugging. Does he get hits? Does he get on base? Does he hit for power? Those three fielding stats are similar: Does he make errors? Does he make many plays in a game? Does he make an out on a ball he should get to? Except that they don’t really tell us those things, not as well as we’d like.
I took last year’s pitching lines for every team along with their overall fielding ratings. It’s a small sample in one sense (it’s 30 teams), but it’s not awful considering it’s also a full-season of defensive chances and all. I looked for correlation between raw runs scored and the different measures. Raw runs scored isn’t particularly good: you’re going to see a team like Arizona, with two dominating strikeout pitchers, throw off your lines, but overall it’s worth looking at. And correlation, of course, is not causation.
Quick note on correlation: Let’s say you’ve got two series that match perfectly. You’ll get a correlation of 1. Or, let’s say you’ve got two series, and one counts down to zero from 100 as the other ascends (say, drinks bought versus cash in wallet on a Friday). You’ll get a correlation of -1. Two lists of random numbers gets you a 0. Correlation between batting average and runs scored runs about .800, OBP and runs scored about .900.
Fielding percentage proves okay. I mean, sure, it sucks, but in raw form it’s got a -.398 correlation efficient, which is decent enough–if your team gets tagged with more errors, it allows more runs. It’s a modest correlation. It makes me wonder, too, if suggestions for improving the scoring of errors (adding a man to umpiring crews, and the fifth man is the official scorer for that game, or having the league employ the scorers, and do some training and peer review) would make errors respectable enough to be useful. Zone rating came out at -.250, which is sort of weakly modest: as range factor drops, runs allowed increased, sort of. Range factor had little correlation at all, at -.060.
Based on my sketchy sketching, even the most-mocked of individual offensive statistics, batting average, has a correlation with run-scoring twice as high as the best fielding stat offers us in the prevention of run scoring. In the course of writing this, I did a little further work trying to remove pitching and baserunning-related events from run totals, and while fielding percentage remained about the same, the correlation for zone rating went through the roof (to a modest -.590, which is a good negative correlation).
While offensive stats don’t have much to close to explain that last slice of run scoring, there’s an alphabet soup of additional offensive metrics: RC, RC/27, ISO, SEC, OPS, along with park-adjusted versions of all these, all offering slightly different pictures of offensive contributions. In the same way, researchers trying to close the much larger gap between fielding and run prevention have started to separate pitching from fielding, and in so doing they have created ever-more complicated formulas, which we’ll get to later this week.
Next Time: Advanced concepts.