April 17, 2012
Giving Difficult Plays Their Due
Believe it or not, most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.
Jon Bruschke is a professor at baseball powerhouse CSU Fullerton and is the webmaster for the DebateResults.com and asfbl.com websites. He appeared in the Emmy-nominated documentary RESOLVED. He was awarded the national debate Coach of the Year award in 2004.
I’d like to comment on the overall state of fielding statistics, starting with three observations. First, fielding is tremendously important to the outcome of games. If the name Voros McCracken doesn’t mean anything to you, your time would be best spent by setting this article aside and checking out his seminal work here. His key finding was that once a ball is in play the pitcher has very little to do with the outcome of the at-bat. My own best guess is that fielding works out to be responsible for about a third of the variability in how much a team wins, but serious baseball watchers can sit in front of virtually any game and identify three or four key defensive plays that turned the outcome. Home team up by one, bottom of the ninth, runners on second and third, liner to the gap: a sliding outfield catch of that ball has just as much to do with who wins or loses the game as hitting a clean, utterly un-catchable double in the same situation.
Second, as others (and especially BP’s Colin Wyers) have noted, fielding statistics are way, way, way behind pitching and hitting metrics. Thinking again of our play that could either be a game-winning double or a game-saving out, we have plenty of ways to measure the hit and the runs that it created, and virtually none to distinguish the defensive play from any other. In most places, it’s simply recorded as a putout, impossible to differentiate from the work the catcher did to receive a called third strike with the bases empty in the first inning.
Third, even the best of us, even the most quantoid-minded, live-in-a-basement-and-breathe-spreadsheets-not-oxygen of us, have a collective blind spot about this. We can sit and watch the games and fully appreciate the impact that fielding has had on the final score, and then pick up the box score the next day to check our fantasy points and forget what we had watched the night before. As soon as we bury our heads back into our beloved tables of numbers we instantly forget how the runs our pitchers “allowed” had as much to do with the players behind them as anything else and can’t recall how the “hits” our batters recorded depended on whether the ground balls between third and short were fielded by Derek Jeter or Ozzie Smith.
If you doubt the profound, all-encompassing nature of our collective blind spot, try this experiment: take any smart baseball person you know, bring up any topic of defense that has nothing to do with fielding percentage—catcher ERA, or outfield kills, whatever—and within five minutes they will always be talking about how players with greater range commit more errors. It won’t have anything to do what you started talking about, it isn’t anything that everyone doesn’t already know, but they can’t not talk about it. Repeat the experiment on the floor of the SABR convention in front of 30 retired NASA scientists with math degrees; the result will always replicate exactly. And it won’t happen because they’re dumb, or can’t stick to the topic, or generally lack conversational skills. It’s because almost none of us can think about defense outside of the box of hits and errors.
The aforementioned Colin Wyers has taken these problems head-on, and when he launched his efforts in July of 2010, his first article dealt with “first principles. I mean really basic stuff.” What I’d like to do here is take up Colin’s fight, start with first principles and even more basic stuff, and see if there’s a way to come to grips with our collective blind spot.
My field is communication, and one of the first things we learn in grad school is that how we name the world changes what we are able to see. You might have heard of the Sapir-Whorf hypothesis, which in its most basic form is that the words we use change the way that we are able to perceive reality. We have a language that describes hits and errors, assists and putouts, and maybe zones. Are these the names we need? Do they describe the world on the diamond in a useful, complete, and informative way? My first-principle question is this: Just how many types of batted balls are there? What is it we need to know to be able to separate the good from the bad defenders? Do we need new categories, and if so, what are they?
I believe our collective blind spot was coded into the original defensive categories of hits and errors. Implicit in this scheme is that there are two basic types of batted balls. The first type are those that are routine and any fielder should make, and a fielder who botches them is charged with an error. The other type of ball in play is a clean hit, which the fielder can’t do anything about, and so no account is made of it.
In reality, and this is my big take-home observation, there is a group of batted balls which some players can make and others can’t, or that can be made only with exceptional effort. My term for them is “difficult plays.” The crazy thing is that the ability to make these plays is almost the only thing that separates good from bad fielders, but we never try to count the number of these plays directly.
What virtually every fielding statistic and measure since fielding percentage has done is to try to estimate (not count) how many of these difficult plays a fielder is making. If just knowing hits and errors were enough, fielding percentage would be a fine measure. But while our various efforts are all attempts to estimate the number of difficult plays, we are all pretty bad about clearly naming the difficult play and being explicit about our goal of trying to find it or guess how many a fielder had a “chance” to make. A quick review is in order.
The zone schemes, which Colin has taken to task elsewhere and for different reasons than I’m discussing here, quietly assume that if we know where the ball lands and how fast it’s going, we can then count how difficult the play is to convert to an out. In other words, they are trying to estimate the number of difficult plays based on zone information, and the hope is that over a lot of plays it will be more evident which players are making more of the difficult plays. If shortstop A converts 80 percent of hard ground balls to fielding zone 27B into outs, and shortstop B converts 60 percent, by the end of the year that difference will tell us who did a better job of making the tough outs.
Colin and I have taken a different approach; we have both started by measuring raw defensive productivity (roughly, the number of outs a defender is responsible for) and comparing it to something meaningful. Colin’s Fielding Runs Above Average divides them by the average number of outs a player at a given position should make (with lots of adjustments), which is certainly sensible. My most recent scheme, which appeared in the Spring 2012 issue of the Baseball Research Journal, divides them by the total defensive productivity of the team.
The system starts with two premises. First, McCracken is right, and once a ball is in play the pitcher no longer has much to do with the outcome of the play. That means that the differences between teams’ batting average on balls in play (BABIP) is largely due to fielding effort. Second, because we have good defense-independent pitching stats, it’s now possible to figure out what percentage of runs allowed can be attributed to pitching, and if we can do that than the remainder must be due to the fielding. Put together, those two things mean that even though we don’t have great measures of individual fielding yet, we do have very good measures of team fielding.
That makes the remaining task to figure out which individual fielders are responsible for good or bad team defense. I take an approach almost identical to Colin’s (although he and I were working independently and unaware of each other’s work—great minds thinking alike, I suppose) and simply count the raw productivity: Who is responsible for the outs? Basically, fielders get credited for unassisted putouts and assists. Teams are given total points for the quality of their defense, and individual players on those teams are then given a share of the team’s points. The logic is quite similar to Bill James’ win shares, but this system is dividing up defensive performance and not wins. High scores go to players making a lot of outs on good defensive teams, low scores go to players making few outs on bad defensive teams.
In my humble view, both Colin’s FRAA and my “out shares” have advantages over the zone systems. But, just like the zone systems, it’s worth remembering that we too are estimating the number of difficult plays a fielder makes. Presumably, you make more outs than an average player at your position by making more tough outs. Presumably, your team has a lower BABIP by making more tough outs. If all the defenders are doing is turning more routine plays into outs, Voros McCracken was totally wrong and all that matters are pitchers after all.
I think that a better direction, and one that might be necessary for us to move fielding stats out of the minors and into the big leagues with our batting and pitching stats, is to simply try to define what a difficult play is and count them. There are efforts in this direction. I proposed a system in volume 36 of the Baseball Research Journal, and James, John Dewan, and the other minds at Baseball Info Solutions have a scheme that records 58 “defensive misplays” and 27 “good plays;” the good plays are very similar to difficult plays converted to outs. BIS’ system is at least partially proprietary ( though you can read about its results in the most recent edition of the Fielding Bible) and beyond the scope of the kind of data observers in the public sector can realistically collect, which limits the extent to which it can catch on among casual fans.
The inevitable bridge to cross is the objectivity/subjectivity divide. Ultimately, what makes a play a difficult-but-makeable one may be hard to pin down with great precision. But I’ll make two points here. First, simply being subjective is not the same as being arbitrary. Social scientists are very interested in the subjective interpretations that human beings give things, and they have developed an impressive array of techniques and measures to determine when a subjective decision is being made reliably (the interested can find more information in my BRJ 36 article). For example, how good-looking someone is would qualify as an undeniably subjective decision. But is Brad Pitt attractive? Catherine Zeta-Jones? Roseanne Barr? Rodney Dangerfield? Run those four faces past anyone with a minimal opinion, and it’s possible to come up with a pretty reliable ranking.
Second (and I’ll base this on the strangely large amount of philosophy of science reading I’ve been doing), none of our most high-end scientists thinks that objectivity is even possible. I’m talking the Stephen Hawking types. The advance of knowledge is no longer thought of as coming from objective rather than subjective observations; it’s thought of as coming from better rather than worse subjective observations. It’s a little esoteric, I know, but I think there is much to be gained from embracing a subjective decision and starting to talk about what makes a play difficult rather than throwing up our hands because it might be subjective. If what separates the good from the bad defenders is the ability to make difficult plays, then by God let’s figure out what we mean by a difficult play in the public sector rather than keep trying to estimate it with increasingly elaborate and uber-adjusted analyses.
In my opinion, what would reduce the size of our collective blind spot without too much extra effort is adding a single category to our current tally: the difficult play. Let it become an official statistic and put it in the box score. We could wake in the morning and say things like “Mario Mendoza was 0-for-4? Oh, but I see he robbed four hits last night, and one came with the winning run on third.”
The last issue for comment is labor. An official scorer would have to record the difficult play, and if not an official scorer, then someone working at a data service. But it’s a lot easier than James’ and Dewan’s 58-by-27 system, or figuring out which of 50-some-odd zones the ball landed in (or would have had it not been fielded). It might not require asking the scorer to do anything more than asking themselves, “Had that play not have been made, would I have ruled it a hit?” It would suffer from its own biases, but it still might just tell more than all that other coding combined.
I don’t mention the Sapir-Whorf hypothesis or the philosophy of science here simply to be pretentious (to be pretentious, I lecture undergrads on these topics). I mention it here because, more than a new metric or statistic or analysis or adjustment, what we need is a new way of thinking about defense. A new category. A new name. A new first principle. So let’s start thinking about that third class of batted ball, which is neither a clean hit nor a routine play, and see what we can come up with. I’ll bet it’s something pretty spectacular.