September 14, 2010
Missing the WAR
The end of the season brings with it a lot of miserable things. It brings about the playoffs (and trust me, for a Cubs fan, that’s about as miserable as it gets), and soon thereafter the end of baseball altogether for the season (well, this may be more miserable). It also brings with it awards voting season.
Zounds! Awards voting season. Rarely is so much passion devoted to so little meaning—if by rarely you mean “every year, like clockwork.” It’s important to remember that, except nobody ever does. It’s as if every year September comes around and everyone is a tabula rasa—every argument starts over as if we’ve never been through this before.
Which would be fine, if these were interesting or stimulating debates, but most of them aren’t. Perhaps the most interesting thing you can learn from these discussions is the staggering number of people who can simultaneously believe “too much attention is paid to statistics in baseball” and “so-and-so is obviously the winner because of his impressive [HR | RBI | Win] totals.”
The awards themselves are of little help—most of them consist of only the vaguest of qualifications, so every voter is given wide latitude to define what the awards mean. And most of them are happy to explore every inch of that latitude.
And the focus of all this tempest in a tea pot? Finding the best player, or the best pitcher, or the best rookie… so on and so forth. But frankly, the gap between the best and the next best isn’t much of a gap at all. And it turns out that the smaller the gap, the more (not less) contentious things become. Large differences are easy to measure, small differences are hard.
This is something I’ve wanted to discuss ever since Murray Chass wrote about his ex-employer printing Wins Above Replacement in connection to the MVP and Cy Young voting. Very little has changed since the last time Chass made such a broadside, except now Nate Silver has a job at the New York Times and Murray Chass runs a blog.
So in a nod to that look back at Chass’ unchanging views on baseball stats, let’s look at the top 10 position players in the NL in Value Over Replacement Player:
I think if I presented you with that list of players as the “top 10 in the National League,” you’d be inclined to say—yeah, that’s reasonable. (Of course, I can come up with similarly reasonable listings by pulling up the top 10 in WARP, or the top 10 in WAR, or the top 10 in WAR–-depending on if you’re visiting FanGraphs or Baseball Reference.) But in terms of awards voting, the listing of the top 10 is less important than where those players rank in the top 10 To illustrate, those same players’ rankings in those measures:
I expressly didn’t compute an average of each player’s ranking in each measure, because that I think would distract rather than enlighten. The point is that the ranking isn’t very helpful in the first place—they’re such fragile things. Add three runs to Carlos Gonzalez’s VORP and he leaps from third to first.
That’s because the precision with which we’re measuring is somewhat less than the differences between these players. And beyond our precision in measurement, what these metrics are is a collection of assumptions about how baseball players create value. And often two very similar metrics will have different assumptions—about positions, about parks, about how to measure defense.
So in the sabermetric approach, there is still room for dissent. We aren’t heading for Chass’ imagined future, where a computer does all the reckoning and humans are mere observers. The computer is a tabulation machine, nothing more or less—it does the rote calculation as instructed to by a thinking human being.
I think what the sabermetric approach does best is it encourages you to think through the process of evaluating players, independently of the results. That strikes me as distinct from the way a lot of awards voters do it, which is to look at players and then work their way to figuring a definition of value. Much to do has been made of Gonzalez’s home-road splits, for instance. And we know that a player’s home park can affect scoring, and that Coors Field is certainly one of those parks. But once we adjust for that (and again, there are different assumptions you can make in doing so that can affect the outcome), does the magnitude of the home-road split matter? Once you figure out these rather subtle questions, the ordinal ranking of value falls into place rather easily.
But the downside is that, in collating all these assumptions into a single number, we invite people skipping all the details and going straight to the conclusion, sometimes not caring about the operating assumptions (or understanding that they exist at all—I’m still rather astonished by people who will quote “WAR” without bothering to note what site the numbers came from). The assumptions matter—in the aggregate sometimes not a lot, but for an individual player they can make all the difference in the world. And by uncritically quoting any of these metrics without examining the assumptions you’re letting someone else do your thinking for you.
I think the worst thing that could happen is for people to start treating any “above replacement” measure the way people 50 years ago treated pitcher wins or runs batted in. That would indicate that sabermetrics won some battles but lost, so to speak, the war. Because it’s not about numbers, it’s about a way of thinking about baseball (and the world)—one that admits that there are always new things to learn and new discoveries to make.
So I implore you—get your nose out of a game and watch a spreadsheet once in a while. I don’t mean to look at the results—I mean examine the process. Don’t think about what the numbers say, think about why the numbers are what they are and what assumptions you have to make to get there. Conclusions are boring and sometimes more final than they ought to be. It’s in figuring out how we come to those conclusions that we end up learning something meaningful about baseball.