April 29, 2010
Talking two languages
Sometimes when I'm talking about baseball with someone, I feel like... like we're speaking two different languages, with a common ancestry perhaps. There's a dissociation between what I'm saying and what they're saying. And no matter how long we discuss it, we don't come any closer to real understanding - we just keep circling round and round each other's points.
Of course, I'm using the words "talking" and "with someone" rather loosely - I understand, for instance, that Jon Heyman probably did not write his article about Ryan Howard's contract with me in mind, for instance.
And while Heyman is talking about Ryan Howard and RBIs, we could be having this discussion about anything - saves, pitcher wins, batting average with runners in scoring position. (And in fact I probably am - this is, I guess, a response to an article by Jesse Spector about pitching stats and the ensuing conversation on Twitter.)
Of course, there is an easy explanation why I feel like I'm trying to speak a different language when I have these conversations - it's because I am. Granted, we're both speaking English, but at some level, the language that "sabermetricians" use to describe baseball is certainly different then what, well, call them "traditionalists" use. And that difference in languages reflects (and sometimes reinforces) a difference in worldview.
Now, I'm going to let you in on a little secret. The core point of sabermetrics - the reason it exists - isn't because of statistics at all. Bill James defined sabermetrics as "the search for objective knowledge about baseball." It's a nice, succinct statement that actually hides a profound truth - we're searching for knowledge about baseball because we don't have all of it yet.
At this point I have to make a confession - I don't know everything there is to know about baseball. And I'm pretty sure that statement will be true 30 years from now. And I think we can generalize this to a larger statement: Nobody will ever know everything there is to know about baseball.
So at the very heart of sabermetrics you have this idea of doubt, the persistent questioning of what we know and how well we know it. And then we compound that doubt with questions about how well what we know addresses the question at hand.
What we end up with is two fundamental truths - what matters is how much something contributes to wins and losses (or at least runs scored and runs allowed, which are the building blocks of wins and losses), and that a player's contributions to wins and losses are shaped by the contributions of his teammates. (Even in "context neutral" measures of performance, such as True Average, what we are considering is how a player's efforts impact the wins and losses of an average team, or at least an abstraction of an average team.)
Circling back around to the subject of RBI again, let us concede that an RBI is a recording of a fact - the fact that it was a specific player's turn to bat when a run scored. Now, we know that all else being equal, runs tend to score more often when good hitters are at bat - as a group, good hitters tend to have more RBIs then bad hitters. But we know that it's also a function of opportunities, both in the number of times at the plate a hitter gets and the number of runners on base when a hitter comes to bat. (As well, it's a function of what base those runners are on.) So an RBI is measuring at least two factors - the quality of a hitter and the quality of his opportunities. What a sabermetrician does is seek to distinguish the two, and evaluate a hitter's quality independent of his opportunities.
When we say a hitter "batted in a run," we reduce the complicated (and, in my opinion, beautiful) mechanism of how an offense scores runs to a simple, flawed and pointless abstraction. It's no more "real" than the abstractions that sabermetricians use, it's just a lot less expressive. It doesn't take into account how players get on base to score, how other players move them over to put them in better position to score, or how players avoid outs to give other players more times at the plate and more chances to drive in runs.
On the pitching side, saying a pitcher "won a game" or "earned a win" is another oversimplifying, misleading abstraction. The pitcher didn't win the game - the team did. The pitcher's efforts are reflected in that win, but so are the efforts of the relievers and position players (both in hitting and in fielding). The very way we talk about pitcher "wins" leads us to discount the contributions of a pitcher's teammates and attribute those contributions to the pitcher himself.
And the language we use to describe these things, and the baggage that comes with it, shapes the way we think about baseball. (Call it the notion of linguistic relativity.) The problem with RBI and pitcher wins and saves isn't just that they're imprecise measures - all measures are to some extent imprecise. It's that they reinforce and encourage bad ways of thinking about baseball.
Now, it's certainly possible to come to the "right" conclusions using traditional stats, assuming one takes care with them and assesses their limitations when coming to conclusions. But you quickly find out that you don't want to - there are easier, less painful ways of doing it. It's like trying to paint a portrait with boxing gloves on. On the flip side, quoting measures like True Average or wOBA in place of RBIs doesn't necessarily make you think any better. If you treat any advanced offensive metric as "more betterer RBIs" then you're more likely to be right than if you were using actual RBIs, but it won't be through any fault of your own.