March 11, 2013
Maybe I'm Wrong
Well, this year's Sloan Sports Analytics Conference has come and gone, and I wasn't able to attend. Worse, I couldn't go to the SABR Analytics Conference either. Of course, I've followed along as best as I could, but there's no substitute for actually being in the room.
With analytics having their big moment in the sun, and with the topic of how analytics fit into sports still something of an open cultural question, there have been a few writers who have considered that intersection and written something about it. Over at SB Nation, Andrew Sharp wrote a review of the Sloan Conference (seems that he was in the room), which contained this excerpt:
If there's genius on display at Sloan, it's this: When scouts or coaches or old school GMs get something wrong, it's an example of traditional scouting methods failing. When analytics get something wrong, it's "randomness" that you can't control. A small part of a much bigger process, and teams and fans should trust that process until they get a better outcome.
This might be the most damning critique of sabermetrics (and sports analytics in general) I've ever seen. Worse, it might be true.
Let's first name what we're talking about. It's a well-known phenomenon in psychology called the self-serving bias. When something good happens to you, you will tend to see it as the result of your own hard work and talent. When something bad happens, you will tend to blame bad luck. When something happens to someone else, especially to a rival, those attributions are generally switched. This has been proven in the psychological literature about 500 times. It's everywhere. You will do it today, I promise. I will too.
If you're reading Baseball Prospectus, you're probably tempted to reflexively say "Sharp is wrong." I was too. But, if we're going to be intellectually honest, this critique deserves a more thoughtful answer. One thing that sabermetrics can pride itself on is that we've made great strides in analyzing baseball while minimizing the biases that go along with "the human element." If we're going to be good scientists, we can't let this bias bring us down. Otherwise, we're just cheerleaders for spreadsheets.
There is a lot of randomness in baseball. Balls take weird hops when they hit a pebble. Wind air currents push a ball just inside the line for a double, rather than a loud foul ball. Teams have years where they win 80 percent of their one-run games. How much can we blame on random noise when we (as a group) don't get something right? How many of our correct predictions are the result of random noise breaking our way, rather than our own brilliant ideas? And is there room to say that "old school" methods (to the extent that the stereotype of the scout going purely by gut feel even exists any more) might have some credence, but might also have to deal with the problem of bad luck as well? There isn't a way to directly answer these questions fully, but the questions themselves raise some important issues.
Consider that almost no one, sabermetrically inclined or not, predicted the Orioles or A's (see, Moneyball works!) to be even close to the playoffs last year. Everyone missed badly on those two, except for a few optimistic fans. Sure, the Orioles caught some very lucky breaks along the way, and if the season were played again a few million times over in parallel universes, I don't think that they would get that lucky again very often. But here I am blaming luck for something that I didn't get right. Then again, had one of my models predicted the A's as AL West champions, I would certainly point to this as confirmation of my awesome powers of awesomeness. This despite the fact that the model most likely to have produced that prediction would have been for me to have chosen a team from the AL West out of a hat and proclaimed them to be my pre-season favorite.
There are plenty of well-scouted draft picks who busted, and not just due to injuries. Look at the recap of the first round of any draft over the past few years. It's fun to play the "He was drafted in front of these other three really good players" game... unless you're a scouting director. Then again, there are plenty of sabermetric darlings out there who were supposed to be the next big thing, but who just didn't make it either. Matt Murton will patrol the outfield for the Hanshin Tigers this year. If we're going to blame scouts for trying to draft blue jeans models, should we not also point out that OBP is not the only thing in the world that makes a baseball player good?
I think that one blind spot that sabermetrics hasn't yet dealt with is that we've misunderstood what it is exactly that we're good at. Sabermetrics is good at taking a lot of observations and sorting through the patterns in them in an unbiased way, at least within the constraints of how the mathematical model that sorts those data is defined. If the model is a good one, then it will perform well in describing the past and predicting the future. But what if the problem is that we just don't really understand what we're trying to model and our equations are off? It's great to have unbiased residuals, but how's the R-squared doing? (And why do we so rarely report it?)
To be clear, I do believe that the amount of randomness present in baseball can overwhelm even a good model. But the reflexive use of "I got unlucky over a small sample size" to explain away any variance is a problem that needs to be addressed in the field. Maybe it's the truth, but it's too easy to say and accept without really thinking about it. "Luck" gives you a get-out-of-jail-free card from having to examine what really went wrong. It short-circuits critical self-analysis, which is the key to breaking out of the self-serving bias.
To the extent that Andrew Sharp's critique is correct, it means that too often in sabermetrics, we sneakily start with the assumption that our models are correct. My model for processing information is inherently sound, and any variance is the result of chance. Paired with the other side of the self-serving bias (Your model is inherently flawed, no wonder you got such bad results), it begins with the assumption that I'm right and you have no idea what you're doing. There's a PR problem with that approach—one that probably has a lot to do with how contentious the field is in baseball more generally—but worse, at that point, we're not even doing science.
Maybe our models are brilliant. But maybe they're not. Maybe we don't know as much as we think we do or as much as we'd like other people to believe that we do. Maybe the "old school" models, even with their flaws, have strengths, but they too got unlucky over a small sample size. Maybe the reason that it stung so much when I read this critique was that when I thought about it, the very unsettling conclusion that I had to come to was "Maybe I'm wrong."