June 1, 2010
Two Dogmas of Sabermetrics
In 1951, W.V.O. Quine published his landmark paper, “Two Dogmas of Empiricism.” His goal was to disprove a certain type of empiricism that was trendy among analytic philosophers in the early 20th century. That set of beliefs—logical positivism—sought to deny that any statement that was not empirically verifiable had any meaning whatsoever. What Quine showed was that the two eponymous beliefs, which he derisively called dogmas, were necessary to logical positivism but also false. That paper remains one of the most important works of 20th -century philosophy because it demonstrated the limits of a system of knowledge based only in observable fact and logic.
There exists, I think, a certain set of analysts in baseball who adhere to a sort of logical positivism. That belief is demonstrated by the drive to completely separate outcomes from processes, despite the fact that such cleavage is not actually possible. For example, to eliminate the effect of luck is one of the guiding goals of data-driven performance analysts. Here are two theorems that reflect that goal:
There are others, too, that are more minor. But these two theorems in particular—commonly accepted among sabermetricians and fellow travelers—are widely rejected by analysts and fans more generally. In each of the two cases, well-reasoned, data-backed arguments from very smart people go all the way to the water’s edge. But the data stops short of allowing analysts to defend conclusive statements—dogmas, if you will—like “pitchers have no influence over the rate of hits on balls in play” and “there is no such thing as clutch hitting.” Such certainty would perhaps be desirable—it’d certainly make analysis easier—but it is simply not supported by the data available.
Many (most? all?) of the easy-to-remember wisdom imparted by “stats geeks,” as the uninitiated seem fond of calling them, have come from attempts to separate processes from outcomes. Most of these insights have been useful, but over reliance on them can be deadly. For precisely the reasons why purely outcome-determined statistics incorporate luck, if newer metrics retain any scintilla of outcome (as opposed to process) data, they must be regarded with skepticism. But that’s just the problem: every single statistic we have—from pitcher wins to SIERA, is tinged with the bias of knowledge of outcomes.
First, we have to remember why any of this matters. After all, outcomes are what matters! It’s not how, as they say, it’s how many. Right?
No Coincidence You Can Also Open a Can of Prince Albert
Let’s take a recent example. Sunday, Prince Albert completed the Pujolsian feat and hit three home runs in a game. It wasn’t the first time he’d hit a hat trick (he did so twice in 2006 and once in 2004), but hitting three home runs in a single game is rare enough even for inner-circle Hall of Famers. So what was the probability that he would hit three home runs Sunday? This, it turns out, is a very difficult question to answer. Although we know for sure it came about, we cannot say that means there was a 100 percent, 90 percent, or even 50 percent chance it would happen. Of course, even if there was a .001 percent chance, it would happen on average once every 100,000 times, and it could be that we just so happen to live in that particular world.
In fact, we want to know the forward-looking probability from two days ago that he would hit three home runs. Such a figure might incorporate his career home-run rate (about 6 percent of plate appearances—can you believe he has 378 home runs?), his three-year weighted average home-run rate, some medical data about his back that day, the home-run tendencies of the likely pitchers, the park, the temperature, the month, the home plate umpire, what he ate for breakfast, and of course whether or not he is in fact a machine.
The point is that there are dozens of variables that need to be considered, and any model that uses only some will be merely an approximation. Even if we put everything we know into a roux, we are still relying on outcome data: career home-run rates are themselves based on previous outcomes, the probabilities of which were just as complex. The eager DB-jockeys among you may be clamoring to regress the rate to the mean, but now we have to figure out which mean is most appropriate—30-year-old Hall of Fame first basemen? Average NL first basemen? Average major-league hitters?—and then we have introduced uncertainty because of our choice of population to which to regress Albert.
Hey, He Might Have Hung That Slider On Purpose
That was a pretty simple example, don’t you think? But there are areas where we can say with more confidence that we are measuring skills and not outcomes. Take pitching, for example. There is panoply of rate-based metrics out there designed to disassociate a pitcher’s value from the outcome after the ball leaves the hitter’s bat. The basic DIPS components are K, BB, and HR, and various formulations expand on that to include GBs. Some have gone further, relying on batted-ball types like line drives and fly balls—even regressing those batted-ball rates to the league mean—but all these stats seek to assign credit and blame to pitchers only for those aspects of pitching over which they have control. That is to say, to assign credit and blame only for processes and not outcomes.
But this valiant effort, too, has logical limits. After all, strikeouts and walks are prone to fluctuation based on opponent strength, park factors, and umpire tendencies. Even putting aside disagreement over whether it is more appropriate to use actual home runs or fly balls as the input, it ought to be clear that either is subject to random variation as well as park and other effects. Again, we can correct for most park factors, and we can regress rates to an appropriate mean, but then we are making choices that introduce uncertainty (even as we remove bias).
There is an even bigger problem with our pitching statistics. The fact that we don’t know where a pitcher/catcher battery intend the ball to go means that we can’t separate the pitcher’s approach from his command. It’s impossible to say whether a changeup inside was what the catcher wanted or whether it was meant to be down and away. Because many times the best pitch is the one that is unexpected (even if it might be very hittable if the hitter were looking for it), we can’t even necessarily assume that a pitcher missed simply because he threw a pitch out over the plate. Perhaps we can figure this out for individual pitches based on video data, but aggregating this data over a whole season is downright impossible without standardized camera locations.
The Pure Gardens of Outcomes
The two areas in which we judge based on outcomes the most are the two areas that are the trendiest in the performance analysis world: baserunning and defense. These are two areas in which we simply don’t have the data to make inferences about processes.
In the case of baserunning, it is impossible to separate not only the process from the outcome (as in the unlucky cases where the wind carried the throw unexpectedly, or the dirt had a soft spot), but also the decisions of the players from the decision of the coaches, as well as the decision-making of the player from the in-game quickness of the player. All of these factors may conspire to make a baserunner seem better or worse than he truly is based on his processes.
Similarly in the case of defense, it is very hard to say even what outcomes resulted, at least at the individual level. It’s harder still to say when good outcomes were the result of good or bad processes, at least in a way that is aggregated over the course of a whole season. Invariably the raw data itself relies on human stringers who vary in their interpretations of the location and type of the batted balls. Even team-level data, like defensive efficiency, suffers from this problem because it only takes into account those balls that were in fact caught, not those balls the team deserved to have caught.
Question of the Day
I’m not suggesting that moving toward processes and away from outcomes isn’t a good thing. But I am suggesting that we ought to apply a similar level to skepticism to those halfway solutions we have at the moment. Perhaps the worst thing a data-loving fan could do is wait for perfect information (say in the form of Hit-f/x), because it not only isn’t coming, it isn’t possible. Am I wrong? Is knowledge of pure processes and skills possible to separate from knowledge of outcomes? Is the separation between truths that are analytic and those that are synthetic workable?