June 13, 2012
The Madness of King Bill
The really great thing about learning from one’s predecessors is that you can learn just as much from when they are wrong as when they are right, if you take care and are vigilant—which is to say, that you recognize that an error was made to begin with.
At its core, sabermetrics got its start because people were able to read things and then ask questions about what they read, rather than taking everything at face value. Does this make sense? Is this really true? And I think that any sabermetrics worthy of that name is as willing to question itself in this fashion as it is everyone else.
(To that end, I think a fair amount of self-doubt is healthy. Too much is crippling, but too little is freeing the same way that taking a life preserver from a drowning man is. Unfortunately, writing with confidence tends to be more interesting than writing with doubt. I think the very best kind of sabermetrics writing is the sort that can make doubt compelling, but it’s hard to do, and especially hard to do often.)
One of the most instructive failures we can look at is the madness of King James the Bill. In the most direct sense, I am referring to his Win Shares system, but I think the lesson we can learn goes well beyond that.
James’ work on Win Shares has long been the “King in Yellow” of sabermetrics, causing many who dare to examine it to grow mad from the very strain of the non-Euclidean geometry within. In my opinion, the best explanation of the system I’ve seen is Patriot’s Win Shares walkthrough. The short version is that Win Shares is an often baroque, sometimes brilliant, always complex system. What separates Win Shares from most modern win-based metrics is that, instead of trying to attribute wins above average or above replacement, James tries to allocate all the wins a team has, period. This leads to some rather unique features of Win Shares.
A great example (and I have stripped away the heavy math portions, so those interested should read the whole series) is the splitting of offensive and defensive wins:
The first important step in Win Shares is to divide credit for the team wins between the offense and the defense. This is done by the percentage of marginal runs that each provides. … This is one of the crucial steps in the process, and it is a fairly clever one, certainly not something I would have thought up. I’m not sure it works, but it’s clever either way. What James does is calculate marginal runs against some very low baseline. He never explains exactly what this baseline is supposed to represent, other than that it works. He uses .52 for the offense and 1.52 for the defense. Let’s call the league average runs/game L. James says:
In other words, [runs per win] = 2L, since we can rewrite this equation as:
W% = (R-RA)/(2L) + .5
It does not matter if you use .52 and 1.52, or .5 and 1.5, or .6 and 1.6 – as long as there is a difference of 1 between the two baselines, the team W% formula will hold.
This, I think, encapsulates Win Shares in a nutshell—it works exceedingly hard to present itself as a measure of absolute wins, but at its core it is based on marginal wins and then “fudged” to look as though it was measuring absolute wins. Many of the flaws of Win Shares can be traced back to that fundamental disconnect between what Win Shares really is and what it is claiming to be.
In some sense, this is a bigger issue for defense than for offense. There is no such thing in Major League Baseball as a team or hitter who, in a reasonable span, will produce zero runs. But that’s a selection problem; you can readily find batters who would manage to put up a zero OBP (and power numbers to match)—if you need someone to demonstrate this, give me a bat and a major-league pitcher and I will strike out as much as you need me to. (In my wildest dreams, I could work myself up to a point where I could muster a groundout against a major-league pitcher.) So measuring offense against a zero baseline, while not realistic, is at least plausible.
If an absolute baseline for offense is a zero run team, though, what’s the equivalent on defense? A team that scores zero runs is practically impossible, but a team that allows every run is not just impossible in practice but impossible even in theory.
So instead James sets baselines for both offense and defense, mirror images of each other, and sets them low enough that if you look at it in a hazy sort of way, you can see them as absolute, not relative, runs. In the end, though, it doesn’t work—and watching James try to fit that square peg into that round hole is like watching a master craftsman relentlessly working off a set of blueprints from M.C. Escher. As much as you may cheer for him to succeed, the effort is doomed from the start.
I don’t bring this up to try and dissuade you from using Win Shares—I think Win Shares itself has already accomplished that task far better than I ever could. Instead, I’m trying to illustrate a point. Nothing in baseball analysis is absolute, everything is relative. And if you’re asking “relative to what,” you’re starting to ask the useful questions of baseball analysis. Hold onto this like a drowning man clutching onto a rope, and eventually you’ll be pulled to shore.
As sabermetricians, we often have ready-made baselines on hand, such as average and replacement level. Some will reflexively claim that average is better because it is easier to define—and it is, and sometimes it is useful, but average is an abstract mathematical concept with no baseball logic behind it. Replacement level has its fans who defend it as the solution to the problem of calling an average player with substantial playing time a zero (and it solves that problem, but as replacement players are hypothetical, your starting assumptions can affect what you are doing in a myriad of ways, some more obvious than others). Both are more useful than their detractors would have you believe, and being abstract or hypothetical does not invalidate the utility of a concept.
But at the core of it, figuring out which baseline has the most value needs to start with figuring out what question you are trying to answer. Not everything is a nail, and not every problem requires a hammer. Thinking carefully about what you’re trying to get at, and what sorts of comparisons are useful to that sort of analysis, is invaluable.
And if you aren’t explicit in deciding that what you’re looking at is relative, that doesn’t mean you’ll be answering things in absolute terms, it means you’ll be answering things relative to a baseline that you haven’t decided on with a clear rationale, and probably without awareness of how that baseline is affecting your analysis.
So I beseech you, do not give into the madness of the king. All value is relative. Anything else is self-deception.