May 5, 2011
A Statistician Rereads Bill James
Believe it or not, most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.
Andrew Gelman is a professor of statistics and political science at Columbia University. He occasionally blogs on baseball, including here, here, here, and here.
I read my first Bill James book in 1984, took my first statistics class in 1985, and began graduate study in statistics the next year. Besides giving me the opportunity to study with the best applied statistician of the late 20th century (Don Rubin) and the best theoretical statistician of the early 21st (Xiao-Li Meng), going to graduate school at Harvard in 1986 gave me the opportunity to sit in a basement room one evening that October with about 20 other students, screaming at the TV, "Put Stapleton in!" Unfortunately, John McNamara didn't hear us, and the rest was history.
I'm much less of a sports fan than I used to be, but the lessons I've learned from reading the Baseball Abstracts have done much to form me as a statistician. James doesn't write much about statistical methods in any general sense—he comes up with what he needs to solve any particular problem—but from his practice one can extract some general principles:
Methodological pluralism: Rather than try to come up with a single number or a single approach to summarizing player abilities, team strategies, or any other topic, he tried out a bunch of different ideas. In statistics, I like to say that each substantive hypothesis deserves its own analysis: it's generally hopeless to expect that you can run a single regression and pull off the answers to each of your research questions, one coefficient at a time.
Controlled comparisons: Instead of comparing simple aggregates, be more careful and make comparisons on pairs or groups of similar players or teams. As economists Rajeev Dehejia and Sadek Wahba demonstrated in a pair of influential articles (they have been cited over 2400 times since their publication a decade ago), these comparisons work only when you are controlling for appropriate characteristics. In the case of Bill James's analysis, player age is typically a key comparison variable. From the standpoint of applied statistics, controlled comparisons combine the averaging that you get from having a moderate or large sample size with the insight that comes from understanding individual cases.
Conceptual models used as guides to comparisons: James has written many times that he does not study statistical questions, he studies baseball questions. Each analysis is grounded in some goal. A conceptual model such as the defensive spectrum, or the narrowing of abilities, or the contribution of speed to both offense and defense, drives the direction of the study and motivates many of the details of the analysis. I have tried to follow these principles in my own work.
One central method of statistics that Bill James does not draw upon very often (if at all) is fitting parametric models. For example, James found that the power two in the Pythagorean prediction for wins worked pretty well. He didn't try to estimate the power from data, nor did he, for example, try to come up with a conclusion such as, "each additional run is worth 0.093 wins." On the rare occasions that he did estimate a parameter (for example, the relative values of stolen bases and times caught stealing), he buried his methodology and had no interest in making a big deal about the estimation.
Fitting models is something that statisticians are trained to do and in fact do all the time. Why didn't Bill James follow the example of Pete Palmer and others and try to estimate the relative values of walks, singles, doubles, and other outcomes? I can't really say, but perhaps he felt that the formulas he used, such as runs created, which generally relied on few (if any) estimated parameters, worked well enough.
James's most famous number may be 27—his estimate of the age at which the typical player (including the typical superstar) reaches his peak. James has explained, illustrated, and justified this number in various places, but I've never seen him set it up as a statistical estimation problem: "find the value where the average curve hits its peak." He just doesn't seem to think that way. A statistician would naturally want to estimate the form of the curve (possibly using a nonparametric method such as a spline), estimate the peak, and then see how this peak varies over time, position on the field, player ability, and other measurable factors.
There is a mathematical reason, perhaps, for a Jamesian reticence about estimating parameters. It goes like this. Consider some curve (for example, the rising and then falling curve of ability for a single or average player, plotted vs. age). It will have some peak. At the peak, the curve will be flat (mathematically, it has zero derivative) and, as a result, the precise location of the peak in time will be difficult to specify. If a player is expected to have maximum ability around age 27, his actual best season might occur at 25 or 28, or even 35, perhaps. Even with averages it can be difficult to spot the exact peak. So perhaps it is better to come up with a reasonable number such as 27, check that it works with the data, and then use it as a baseline to think about the occasional shooting stars who peak early and the drug-assisted sluggers who have their statistically best years in their late thirties.
Another thing that I do all the time, but that James almost never seems to do, is make graphs. He loves looking at numbers but seems to avoid any and all chances to make scatter plots, line plots, and the rest. This may be simply a matter of taste. Two exercises in which I often use graphs are (1) checking and cleaning data, and (2) exploratory analysis—finding patterns in data beyond what is explained by my existing models. It's possible that James is so in touch with his data that he can do all the checking and cleaning just by looking at the numbers—he thinks of each data point as its own unique person or event rather than merely as one point in a distribution. If so, it may be that the Bill Jameses of the world can do their exploratory data analysis by looking at numbers, but the rest of us may benefit from graphical displays.
My two favorite Bill James lines:
When someone wrote asking him to look into some idea or another, James replied, "I'm not a public utility. If you care so much about this, do the analysis yourself."
Responding to a comment by some humanist type who was yammering on about how there are all sorts of truths that aren't in the numbers, James pointed out that the alternative to good statistics is not "no statistics," it's bad statistics. People who argue against statistical reasoning often end up backing up their arguments with whatever numbers they have at their command, over- or under-adjusting in their eagerness to avoid anything systematic.
I also love how he sprinkles his writing with commonsensical but non-obvious points. For example, when talking about a player being replacement level, he points out that this is not an insult—if you're "replacement-level," you're good enough to play for one of the best baseball teams in the world. Finally, I appreciate James's focus on defining players based on what they can do rather than what they can't. These are insights that don't sound like much in isolation but pack a punch when coming at the end of a statistical analysis.
Let me conclude this appreciation by listing a few things that Bill James has written that baffle me. One of the lessons of statistics, as with science in general, is that we can learn from anomalies. What are some of James's anomalies—those items he has written (or not written) that surprise me?
Quantitative analysis of baseball can take many directions. James has always focused on the decisions of a team's management: which players to hire or let go, what positions to play them at, when to platoon a hitter or rest a pitcher, when to yank a starter and put in a reliever, whether to save your best reliever for "save" situations. Related are other recurrent themes such as rating players or teams, adjusting for park effects, and estimating the offensive value of stolen bases.
But there are other quantitative aspects to the game. I think it's just as well that James has not tried to estimate what factors predict player compensation—I couldn't care less about this one, and I get bored when I open the newspaper and find that the entertainment page or the sports page has become the financial page—but it's notable that he hasn't written much about the topic, especially given his extensive experience in arbitration meetings.
Another much-studied topic in baseball is game strategy. James has occasionally written about when it's advisable to bunt and when a team should use a pinch-hitter, but I haven't seen him spend much time on calculations such as, "If you have a man on second with one out, you can expect to get 1.2 runs," those Markov chain analyses that are a natural part of the sabermetrician's trade. I wonder why James has not written more about these analyses—is it just because others have done it well, so he feels no need to duplicate the effort?
Similarly, I've never seen James write much about strategies within a plate appearance. If a pitcher has a few different pitches, should he just throw them at random? Or does it makes sense to be more likely to throw a fastball (say) on the first pitch? Which sorts of pitches are more likely to be fouled off, and by how much? I realize that I'm demonstrating my ignorance by even asking these questions in this way; my point here is that I'm surely not the only person whose knowledge of sabermetrics is bounded by the Baseball Abstracts at one end and Moneyball at the other, and I'm surprised that James never seemed interested in tackling these questions systematically. I'm not demanding or even asking that he do so (see the "public utility" quote above), just curious that he hasn't done so already.
A similar line of study concerns a batter's choices. In one of his books, James remarked that if you swing at more pitches, you're likely to end up with fewer walks but a higher batting average. This makes sense, but I'd be interested in seeing a more systematic analysis, along with related issues such as when it makes sense to let a first pitch go by, and how effective is the strategy of having a batter who can exhaust the pitcher by fouling off pitch after pitch after pitch. (That last strategy has always seemed a bit unsportsmanlike to me, but that's another story.) Again, I'm not saying that James should do this or that analysis, just wondering about his choices of what to focus on. He seems more comfortable reimagining the decisions of a team's general manager than thinking about the microdecisions of individual players.
Bill James is now one of the biggest names in baseball, but he used to be an outsider. The very first article in his 1984 Baseball Abstract is called "Inside-Out Perspective," and it expresses his opinion that when studying baseball it is better not to be too close to the individual players and outcomes: "There will be in this book no new tales about the things that happen on a team flight, no sudden revelations about the way that drugs and sex and money can ruin a championship team. I can't tell you what a locker room smells like, praise the Lord. But perspective can be gained only when details are lost..."
Things have changed, though. By the time the updated edition of his Historical Abstract came out, in 2001, James was writing, "Are athletes special people? In general, no, but occasionally, yes. Johnny Pesky at 75 was trim, youthful, optimistic, and practically exploding with energy. You rarely meet anybody like that who isn't an ex-athlete—and that makes athletes seem special." I've met 75-year-olds like that, and none of them was an ex-athlete. That's probably because I don't know a lot of ex-athletes. But Bill James...he knows a lot of athletes. He went to the bathroom with Tim Raines once! The most I can say is that I saw Rickey Henderson steal a couple bases in a game against against the Orioles.
Cognitive psychologists talk about the base-rate fallacy, which is the mistake of estimating probabilities without accounting for underlying frequencies. Bill James knows a lot of ex-athletes, so it's no surprise that the youthful, optimistic, 75-year-olds he meets are likely to be ex-athletes. The rest of us don't know many ex-athletes, so it's no surprise that most of the youthful, optimistic, 75-year-olds we meet are not ex-athletes. The mistake James made in the above quote was to write "You" when he really meant "I." I'm not disputing his claim that athletes are disproportionately likely to become lively 75-year-olds; what I'm disagreeing with is his statement that almost all such people are ex-athletes. Yeah, I know, I'm being picky. But the point is important, I think, because of the window it offers into the larger issue of people being trapped in their own environments (the "availability heuristic," in the jargon of cognitive psychology). Athletes loom large in Bill James's world—I wouldn't want it any other way—and sometimes he forgets that the rest of us live in a different world.
Just last month, James concluded an article in Slate on racism and society by writing, "this situation is not a failing of the sporting world. Rather, it is that the rest of society has been too proud to follow our lead." The ultimate outsider is now in the clubhouse.
I noted above that I like BIll James's methodological pluralism, his willingness to try out lots of ideas and get different insights using different methods. Sometimes, though, the results confuse me. For example, he's argued for decades that on-base percentage and slugging average are more informative than batting average and RBI—but then he provides the following four statistics for every player in his historical abstract: games played, home runs, RBI, and batting average. At the very least, why not give on-base percentage and runs scored? Similarly, James was really into the concept of "secondary average" for a few years before it seemed to disappear. I can't tell whether he decided it was a bad idea or simply became interested in other things.
My biggest Bill James puzzle involves batting order. Over and over he talks about bad leadoff men and great leadoff men and criticizes managers who lead off with a speedy "contact hitter" with a .280 OBP. Where to start? The 1985 Abstract features a long discussion of the San Diego Padres' lead-off problem and then continues a few pages later with a lengthy explication of James's frustration with managers who don't know how to set up a lineup.
But then, in his 1997 book on baseball managers, James looked at the subject one more time: "There is probably no subject within the province of managing which draws more comment than batting order...Let's start with the broadest question: How much difference would it make?" He ran a simulation (on the 1930 Cubs) and reported his results: "How much difference was there between the 'correct' batting order, and the same players in an obviously irrational order? Surprisingly enough, very little...50 runs per season [i.e., about 5 games, using the standard 10:1 conversion factor]...if the difference between a reasonable batting order and an unreasonable batting order is only 5%, what do you suppose would be the difference between two reasonable batting orders? That's right: it's nothing." James concludes his discussion in his usual pugnacious style: "Our model is far from perfect...But for now, this discussion has two groups. On the one hand, you have the barroom experts, the traditional sportswriters, the couch potatoes, and the call-show regulars, all of whom believe that batting orders are important. And then, on the other hand, you have a few of us who have actually studied the issue, and who have been forced to draw the conclusion that it doesn't make much difference what order you put the hitters in, they're going to score just as many runs one way as another. You can believe whoever you want to; it's up to you."
My question is, where does the Bill James of the Baseball Abstracts fit in to this scheme? It's perfectly fine—admirable, even—for him to change his mind on the importance of batting order, but it's odd that he doesn't acknowledge the shift. Was it actually okay all those years for those managers to be leading off with .250 hitters who never drew walks?
I don't want to conclude on a down note, though. It is only because Bill James's ideas, methods, and principles have influenced me so much and have burned themselves into my brain that I am aware of the places where he's changed. In statistics we like to say that God is in every leaf of every tree: whenever we work on any serious problem in a serious way, we find ourselves quickly thrust to the boundaries of what existing statistical methods can achieve.