May 21, 2003
Lies, Damned Lies
Holes isn't just the movie you see begrudgingly upon discovering that The Matrix Reloaded is sold out on all 17 screens at the Springfield GooglePlex. No, "holes" are also one of the big concepts in Michael Lewis' Moneyball, and not just as a part of Billy Beane's vernacular. Rather, Lewis contends that every hitter (excepting Scott Hatteberg, Pickin' Machine) has a hole in his swing, and that the hole will inevitably be discovered and exploited in repeated trials. Unless the hitter is able to make adaptations of his own--retooling his swing, standing in a different place in the batter's box, taking more pitches--the hitter will not be able to survive in the big leagues for long, and will join Kevin Maas and Joe Charboneau in baseball purgatory.
It's a nice concept. Game theory hasn't been this sexy since Russell Crowe played the genius/lunatic somewhat resembling Princeton scholar John Nash in A Beautiful Mind. But is it real? Can it be tested? Does it hold its sabermetric water?
Let's use Reds slugger Adam Dunn as a test case. Apart from the fact that he looks just a tiny bit like a gerbil, Dunn makes a good guinea pig for a number of reasons:
With an assist from Ray Kerby's insanely cool Astros Daily software, I broke down Dunn's performance into individual plays based on the number of times he has faced a given pitcher. If Lewis' theory is applicable here, we might expect to see Dunn have progressively more trouble as he faces a pitcher more frequently, as they're able to exploit his weaknesses more quickly than he is able to make adjustments of his own.
Here are Dunn's numbers for 2001 and 2002:
Career PA PA BA OBP SLG OPS HR BB K 1 246 .260 .386 .529 .916 16 41 62 2 193 .271 .415 .503 .918 7 35 49 3 142 .248 .380 .598 .979 11 20 39 4 87 .217 .379 .406 .785 4 17 27 5 68 .316 .426 .632 1.058 4 11 15 6 58 .174 .345 .391 .736 2 10 18 7 41 .294 .415 .324 .738 0 7 7 8 27 .333 .481 .476 .958 0 6 6 9 21 .250 .429 .375 .804 0 4 6 10 18 .133 .278 .200 .478 0 3 7 11+ 57 .200 .368 .356 .724 1 12 8
Most everything in that table should be familiar, except for "Career PA", which represents the number of times that Dunn has faced a given pitcher. For example, the third row down summarizes Dunn's performance in all the plate appearances in which he was facing a pitcher for the third time.
Clearly, there is a trend downward:
The more often Dunn has faced a given pitcher, the more he has struggled. Almost all of the difference comes in his power numbers--entering the 2003 season, Dunn had hit just one home run in 164 PA against pitchers he had faced at least seven times.
It might be objected that I'm not working on a level playing field. Pitchers who he faces more frequently are likely to be better pitchers, while the Triple-A call-ups and associated retreads that surface during the course of a season are likely going to have only one or two shots at him.
In order to account for differences in the sample of pitchers that Dunn was facing, it'd also be helpful to convert to a slightly more accurate production metric. A linear model will be the most appropriate here since we're breaking things down on a play-by-play basis, and the old standby, Pete Palmer's Batting Runs is as good a choice as any.
I was able to evaluate Dunn's performance by comparing the Batting Runs he produced in given plate appearances to the average that the opposing pitcher yielded over the course of the season. For example, Dunn's first big league hit was a single, which came off Matt Clement (before his Hyde-to-Jekyll transformation) on 7/20/2001. Here's how the calculation shakes out for that at bat:
Batting Runs value of single = .0470 Batting Runs / PA for Clement, 2001 = .0038 Adjusted Batting Runs for PA = .0470 - .0038 = .0432
Clement was a slightly worse than average pitcher that year, so we take a little bit of credit away from Dunn for his hit. By performing this calculation for each of Dunn's plate appearances, and taking the average result, we can do a good job of adjusting for the quality of his competition.
Still, as the chart below suggests, it doesn't make a heck of a lot of difference:
There's a noticeable improvement in the quality of the opposing pitcher among guys that Dunn has faced 11 times or more (fewer Batting Runs allowed implies better pitching), but the trend downward is still evident. Maybe Dunn's approach really is flawed?
(Note also that, although Batting Runs was originally designed such that a league average hitter will produce zero batting runs, the league as a whole produces more runs now than when Palmer designed the metric, throwing the calculations off a little bit. The Adjusted Batting Runs figured here correct for this problem, as well as the quality of Dunn's opponents.)
To borrow one of the pseudo-intellectualisms from The Matrix (as an aside, has there ever been a more disturbing dystopian vision than the possibility that the entire tangible universe was created by Colonel Sanders?), human beings are very good at figuring out the whats, but it's the hows and whys that get us every time. We know that Dunn's performance suffered in the second half of 2002. Was it because he was beginning to see more repeats among the pitchers he was facing, and they were better able to exploit a hole in his swing? Or was it because of some unrelated fact, making it appear as though game theory was the culprit, when really it had nothing to do with it? And where can I get some of that orgasm cake?
To examine the question further, I divided Dunn's career into two portions, the first covering his big league debut through last year's All-Star break, and the second covering the second half of 2002. We can then rerun the Adjusted Batting Runs analysis to compare his performance in the two periods:
When we look at things this way, the pattern we observed before seems to disappear entirely. There's no evident relationship between the number of PAs that Dunn had against a given pitcher, and his performance; in fact, in the second half of 2002, Dunn's greatest struggles came against pitchers he was seeing for the very first time. It appeared that Dunn's struggles were due to difficulties in repeated trials against the same pitchers, but that result was merely an artifact of timing. It's the very definition of a confounding variable. Or is it?
[Insert gravity-defying ninja move here].
Thus far, I've assumed that if Dunn is a particularly easy hitter to adjust to--if his "holes" are especially problematic--that the pitchers who see him most frequently will be in the best position to take advantage of him. But that isn't necessarily the case. Baseball hasn't quite reached N.F.L./Operation Iraqi Freedom levels of advanced preparation just yet, but with improvements in information sharing and videotape scouting, it may be silly to assume that the information gathered about a particular hitter by a particular pitcher will remain proprietary for very long. Is it possible that Dunn does have holes in his swing, and that the entire league caught up to him in the second half?
Of course it's possible, and you can answer that question in just about any way you like. Within the analytical community, the responses inevitably run the gambit from the Ockham's Razor crowd, who look at a drop-off in performance of that magnitude and take it as prima facie evidence that the league caught up with the hitter, to the null hypothesis groupies, who assume everything is random until they're 95% sure that it's not. Is there a way out of the conundrum? Is this the end for our dynamic duo?
TO BE CONCLUDED...
Just kidding. Fortunately, we also have Dunn's performance this year to work with. While Dunn hasn't hit for average, he does lead the league in home runs and has drawn plenty of walks, turning in a vintage, pre-creatine Mark McGwire performance. It would certainly be a stretch to say that he's struggling. Who has Dunn gone long against?
Dunn HR, 2003 Opposing Pitcher PA Against ('01-02) Glendon Rusch 8 Wayne Franklin (3 HR) 0 Garrett Stephenson 8 Jason Simontacchi (x2) 11 Aaron Cook 0 Clay Condrey 0 Luis Ayala 0 Kerry Wood 18 Kevin Millwood 16 Brett Myers 0 Carlos Zambrano 8 Mike Remlinger 2 Shawn Estes 0
While Dunn has victimized some newbies like Clay Condrey and Wayne Franklin, he's also hit homers against some of the pitchers he's faced most frequently, like Kerry Wood and Kevin Millwood. It's possible that game theory really was working against Dunn last year, and that he's since made adjustments of his own; it's also possible that his struggles last year were just a random burp. In either event, there's not much reason for concern going forward: Dunn is no Maas.
Given the confluence of better, more readily available data, broader acceptance within the game, and an ever-ready supply of creative thinking, it's possible that we're on the cusp of a sort of third wave of sabermetric analysis. Instead of spending our time worrying about whether anyone is going to listen to what we have to say, we can focus on new directions for research. I like to think that some of the stuff we're doing at BP--whether it's Will Carroll's analytical take on injuries, or PECOTA, or Clay Davenport's work on defense--falls into that category.
The point, however, is not to lose sight of one's healthy skepticism when encountering these new approaches. It's possible that Lewis is really on to something with his emphasis on game theory. Maybe not Dunn, but are there certain types of hitters who flounder or excel in repeated trials against the same opponent? Are there certain types of pitchers? Do they stick around long enough for us to identify them? Or is all this talk about game theory the modern equivalent of clutch hitting, something that seems intuitive but is really just another way to get around our reluctance to attribute events to chance? I certainly plan to do my part, and replicate this sort of analysis for other players.
Until next time, we'll see you at the movies. Err...at the ballpark.