April 16, 2003
Lies, Damned Lies
Strikeout Rate, Redefined
[Note: "Lies, Damned Lies" is a column written by Nate Silver, and will run every Wednesday throughout the baseball season.]
In last week's 6-4-3, Gary Huckabay wrote about the fact that our perceptions are more often colored by the way information is presented than by the substance of the information itself. There are plenty of examples of this, drawn both from the ballpark and the world at large. Get your hands on most any media guide, and you're sure to see the familiar rotisserie categories--batting average, home runs, RBI--presented prominently in bold face. Now, a typical media guide runs about 400 pages, and there's plenty of information to go around, ranging from the trivial (Mike Lincoln's career ERA at Busch Stadium is 14.29) to the frivolous (Joe Borowski's wife is named Tatum). Thus, should it really be that much of a surprise to find out that in the thick of that pulp forest, the people who rely on media guides to grab information on the fly--like beat writers pushing on a deadline and radio announcers trying to keep a cadence--gravitate toward those bits of knowledge that are literally staring them boldly in the face?
We know better, of course, just like we know not to trust any financial disclosures coming from Herr Selig, or directions coming from a cabbie at the train station, or poll numbers coming from just about anybody. In a number of surveys, most Americans didn't support unilateral military action in Iraq when multilateral action was first presented to them as an alternative--yet when that option was withdrawn from them, both literally and figuratively, a solid majority came out in favor of going in alone. Now, Patriotism has a lot to do with that, and well it should, but so too does question order and non-neutral wording. We know all of this, of course, because we're smart.
Hey, I think I'm pretty smart, too, and I have an annoying little hobby of trying to deconstruct print and television advertisements on the spot. It's not out of any postmodern malaise so much as the naive hope that by decomposing something, I'll somehow be able to avoid succumbing to it. But open up my wallet, and you'll find a Starbucks Card, a Banana Republic Card, and two out of the three pieces from the yellow group in the McDonald's Monopoly game from a few summers ago (damned Marvin Gardens).
All of this is a roundabout way to say that, while we can strive to be as critical as we want to be, the problem with taking something for granted is that we never realize that we're doing it until it's too late. One of those things--ranking somewhere in importance between wrinkle-free chinos and the siege of Basra--is the use of pitching statistics that are denominated in innings pitched.
Take a look at the following group of pitchers, sorted by their 2002 strikeout rate:
1. Nick Neugebauer 7.65 2. Juan Cruz 7.49 3. Chan Ho Park 7.47 4. Matt Morris 7.32 5. Barry Zito 7.14 6. LaTroy Hawkins 7.06 7. Dennis Tankersley 6.84 8. Keith Foulke 6.72 9. Odalis Perez 6.28 10. Ruben Quevedo 6.02 11. Derek Lowe 5.20Heck, it's only fair that Neugie comes out on top; memories of a high strikeout rate in 2002 are about the only thing he has going for him at the moment.
Now take a look at another list of pitchers, also sorted by their 2002 strikeout rate.
1. LaTroy Hawkins 20.3% 2. Matt Morris 19.8% 3. Barry Zito 19.4% 4. Keith Foulke 19.0% 5. Juan Cruz 18.7% 6. Chan Ho Park 18.1% 7. Nick Neugebauer 18.0% 8. Odalis Perez 17.8% 9. Dennis Tankersley 15.9% 10. Derek Lowe 14.9% 11. Ruben Quevedo 14.6%The first list is sorted by strikeouts per nine innings pitched, the second by strikeouts per batter faced. And no, I don't think I'm pulling the Woolner over anyone's eyes; those are the same pitchers.
What's interesting, though, is that the former list conforms a lot better than the latter with our notions of what a good pitcher is: Keith Foulke ranks ahead of Nick Neugebauer; Barry Zito ahead of Juan Cruz; Derek Lowe ahead of Ruben Quevedo. Nor is that entirely a coincidence: evaluating pitchers based on strikeouts per nine innings pitched is deceptive because it doesn't account for the fact that better pitchers throw shorter innings. A Neugebauer inning lasted an average of 4.7 plate appearances last year, a Foulke inning, just 3.9. Sure, you can make the argument that Foulke was helped more by his defense, but most of the difference is entirely Neugebauer's fault--the inevitable result of walking nearly a batter per. Ranking pitchers based on strikeouts per inning rewards those pitchers who allow their opponents to reach base more often, something that you'd ordinarily want to avoid. I don't expect anyone to start talking about Billy Wagner as a "30% pitcher," but strikeouts per batter faced is a truer measure of a pitcher's effectiveness.
The differences can be even more profound when you're dealing with small sample sizes. Joe Sheehan noted last week that Greg Maddux' disastrous start to the season had come in spite of an odd increase in his strikeout rate. Here were his numbers at the time:
Year K/9 ------------ 2001 6.68 2002 5.33 2003 8.59 (first three starts) Year K/BF ------------ 2001 18.7% 2002 14.4% 2003 17.1% (first three starts)Well, maybe the jump in strikeout rate wasn't so odd after all--as evaluated by Ks per batter faced, Maddux' strikeout rate to start the year wasn't really any better than a rough average of his numbers from the previous two seasons. It's just that when your opponents' OBP jumps from an Ordonezesque .290 to Thome-like .430, you have a lot more opportunities to get your licks in.
Implicit in all of this is that ERA doesn't display a particularly linear response to batting events. When evaluating offensive performance, it's safe to ignore the non-linear nature of run production except in the extreme cases--like, say, Barry Bonds and a random group of eight mortals. The same can't be said for pitchers because they, by the very nature of their job description, face a series of opposing batters in sequence. Little differences go a long way.
The problem is, this fact is obscured when the inputs are denominated in the same units as the outputs. Take a look at the following chart, which is based on the formula that PECOTA uses for estimating ERA from a pitcher's peripheral statistics (a very close relative of the Clay Davenport PERA formula for doing the same thing). Let's assume that all of the rest of the pitcher's attributes--his strikeout rate, homer rate, and hits allowed rate--are league average, and hold constant as we increase his walk rate by taking outs and turning them into free passes.
Well, I'll have to ask Gary if I can borrow his level, but that's about as straight as it gets. If we look at things in terms of walk rate per batter faced, however, the graph is about as straight as Corky St. Clair.
The notion that a small tick upward in a pitcher's peripheral statistics can translate into a big improvement in his performance really comes though here. By keeping runners off first base, a pitcher not only reduces the risk that they'll be driven home, but also reduces the number of opportunities that the opposition has to do so.
In case you're curious, the PECOTA system does operate in terms of events-per-batter, rather than events-per-inning, before translating everything to the more familiar metric as a last step prior to publication. As a result, PECOTA's better able to reward pitchers like Foulke and Mariano Rivera who are fantastically good at shortening the length of innings, while withholding its enthusiasm for the Brad Penningtons of the world.
And if you still don't get it, well, Life Goes On. Whoops--wrong Corky!
As some of you may have noticed, I've gone from being very good about responding to e-mails to very sporadic within the span of about a month. It's a tired refrain, but between holding down two jobs and trying to maintain some semblance of a social life--all while catching a couple of ballgames each week--I don't always have the time or the energy to give even the thoughtful messages the reply they deserve. I do sincerely appreciate all the feedback, and look forward to resuming the tit-for-tat when things are a little bit less hectic.