March 22, 2013
Lies, Damned Lies, And That One Barry Zito Factoid
Most of what I write centers on The Factoid. I like to organize the word into surprising, easily digested chunks, so I love factoids. My job usually requires me to write longer pieces than a simple factoid, so I keep writing and writing, but if you strip away the stalling, the GIFs, the jokes, the pointlessly long lead-ins, the repetition, and the tables, it’s usually just a factoid that I wanted to find a place for. Here’s a factoid: In the average day, I spend approximately 25 minutes looking for factoids, and 18 minutes interacting with my family. Factoid!
But the very pithiness and the juxtapositions that make factoids awesome also make them easily deceptive. As much as I love a good factoid, I am skeptical of a good factoid. There’s one baseball factoid that I see more than any other factoid, and it has always made me uncomfortable, and I’m finally getting around to exploring that factoid. That factoid is a variation on this:
For example, when Barry Zito was a free agent, this was reportedly in his Boras Binder:
When the Oakland A's score three or more runs for him, Barry's record is 93-11.
And at the end of the 2012 season, the excellent Henry Schulman wrote this about Barry Zito:
The Giants took advantage of pitcher Lance Lynn's throwing error to score four unearned runs in the fourth inning. Fans on both sides of San Francisco Bay know what four runs of support has meant for Zito: a 126-7 record for his teams.
I don’t blame anybody, Boras or Schulman or Wikipedia or anybody else, for using this. I mean daaaaaaaaang 126-7. How you going to beat 126-7? It’s a mighty impressive factoid, which is why I hear it about 15 times a year, every time the Giants score four runs in a game while Zito is on the mound. So why does it make me uncomfortable? Well, because I suspect it’s a total fraud. And until I get to the bottom of it, I’ll never feel any peace. So here we go.
What is This Factoid Trying to Say?
How Does it Work?
Alternately, when used in real time during a game, this factoid takes as a given that Barry Zito is exceptional (at least, in some situations), and uses the fourth run as an indicator of certain Giants victory. It is, essentially, a Mission Accomplished banner that makes the audience feel good. (Note: Some Mission Accomplished banners accompany actual accomplished missions. I don’t mean to suggest that the Barry Zito factoid is prima facie bogus. Though I suspect that the Barry Zito factoid is bogus.)
Finally, in the longest scale, the factoid suggests that Barry Zito’s performance is tied to his run support, that he finds a way to win within the context of run support, that he is in some way perhaps pitching to the score, and that he is therefore better at winning than he is at producing a good ERA. This is always a suspect claim. But it is a claim that, if true, would be pleasing to Baseball Men who prefer winners to stat producers.
Why Does it Make Me Uncomfortable?
2. The language is subtly deceptive. I remember Columbia House telling me that I had to buy only six CDs that cost “$15.99 and up.” Of those three words, there are only two that are important: "and," and "up." In the same way, “Four runs or more” usually means “more.” Sometimes it means four, of course. But it also means five, six, seven, a billion. A different way of phrasing that factoid is “Barry Zito is 126-7 when he gets a tankful of run support,” but by phrasing it as “four or more” the factoid plants the number four in your head and there you are, thinking about the number four and how reasonable it sounds.
3. Much like objection no. 2: “Four or more” creates a line of demarcation that is utter nonsense. Consider a small tweak of the factoid, if you were to hear a broadcaster announce it upon the Giants’ scoring their fourth run for Zito in the seventh inning:
For that matter, if they score a fifth run, and a sixth run, and even a seventh run, they’re still likely to lose by this factoid’s logic:
Far, far, far more telling (and still not all that telling, for reasons we’ll get to) would be a factoid that doesn’t lie a doggone bit. Such as: When Barry Zito’s teams score exactly four runs while he is in the game, his record is 14-6.
4. The X variable is not one we intuitively understand. Shoot. Is that the X variable? I think it’s the X variable. The one that goes “when his team scores four runs or more.” Four runs seems relatable, because we all know pretty well what four runs represents in a baseball game: something pretty close to the median runs scored by a team in a game. So, by that standard, a pitcher who gets four runs of support should be close to .500, on average. That’s what the factoid is saying to you, with cold, deceptive intentions.
In fact, though, this factoid includes only runs scored while Barry Zito is in the game. Four runs might be the median offense for an entire game, but it is far less than the median for the portion of a game in which the starting pitcher is still on the mound. In his career, Zito has averaged 6.17 innings per start. So four runs during those 6.17 innings prorates out to about six runs in a full nine-inning contest. Six runs is a lot of runs for a baseball game! Teams mostly win those games. A pitcher who doesn’t win those games is not good. A pitcher who does win those games is often not good! Barry Zito wins those games and Barry Zito is Barry Zito!
(We should note that Barry Zito is, if taken over the course of his career, good. That is a true statement, regardless of our dispute with this factoid.)
5. The X variable (or maybe the Y variable; whatever we decided in the point above) is explicitly dependent on a different variable that is not acknowledged. That is to say, if “run support” is limited only to those innings in which Barry Zito remains in a game, “run support” is going to be positively correlated to how long Barry Zito remains in a game, which is itself positively correlated to how well Barry Zito pitches. Simply: The better Zito pitches, the more likely his team is to score four or more runs for him. The factoid is backward! Rather than stating a fact about Zito’s pitching, it states a factoid about the unexceptional and mathematically predictable results of Zito’s pitching.
It also ruins the sample. It is overwhelmingly skewed toward counting only the games he is going to win. If he allows 10 runs in the first three innings, it is very unlikely he will have gotten the “run support” to qualify the start for the factoid’s parameters, even if the Giants go on to score four runs in the fifth inning or whatever.
So let’s unpack the Zito factoid for a minute. What’s the baseline for this sort of thing? I went through five random Saturdays during Zito’s career, one in 2012, in 2010, in 2008, in 2006, and in 2004, and looked at all the starters. This isn’t a totally convincing sample size, but it’s about 150 starts, so what the heck, let’s just assume, for the purposes of this here piece, and to avoid me having to go through and get 10 days’ worth of starts, or even 20 days’ worth of starts, or even better 1,000 days’ worth of starts, that these 150 starts are a perfect representation of league averages. Can we do that? Let’s just agree to do that.
Of those 150 or so starts, here’s how often the starting pitcher’s team gave him:
And here’s how often Zito’s team gave him:
Don’t get hung up on staring at these tables, which are basically pointless. The small point, the only point, is that if we focus on games in which Zito gets four or more runs of support, we can see that a larger percentage (75 percent) of those games involve him getting five or more runs than the league average (67 percent). Or, simply, he has a smaller portion of four-run-support games, which I would hypothesize turns out to be the most important factor in a factoid like this one.
How does Zito's team do in each type of start, compared to other pitchers' teams? For various reasons, I didn’t look at pitcher wins and losses; I just looked at team wins and losses, based on the support that the starting pitcher got. Here's his team's winning percentage at each level of support:
So, again, assuming that we have an accurate baseline, Zito’s talent becomes clear: It’s about normal. Teams do as well when they score four runs for Barry Zito as they do when they score four runs for the league-average starter. They do a little better with five runs, and a little worse with three, and ever, ever so slightly worse when they score six or more. The conclusion is not that Barry Zito is better, or isn’t better, than other good pitchers. The conclusion is that this factoid is a lot of noise, dressed up as something special. If you want to know how good Barry Zito is, there are better places to start.