Most of what I write centers on The Factoid. I like to organize the word into surprising, easily digested chunks, so I love factoids. My job usually requires me to write longer pieces than a simple factoid, so I keep writing and writing, but if you strip away the stalling, the GIFs, the jokes, the pointlessly long lead-ins, the repetition, and the tables, it’s usually just a factoid that I wanted to find a place for. Here’s a factoid: In the average day, I spend approximately 25 minutes looking for factoids, and 18 minutes interacting with my family. Factoid!

But the very pithiness and the juxtapositions that make factoids awesome also make them easily deceptive. As much as I love a good factoid, I am skeptical of a good factoid. There’s one baseball factoid that I see more than any other factoid, and it has always made me uncomfortable, and I’m finally getting around to exploring that factoid. That factoid is a variation on this:

When his team scores at least X runs in his start, Barry Zito is XXX-X.

For example, when Barry Zito was a free agent, this was reportedly in his Boras Binder:

When the Oakland A's score three or more runs for him, Barry's record is 93-11.

And at the end of the 2012 season, the excellent Henry Schulman wrote this about Barry Zito:

The Giants took advantage of pitcher Lance Lynn's throwing error to score four unearned runs in the fourth inning. Fans on both sides of San Francisco Bay know what four runs of support has meant for Zito: a 126-7 record for his teams.

I don’t blame anybody, Boras or Schulman or Wikipedia or anybody else, for using this. I mean daaaaaaaaang 126-7. How you going to beat 126-7? It’s a mighty impressive factoid, which is why I hear it about 15 times a year, every time the Giants score four runs in a game while Zito is on the mound. So why does it make me uncomfortable? Well, because I suspect it’s a total fraud. And until I get to the bottom of it, I’ll never feel any peace. So here we go.

What is This Factoid Trying to Say?
This factoid is trying to say that Barry Zito is better than you think. In its original form, it was trying to say that Barry Zito is elite, but that is a step too far for anybody to accept these days, so now it’s just trying to say that Barry Zito is better than you think.

How Does it Work?
Without being explicit, this factoid suggests that, in a way that traditional numbers can’t capture, Barry Zito gives his team a better chance to win than other good pitchers. While a person could simply cite Barry Zito’s record overall, the implication of this factoid is that Barry Zito is so exceptional in certain common situations that his other stats must not quite be doing him justice.

Alternately, when used in real time during a game, this factoid takes as a given that Barry Zito is exceptional (at least, in some situations), and uses the fourth run as an indicator of certain Giants victory. It is, essentially, a Mission Accomplished banner that makes the audience feel good. (Note: Some Mission Accomplished banners accompany actual accomplished missions. I don’t mean to suggest that the Barry Zito factoid is prima facie bogus. Though I suspect that the Barry Zito factoid is bogus.)

Finally, in the longest scale, the factoid suggests that Barry Zito’s performance is tied to his run support, that he finds a way to win within the context of run support, that he is in some way perhaps pitching to the score, and that he is therefore better at winning than he is at producing a good ERA. This is always a suspect claim. But it is a claim that, if true, would be pleasing to Baseball Men who prefer winners to stat producers.

Why Does it Make Me Uncomfortable?
1. There is no baseline. Most of us have no idea how often a pitcher should win when he gets three or four runs of support. There are a lot of ways factoids tell us the truth but lie, such as using arbitrary endpoints, or by ignoring relevant and distorting context. Disorienting juxtapositions are another way. The factoid seems incredibly impressive because it is a big number alongside a small number. But, as you might intuit, most pitchers will have a big number next to a small number when they get four (or more) runs of support. So how big, and how small, must the numbers be to be impressive? None of us would have any idea.

2. The language is subtly deceptive. I remember Columbia House telling me that I had to buy only six CDs that cost “$15.99 and up.” Of those three words, there are only two that are important: "and," and "up." In the same way, “Four runs or more” usually means “more.” Sometimes it means four, of course. But it also means five, six, seven, a billion. A different way of phrasing that factoid is “Barry Zito is 126-7 when he gets a tankful of run support,” but by phrasing it as “four or more” the factoid plants the number four in your head and there you are, thinking about the number four and how reasonable it sounds.

3. Much like objection no. 2: “Four or more” creates a line of demarcation that is utter nonsense. Consider a small tweak of the factoid, if you were to hear a broadcaster announce it upon the Giants’ scoring their fourth run for Zito in the seventh inning:

“Well, the Giants shouldn’t feel too confident. When Barry Zito gets four runs or fewer of support, his record is just 49-130.”

For that matter, if they score a fifth run, and a sixth run, and even a seventh run, they’re still likely to lose by this factoid’s logic:

“Well, the Giants shouldn’t feel too confident after that seventh run. When Barry Zito gets seven runs or fewer of support, his record is just 124-131.”  

Far, far, far more telling (and still not all that telling, for reasons we’ll get to) would be a factoid that doesn’t lie a doggone bit. Such as: When Barry Zito’s teams score exactly four runs while he is in the game, his record is 14-6.

4. The X variable is not one we intuitively understand. Shoot. Is that the X variable? I think it’s the X variable. The one that goes “when his team scores four runs or more.” Four runs seems relatable, because we all know pretty well what four runs represents in a baseball game: something pretty close to the median runs scored by a team in a game. So, by that standard, a pitcher who gets four runs of support should be close to .500, on average. That’s what the factoid is saying to you, with cold, deceptive intentions.

In fact, though, this factoid includes only runs scored while Barry Zito is in the game. Four runs might be the median offense for an entire game, but it is far less than the median for the portion of a game in which the starting pitcher is still on the mound. In his career, Zito has averaged 6.17 innings per start. So four runs during those 6.17 innings prorates out to about six runs in a full nine-inning contest. Six runs is a lot of runs for a baseball game! Teams mostly win those games. A pitcher who doesn’t win those games is not good. A pitcher who does win those games is often not good! Barry Zito wins those games and Barry Zito is Barry Zito!

(We should note that Barry Zito is, if taken over the course of his career, good. That is a true statement, regardless of our dispute with this factoid.)

5. The X variable (or maybe the Y variable; whatever we decided in the point above) is explicitly dependent on a different variable that is not acknowledged. That is to say, if “run support” is limited only to those innings in which Barry Zito remains in a game, “run support” is going to be positively correlated to how long Barry Zito remains in a game, which is itself positively correlated to how well Barry Zito pitches. Simply: The better Zito pitches, the more likely his team is to score four or more runs for him. The factoid is backward! Rather than stating a fact about Zito’s pitching, it states a factoid about the unexceptional and mathematically predictable results of Zito’s pitching.

It also ruins the sample. It is overwhelmingly skewed toward counting only the games he is going to win. If he allows 10 runs in the first three innings, it is very unlikely he will have gotten the “run support” to qualify the start for the factoid’s parameters, even if the Giants go on to score four runs in the fifth inning or whatever.

So let’s unpack the Zito factoid for a minute. What’s the baseline for this sort of thing? I went through five random Saturdays during Zito’s career, one in 2012, in 2010, in 2008, in 2006, and in 2004, and looked at all the starters. This isn’t a totally convincing sample size, but it’s about 150 starts, so what the heck, let’s just assume, for the purposes of this here piece, and to avoid me having to go through and get 10 days’ worth of starts, or even 20 days’ worth of starts, or even better 1,000 days’ worth of starts, that these 150 starts are a perfect representation of league averages. Can we do that? Let’s just agree to do that.

Of those 150 or so starts, here’s how often the starting pitcher’s team gave him:

Runs Frequency
0 18.5%
1 13.7%
2 19.2%
3 10.3%
4 12.3%
5 8.9%
6 5.5%
7 6.2%
8 2.1%
9+ 3.4%

And here’s how often Zito’s team gave him:

Runs Frequency
0 8.4%
1 15.5%
2 18.0%
3 16.8%
4 10.4%
5 7.9%
6 5.8%
7 8.1%
8 2.3%
9+ 6.9%

Don’t get hung up on staring at these tables, which are basically pointless. The small point, the only point, is that if we focus on games in which Zito gets four or more runs of support, we can see that a larger percentage (75 percent) of those games involve him getting five or more runs than the league average (67 percent). Or, simply, he has a smaller portion of four-run-support games, which I would hypothesize turns out to be the most important factor in a factoid like this one.  

How does Zito's team do in each type of start, compared to other pitchers' teams? For various reasons, I didn’t look at pitcher wins and losses; I just looked at team wins and losses, based on the support that the starting pitcher got. Here's his team's winning percentage at each level of support:

Run Support Zito W% League W%
0 3% 0%
1 25% 30%
2 41% 36%
3 48% 60%
4 61% 61%
5 97% 92%
6 100% 100%
7 94% 100%
8 100% 100%
9+ 100% 100%

So, again, assuming that we have an accurate baseline, Zito’s talent becomes clear: It’s about normal. Teams do as well when they score four runs for Barry Zito as they do when they score four runs for the league-average starter. They do a little better with five runs, and a little worse with three, and ever, ever so slightly worse when they score six or more. The conclusion is not that Barry Zito is better, or isn’t better, than other good pitchers. The conclusion is that this factoid is a lot of noise, dressed up as something special. If you want to know how good Barry Zito is, there are better places to start.

Another great one, Sam.
Only tangentially related to the actual article (which was great, don't get me wrong!) but in the last table, am I reading correctly that the average MLB team's expected winning percentage jumps from 36% to 60% if they score (at least) the 3rd run while the starting pitcher is still in? If so, it's really remarkable that one extra run, by definition relatively early in the game, can swing the ExpW% so much!
I know I said we'll assume for this article that those numbers are real and representative, but if anything surprises you about those tables I would assume for real life that those numbers are not real and representative. When I have a moment maybe I'll broaden the sample and do an Unfiltered on it
Excellant work. I get annoyed every time that factoid is mentioned in a broadcast. "There are three types of lies. Lies, damn lies, and statistics."
Actually, there are four types of lies: "Lies, damn lies, statistics and baseball statistics".
Ugh, yes, thank you. My old man looove this "stat" and won't listen to me when I tell him it's meaningless.
It's obvious that Zito isn't a better than average pitcher but rather that he is a much better hitter than his career -31 OPS+ would suggest. How else can you explain all those extra 7 to 9 run games? I'll bet that in games that Zito gets 1 or more hits, the Giants win some of them. :-) Having never looked at it before, I'm impressed with Zito's consistency at the plate as a Giant; 0.113 BA his first year, and then 3 years in a row at 0.118. In 2008 and 2009 he had the exact same stat line. What are the odds of that?
As a famous Russian once said, "There are no small coincidences and big coincidences, only coincidences."
I am amused by the propsect that in six percent of Barry Zito's starts in which he has received 7 runs of support, he is not doing so in the major leagues. If he were pitching in the majors in those starts, after all, he would have won 100% of the time, just as all other major leaguers have.
It seems to me this is an issue not wholly unrelated to Doug Thorburn's Raising Aces today on blow-up starts. The factoid is misleading; I know that's really what the article is about. But one thing the factoid does capture, however roughly and imperfectly, is that Barry Zito rarely shits the bed when you make it for him really nicely. I have often wondered what the standard deviation of game scores over a given season is, whether pitchers have a demonstrable consistency skill, and at what inflection point or extreme a starter's performance really becomes the driving force in determining the game's outcome. I mean, if the Giants score three or four runs with Zito in the game, it's pretty clear that not only is Zito not driving the outcome, but virtually no pitcher in baseball would be driving the outcome. Three runs scored in the first six or seven innings means the starting pitcher is not likely to factor in the decision that day, and will leave, unless he pitched really, really well or really, really badly, without having changed the win probability for his team much at all that day. This is where I start thinking starters, while important narratively and important because teams couldn't possibly keep 12 good relievers healthy all year, are drastically overpaid, and their importance overstated. If I had to guess, I would guess that starting pitching determines the outcome of fewer than 35 percent of all games.
I too have yearned for standard deviation of game score data.
Another way that this factoid is misleading is that its talking about while Zito is in the game. Four runs while the starting pitcher is STILL IN THE GAME is a lot. We just see the "four runs" and think of that as average support. Considering that and the "or more" add-on that we tend to overlook, this is double the way we see $1.99 and just think of it as a buck.
Sam, this is good stuff as usual. But #3 doesn't hold water. His team won't score "or fewer" runs. But at the time the announcer utters the phrase, his team still might score "or more." The announcers really should just say, "4 runs. The [other team] might as well go home, because even Zito doesn't blow many games when he gets that much support before getting yanked."
Your columns would be significantly more interesting if I'd always read them in their entirety as I've noticed a strong positive correlation between how much of your column I read and how interesting it is.
Your research actually demonstrates that Barry Zito is a magician. He manages to win 3% of games in which his team does not score for him.
Read carefull!
This is the kind of article that microscopically restores my faith in the human race. There are an awful lot of tricky factoids in our world, and they all deserve to be deconstructed this way.
Thanks for debunking a token from the famed Boras Binder- this could be a fun regular segment. I noticed those first two numbers jumped out: 7 and 126. Didn't Boras get the Giants to pay $126 million for 7 years of Zito?