In Episode 1080 of the Effectively Wild podcast, cohost Jeff Sullivan noted that several of the batters who’ve reached base via error three times in a game did so in doubleheaders (1:06:06):
"This seems like something that somebody could research, to see if errors have been more prevalent in doubleheaders. If you adjust for era, there could be something there. I am not nearly interested enough to do the research … so somebody out there, do the hard work!"
I took up the mantle of somebody, but as you’ll see, it was not I who did the hard work.
I didn’t exactly follow Sullivan’s edict, though. I concentrated only on recent doubleheaders, with recent defined as since the 30team era began in 1998. Doubleheaders have become uncommon in contemporary baseball, enough so that the difference between a twinbill and a regular game may not be the same as when they were more common. The reasons for the decline in doubleheaders have a little to do with collective bargaining (players don’t like them) and a lot to do with economics.
Fifty years ago, the Cardinals, Red Sox, Dodgers, and Mets were the only teams that drew over 1.5 million fans to their home games. A singleadmission doubleheader in 1967 could mean a nice boost in ticket sales—those four teams were the only ones to average over 20,000 per game—along with extra concessions. So far this year, the only teams that haven’t drawn 20,000 per game are the A’s and Rays. Nearly half of all teams are averaging over 30,000 per game. There is no economic reason to schedule a doubleheader; you’re not going to get much of an attendance bump.
As a result, doubleheaders are mostly unscheduled, the result of postponements, and even then they’re largely of the daynight, twoseparateadmission variety. There were 72 doubleheaders in 1998, the first season of the 30game era. There were only 28 last year.
Still, Sullivan’s question is valid. Doubleheaders may be infrequent, and the break between a daynight twinbill is longer than for a singleadmission double dip, but they put similar strains on a team. You go through a bunch of pitchers. You have to split catching duties. And for those players who play both games, the mental and physical fatigue of a nineinning majorleague game is doubled. Those strains, and that fatigue, could affect play on the field. But does it?
To answer that question, our data genius Rob McQuown did the hard work. He retrieved data on every doubleheader played from 1998 through last week. That gave me 994 doubleheaders, comprising 1,988 team games. That number is dwarfed, of course, by the 94,170 nondoubleheaders since 1998. But it’s a robust sample, equivalent to over twofifths of a full season, or over a dozen team seasons. That’s enough to draw some conclusions.
Now, some of you are going to object to this data. You’re going to say that I should be comparing the doubleheaders between, say, the Tigers and Royals to the nondoubleheader games between the Tigers and Royals, not combining all doubleheader games and all nondoubleheader games. I should be taking the ballpark into account, and the time of year, given how offense tends to rise with temperatures.
Those are all valid observations! And I’m going to ignore them. I want to get a rough idea of what goes on in doubleheaders, that’s all. Close is good enough. It’s not like this is going to change the way the game is played. Here are my results. I think this table is pretty selfexplanatory. The only abbreviation that may unfamiliar is UER for unearned runs.
Metric 
Doubleheaders 
Other Games 
Difference 
Runs/Game 
4.718 
4.600 
0.119 
Errors/Game 
0.663 
0.634 
0.029 
UER/Game 
0.376 
0.363 
0.013 
BA 
.2642 
.2615 
.0027 
.3331 
.3293 
.0038 

.4169 
.4163 
.0006 

.7499 
.7456 
.0044 
One of the problems I find when presenting tables of this type, in which one of the columns is labeled difference, is the sign to use. Should I put a plus sign in front of positive differences and a minus sign in front of negative differences? Or do positive numbers get presented as is, with parentheses around negative results? In this case, I don’t have to make this decision. Every single difference is positive! There’s more offense, more runs, and more errors in doubleheaders than in other games. Without exception.
But some of those changes are pretty small. A .4169 slugging percentage vs. a .4163 slugging percentage … that’s a pretty imperceptible difference. On the other hand, as I pointed out, we’ve got some pretty robust samples here. You get enough data, and even small changes can be statistically significant.
So I’m going to present the same table, but this time I’m going to add a column labeled P Value. That’s the significance level of the difference between the two numbers, using this online calculator. If you’re not into P values, just know this: A value below 0.10 is somewhat statistically significant. A value below 0.05 is pretty clearly statistically significant. To make it easy, I’ll bold the strongly significant differences and put the more weakly significant ones in italics. (If, on the other hand, you are into P values, you probably want to read this footnote. [1])
Metric 
Doubleheaders 
Other Games 
Difference 
P Value 
Runs/Game 
4.718 
4.600 
0.119 
.023 
Errors/Game 
0.663 
0.634 
0.029 
<.0001 
UER/Game 
0.376 
0.363 
0.013 
.001 
BA 
.2642 
.2615 
.0027 
.097 
OBP 
.3331 
.3293 
.0038 
.0005 
SLG 
.4169 
.4163 
.0006 
0.656 
OPS 
.7499 
.7456 
.0044 
.0673 
There are two conclusions here, I think.
 Sullivan’s intuition was right. There is a statistically significant increase in errors in doubleheaders compared to other games, resulting in a statistically significant increase in unearned runs. We can imagine plenty of reasons for the sloppier play: tired players who play both games, and rusty, less talented substitutes who play just one game.
 The increase in unearned runs contributes to—but does not solely cause—an increase in runs overall in doubleheaders. That increase is caused largely by a statistically significant increase in onbase percentage, with only a little attributable to more base hits. I didn’t show it in the table, but there’s a statistically significant increase in both walks per game and hit batters per game in doubleheaders, again likely due to fatigued and/or underused pitchers. (We can all recall minor leaguers being called up for a doubleheader and sent back down once it’s over.)
The trends are significant, but are they interesting? Probably not. With so few doubleheaders being played now, the difference isn’t very important. I randomly looked up the 1938 Boston Bees, just to see how much things have changed. They played a doubleheader in May, three in June, eight in July (including July 1, 3, and 4!), eight in August, and nine in September. That’s 29 doubleheaders! Almost twofifths of their games were in doubleheaders!
Their successor, the Atlanta Braves, played one doubleheader in 2014, one in 2015, and one in 2017. That’s it over the past four seasons. Whether those two games per year featured a few more errors and runs than the other 160 is a curiosity, not a feature.
[1] For those of you statistically inclined, you’ll know that ttests such as these require means, which are in my table, and sample sizes, which are 1,988 for doubleheaders and 94,170 for nondoubleheaders. They also require standard deviations. Here’s what I did there. The standard deviation for runs per game is about twothirds of the mean calculated on a pergame basis. It’s more like 12% or so on a perteam basis. I wanted to set my deviations fairly wide, so I arbitrarily set it at 50% of the mean (i.e., closer to the gametogame variation than teamtoteam), and did the same for unearned runs. For the four batting statistics, the team standard deviations are .010 for BA, .012 for OBP, .015 for SLG, and .026 for OPS. Similar to what I did for runs per game, I simply quadrupled them for this experiment: .0429 for BA, .0484 for OBP, .0594 for SLG, .1037 for OPS. Finally, the standard deviation of team errors per game is 0.011, which is really, really small, so I set the standard deviation at 50% of the mean, as for runs per game.
I know, I know, these are borderline laughably imprecise; the key here, I think, is that I chose pretty wide standard deviations across the board, reducing the chance of false positives.