Now, more than any time in baseball history, games are won and lost in the
bullpen. As such, more attention has focused on the importance of a good
bullpen as oen significant difference between a playoff team and an
underachieving also-ran. Whether it’s explaining the Mariners’ inability to
contend despite fielding two of the 50 greatest players in history, or
defining how the Reds are in first place with Steve Avery in the rotation
and Dmitri Young riding the bench, the fortunes of a team’s bullpen seem to
dictate the fortunes of the team as a whole.

We recently published the results of a study that looked at whether a good
bullpen could add some sort of synergy to a team’s win-loss record above
and beyond the runs that they save, and conversely, whether a collection of
pitchers throwing AckerCurves and WengerTaters would snatch more defeats
from the jaws of victory than the run totals would suggest.
In the study, published at,
we looked at two sets of teams–those with the best bullpens in their league and those with the
worst–and compared the records for those teams with their expected
records, as calculated by the Pythagorean Method.

What we found was that teams with good bullpens actually won more
games–about 1.3 more, on average–than would be expected from their totals
of runs and runs allowed, while teams with bad bullpens won about 1.6 fewer
games than expected. This is, we believe, the first time any study has
pinpointed a subset of teams which routinely outperform or underperform
their Pythagorean projection.

Having established that having a good bullpen is important, and calculated
how important a good bullpen is, let’s test the conclusion a bit.

A Control Group

Any statistical study worth its salt has to account for bias. You must show
that the results of your study are not skewed by hidden factors. The best
way to do that is to use a control group.

What we showed in the earlier study was that teams with the best bullpens,
defined as the bullpens which allowed the lowest OPS in the late innings of
close games (LICG), won more games–on average–than their Pythagorean
projection, and teams with the worst bullpens won fewer games. But does
that prove that good bullpens lead to overachievement, or is it possible
that the study’s design was flawed?

For example, is it possible that the reason the teams overachieved was that
they simply had good pitching, period, whether in the first inning or in
the eighth? That teams with good opponent’s OPS tend to play in pitcher’s
parks, where runs are more valuable? There are dozens of ways in which the
design of the study may have unintentionally skewed the results, giving us
a conclusion which may not be warranted.

So we designed a control group to see if we could eliminate that bias as
much as possible. The original study compared the three best and three
worst bullpens in each league, from 1980 to 1998 (except 1981 and 1994),
based on OPS by relievers in LICG. Here are those results again:

                           Record vs.           Avg.
                   Pythagorean Record
                       SO  WO  WU  SU
Best Bullpens          34  32  25  11   + 1.28 Games
Worst Bullpens          8  32  27  35   - 1.57 Games

SO – Strong Overachiever (won 3 or more games more than expected)
WO – Weak Overachiever (won fewer than 3 games more than expected)
WU – Weak Underachiever (won fewer than 3 games less than expected)
SU – Strong Underachiever (won 3 or more games fewer than expected).

To briefly review the results: 66 of the 102 teams with the best bullpens
(65%) won more games than projected. For teams with the worst bullpens, 62
out of 102 (61%) won fewer games than prjoected. On average, teams with
excellent bullpens won about 1.3 games more than projected, while teams
with poor bullpens won about 1.6 games fewer, a swing of almost three games.

To construct a control group, we ranked the same teams by the effectiveness
of their starting pitchers, based on the OPS allowed by all starting
pitchers used by that team. Mimicking our original design as much as
possible, we took the top three and bottom three rotations in each league
from the same years, and compared their won-loss record against their
expected performance.

Here are those results:

                            Record vs.           Avg.
                    Pythagorean Record
                        SO  WO  WU  SU
Best Rotations          23  34  24  21   + 0.04 Games
Worst Rotations         24  35  22  20   + 0.11 Games

(One of the teams with a poor rotation, the 1983 Padres, went 81-81 and
also scored and allowed the exact same number of runs (653), which is why
the second group only contains 101 teams.)

As you can see, while slightly more than half (56%) of the teams with good
rotations overachieved, the overall effect was extremely small, just 0.04
games above expectation. And teams with bad rotations had almost an
identical breakdown to the teams with the best rotations. In fact, on
average the teams with bad starters did slightly better (0.11 games above
expectation) than the first group, although the difference is so slight as
to be statistically insignificant.

In summary, it appears that the results of the original study are not due
to some hidden bias, and indeed appears to be a true relationship: good
bullpens do correlate with better-than-expected records.

Effect on One-Run Games

It stands to reason that a team with a great bullpen should be able to
prevent runs at the most crucial times and thus win more games than their
ratio of runs scored to runs allowed would predict. It would also make
sense that those same teams would win more than their fair share of tight
games, with a record in one-run ballgames better than would be expected
from their overall record. Actually, it stands to reason that these two
factors would be tightly correlated for all teams–that a team which wins
more than its share of one-run games should end up with more wins overall
than would be expected.

It stands to reason, and it stands up to the facts as well. Here is a chart
comparing the two (for all teams from 1980 – Present, save 1981 & 1994):

                             Record vs.           Avg.
                     Pythagorean Record
                         SO  WO  WU  SU
                  SO     30  20   5   1   + 3.09 Games
One-Run Record    WO     54  64  29  19   + 1.35 Games
vs.               WU     17  35  46  44   - 1.17 Games
Overall Record    SU      2  10  21  35   - 3.53 Games

Some explanation is needed on what we are calling a team’s
"expected" record. As before, for "Record vs. Pythagorean
Record", we are comparing a team’s overall won-loss record with their
record as predicted by the formula (Runs Scored^2) / (Runs Scored^2+ Runs
Allowed^2). The use of 2 as the exponent for this formula is traditional,
but in fact is not the most accurate number to use. Previous studies have
shown that the most accurate value for the exponent is about 1.87 (see
new study by Clay Davenport
for more on this), but the difference is
quite small, and for purposes of this study we’ll use 2 as our exponent to
keep this study in line with the previous one at

In comparing "One-Run Record vs. Overall Record", it is important
to realize that a team which plays .600 ball overall is *not* expected to
play .600 ball in one-run games. The common perception is that the best
teams win the close games, and that a mark of great teams is the ability to
pull out one-run games. It’s a silly perception.

Here’s n example why: when the best team in the league takes on the worst,
the better team is probably going to win around three-fourths of the games.
If the Indians play the Twins 20 times, Cleveland is probably going to 15-5
or so. Included in those 15 wins are going to be games with scores of 14-1,
12-2 and, in today’s game, 18-8. How many of the Twins’ victories are going
to be blowouts? When the Twins do squeeze out a win, it’s likely to be a
7-5 score or something similar.

In reality, one of the marks of a good team is the ability to blow out its
opponents. And in fact, what we find is that all teams play towards the
center in one-run games: a .600 team will play around .565 ball, while a
.400 team will play around .435. So in comparing records, each team’s
record in one-run games was compared to their expected record in one-run
games, based on their overall record.

The results are striking. Among teams that did extremely well in one-run
games, 50 out of 56 (89%) also did much better than expected compared to
their Pythagorean projection. On the flip side, 56 out of 68 (82%) of teams
that played poorly in one-run games had a similar profile in their
Pythagorean record. The results among teams that over- or under-achieved by
lesser amounts follows the same overall trend.

To make a long story short: the correlation between a team’s performance in
one-run games and their performance compared to their Pythagorean record is
+0.56, indicating a strong if not overwhelming correlation.

So, since we found a high correlation between good bullpens and exceeding a
team’s pythagorean win total, we should expect that good bullpens would
also correlate with a better-than-expected record in one-run games, right?
Here’s the data:

                    One-Run Record vs.           Avg.
                       Expected Record
                        SO  WO  WU  SU
Best Bullpens           12  45  31  14   + 0.22 Games
Worst Bullpens          11  39  34  18   - 0.26 Games

While there does appear to be a trend, it’s a small one. Just 56% of the
teams with great bullpens performed especially well in one-run games, and
barely 51% of the teams with bad bullpens did poorly in those situations.
On average, the teams with the best bullpens played just a half-game better
in one-run contests than teams with the worst pens. That’s just one-sixth
of the three-game disparity we found in their performance against their
Pythagorean record.

So if these teams aren’t doing that much better or worse in one-run games,
how are they winning more games than their Pythagorean projection? Let’s
look at their record in two-run games against their expected record. Keep
in mind that just as in one-run games, teams play towards the center in
two-run games, although the effect is not as significant: a .600 team
should play around .580 ball in two-run games.

                    One-Run Record vs.           Avg.
                       Expected Record
                        SO  WO  WU  SU
Best Bullpens           13  44  34  10   + 0.21 Games
Worst Bullpens           8  33  48  13   - 0.54 Games

The same general trend is followed in two-run games as in one-run games,
and in fact the correlation appears to be a little stronger. When we
combine the results of both one- and two-run contests, teams with good
bullpens win 0.47 games more than expected, while teams with bad bullpens
win about 0.80 games less than expected. Together, this still explains less
than half of the disparity between actual records and Pythagorean projections.

One possibility that would explain the difference is that teams with strong
bullpens– remember, in this case we’re defining "bullpen" as only
those relievers used in tight games–may have focused their resources on
acquiring good late-inning relievers to the detriment of the rest of the
team. This might cause such teams to get blown out of games more often than
usual. If a team has two great relievers but a lousy starting rotation,
there’s going to be a lot of big losses that the bullpen isn’t going to
bail you out of. Those blowouts would damage the team’s runs scored/runs
allowed ratio, and hence their Pythagorean record, but would cause only
minor damage to their overall win-loss record. However, it’s just a theory,
and more research may be needed to determine the true source of this

So maybe this isn’t the last word after all.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe