The furor over Game of Shadows, the forthcoming book detailing Barry Bonds’ alleged use of performance-enhancing drugs, has thrust the issue of steroids in baseball back into the spotlight. Baseball Prospectus’ new book, Baseball Between the Numbers, includes a chapter titled “What Do Statistics Tell Us About Steroids?” Written by BP’s Nate Silver, the chapter takes both a numerical and historical look at this hot-button topic. Read on to see what this chapter has to say about Baseball Between the Numbers’ cover boy.
In December of 2004, with the frenzy over the BALCO investigation at
its peak, Alan Schwarz of the New York Times asked Baseball Prospectus
to assist him with an analysis of Barry Bonds and Jason Giambi.
The idea was to use BP’s projection system, Player Empirical Comparison
and Optimization Test Algorithm (PECOTA), to compare how
Bonds and Giambi might have been expected to perform based on
their statistics up through 2000, against what actually happened to
their careers from that point forward.
To retell the story: Entering the 2000 season, each of these players
was at a career crossroads. Bonds would turn thirty-five that year–the age at which even great players can begin to struggle–and was
coming off an injury-plagued season in 1999. Giambi was a slowfooted
first baseman about to enter his thirties; he’d had a good season
in 1999, but it looked like a career year. Instead of withering,
however, both players blossomed. Giambi won the MVP Award in
2000, and Bonds set a career high in home runs, launching an upward
trajectory that would see him rewrite baseball’s record books. Needless
to say, PECOTA found that Bonds and Giambi had far outperformed
reasonable expectations. Bonds produced 142 more home
runs between 2000 and 2004 than PECOTA would have guessed and
hit .339 rather than the projected .272. Giambi produced 60 percent
more home runs and 50 percent more RBI than PECOTA expected.
Lots of players have had unusual career paths,
back from the days when ballplayers’ drugs of choice were Schaefer
Beer and Vitalis Hair Tonic. Starting in 1953, a twenty-eight-year-old
Ted Kluszewski, who had averaged just 15 home runs a season to that
point in his big-league career, reeled off consecutive seasons of 40, 49,
and 47. In 1973, Davey Johnson, who had just turned thirty, hit 43
home runs; he had never hit more than 18 before (and would never hit
more than 15 thereafter). Even Hank Aaron defied expectations. In
1971, a season in which he missed more than twenty games, he set a
career high in home runs with 47. Aaron was thirty-seven years old at
It is natural to tie together cause and effect. These days, it has become
just as natural to attribute any unexpected change in performance
to ulterior motives. Eric Gagne adds 5 mph to his fastball? He’s
juicing. Albert Pujols, who was considered a second-tier prospect,
bursts onto the scene with a performance worthy of Joe DiMaggio?
He’s juicing–unless he faked his birth certificate. Sammy Sosa? Not
only was he juicing, he was also corking his bat, using a laser-eye
mechanism in his batting helmet, and bribing the opposing pitcher to
throw him hanging sliders.
The Indirect Evidence
One way to examine this question is by looking at what I’ll call a
Power Spike. A Power Spike occurs when a player “suddenly” starts
hitting home runs more frequently than he used to. More specifically,
we can define a Power Spike as follows:
- A player is an established major league veteran, at least twenty-eight
years old, with at least 1,000 plate appearances (PA) accumulated
between his previous three seasons; and
- The player improves upon his established home-run rate by at
least 10 HR per 650 PA, in a season in which he had at least 500
We can look at the frequency of Power Spikes throughout different
eras in baseball’s recent history. Although there are many permutations
in how we might define such eras, I prefer the following:
- Golden Age (1949-1957). Runs from the complete reestablishment
of baseball following World War II until the movement of
the Giants and Dodgers from New York to California in the 1958
season. A last period of stability featuring relatively high levels of
- Expansion Era (1958-1969). Coincides with the westward expansion
of baseball, the expansion in the number of franchises (from
sixteen to twenty-four during this period), and the full racial expansion
of the sport. The instability off the field is paralleled by
instability in offensive levels, which varied maniacally from year
- Dynasty Era (1970-1976). The period immediately preceding the
implementation of full-blown free agency in 1977. Three great
dynasties–those of the Cincinnati Reds, Oakland A’s, and Baltimore
Orioles–accounted for six of the seven World Series championships
during the period and nine of the sixteen league
pennants. Offense was relatively low, prompting the American
League to implement the DH in 1973.
- Balanced Era (1977-1985). The 1977 season was marked by a
sharp increase in offense as a result of the expansion to 26
clubs and a new manufacturer of baseballs. The offensive improvement
brought the game back into balance, and the era is
remembered for the wide variety of styles that prospered during
- Canseco Era (1986-1993). Begins with Jose Canseco‘s Rookie of
the Year award in 1986 and ends with the last full season before
the 1994 strike. The Canseco Era saw the resumption of large
year-to-year fluctuations in offensive levels. The 1987 season, in
particular, featured the highest levels of run scoring seen in either
league since the 1950s.
- Juiced Era (1994-2004). One of the great boom periods in baseball
history, along with the Roaring ’20s. Offensive levels improved
sharply between 1993 and 1995, escalated further in
1999, and have remained high since then. Associated with small
ballparks, small strike zones, and the allegation of widespread
Table 9-1.1 provides the average number of runs and home runs
produced per game in each era.
TABLE 9-1.1 Average Number of Runs and Home Runs Produced per Game in Different Eras
American League National League Era R/G HR/G R/G HR/G Golden Age (1949-1957) 4.50 0.73 4.48 0.89 Expansion Era (1958-1969) 4.09 0.86 4.11 0.81 Dynasty Era (1970-1976) 4.03 0.73 4.11 0.71 Balanced Era (1977-1985) 4.44 0.85 4.10 0.69 Canseco Era (1986-1993) 4.50 0.89 4.15 0.77 Juiced Era (1994-2004) 5.06 1.12 4.68 1.04
Tracking the number of Power Spikes is relatively simple, once we
have these definitions in place. Figure 9-1.1 presents the frequency of
Power Spikes per 100 eligible hitters in each of our six eras. The
dashed line in Figure 9-1.1 indicates the average frequency of Power
Spikes between 1949 and 1993–about 5.8 per 100 hitters. Since 1994,
the frequency has increased to 9.1 per 100 hitters. Just how much emphasis
you want to place on the increase is a matter of perspective.
Power Spikes have been 57 percent more common during the Juiced
Era than they had been previously, which is certainly statistically significant.
On the other hand, some number of Power Spikes has always
occurred, and the difference amounts to only a handful of “extra”
Power Spikes per season.
In some sense, however, Figure 9-1.1 is telling us something that we
already knew. We know that there has been an increase in home runs
in recent seasons, and that somebody has to be responsible for providing
those extra home runs. If home runs have become easier to hit for
some reason other than steroids, be it smaller ballparks, inferior pitching,
juiced baseballs, or something else, then Power Spikes will be easier
to come by.
In fact, if we rerun the numbers to account for macroscopic
changes to the offensive environment, then the increase in Power
Spikes disappears. Figure 9-1.2 presents the same information but incorporates
an adjustment for league and park effects rather than using
raw totals. More specifically, all the historical home run numbers
are adjusted to the standards of the 2004 American League. There
were about 20 percent more home runs hit per game in the 2004 AL,
for example, than there were in 1986. So a player who hit 30 home
runs in 1986 is credited with 36 adjusted home runs (20 percent
more). An identical technique is applied to account for park effects
By this definition, Power Spikes have been neither any more nor
any less frequent in the Juiced Era than in previous periods.
FIGURE 9-1.1 Power Spikes per 100 hitters in different eras
FIGURE 9-1.2 Power Spikes per 100 hitters in different eras, adjusted for park and league effects
Instead, the period that stands out is the Dynasty Era of the early and mid-’70s,
which interestingly enough corresponds with the widespread introduction
of “greenies” (amphetamines) into major league clubhouses.
Then again, perhaps the league adjustment is not the right thing to
do after all. This gets to what I call a “chicken-and-egg” problem: Are
there more home runs hit because there are more Power Spikes? Or
are there more Power Spikes because there are more home runs?
One way to refocus the question is to look at which hitters are responsible
for the increase in home runs. Are home runs up because
shortstops who look like Bugs Bunny are suddenly turning in 20-homer seasons? Because players like Barry Bonds and Mark McGwire, who were already very good, have taken their power output to
unprecedented levels? Or is the difference felt universally–a rising
tide lifts all boats?
Figure 9-1.3 returns to the unadjusted data set but breaks the frequency
of Power Spikes down based on the number of home runs that
the player had hit previously. We call this his “established” home-run
rate–his frequency of home runs per 650 PA in the three seasons before
the Power Spike occurred. The figure is further broken down
between the Juiced Era and the “Pre-Juiced” years of 1949-1993.
FIGURE 9-1.3 Power Spikes per 100 hitters, compared to established home-run rates, 1949-1993 and post-1993
This figure reveals something very interesting: Power Spikes have
occurred more frequently in the Juiced Era, but the increase in frequency
is almost entirely attributable to certain types of hitters. In particular,
Power Spikes have become more frequent among hitters with
average power–those guys who will hit more than 10 home runs but
fewer than 30 in a typical season. Power Spikes have not become more
frequent among hitters who have no power at all. It has never been
very common for a hitter who has a weak, slap-hitting swing to transform
into a power threat, and it is no more common today.
But there is also no increase in Power Spikes among players who
were already very good power hitters, capable of hitting at least 30
home runs per year. Sometimes a very good power hitter will turn into
an insanely great one, as Bonds and McGwire did. But this is no more
common today than it had been previously. The players who have
been most responsible for the Juiced Era home-run boom are the
middle-of-the-road players: those guys who used to hit 15 or 20
homers a season and are now hitting 25 or 30.
The typical steroid user might not be the prima donna slugger who
endorses Budweiser between innings but the “hardworking late
bloomer” who is struggling to maintain his spot in the lineup or is trying
to leverage a good season into a big free-agent contract.
Certainly these players might have more economic incentive to enhance
their performance, as compared to their counterparts who have
already signed multiyear, guaranteed major league contracts. Among
professional athletes, the decision about whether to use steroids is not
a result of locker-room peer pressure but rather a relatively rational
calculation about the medical, moral, and financial costs and the risk
of getting caught as compared to the potential upside. In that sense, it
is just like any other form of cheating. The anonymous minor leaguer
profiled in Will Carroll’s book The Juice, who used steroids at a time
when he was struggling to maintain his status as a credible major
league prospect, expressed this calculation succinctly: “Look, if you
told me shooting bull piss was going to get me ten more home runs,
Baseball Between the Numbers is now available in major bookstores nationwide. To order the book online, click here.