CSS Button No Image Css3Menu.com

Baseball Prospectus home
Click here to log in Click here for forgotten password Click here to subscribe
<< Previous Article
Premium Article Prospectus Today: Pass... (03/09)
Next Article >>
Premium Article Prospectus Matchups: L... (03/10)

March 9, 2006

Baseball Between the Numbers

What Do Statistics Tell Us About Steroids?

by Baseball Prospectus

The furor over Game of Shadows, the forthcoming book detailing Barry Bonds' alleged use of performance-enhancing drugs, has thrust the issue of steroids in baseball back into the spotlight. Baseball Prospectus' new book, Baseball Between the Numbers, includes a chapter titled "What Do Statistics Tell Us About Steroids?" Written by BP's Nate Silver, the chapter takes both a numerical and historical look at this hot-button topic. Read on to see what this chapter has to say about Baseball Between the Numbers' cover boy.


In December of 2004, with the frenzy over the BALCO investigation at its peak, Alan Schwarz of the New York Times asked Baseball Prospectus to assist him with an analysis of Barry Bonds and Jason Giambi. The idea was to use BP's projection system, Player Empirical Comparison and Optimization Test Algorithm (PECOTA), to compare how Bonds and Giambi might have been expected to perform based on their statistics up through 2000, against what actually happened to their careers from that point forward.

To retell the story: Entering the 2000 season, each of these players was at a career crossroads. Bonds would turn thirty-five that year--the age at which even great players can begin to struggle--and was coming off an injury-plagued season in 1999. Giambi was a slowfooted first baseman about to enter his thirties; he'd had a good season in 1999, but it looked like a career year. Instead of withering, however, both players blossomed. Giambi won the MVP Award in 2000, and Bonds set a career high in home runs, launching an upward trajectory that would see him rewrite baseball's record books. Needless to say, PECOTA found that Bonds and Giambi had far outperformed reasonable expectations. Bonds produced 142 more home runs between 2000 and 2004 than PECOTA would have guessed and hit .339 rather than the projected .272. Giambi produced 60 percent more home runs and 50 percent more RBI than PECOTA expected.

Lots of players have had unusual career paths, back from the days when ballplayers' drugs of choice were Schaefer Beer and Vitalis Hair Tonic. Starting in 1953, a twenty-eight-year-old Ted Kluszewski, who had averaged just 15 home runs a season to that point in his big-league career, reeled off consecutive seasons of 40, 49, and 47. In 1973, Davey Johnson, who had just turned thirty, hit 43 home runs; he had never hit more than 18 before (and would never hit more than 15 thereafter). Even Hank Aaron defied expectations. In 1971, a season in which he missed more than twenty games, he set a career high in home runs with 47. Aaron was thirty-seven years old at the time.

It is natural to tie together cause and effect. These days, it has become just as natural to attribute any unexpected change in performance to ulterior motives. Eric Gagne adds 5 mph to his fastball? He's juicing. Albert Pujols, who was considered a second-tier prospect, bursts onto the scene with a performance worthy of Joe DiMaggio? He's juicing--unless he faked his birth certificate. Sammy Sosa? Not only was he juicing, he was also corking his bat, using a laser-eye mechanism in his batting helmet, and bribing the opposing pitcher to throw him hanging sliders.

The Indirect Evidence

One way to examine this question is by looking at what I'll call a Power Spike. A Power Spike occurs when a player "suddenly" starts hitting home runs more frequently than he used to. More specifically, we can define a Power Spike as follows:

  1. A player is an established major league veteran, at least twenty-eight years old, with at least 1,000 plate appearances (PA) accumulated between his previous three seasons; and
  2. The player improves upon his established home-run rate by at least 10 HR per 650 PA, in a season in which he had at least 500 PA.

We can look at the frequency of Power Spikes throughout different eras in baseball's recent history. Although there are many permutations in how we might define such eras, I prefer the following:

  • Golden Age (1949-1957). Runs from the complete reestablishment of baseball following World War II until the movement of the Giants and Dodgers from New York to California in the 1958 season. A last period of stability featuring relatively high levels of offense.

  • Expansion Era (1958-1969). Coincides with the westward expansion of baseball, the expansion in the number of franchises (from sixteen to twenty-four during this period), and the full racial expansion of the sport. The instability off the field is paralleled by instability in offensive levels, which varied maniacally from year to year.

  • Dynasty Era (1970-1976). The period immediately preceding the implementation of full-blown free agency in 1977. Three great dynasties--those of the Cincinnati Reds, Oakland A's, and Baltimore Orioles--accounted for six of the seven World Series championships during the period and nine of the sixteen league pennants. Offense was relatively low, prompting the American League to implement the DH in 1973.

  • Balanced Era (1977-1985). The 1977 season was marked by a sharp increase in offense as a result of the expansion to 26 clubs and a new manufacturer of baseballs. The offensive improvement brought the game back into balance, and the era is remembered for the wide variety of styles that prospered during the period.

  • Canseco Era (1986-1993). Begins with Jose Canseco's Rookie of the Year award in 1986 and ends with the last full season before the 1994 strike. The Canseco Era saw the resumption of large year-to-year fluctuations in offensive levels. The 1987 season, in particular, featured the highest levels of run scoring seen in either league since the 1950s.

  • Juiced Era (1994-2004). One of the great boom periods in baseball history, along with the Roaring '20s. Offensive levels improved sharply between 1993 and 1995, escalated further in 1999, and have remained high since then. Associated with small ballparks, small strike zones, and the allegation of widespread steroid usage.

Table 9-1.1 provides the average number of runs and home runs produced per game in each era.

TABLE 9-1.1 Average Number of Runs and Home Runs Produced per Game in Different Eras

                     American League   National League
Era                        R/G  HR/G   R/G  HR/G
Golden Age (1949-1957)    4.50  0.73  4.48  0.89
Expansion Era (1958-1969) 4.09  0.86  4.11  0.81
Dynasty Era (1970-1976)   4.03  0.73  4.11  0.71
Balanced Era (1977-1985)  4.44  0.85  4.10  0.69
Canseco Era (1986-1993)   4.50  0.89  4.15  0.77
Juiced Era (1994-2004)    5.06  1.12  4.68  1.04

Tracking the number of Power Spikes is relatively simple, once we have these definitions in place. Figure 9-1.1 presents the frequency of Power Spikes per 100 eligible hitters in each of our six eras. The dashed line in Figure 9-1.1 indicates the average frequency of Power Spikes between 1949 and 1993--about 5.8 per 100 hitters. Since 1994, the frequency has increased to 9.1 per 100 hitters. Just how much emphasis you want to place on the increase is a matter of perspective. Power Spikes have been 57 percent more common during the Juiced Era than they had been previously, which is certainly statistically significant. On the other hand, some number of Power Spikes has always occurred, and the difference amounts to only a handful of "extra" Power Spikes per season.

In some sense, however, Figure 9-1.1 is telling us something that we already knew. We know that there has been an increase in home runs in recent seasons, and that somebody has to be responsible for providing those extra home runs. If home runs have become easier to hit for some reason other than steroids, be it smaller ballparks, inferior pitching, juiced baseballs, or something else, then Power Spikes will be easier to come by.

In fact, if we rerun the numbers to account for macroscopic changes to the offensive environment, then the increase in Power Spikes disappears. Figure 9-1.2 presents the same information but incorporates an adjustment for league and park effects rather than using raw totals. More specifically, all the historical home run numbers are adjusted to the standards of the 2004 American League. There were about 20 percent more home runs hit per game in the 2004 AL, for example, than there were in 1986. So a player who hit 30 home runs in 1986 is credited with 36 adjusted home runs (20 percent more). An identical technique is applied to account for park effects (Figure 9-1.2). By this definition, Power Spikes have been neither any more nor any less frequent in the Juiced Era than in previous periods.

Figure 9-1.1

FIGURE 9-1.1 Power Spikes per 100 hitters in different eras

Figure 9-1.2

FIGURE 9-1.2 Power Spikes per 100 hitters in different eras, adjusted for park and league effects

Instead, the period that stands out is the Dynasty Era of the early and mid-'70s, which interestingly enough corresponds with the widespread introduction of "greenies" (amphetamines) into major league clubhouses. Then again, perhaps the league adjustment is not the right thing to do after all. This gets to what I call a "chicken-and-egg" problem: Are there more home runs hit because there are more Power Spikes? Or are there more Power Spikes because there are more home runs? One way to refocus the question is to look at which hitters are responsible for the increase in home runs. Are home runs up because shortstops who look like Bugs Bunny are suddenly turning in 20-homer seasons? Because players like Barry Bonds and Mark McGwire, who were already very good, have taken their power output to unprecedented levels? Or is the difference felt universally--a rising tide lifts all boats?

Figure 9-1.3 returns to the unadjusted data set but breaks the frequency of Power Spikes down based on the number of home runs that the player had hit previously. We call this his "established" home-run rate--his frequency of home runs per 650 PA in the three seasons before the Power Spike occurred. The figure is further broken down between the Juiced Era and the "Pre-Juiced" years of 1949-1993.

Figure 9-1.3

FIGURE 9-1.3 Power Spikes per 100 hitters, compared to established home-run rates, 1949-1993 and post-1993

This figure reveals something very interesting: Power Spikes have occurred more frequently in the Juiced Era, but the increase in frequency is almost entirely attributable to certain types of hitters. In particular, Power Spikes have become more frequent among hitters with average power--those guys who will hit more than 10 home runs but fewer than 30 in a typical season. Power Spikes have not become more frequent among hitters who have no power at all. It has never been very common for a hitter who has a weak, slap-hitting swing to transform into a power threat, and it is no more common today.

But there is also no increase in Power Spikes among players who were already very good power hitters, capable of hitting at least 30 home runs per year. Sometimes a very good power hitter will turn into an insanely great one, as Bonds and McGwire did. But this is no more common today than it had been previously. The players who have been most responsible for the Juiced Era home-run boom are the middle-of-the-road players: those guys who used to hit 15 or 20 homers a season and are now hitting 25 or 30.

The typical steroid user might not be the prima donna slugger who endorses Budweiser between innings but the "hardworking late bloomer" who is struggling to maintain his spot in the lineup or is trying to leverage a good season into a big free-agent contract. Certainly these players might have more economic incentive to enhance their performance, as compared to their counterparts who have already signed multiyear, guaranteed major league contracts. Among professional athletes, the decision about whether to use steroids is not a result of locker-room peer pressure but rather a relatively rational calculation about the medical, moral, and financial costs and the risk of getting caught as compared to the potential upside. In that sense, it is just like any other form of cheating. The anonymous minor leaguer profiled in Will Carroll's book The Juice, who used steroids at a time when he was struggling to maintain his status as a credible major league prospect, expressed this calculation succinctly: "Look, if you told me shooting bull piss was going to get me ten more home runs, fine."

Baseball Between the Numbers is now available in major bookstores nationwide. To order the book online, click here.

0 comments have been left for this article.

<< Previous Article
Premium Article Prospectus Today: Pass... (03/09)
Next Article >>
Premium Article Prospectus Matchups: L... (03/10)

Expert FAAB Review: Week 21
Premium Article Minor League Update: Games of August 21
What You Need to Know: 88-35
Short Relief: The Wonderful Vulgarity of Mia...
Premium Article Notes from the Field: August 22, 2017
Premium Article Rubbing Mud: They Might Be Rebuilding
Liner Notes: Kenny, Boomer, Bobo, and the Ho...

Premium Article Prospectus Today: Passion
Fantasy Article LABR of Love, v2006
Premium Article Future Shock: State of the Systems: NL East
Fantasy Article Team Health Reports: Toronto Blue Jays
Player Forecast Manager

2006-03-30 - Preseason Predictions
2006-03-22 - Premium Article From The Mailbag: Cubans, Standards of Great...
2006-03-16 - Prospectus Notebook: Braves, Marlins
2006-03-09 - Baseball Between the Numbers
2006-02-27 - Prospectus Notebook: Red Sox, Cubs
2006-02-17 - Prospectus Notebook: Padres, Rangers
2006-02-16 - Prospectus Notebook: Reds, Royals