March 9, 2006
Baseball Between the Numbers
What Do Statistics Tell Us About Steroids?Game of Shadows, the forthcoming book detailing Barry Bonds' alleged use of performance-enhancing drugs, has thrust the issue of steroids in baseball back into the spotlight. Baseball Prospectus' new book, Baseball Between the Numbers, includes a chapter titled "What Do Statistics Tell Us About Steroids?" Written by BP's Nate Silver, the chapter takes both a numerical and historical look at this hot-button topic. Read on to see what this chapter has to say about Baseball Between the Numbers' cover boy.
In December of 2004, with the frenzy over the BALCO investigation at its peak, Alan Schwarz of the New York Times asked Baseball Prospectus to assist him with an analysis of Barry Bonds and Jason Giambi. The idea was to use BP's projection system, Player Empirical Comparison and Optimization Test Algorithm (PECOTA), to compare how Bonds and Giambi might have been expected to perform based on their statistics up through 2000, against what actually happened to their careers from that point forward.
To retell the story: Entering the 2000 season, each of these players was at a career crossroads. Bonds would turn thirty-five that year--the age at which even great players can begin to struggle--and was coming off an injury-plagued season in 1999. Giambi was a slowfooted first baseman about to enter his thirties; he'd had a good season in 1999, but it looked like a career year. Instead of withering, however, both players blossomed. Giambi won the MVP Award in 2000, and Bonds set a career high in home runs, launching an upward trajectory that would see him rewrite baseball's record books. Needless to say, PECOTA found that Bonds and Giambi had far outperformed reasonable expectations. Bonds produced 142 more home runs between 2000 and 2004 than PECOTA would have guessed and hit .339 rather than the projected .272. Giambi produced 60 percent more home runs and 50 percent more RBI than PECOTA expected.
Lots of players have had unusual career paths, back from the days when ballplayers' drugs of choice were Schaefer Beer and Vitalis Hair Tonic. Starting in 1953, a twenty-eight-year-old Ted Kluszewski, who had averaged just 15 home runs a season to that point in his big-league career, reeled off consecutive seasons of 40, 49, and 47. In 1973, Davey Johnson, who had just turned thirty, hit 43 home runs; he had never hit more than 18 before (and would never hit more than 15 thereafter). Even Hank Aaron defied expectations. In 1971, a season in which he missed more than twenty games, he set a career high in home runs with 47. Aaron was thirty-seven years old at the time.
It is natural to tie together cause and effect. These days, it has become just as natural to attribute any unexpected change in performance to ulterior motives. Eric Gagne adds 5 mph to his fastball? He's juicing. Albert Pujols, who was considered a second-tier prospect, bursts onto the scene with a performance worthy of Joe DiMaggio? He's juicing--unless he faked his birth certificate. Sammy Sosa? Not only was he juicing, he was also corking his bat, using a laser-eye mechanism in his batting helmet, and bribing the opposing pitcher to throw him hanging sliders.
The Indirect Evidence
One way to examine this question is by looking at what I'll call a Power Spike. A Power Spike occurs when a player "suddenly" starts hitting home runs more frequently than he used to. More specifically, we can define a Power Spike as follows:
We can look at the frequency of Power Spikes throughout different eras in baseball's recent history. Although there are many permutations in how we might define such eras, I prefer the following:
TABLE 9-1.1 Average Number of Runs and Home Runs Produced per Game in Different Eras
American League National League Era R/G HR/G R/G HR/G Golden Age (1949-1957) 4.50 0.73 4.48 0.89 Expansion Era (1958-1969) 4.09 0.86 4.11 0.81 Dynasty Era (1970-1976) 4.03 0.73 4.11 0.71 Balanced Era (1977-1985) 4.44 0.85 4.10 0.69 Canseco Era (1986-1993) 4.50 0.89 4.15 0.77 Juiced Era (1994-2004) 5.06 1.12 4.68 1.04
Tracking the number of Power Spikes is relatively simple, once we have these definitions in place. Figure 9-1.1 presents the frequency of Power Spikes per 100 eligible hitters in each of our six eras. The dashed line in Figure 9-1.1 indicates the average frequency of Power Spikes between 1949 and 1993--about 5.8 per 100 hitters. Since 1994, the frequency has increased to 9.1 per 100 hitters. Just how much emphasis you want to place on the increase is a matter of perspective. Power Spikes have been 57 percent more common during the Juiced Era than they had been previously, which is certainly statistically significant. On the other hand, some number of Power Spikes has always occurred, and the difference amounts to only a handful of "extra" Power Spikes per season.
In some sense, however, Figure 9-1.1 is telling us something that we already knew. We know that there has been an increase in home runs in recent seasons, and that somebody has to be responsible for providing those extra home runs. If home runs have become easier to hit for some reason other than steroids, be it smaller ballparks, inferior pitching, juiced baseballs, or something else, then Power Spikes will be easier to come by.
In fact, if we rerun the numbers to account for macroscopic changes to the offensive environment, then the increase in Power Spikes disappears. Figure 9-1.2 presents the same information but incorporates an adjustment for league and park effects rather than using raw totals. More specifically, all the historical home run numbers are adjusted to the standards of the 2004 American League. There were about 20 percent more home runs hit per game in the 2004 AL, for example, than there were in 1986. So a player who hit 30 home runs in 1986 is credited with 36 adjusted home runs (20 percent more). An identical technique is applied to account for park effects (Figure 9-1.2). By this definition, Power Spikes have been neither any more nor any less frequent in the Juiced Era than in previous periods.
FIGURE 9-1.1 Power Spikes per 100 hitters in different eras
FIGURE 9-1.2 Power Spikes per 100 hitters in different eras, adjusted for park and league effects
Instead, the period that stands out is the Dynasty Era of the early and mid-'70s, which interestingly enough corresponds with the widespread introduction of "greenies" (amphetamines) into major league clubhouses. Then again, perhaps the league adjustment is not the right thing to do after all. This gets to what I call a "chicken-and-egg" problem: Are there more home runs hit because there are more Power Spikes? Or are there more Power Spikes because there are more home runs? One way to refocus the question is to look at which hitters are responsible for the increase in home runs. Are home runs up because shortstops who look like Bugs Bunny are suddenly turning in 20-homer seasons? Because players like Barry Bonds and Mark McGwire, who were already very good, have taken their power output to unprecedented levels? Or is the difference felt universally--a rising tide lifts all boats?
Figure 9-1.3 returns to the unadjusted data set but breaks the frequency of Power Spikes down based on the number of home runs that the player had hit previously. We call this his "established" home-run rate--his frequency of home runs per 650 PA in the three seasons before the Power Spike occurred. The figure is further broken down between the Juiced Era and the "Pre-Juiced" years of 1949-1993.
FIGURE 9-1.3 Power Spikes per 100 hitters, compared to established home-run rates, 1949-1993 and post-1993
This figure reveals something very interesting: Power Spikes have occurred more frequently in the Juiced Era, but the increase in frequency is almost entirely attributable to certain types of hitters. In particular, Power Spikes have become more frequent among hitters with average power--those guys who will hit more than 10 home runs but fewer than 30 in a typical season. Power Spikes have not become more frequent among hitters who have no power at all. It has never been very common for a hitter who has a weak, slap-hitting swing to transform into a power threat, and it is no more common today.
But there is also no increase in Power Spikes among players who were already very good power hitters, capable of hitting at least 30 home runs per year. Sometimes a very good power hitter will turn into an insanely great one, as Bonds and McGwire did. But this is no more common today than it had been previously. The players who have been most responsible for the Juiced Era home-run boom are the middle-of-the-road players: those guys who used to hit 15 or 20 homers a season and are now hitting 25 or 30.
The typical steroid user might not be the prima donna slugger who endorses Budweiser between innings but the "hardworking late bloomer" who is struggling to maintain his spot in the lineup or is trying to leverage a good season into a big free-agent contract. Certainly these players might have more economic incentive to enhance their performance, as compared to their counterparts who have already signed multiyear, guaranteed major league contracts. Among professional athletes, the decision about whether to use steroids is not a result of locker-room peer pressure but rather a relatively rational calculation about the medical, moral, and financial costs and the risk of getting caught as compared to the potential upside. In that sense, it is just like any other form of cheating. The anonymous minor leaguer profiled in Will Carroll's book The Juice, who used steroids at a time when he was struggling to maintain his status as a credible major league prospect, expressed this calculation succinctly: "Look, if you told me shooting bull piss was going to get me ten more home runs, fine."
Baseball Between the Numbers is now available in major bookstores nationwide. To order the book online, click here.