September 19, 2013
The Importance of a Living Replacement Level
Last week, we talked about what replacement level is, and why we need it. Now let’s talk about who replacements are, and how to find them.
What we uncovered last time is that, rather than being some wholly arbitrary baseline concocted by evil sabermetric geniuses, replacement level (and the consequences of replacement level-based analysis) in fact grow out of something fundamental to the game: the distribution of talent in baseball, and the limited number of roster spots available. So the question becomes, how do we measure the distribution of talent? What are some things we need to make sure we’re capturing?
Sabermetrics is a field of inquiry open to just about anyone, and while the field is known for its own share of geniuses, occasionally one comes to it on loan from other fields. Stephen Jay Gould was a prominent paleontologist and evolutionary biologist, but he was also a passionate baseball fan. And Gould noticed something interesting, as recounted in his essay "Why No One Hits .400 Any More."
When we contrast these numbers of the past and present, we encounter the well-known and curious phenomenon that inspired this article: great players of the past often stand further apart from their teammates. Consider only the principle measures of hitting and pitching: batting average and earned run average. No one has hit .400 since Ted Williams reached .406 nearly half a century ago in 1941, yet eight players exceeded .410 in the fifty years before then. Bob Gibson had an earned run average of 1.12 in 1968. Ten other pitchers have achieved a single season ERA below 1.30, but before Gibson we must go back a full fifty years to Walter Johnson’s 1.27 in 1918. Could the myths be true after all? Were the old guys really better? Are we heading toward entropic homogeneity and robotic sameness?
As Gould notes, it seems unlike that players of old really were better than players of now. “We live better, eat better, provide more opportunity across all social classes,” he said. “Moreover, the pool of potential recruits has increased fivefold in one hundred years by simple growth of the American population.” And baseball is no longer a wholly American game; Major League Baseball now can pull the best talent from across the world.
It is difficult to attribute the absence of .400 hitters to changes in the game on the whole—while the league batting average has fluctuated since the game began, it has been reasonably stable over time. And the American League hit .266 the year Williams hit .406, compared to .256 this year and .255 the year before. A decrease, yes, but an altogether modest one. We don’t have to guess at the increasing physical improvement of our athletes—we can measure it objectively in sports like track, where an athlete performs against a clock. So why can’t see it in baseball? Gould offers this explanation:
In a system with relative standards (person against person) – especially when rules are subtly adjusted to maintain constancy in measures of average performance – this general improvement is masked and cannot be recovered when we follow our usual traditions and interpret figures for average performances as measures of real things. We can, however, grasp the general improvement of systems with relative standards by a direct study of variations – recognizing that variation itself is a decline in variation. Paradoxically, this decline produces a decrease in the difference between average and stellar performance. Therefore, modern leaders don’t stand so far above their contemporaries. The “myth” of ancient heroes—the greater distance between average and best in the past—actually records the improvement of play through time.
In other words, what Gould found was evidence that while the best baseball players of today are better than the best baseball players of years past, the worst baseball players have been improving at an even greater rate than the best.
We can observe the same behavior at the team level. Let’s look at the standard deviation of team wins by year:
The blue line represents the raw standard deviation; the orange line uses the binomial distribution to model a “true” standard deviation that controls for the number of teams in the league and the number of games played. Both graphs tell a very similar story—the distance between the best and worst baseball teams has shrunk substantially over the history of the game.
What this tells us is that the sort of freely-available talent in 2013 is much, much better than the sort of freely-available talent available in 1913. And this has significant consequences for how we evaluate players when we use freely-available talent as the baseline for our value systems.
There’s also been another substantial change in the game. Let’s look at the rate of the so-called three true outcomes (walks, strikeouts and home runs) over time:
From 1872 to 1875, the league averaged roughly three percent of plate appearances having a “true outcome.” In 2012, over thirty percent of plate appearances ended in the same way. The three true outcomes are so called because they are largely immune to the interference of the fielders (home run grabs off the wall notwithstanding), being wholly in the purview of the batter-pitcher matchup. (Although—and we’ll get to this later—it may be better to think of it as the batter-battery matchup.)
The importance of fielding has declined in the game steadily and dramatically over time; by the same token, the importance of pitching has increased as well. At the same time, the number of pitchers has increased dramatically. Looking at pitchers per game:
What we see is a four-fold increase in the pitchers used in a major-league game over the game’s history. (It is tempting to draw a connection between the way pitchers are used and the rate of true outcomes over time, and while there are probably other factors in play as well, I suspect there is a causal link there.) Behind this change in pitcher usage is the simple fact that the longer a pitcher is in a game, the less effective he is.
We call replacement players that because they’re often used to replace the player a team would otherwise use if they could. But consider a starting pitcher. We know that the better starters will pitch deeper into games, because even though they decline over the course of a game, they start from a higher point and thus can remain effective longer. So if a team has to replace a pitcher who can reliably pitch six or seven innings a game, they won’t force their replacement starter to pitch the same number of innings. Instead, they’ll figure out the correct amount of innings for that pitcher to pitch in a game given his abilities, and they’ll fill in the innings gap with relief pitcher performance. What this means is that a starting pitcher does not have only one replacement in the modern era, even at the level of the single game; he can have two or three or more replacements. Taken together, all this means that the replacement level for pitching is going to shift more dramatically over time than the replacement level for hitting.
This has two consequences. One is that replacement level needs to shift over time in order to properly account for the split between pitching and hitting. The other is that comparing raw performance above replacement for players across different seasons will tend to overrate players from older seasons. We’ll investigate both conclusions next week.