We are getting close to finishing up the revisions to Wins Above Replacement Player, and before we get to wrapping up the pitchers, I want to step back for a second and take a look at replacement level from a different perspective.

One of the important points of analyzing baseball is that everything important happens at the team level. This isn’t to say individual players are unimportant. You can’t have a baseball team without them, but you can only win or lose as a team, not as an individual. Having a system that makes sense at the team level is a necessary (but not sufficient) criteria for having a good system of evaluating baseball players at the individual level.

But where do you find a replacement-level team?

Oh, sure, baseball fans can readily supply you with examples of teams that—in a word—sucked. The 2003 Detroit Tigers, for instance. Or, if you want to go all the way back, the 1899 Cleveland Spiders (the Spiders can perhaps be excused, having been sabotaged by their owners and later disbanded—proving once again that those who fail to learn from history are doomed to Jeffrey Loria).

But that only tells us who played poorly. A decent team can find itself dragged down by a spate of terrible misfortune; the ’04 Tigers were certainly not a great team, but they were a far sight better than the ’03 Tigers while sharing many of the same players (and the ’02 Tigers were a very bad team but still 12 wins above the ’03 team, again with many of the same players in common). The ’03 Tigers may be a helpful cautionary tale as to the worst possible outcome for a team, but they don’t tell us much about the most likely outcome for a team full of replacements.

What is more helpful to us is finding teams that were filled with players nobody else wanted to begin with. History very helpfully provides us with a serviceable set of such teams—ones populated primarily through expansion drafts. These aren’t perfect examples, of course—sometimes good players get moved in expansion drafts (the Rays picked up Bobby Abreu, for instance, though of course they immediately traded him for Kevin Stocker). But they should serve as useful guides, so long as we realize that they’re imperfect examples.

There are two things we have to be careful of here. One is comparing teams that played in 1962 to teams that played in 1998. We need to control for run environments. The other thing we have to look out for is that, among the 14 expansion teams from 1961 to now, one of them was the Colorado Rockies. So if you don’t control for park effects, you see a definite skew to the analysis.

What we see looking at those teams is that they had an average winning percentage of .375. That’s a nice, round figure—probably a bit higher than we think a team full of replacement-level players would fare but still a pretty bad team. The best of them, the ’61 Angels, posted a .435 record; the worst, the ’62 Mets, posted a .250 record.

In terms of runs scored, the teams produced at about 86 percent of the league average; in terms of a league where 4.5 runs per game is average, they would have scored roughly 3.9 runs per game. Things look a little better on defense, where they allowed runs at 111 percent of the league average; in the same league, they would have allowed five runs per game (this means that these teams modestly under-performed their Pythagorean records, for those curious).

I’ve talked about offensive replacement levels before—I don’t think anything from the replacement-level teams disagrees strongly with that analysis (again, these teams are a little better than what we think teams of ONLY replacement players would be). Looking at run prevention seems to be more interesting.

Expansion teams were wholly average in the BABIP department—a victory for DIPS theory. It’s also an indicator that those teams were average on defense, which squares with the current conception of replacement level for position players including average performance on defense.

And so now we move to pitching. Starting pitchers for expansion teams allowed (on a rate basis) 113 percent as many runs as the average starting pitcher for those seasons. That works out to a 5.22 RA, in a 4.5 RA league (the average starting pitcher is in fact slightly worse than the league average overall—more on that in a minute). That’s substantially better than the current conception of a replacement pitcher in WARP—in other words, right now our WARP values are systemically overrating starting pitching.

This is because up until now we’ve assumed that a replacement-level offense and a replacement-level defense were proportionate to each other—if a replacement-level team scored 80 percent of the runs an average team scored, they would allow 120 percent as many runs. But primarily because of the replacement level for fielding being average, that finding is not borne out in the data.

Another interesting feature is that these starters don’t pitch as many innings as the league-average—as a whole, starters on expansion teams pitched 94 percent as many innings as did starters on the average team. That puts more of the burden on the bullpen.

This is probably a net positive for these expansion teams, as (outside of defense) the bullpen was the closest they came to being average, allowing only 109 percent as many runs as the average team’s relievers on a rate basis. In a 4.5-run-per-game league, that works out to an RA of 4.88, nearly half a run less than the starting pitchers. That’s because relievers tend to pitch better than the league average (This is not to say that they are better pitchers—they most clearly aren’t, or most of them would have starting jobs. This is to say that relievers, who face batters less often and see favorable platoon match-ups more often, are at an advantage compared to starters).

There is a confound here, of course—how teams use relievers has changed drastically over that time period. There is a danger, then, in extrapolating these findings out too far. In the late '60s and early '70s, relievers did not enjoy anywhere near as great an advantage as they do in the modern era. In fact, in 1969 (when four of the 14 expansion teams were added) starters actually out-pitched relievers. So here, then, the matter deserves more careful attention, and there simply aren’t enough expansion teams to do that. As with the hitters, we’ll next cover looking at replacement pitchers (starters and relievers) on the individual level. 

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
I don't understand the premise here. As you acknowledge, there has never been a team consisting solely of replacement-level players. The expansion drafts are the non-exception that proves the rule: Existing teams were only allowed to protect 15 players, with the result that drafted players were generally the 16th-18th man on a team's roster, which is above replacement level by definition.

So why does looking at bad teams consisting of somewhat-above-replacement-level players help us understand the which individual players are replacement level? If we're looking for real-world applications of replacement level, why not focus on actual replacement players, which would seem easy enough to define?
It's not clear to me that a -14% finding for hitters and a +13% finding for starting pitchers (or +11% for all pitchers) suggests a lack of symmetry for replacement-level. Seems pretty close given the limitations of the methodology. And a crucial difference is the ability to assign playing time more efficiently among pitchers. Let's say each expansion team has an average of 2 above-replacement hitters and 2 above-replacement pitchers. The hitters might account for 20% of the team's PA. But two so-so starting pitchers could account for 30% of opposing hitter PAs. So assuming these teams are not truly 100% replacement players, I'd expect the pitching to look a little less bad.

You also have a potential survivor bias at work here. Expansion teams get to throw a lot of guys out there and see who succeeds. The expansion pool includes a few better-than-replacement players, and these will then get a disproportionate amount of playing time. So these team performances will be skewed upward, although that presumably affects hitting and pitching equally.
Something that been on my mind lately - if a certain amount of WARP should be equal to a win (all else being equal), what does it say about teams/managers/WARP (the concept) if a given team's total WARP is not commensurate with their WARP-estimated win total? Furthermore, how different is that to comparing actual wins to 1st-, 2nd- and 3rd-order wins?
Colin, great article.

With pitching, and especially reiever, performance being more volitile than hitting; and maybe pitching has a flatter distribution of "true ability/performance"...wouldn't it be counter-intuitive to expect replacement level pitching to be the same as replacement level hitting? Isn't the average AAA relief pitcher far closer in ability to MLB avaerage than the average AAA position player?