In the pages of yesterday’s Boston Globe, veteran sports reporter Bob Ryan declared war on WAR. We get that one a lot. But the unusual part of this particular declaration was that it was based on the belief that the “RP” in WARP—for “replacement player”—was a "judgment call" rather than the product of a mathematical formula. Ryan argued that the "replacement level" comparison, as currently constituted, is just a matter of opinion, and therefore arbitrary and unreliable. It's not often that we’re told that we’re not using enough math.
It seems that Mr. Ryan might be misunderstanding what replacement level is and how it’s calculated, mistaking a mathematical abstraction for "something that we make up as we go along." In fact, replacement level is the result of a perfectly logical calculation. So let me take a moment to set the record straight.
WARP seeks to answer this basic question: If Smith suddenly vanished from the face of the earth, how much production would his team lose as a result? The general idea is that his team would do the best that it could, either promoting a guy from the bench to the starting job, bringing someone up from the minors, or signing a scrap-heap free agent who plays the same position. It wouldn’t get the same production it would have gotten from Smith, but it would get something. We need a way to compare the value that Smith supplies to the value of these guys on the bench, in the minors, and on the scrap heap.
Mr. Ryan correctly points out that WARP converts a player's exploits on the diamond into run values, and includes his hitting, defense, and baserunnng contributions for hitters. We might say that Mike Trout contributed five billion runs (okay, the number might have been slightly smaller) to the Angels last year, all told. But to what shall we compare him? A summer's day? No, we compare him to the value of the "replacement players," who are the bench/minor league/scrap heap guys. Because Trout played center field last year, we need to find all the bench/minors/scrap heap center fielders out there. The 30 guys who led their teams in time spent in CF don't count. But everyone else who primarily played center field (i.e., that was the time where he personally spent the most time) does. We can look to see how much value these guys collectively brought to their teams.
Had Trout himself disappeared, the Angels probably would have responded by playing Peter Bourjos and Torii Hunter more often. But we don't want to credit or blame Trout for the presence of other players who just happen to be on his team, so we take an average of what everyone else's bench players might have done in Trout's place, rather than compare him just to the Angels’ backup options. Then we look at how much value those backup center fielders, on average, would have provided in the amount of time that Trout played last year.
Replacement level is a mathematical abstraction in that no such "replacement player" actually exists—you can’t point to Larry over there and say that he is the gold standard of replacement level. But really, a replacement player is just the per plate appearance (or per inning) mathematical (weighted) average performance of all backup center fielders, multiplied by the number of plate appearances (or innings) that Trout (or any other player whose value we want to assess) played.
In using this composite sketch of the state of backups in MLB, we trade the ability to answer the question, "What really would have happened to the Angels if Trout had vanished into thin air?" for the ability to compare everyone in MLB against a common baseline. Depending on the question that you want to answer, this may or may not be a beneficial assumption. It has advantages and disadvantages, but I'd argue that the advantages have more weight here.
If you'd like to take issue with how WAR defines value (and the assumptions inherent in it), then that's fine. If you'd like to take issue with the methodology used to calculate it, perhaps to say that the math and the definition don't fully match, that's fine too. A good scientist—and I consider myself to be a proper scientist—should give a fair hearing to a reasonable argument. But as always, we've started with a reasonable definition of what we're looking for, tried to create the best mathematical model that we can based on that definition, and then let the numbers fall where they will. That’s a better approach than making it up as we go along.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
One thing I've wondered about from time to time is how well a team of all replacement players would fare against the rest of the league. Also, does aggregate team WARP scale linearly against team wins? That is, do excellent teams perform better than their total WARP compared to a replacement level team, perhaps because of lineup synergy or having unusually good bench players?
Let's be honest here: there IS a component of judgment -- or if you prefer, "subjectivity" -- as to where replacement level resides. If there wasn't, everybody's values for players' WAR (or WARP) would be the same. They aren't. Most of the time they are fairly similar, but there are occasional extreme outliers. This is in contrast to traditional stats, where at least everyone's understanding of batting average, HRs, RBIs, ERA, etc., is exactly the same (which is not to say that that understanding then translates to a correct understanding of value).
Furthermore, we ARE making it up as we go along. There is constant fine tuning of the calculation of WAR/WARP/whatever. I submit that this is a good thing, not a bad thing; it means that we are continuing to think hard about what it means to be good at baseball. The "making it up" is on a long-term basis, rather than just pulling numbers and formulas out of some body orifice that are different today than yesterday so that we can "demonstrate" the superiority of the guy we perceive today as being the best player. (Ryan apparently doesn't get that.) But it still goes on.
This doesn't mean that I agree with Ryan; at the 90% level, I don't. However, I do agree with him that it's important to avoid overclaiming what WAR can tell us about players, and about baseball.
(Incidentally, I'm also a physicist.)
There are a LOT of things that differentiate WAR calculations aside from the chosen replacement level:
http://www.baseball-reference.com/about/war_explained_comparison.shtml
I'd assume that the new RL would be lower than the old one, but the setting on the "replacement level-o-stat" would need to be turned down, so that your new replacement-level player would still be ~0.0 WAR / WARP / WOW. You would then no longer be comparing apples to apples.
Logic tells us the replacement level would be lower - it would have to be, since a number of the previous season's "replacements" were now regulars, or in the major leagues. But something else happens in expansion seasons - the best players and pitchers take advantage of the watered down league. Watching the 1962 Mets against quality pitchers was truly sad (not for the pitchers, of course). So better players are having better seasons because they're facing replacement level players, then their WARP increases more because the replacements also got weaker.
The reverse of this should have happened between 1947 and 1960, where integration added a steadily increasing stream of quality players to the mix. Players who were regulars were pushed to the bench or the minors, which improves the level of replacement player, while at the same time pitchers had to deal with Jackie Robinson and Willie Mays, while hitters had to face Don Newcombe.
Maybe WAR/WARP/whatever accounts for this in some way. I'd be curios as to how.
"We are going to WARP on this thing."
Contrast this with the scenario where a replacement-level player gets injured. Typically he can be replaced by another player who would be willing to play for the league minimum. So, in a sense, a full season of 0 WAR and a half season of 0 WAR have the same value, whereas the same is not said of a full season of WAA vs. half a season of WAA.
There can be legitimate reasons to favor other baselines of course though.