Lies, Damned Lies: Rethinking Replacement Level

We often interchange the phrases “freely available talent” and “replacement level player.” But what does a freely available player actually look like?

To answer this simple question, I performed a search for players for all players since 1985 that met the following criteria:

The player was paid no more than twice the league minimum salary
The player was at least 27 years old or had at least 1950 PA (three “seasons” of 650 PA) in his major league career.

The first criterion is straightforward enough–I built in a little bit of wiggle room for players that weren’t quite “free,” but were certainly available very cheaply. The second requires a bit more finesse, and is designed to eliminate players who were still subject to the reserve clause–Miguel Cabrera would meet the first criterion, but not the second. I’ve probably included a few players who were late-developing rookies, but most players who haven’t had a couple of years in the show by age 27 will have become eligible for minor league free agency or will be on their second or third organization.

The other “trick” is in figuring out how to treat playing time when evaluating these players’ statistics. Say that the Mariners’ Pacific Rim scout is on a flight with a pilot that has had far too much sake and happens to land in Yakutsk, Sibera instead of Sapporo, Japan. The intrepid scout can’t get a flight back until the next day and decides to take in a Yakutsk Yaks game, where he discovers a set of twin left-handed pitchers, Miroslav Borscht and Radoslav Borscht, who each hit 95 on his JUGS gun. The twins are given non-roster invitations, and an all-expenses paid trip to Peoria, Arizona. Miroslav turns out to be the Ozzie Canseco of the pair, with a weakness for flavored vodkas and Maricopa County’s finest topless bars, and is cut a week into camp, while Radoslav emerges as the team’s second starter.

Obviously, we’d have a selective sampling issue if we gave full credit to Radoslav’s performance, while forgetting about Miroslav entirely–one of the risks when signing an unknown player is that you may waste a significant number of at bats on him before you figure out that he’s not even qualified to carry Mario Mendoza‘s jock. The chosen solution was to “max” everyone’s playing time at the rookie minimum of 130 PA. So, even if the player played a full season, his statistics were weighted as though he’d only had 130 PA. If the player had fewer than 130 PA, then his playing time was taken as is.

Here are the weighted average performances of those players based on that criterion, and their primary defensive position. These statistics are normalized to a neutral park and a .270/.340/.440 league.

        BA     OBP     SLG
C     .238    .303    .373
1B    .251    .328    .416
2B    .253    .317    .373
3B    .249    .315    .391
SS    .244    .301    .352
LF    .252    .321    .396
CF    .250    .319    .375
RF    .254    .322    .409
DH    .268    .335    .446

My first reaction when I saw these numbers was “that looks about right.” That is, the numbers looked very close to what I’d have expected them to look like if we’d used another replacement level definition like VORP. However, there are some interesting differences in the relative placements of different positions, which we’ll discuss in a moment. In the meantime, let’s look at another part of the equation–positional defense.

       D-Rate
C       99.2
1B     101.5
2B      99.7
3B      99.7
SS      96.6
LF      98.9
CF     100.1
RF     100.1
DH       N/A

These are the weighted average defensive Rate performances, measured in terms of extra runs saved or allowed per 100 games. In general, the freely available players were about average defensive performers at their respective positions–and so it would be a mistake to “double credit” a player for being better than replacement level both in the field and at the plate. But a couple of positions are exceptions. Freely available first basemen were notably above average defensive performers, probably because replacement-type first basemen are often converts from the middle infield or the outfield who have will no trouble at all handling an easier position.

Freely available shortstops, on the other hand, cost their teams 3-4 runs per 100 games with their glove. I tend to think of shortstops like NBA point guards: there are 30 NBA teams, but perhaps only 18 or 20 players in the league at any given time who can really handle the point. Similarly, there are a finite number of players who can really play a major league average-to-plus shortstop. Why do shortstops exhibit this pattern and not, say, catchers? I suspect this is because a shortstop’s defensive skills are more likely to atrophy with age, meaning that the player will already have lost a step or two by the time he becomes a freely available commodity. Catchers, while their position is very difficult, don’t require much mobility and keep their throwing arm and game-calling skills intact more or less until they retire. This is why Alberto Castillo was in a major league uniform last year.

We can combine the offensive and defensive numbers to create an overall scorecard, converting the BA/OBP/SLG numbers into runs per 162 games by means of the Marginal Lineup Value formula. For snickers and giggles, I’ve also added basestealing to the mix. We’ll abbreviate this method FAT, for Freely Available Talent.

       Hitting    SB/CS     Defense     TOTAL
C       -28.6      -0.3       -1.3      -30.1
1B      -10.8      -0.3       +2.4       -8.7
2B      -22.9      +0.4       -0.5      -22.9
3B      -20.1      -0.2       -0.5      -20.7
SS      -33.0      +0.0       -5.5      -38.5
LF      -16.9      +0.2       -1.8      -18.5
CF      -22.2      +1.0       +0.2      -21.0
RF      -13.6      +0.2       +0.2      -13.3
DH       -0.4      -0.2                  -0.5

So, a shortstop that hits and fields at the league average should be credited with 38.5 runs above replacement under this method, and a league average first baseman 8.7 runs. How does the FAT approach compare to VORP, or the implicit positional ratings embedded in our WARP rankings? Here are the replacement level thresholds produced by the three metrics, taken as runs below average per 162 games. (Note: the VORP numbers change from year to year, depending on league average performances in a particular year. The figures included here represent the cumulative league averages across both leagues from 1996-2005).

       FAT      VORP     WARP
SS    -38.5    -32.9    -33.0
C     -30.1    -27.4    -39.0
2B    -22.9    -28.3    -29.0
CF    -21.0    -22.0    -24.0
3B    -20.7    -21.6    -22.0
LF    -18.5    -12.2    -14.0
RF    -13.3    -10.0    -14.0
1B     -8.7    -10.6    -10.0
DH     -0.5    -20.2      0.0

As I’ve said, the three metrics are in almost perfect agreement in the aggregate (although WARP may be subject to the double-counting phenomenon that I described before when Batting Runs Above Replacement are added to the mix). But the relative values of the positions are quite different.

FAT sees shortstop as far and away the most difficult position on the diamond, mainly because it’s able to recognize that freely available shortstops usually give something up with their gloves as well as with their bats. VORP still has shortstops in front, but by a smaller margin. WARP goes in the other direction, and gives much more credit to catchers.

FAT, however, is considerably more skeptical than the other two metrics about second basemen. Whereas VORP and WARP posit a 4-run difference between shortstops and second basemen, FAT puts the gap at about 15 runs. Remember when I suggested that it seemed strange that so many second basemen rated so highly in the PECOTA prospect rankings? Those rankings were derived based on offshoots of VORP and WARP, and so this may be the reason why.

At the risk of starting a firestorm within the authors group, let me argue that FAT gets it right on this particular question. Although the original formulation of VORP was based on looking explicitly at the performances of reserve players (very similar to the FAT approach), the version that we use more commonly takes the slight shortcut of backing into replacement level based on a comparison to league positional averages. The problem is that shortstop tends to be a feast-or-famine position: the shortstop is usually one of the very best players in the everyday lineup (Derek Jeter, Michael Young), or perhaps the very worst (Neifi Perez, Angel Berroa). Second basemen, on the other hand, tend to cluster around league average–you have your Placido Polancos and your Aaron Hills. The overall positional averages may not be that different, since the Jeters and Youngs lift the numbers, but they understate just how much easier it is to find a credible second baseman than a credible shortstop.

Psychologists talk about something called g, or General Intelligence Factor, the notion that abilities in certain seemingly unrelated mental fields are positively correlated. It seems probable that there is baseball analog to this, which we might call General Athletic Ability. That is, although the specific skills and motor abilities required to field a good defensive shortstop are quite different than those required to hit a curveball, a truly elite overall athlete will be able to do both things well. I suspect that major league shortstops, as a group, have quite a bit more g than major league second basemen. You could put Miguel Tejada at second base if you wanted to, and he’d still outhit pretty much everyone at the position, but there’s no reason to since his defense is more valuable at short. You couldn’t put Jeff Kent at shortstop, however, without your pitching staff chipping in on a bounty against you.

The three metrics are in strong agreement in their treatment of center fielders and third basemen. However, FAT sees more separation between first basemen and corner outfielders than its counterparts. Once again, I suspect this has to do with the practicalities of finding a freely-available player who can handle a given defensive position: virtually any outfielder can play a decent first base, but not the other way around. FAT does posit a fairly large difference between left fielders and right fielders, with LF rating as the more difficult position. This is a bit strange, since I can’t think of any reason why LF should be more difficult than RF, and is probably a sample size fluke. I’m open to hearing explanations to the contrary, however.

Finally, FAT works around any issues with designated hitters. DHs have been outhit as a group by outfielders and first basemen at most points in the recent past, and so a strict league average based notion of replacement level will actually give more credit to a DH than a LF or RF with the same statistics. But it’s very rare to see a truly awful regular DH, since teams have so much flexibility at the position.

If it sounds like I’m suggesting that we reinvent the replacement level wheel–well, I guess that I am. We’re fighting over table scraps and percentage points, but replacement level is so vital to what we do that the fight may be worth having.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Lies, Damned Lies: Rethinking Replacement Level

Thank you for reading

Latest Articles

BSB: A Good Day at the Office B

The Most Dominated Teams of All Time: 15-16 $

The Growing Disparity in Injuries $

Slopball: Even the Worst Teams Don’t Suck at Everything B

MLU: Jacob Gonzalez is Turning It Around $

Nate Silver

Latest Articles

BSB: A Good Day at the Office B

The Most Dominated Teams of All Time: 15-16 $

The Growing Disparity in Injuries $