The first edition of Prospectus Toolbox generated a lot of feedback, so today’s column is going to be in a mailbag format. First of all, I’d like to apologize for one bit of confusion raised by last week’s post. Some of the sample VORP sortables I provided used an all-years search function that I wasn’t aware was unavailable to our users. On the bright side, the next upgrade to our statistics page should be coming soon, and the emails on this topic have given me the chance to stress to the powers-that-be that this search function is something our readers really want. Hopefully, this is a toy we’ll all be able to play with soon.


One of the questions that came up a lot after last week’s discussion of VORP was how this is different from Wins Above Replacement Player (WARP), and why is it that we have two systems? I won’t go into as much detail about WARP at this time as I did with VORP last week, but I will briefly compare the qualities and limitations of VORP to WARP’s qualities. (I’ll do a much more extensive piece on WARP soon.)

WARP was created by Clay Davenport, and it is part of the Davenport Translations system. The DTs, as we like to call them, come from the perspective of trying to translate player performances in different contexts, and adjust them to a single standard. Using the translations, you can contrast the performances of a Deadball Era player operating in a pitcher’s park, like Joe Connolly with the 1914 Boston Braves, and a player operating under the opposite circumstances, like Ellis Burks on the 1997 Colorado Rockies.

WARP is measured in wins, rather than runs, but like VORP, the values are compared to the performance of a theoretical replacement player. Like VORP, WARP is a counting stat, it is ballpark adjusted, it factors in all components of run scoring (within its data set), and it includes pitchers as well as hitters, placing both on the same scale.

In addition to those qualities, which it shares with VORP, WARP addresses all three of the limitations to VORP that I mentioned in last week’s column. First, WARP is available for players well past VORP’s 1959 limit; second, it includes an adjustment for pitcher leverage; and lastly, WARP integrates a measure of the quality of a player’s defense. The first and the last elements are the ones that make the biggest difference between WARP and VORP as tools for evaluating players.

The DTs measure a player’s fielding performance in terms of fielding runs above a replacement (FRAR), and that contribution is added to a player’s contributions on offense and/or the mound to make up their final WARP score. Unlike VORP, there is no positional adjustment in WARP. Two players who had the same fielding performance-say 10 FRAR-are considered to have made the same defensive contribution, regardless of whether one is a first baseman and the other a shortstop.

As measured in WARP, defense is based on the defensive statistics that have been maintained since the beginning of baseball’s history-assists, putouts, errors, hits allowed on balls in play. This allows the DTs to compare the defensive performance (and, for that matter, the offensive and pitching performances) of players throughout baseball history. So, if you want to compare Albert Pujols‘s defense at first to that of Keith Hernandez or Frank Chance, the data is available to you. The drawback of this approach-as opposed to one that uses play-by-play data-is that mathematical assumptions have to be made about the distribution of defensive opportunities to the various fielders. Sadly, play-by-play data isn’t available for the nineteenth or early twentieth centuries, so the DTs (and WARP) provide historical and more broadly comparative scope, at the expense of utilizing more recent, more detailed data for those seasons we have it for.

Now, most people don’t keep just one screwdriver in their toolbox. Even though VORP and WARP measure some of the same things, they use different processes, different sets of data, and come at the question of measuring performance over replacement from distinctly different perspectives. Each has certain tasks for which it is better suited than the other.

The Replacement DH and the Positional Average Sortable

Reader Richard Hanna chimed in with this question about VORP:

The question is: what is a replacement-level DH? I asked Keith Woolner in a chat why a replacement level DH wasn’t as good a hitter as a replacement level 1B or LF (you can see this by comparing the VORPs of Ortiz and Hafner to similar hitters). His answer was: DHs aren’t as good hitters as you think they are.

This answer completely misses the point of the question. If David Ortiz and Justin Morneau were to put up identical numbers, VORP would value Ortiz higher than Morneau. This is just plain crazy-I don’t know anyone who could argue convincingly that an identical offensive performance from someone who never takes the field has more value to a team. Shouldn’t a replacement-level DH be a replacement level hitter for the whole league, rather than something based only on actual DHs? I understand that the definition of a replacement level player is a statistical construct, but it is a seriously flawed one when it produces these results.

The positional adjustment in VORP isn’t there to account for a player’s relative defensive contribution, it’s there to account for the replacement cost of a player, based on offensive performance. Quickly, we can look at this in the League Batting by Position report:

POS   G       PA   AVG   OBP   SLG
1     21      57  .125  .160  .167
2    320   2,373  .257  .320  .391
3    320   2,521  .270  .346  .449
4    320   2,488  .273  .333  .401
5    320   2,519  .256  .329  .430
6    320   2,507  .264  .319  .382
7    320   2,541  .251  .312  .384
8    320   2,651  .264  .337  .419
9    320   2,574  .269  .347  .449
10   299   2,516  .262  .353  .431
11   186     281  .189  .300  .307
12    99       0  .000  .000  .000

I’ve trimmed down the report results in the chart above, so all we’re looking at is this year’s American League values. One aspect of the Batting by Position reports that throws people off is the use of numbers to represent positions. That’s okay, but it’s derived from the way Retrosheet does things. Positions 1-9 are the same as they would be on a scorecard (i.e. 1 is pitcher, 2 is catcher, 3 is first base, straight through to the right fielder at number 9). After those old standbys, 10 represents designated hitters, 11 players who come in as pinch-hitters, and 12 those who come in as pinch-runners (that’s why they don’t have any plate appearances).

Now, if you compare the rate stats for DH (line 10) and first base (line 3), you’ll see that the first basemen are slugging a bit more, and getting on base a sliver less. If you go back several seasons of the report, you’ll see AL designated hitters perform roughly on the same level as AL first basemen, sometimes a bit worse. As the reader points out, since VORP’s positional adjustment is based on the actual performance of players at the position, this means that a DH with identical performance to a first baseman would have an equal or better VORP.

Is that fair? Not particularly. The offensive performance of the DH position is likely suppressed by the way that the designated hitter slot is used as a repository for older and/or injured players. According to William Burke, one of the many people here at BP who’s smarter than me, the average American League DH is about two years older than the next-oldest position player in the lineup, and three years older than the average first baseman. The performance-based value we place on the replacement DH reflects that reality.

Toolbox will be on vacation next week, but in two weeks, we’ll be back with a look at BP’s reliever metrics.