Prospectus Toolbox: Translating Scales

When it comes to baseball’s numbers, I’m often more interested in how they function as language than anything else. As Joe Sheehan recently pointed out, the 500- and 600-home run milestones don’t, for analytical purposes, have that much value over the numbers 499 and 599, or 501 and 601, for that matter. They’re round numbers that end in double zeroes, and that’s why we paid more attention to Manny Ramirez‘s at-bats this weekend than we did his at-bats the weekend before, and more attention than we’ll pay to his at-bats during the weekend to come.

However, as artifacts of language, both numbers have an unmistakable resonance. Even people who don’t know much about baseball understand that 500 home runs is a standard for being an all-time great slugger. The number communicates more than the concept of “more homers than Lou Gehrig hit” or “only twenty-some odd players have ever hit this many homers in major league history.” As for 600 homers-and this could just be me-does anyone hear that number and think about anything but Willie Mays? For a long time, the home run leaderboard featured only a large handful of men in the 500s, two men who’d broken the 700-homer mark, and Mays sitting all by his lonesome between 600 and 700 homers. Barry Bonds cruised through the neighborhood on the way to his fateful meeting with Hank Aaron last year, and Sammy Sosa now keeps Mays company in the 600-homer range, but when Ken Griffey Jr. finally hits that 600^th homer, the image that will flood some fans’ minds-or maybe just mine-will be a picture of Mays, his Giants uniform flapping in the Candlestick breeze.

Baseball is full of numbers with that sort of power as language. Standards like the .400 batting average and the Mendoza line help us define what is great and what is awful in terms of performance, and experience helps us define the gradations in between. These associations-the connection between a .300 batting average and the concept “good”-anchor the way that we process statistical information.

When we introduce advanced metrics, one of the things that stands in the way of their acceptance is the lack of familiar signposts that connect the metric to language. In the year that this column has run, the most frequent request has been that we establish a scale for our stats, so that new readers can identify the good, the bad, and the ugly more easily. Even though many of the metrics we use here help establish their own benchmarks by being fixed to a league average (the league-average EqA, for example, is always .260) or the replacement level, very few contain easy descriptions of what’s good or bad. Generally I-and most of my brethren-make such calculations by sight, based for example on my experience that a VORP of 50 is very good, and that a VORP of 10 over a full season is sub-par. It seemed odd, however, given the work that we do here, to publish these “gut” estimates, when there should be a more scientific method applied. Each time I tried to do so, though, I wasn’t satisfied with the results, so I put things off.

Recently, a reader question brought this 2007 article to my attention, in which Kevin Goldstein and William Burke work out positional averages, for the purpose of explaining to readers the cost-in terms of expected value-when a prospect is moved from a position on the left side of the defensive spectrum to one on the right side. What Kevin and William did was to take the 30 players who had the most starts at each position over the past three seasons, rank them by OPS, then average out the top third (Good), middle third (Average), and bottom third (Bad), and separately average out the top five at each position (the “Elite”). That way you could see that if a player with the ability to hit for an 853 OPS was moved from catcher to first base, he’d essentially go from being an elite player behind the plate to a league-average player at first.

It occurred to me that you could use the same basic method to construct a scale for performance in any statistic, so I tried it out for starting pitchers’ ERAs. I made a couple of modifications: rather than use individual players’ three-year playing time records, I collected the top 150 player-seasons, by Starters’ Innings Pitched, for each of the past three seasons, then I averaged out the top seasons by ERA, followed by the second-best seasons, etc. Using the same measure of thirds of the population as set out in the Goldstein/Burke method, and establishing the top 25 (rather than top five) as the “Elite” starters, this is what the ERA Scale looks like, along with some additional averaged-out statistics from each group:


             ERA        W      L     IP       BB      K      VORP
Elite       3.14      13.3    7.4   188.6    52.1   156.0    50.9
Good        3.47      12.9    8.2   187.6    54.3   147.3    43.8
Average     4.40      10.5    9.9   170.0    54.4   111.9    21.2
Bad         5.50       7.6   10.6   142.9    53.4    88.9    -0.3
Superbad    6.28       6.9   10.9   135.1    53.9    84.2    -7.9

I tacked on the “Superbad” group (apologies to McLovin) as a reflection of the Elite-the bottom 25 players by ERA. The most interesting part of this scale is the “Average,” which is pretty close to the league’s average ERA from 2005-2007 (4.42). Also, just looking at 2007, you’ll find just eight ERA title qualifiers (Jake Peavy, John Lackey, Brandon Webb, Brad Penny, Fausto Carmona, Danny Haren, John Smoltz, and Chris Young) would fall under that mark, a fairly elite group. So things seemed to work out well enough to try to scale out a more sophisticated statistic, Support Neutral Value Added Lineup-adjusted above Replacement (SNLVAR):


           SNLVAR       W      L       IP       BB      K      ERA   VORP
Elite       6.43      14.9    8.8     213.1    59.3   167.4   3.34   53.8
Good        5.61      13.8    9.2     203.2    58.5   157.1   3.59   45.2
Average     3.23      10.3    9.9     167.7    54.9   110.3   4.46   19.8
Bad         1.18       6.9    9.5     129.6    48.7    80.8   5.45   -0.1
Superbad    0.58       6.2   10.1     124.9    48.9    76.6   5.85   -6.7

I have some concerns about this batch. The Average ERA is still in the right range, but the relatively relaxed Superbad ERA makes me wonder if that group was catching pitchers who weren’t exactly horrible, but who just didn’t get enough playing time. Overall, however, I’m happy using this method-but it certainly isn’t the only way to go about this task, nor is it necessarily the best. So I’m going to throw this out to the insanely smart crowd this space draws: how would you improve upon the method? Does it provide what you want to see?

Next time around, we’ll be back with reliever stats, a few alternate methods of setting out the scale, and, hopefully, your feedback.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Prospectus Toolbox: Translating Scales

Thank you for reading

Latest Articles

The Stash List ’24: Week Four $

Box Score Banter: No Exit B

MLU: Triantos Tries on Some Power $

Speed, Spin, and Snap $

Pat Murphy, Wade Miley, and the Ship of Theseus $

Derek Jacques

Latest Articles

The Stash List ’24: Week Four $

Box Score Banter: No Exit B

MLU: Triantos Tries on Some Power $