Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

This time of year is a busy stretch if you're a Hall of Fame buff, or at least this particular Hall of Fame buff. The 2012 BBWAA ballot was released on Wednesday, adding 13 new candidates to the 14 holdovers from last year's ballot. I'll start digging into the details of those candidacies starting at some point late next week. Meanwhile, the vote on the Golden Era candidates will take place at the Winter Meetings in Dallas this coming Monday, December 5; alas, I think I’m actually going to be in the air when the results are announced, but I’ll weigh in upon arrival. Earlier this week, I had the opportunity to discuss some of the Golden Era candidates on television as part of my debut appearance on MLB Network's new show, “Clubhouse Confidential.” It wasn't my first time on TV, but I believe it was my first time discussing JAWS in that medium. Explaining the system concisely AND discussing the merits of a handful of candidates in a four-minute span was certainly a challenge, but host Brian Kenny and his producers seemed quite pleased with the segment, and there’s reason to believe that it won't be the last time I appear on the show.

Here is a clip of the appearance, if you didn’t get to see it:

Between trying to boil down the system to its essential talking points and continuing a discussion of the top candidates on the Golden Era ballot, I've been thinking about a few issues pertaining to JAWS and wondering if it isn't time for a tweak or two to the system. First off, it's important to recognize that the system just underwent a seismic shift, in that this year's data set marks the first one using Colin Wyers' formulation of WARP instead of Clay Davenport's. Higher replacement levels and different methods of measuring offensive, defensive, and pitching value have shaken up the standings of some candidates relative to the standards, which have shifted as well—after all, they're averages of individual player values. In general, the WARP values for most players are lower, and in some cases very different from what previous iterations or various competing systems have told us.

The changes themselves were a topic for considerable discussion in the comment thread of my Golden Era piece. One loyal reader voiced concern over the underlying changes to WARP and the way that they rendered previous work—not only on JAWS but with respect to a whole lot of other BP studies—outdated, asking us to rename the new methodology to avoid confusion and to maintain the old version as well. This is impractical. Consider how dismissive certain members of the mainstream sports media are of WAR because golly, there are two popular formulations of it out there. Evan Grant, the Dallas Morning News writer who gave Michael Young his sole first place vote in the 2011 AL MVP race, explained his vote by noting, "When somebody can quickly explain the complexities of the concept and standardize the WAR formula, I’ll spend more time with it. In the meantime, I’ll go with what my eyes told me."

After you're done rolling your eyes at that one, consider that here at BP, we've spent the past year retiring or retooling some overlapping metrics so that, say, VORP and RARP aren't telling us two different things when they not only should be saying the same thing, they should be the same thing. No attempt to expand the audience for sabermetrics is going to convince many people if we have to resort to explaining, "This is the original Port Huron WARP, man, not the watered-down second draft." We might as well retire to our bathrobes and drink White Russians, a huge problem because the mere thought of drinking milk makes me gag.

Most of us do not love change, because it decreases our comfort level, threatens our understanding of the world, and forces us to act in new ways and digest new information. I don't love everything about the new WARP relative to the old, don't love change in general, but I do think that the new system's improvements are very worthwhile. Colin has incorporated baserunning into WARP, as well as play-by-play defense going back to 1950. He has attempted to tackle some age-old problems that smart people had with WARP, things like the counterintuitive calculations involved in Equivalent Average/True Average, or the assumption that replacement-level players were both replacement-level hitters and replacement-level fielders, when further study has suggested that such hitters are generally average defensively. I'm paraphrasing a more detailed exchange with Colin on the topic here, but formerly, we were measuring each pitcher against a replacement-level pitcher backed by replacement-level fielders, when instead we should have been measuring them relative to a replacement-level pitcher backed by average fielders. The new WARP also measures starters and relievers against different baselines based upon the fact that that the historical record strongly suggests that replacement relievers are better pitchers than replacement starters.

Furthermore, it's important to remember that the "old WARP" of yesteryear was hardly static. It underwent countless revisions without Clay calling a lot of attention to them; if I wanted to do a JAWS study in July, I'd have to get an updated data set because January's was out of date. Some of the revisions were minor, some were jarring; a given set might show that 19th century first basemen like Roger Connor and Dan Brouthers had climbed the first-base rankings significantly, but they'd fall back down to a more typical ranking with a subsequent batch.

Sabermetrics is the pursuit of objective truth about baseball. If our understanding of the truth changes—"Hey, we've been underestimating where the replacement-level line should be set" or, "Pitchers deserve a bit less credit than we have been giving them for controlling balls in play"—we owe it to ourselves and our audience to review our previously-held assumptions and revise our thinking, without worrying too hard about what information we've superseded. At any given moment, our WARP numbers represent our systematic best estimates of value, but they're still just estimates, not permanent figures carved in stone. We make a grave mistake when we think we've found the answer once and for all. As the PITCHf/x and HITf/x stuff that Mike Fast is doing revises our thinking about the nature of DIPS, you can bet that we'll find a way to incorporate that, and likewise with his brilliant catcher framing study. Sooner or later, someone may come along and solve another defensive quandary—maybe it's the adjacent fielder ballhog effect—that will make our current assumptions look dated and naive.

Enough inside baseball, at least on that front. One thing that several people pointed out with regards to Santo's case is that the standards that ran in last week's piece showed third base to have the second-highest JAWS of any position, behind only center field:

Position

#

JAWS

+/-

1B

18

52.6

-2.1

2B

19

55.6

0.9

3B

11

59.6

4.9

SS

21

51.7

-3.0

LF

20

55.2

0.5

CF

18

61.8

7.1

RF

23

55.1

0.4

C

13

45.2

-9.5

Average

143

54.7

0.0

The last column is the gap between the position in question and the overall average; leaving catchers out of the equation for the moment, the spread is about 10 points from center field to shortstop. This is not a new phenomenon, though the identity of the particular outliers is. Here's what the previous set of standards looked like:

Position

JAWS

+/-

1B

53.5

-4.4

2B

67.3

9.4

3B

59.5

1.6

SS

59.0

1.1

LF

53.7

-4.2

CF

56.1

-1.8

RF

61.2

3.3

C

50.8

-7.1

Average

57.9

 

Again excluding catchers, the spread here is even wider, almost 14 points between second base and first base, though third base wasn't the outlier at that point. Santo was at 62.4 in the old set, 2.9 points above the third-base standard, so that particular point wasn't germane to the discussion the last time I reviewed his candidacy. Now it is, given a JAWS of 58.2, a mark that's 1.4 points below the third-base standard, but 3.5 points higher than the average hitter.

An aside: I've hidden the career and peak columns for the purposes of this demonstration, but as I argued the other day, the fact that Santo has a peak higher than the standard while being short on career means he still has a strong case. Had he hung around two years and squeezed out 3.0 WARP—enough to put him over the JAWS standard—it wouldn't have mattered much either to his teams or to our notion of his greatness. The same is true for certain other candidates, Minnie Minoso being another one. Careers that ended too early due to injury or otherwise come up short due to time missed via military service or the color line—those are reasons why peak is an important facet of a Hall of Fame argument in the first place, and it’s why I generally use the career/peak/JAWS triumvirate for a three-dimensional picture of a given candidacy.

Another point to be made about Santo is that there are only 11 third basemen in the Hall, and that's if you count Paul Molitor, as I do. Molitor played 791 of his 2,683 games there, and another 644 elsewhere in the infield, compared to 1,171 at DH; his defense (22 FRAA in our current build) had real value, and since there are no "pure" designated hitters in the Hall with which to group him, I've argued that he belongs there. Meanwhile, there are an average of 20 players at the other defensive positions besides catcher, ranging from 18 at first base and center field to 23 in right. What we're measuring Santo against is a small sample size, even relative to the other small sample sizes.

That's one existing problem with the system. Another is the way that the various position rankings are generally top-heavy; of the 143 Hall of Fame hitters for whom we can do JAWS, just 58 (40.6 percent) clear the standard. Conceptually, this doesn't mean that I advocate the ouster of more than half of the Hall of Fame hitters as unworthy; this isn't Lake Wobegon, where all of the children are above average. JAWS is meant to spotlight above-average candidates as a means of improving the Hall of Fame.

Anyway, the lowest percentage of Hall of Famers above the JAWS standard at their position is in right field, where just eight out of 23 are above the bar:

Player

JAWS

Babe Ruth

133.4

Hank Aaron

104.5

Frank Robinson

86.5

Mel Ott

75.6

Al Kaline

71.5

Roberto Clemente

67.7

Reggie Jackson

62.8

Dave Winfield

55.8

Avg HOF RF

55.1

Sam Crawford

51.3

Paul Waner

49.2

Tony Gwynn

48.1

King Kelly

47.0

Harry Heilmann

46.9

Elmer Flick

43.2

Enos Slaughter

42.6

Chuck Klein

39.2

Sam Thompson

36.6

Willie Keeler

36.6

Kiki Cuyler

34.3

Sam Rice

28.7

Harry Hooper

27.7

Ross Youngs

22.2

Tommy McCarthy

20.6

Not surprisingly, all eight of those players above the standard are BBWAA choices, while only four of the 15 players below the standard—Waner, Gwynn, Heilman and Keeler—were elected via that route (and yes, I am surprised that Gwynn has fallen). The rest came via one iteration of the Veterans Committee, or as it was known from 1939-1949, the Old-Timers Committee. In the current incarnation of JAWS, the lowest score, that of McCarthy, is dropped and then the rest are averaged.

Would removing that dropped player at each position even things out with regards to the spread of standards between positions? Yes, but only a little; the distance between the center field and shortstop standards drops from 10.1 to 9.3 points. Incidentally, this only bumps three more players above the standard at their given positions, one at third base, one at shortstop, and one at catcher.

Over the years, it has been suggested by multiple readers that I use the median score at each position instead of the modified mean. I wrote about this back in 2007, incidentally, in the context of the case of the player who happens to be the 2012 BBWAA ballot's top newcomer, Bernie Williams. A reader argued that because the scores at each position were not normally distributed, and the populations of each position were small, it was inappropriate to use the mean. In his view, the median presented a better alternative; by definition, half of the players at a given position would be above average.

I considered that suggestion, but ultimately rejected it, because one consequence of it is that it lowers the standard scores too drastically—10 points, in the case of center fielders in the current data set—to the point that a given BBWAA ballot of 30-someodd candidates might have well over 10 flagged as above the median at their position. Voters can only list 10 players on their ballots, and aside from one crackpot with a ballot that I came across years ago, no voter or credible observer of the process has suggested that at a given time there are too many Hallworthy players to vote for. When the next two classes (Barry Bonds, Roger Clemens, Mike Piazza, Sammy Sosa, Craig Biggio, and Curt Schilling in 2013, and Frank Thomas, Tom Glavine, Mike Mussina, Jeff Kent, and Jim Edmonds in 2014) arrives, that may be the case anyway, but that’s a problem for the coming years, and relaxing the standards so drastically won’t help. Switching to the median doesn't do much to decrease the spread between the highest standard and the lowest non-catcher standard, either. It would still be 9.3 points.

Ruminating on this for the past week, I've come up with an idea that doesn't entirely solve the problem but reduces it: regress the standard to the average Hall of Fame hitter. Now, this isn't rigorously scientific, in that I don't have any empirical data to tell me how heavily to weight the average to get to a valid sample size. Plonking around for a couple of hours, I settled on a sample size of 23, the existing maximum at any position, for each non-catcher position, meaning that if there are 20 left fielders in the Hall, the standard would be calculated as 20 left fielders plus 3 average Hall hitters divided by 23, or 11 third basemen plus 12 average Hall hitters divided by 23. If I do that, first restoring that lowest-ranked player at each position to the data set to simplify the process, this is where the standards end up:

Pos

#

GE

WS

Change

1B

18

52.6

51.4

-1.3

2B

19

55.6

53.8

-1.8

3B

11

59.6

54.8

-4.8

SS

21

51.7

50.7

-1.0

LF

20

55.2

53.5

-1.7

CF

18

61.8

58.3

-3.5

RF

23

55.1

53.6

-1.5

C

13

45.2

43.7

-1.5

Avg

143

54.7

52.9

-1.8

GE is the set I unveiled last week for the Golden Era ballot. WS is the new weighted set, with (23 – n) average Hall of Fame hitters thrown in. I did that for the catchers as well, taking 85 percent of the average at the other positions to preserve the existing ratio between catcher JAWS and the overall position player JAWS. Note that I’m not doing anything to pitcher JAWS at this point.

The spread between the center fielders and the shortstops is just 7.6 points, down from 10.1; the standard deviation of the various positions falls from 3.3 to 2.3. Again, this only bumps three existing Hall of Famers back above the standards at their position; for some reason, there seems to be a donut hole near the center of the rankings at most positions. Consider the center fielders:

Player

JAWS

Willie Mays

117.3

Ty Cobb

112.1

Tris Speaker

101.2

Mickey Mantle

92.7

Joe Dimaggio

69.3

Richie Ashburn

66.1

Avg HOF CF (GE)

61.8

Duke Snider

61.2

Avg HOF CF (WS)

58.3

Billy Hamilton

59.1

Larry Doby

49.8

Andre Dawson

49.5

Earl Averill

48.1

Max Carey

43.4

Kirby Puckett

42.3

Hugh Duffy

39.7

Hack Wilson

36.2

Earle Combs

32.8

Edd Roush

30.1

Lloyd Waner

25.3

I don't think this perfectly solves the multiple issues I've cited, but it does even out the terrain, setting a more reasonable bar for third base and center field, two of the least represented but highest-JAWS positions. Note that I'm actually doing this process for the career and peak scores underneath the hood; this is where the new standards would land:

Pos

Career

Peak

JAWS

1B

61.7

41.0

51.4

2B

64.6

43.0

53.8

3B

66.4

43.2

54.8

SS

60.9

40.4

50.7

LF

65.0

42.0

53.5

CF

70.9

45.7

58.3

RF

66.2

40.9

53.6

C

52.9

34.6

43.7

Avg

64.1

41.7

52.9

Relative to the just-reviewed slate of Golden Era candidates, this lands both Santo (66.1/50.3/58.2) and Minoso (61.4/46.1/53.8), two candidates whose peak scores are significantly above average but whose JAWS scores fall a few hairs short, above the bar. That's where I had already concluded they belonged anyway, after considering the extenuating circumstances surrounding their careers and the odd lumps under the standards rug. The above set doesn't alter my conclusions about any of the other Golden Era candidates, none of whom had JAWS scores above 49.9. It may carry ramifications with regards to holdover candidates on the BBWAA ballot, though not to the extent as those wrought by new WARP changes in the first place. Word of warning: Not all of our pet candidates fare so well in the move.

 This is the first major change I’ve made to JAWS since the 2006 ballot, when I redefined peak as best seven seasons instead of best five consecutive seasons. Before I go ahead and lock this methodology in for the new cycle, I'd like to mull the change over a bit longer, and consider the feedback I get from colleagues and readers. This article, and my attempt to smooth out some of the rougher edges in JAWS, is a response to previous feedback. If you have a strong opinion on the matter, I'd welcome your comments.  

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
KBarth
12/02
You wrote: "WS is the new weighted set, with (30 - n) average Hall of Fame hitters thrown in."

Did you mean (23 - n)?

Your idea is interesting and I think I see why you are doing it. It pretty clearly pushes all the JAWS norms toward the average hitter and as long as that's what you *want* to do this will certainly accomplish it. I'm not sure I buy your rationale for pushing all the positional norms toward the mean, but for now I am thinking of it as an interesting exercise that is not yet finalized.

Fun stuff. Thanks!
jjaffe
12/02
Yup, I meant 23 - n. Fixed.
marshaja
12/02
I like the changes. It takes a little bit of position bias out of it in calculating where players fall. 3B should not be punished because the BBWAA has shunned their position only letting in the all time greats while letting in less than stellar 1B.
BurrRutledge
12/02
Thanks, Jay! My first take is that I like this tweak. I am going to mull this over more carefully, and I'll offer any thoughts later.

In the meantime, I did notice that in the list of CF, unless there's a typo in the scores, then Billy Hamilton should be listed above the weighted mean.

Thanks also for actively seeking to improve the (already great) JAWS system. Complacence is for the weak!
jjaffe
12/02
Gah, typo on Hamilton. It's almost like updating my table manually while editing the piece at midnight wasn't entirely foolproof.
BurrRutledge
12/02
Another quick observation with regard to the "median" line, with the HOF CF list:

With the weighted set, and if Hamilton is above the new line, then there are 8 players above and 10 below.

Previously, there were 6 above and 12 below.

Would be interesting to see how the WS relates to the median line at the other positions...
KJOKBASEBALL
12/02
This is a little nit-picky, but it's also an important concept:

"..the historical record strongly suggests that replacement relievers are better pitchers than replacement starters."

It's actually that the record suggests replacement level relievers PERFORM better than replacement level starters, not that they ARE better talent-wise, as the whole idea is to set the replacment level higher for relievers so that replacement level relievers EQUAL replacement level starters (ie they have the same WARP of 0).

jjaffe
12/02
Agreed. Sloppy phrasing preserved from a fairly informal conversation on the topic.
denny187
12/02
Explain why it is better to compare a shortstop to HOF shortstops, a right fielder to HOF right fielders, a first baseman to HOF first basemen, etc. Why not just compare all players to the WARP of the average hall of famer? WARP already adjusts for position so it seems weird to use it to only compare like-positions instead of using it to compare all players.
jjaffe
12/02
Talent is not distributed evenly across positions, and skills certainly are not. While WARP compresses the offensive and defensive contributions of a hitter into one handy number, the offensive and defensive components of what make that up are still important, and I often make reference to them in the course of my analysis. I also make extensive references to things that JAWS doesn't capture - awards and counting stats. If I'm examining the Hall of Fame cases of Barry Larkin or Alan Trammell, it doesn't help me so much to measure them against Eddie Murray as it does to compare them to Cal Ripken.
JimmyJack
12/02
I appreciate your rethinking and retooling on this. The hardest part for me to get my brain around (regardless of the Davenport/Wyers incarnations) has been defense. I have never felt comfortable with them as they can show variance from year to year with players. I understand that is possible, it just seems defense would be the most consistent year-to-year part of a players game. What am I missing?
jjaffe
12/02
I highly suggest you dig back into Colin's archive and read http://www.baseballprospectus.com/article.php?articleid=11476 and
http://www.baseballprospectus.com/article.php?articleid=11589 where he hits some of the big stumbling blocks that one encounters when trying to measure defense. Long story short: a single year isn't enough to reliably measure defense, so take any yearly number with a grain of salt

But remember that over time, the larger sample size should build confidence in what's being measured, at least if the methodology is sound. When it comes to measuring the Hall of Fame candidates' defense, we generally have 10-20 years worth of data, and while multiple sources may disagree as to the value of those years, they're hopefully pointing in the same direction.



gpurcell
12/03
Like you I tend to suspicion towards defensive metrics, but its mainly an issue for me because they aren't great predictors (e.g., they take a long time to stabilize, long enough that skill can degrade). If you are looking at them to establish as a historical statistic how good a guy was on average during his career, and you aren't concerned with prediction, I think they are reasonable.
fgreenagel2
12/02
Very cool that you are on TV. Your multi-media assault begins. I'm looking forward to a time when BBWAA writers cite JAWS on a regular basis.

Raines. Please.
newsense
12/03
Is it wrong to think that HOF voters have been biased toward positions (i.e., 3rd base vs. RF) and there ought be approximately the same number of HOF players at each position? If so, a more elegant solution would be to set the JAWS standard at the 15th (or 12th or 20th) best JAWS score at that position (regardless of whether the player is in the HOF, especially since the performance of ineligible or not yet eligible players is relevant to the discussion)?
jjaffe
12/03
Using a given ranking at a position produces a set of standards that's no less uneven than what the raw JAWS standards produce. It's more artificial as well. JAWS is less an attempt to define an ideal Hall of Fame - which I would do if I were advocating starting from scratch, a la the Hall of Merit - than it is to recognize what's in line with the Hall of Fame as it exists.

gpurcell
12/03
I like it. The only beef I have is with including ANY of the clearly ridiculous 20 or 30 WARP guys in the average calculation. I'd be more in favor of a "mistake rule", just accept that they are in as historical curiosities, and then remove them from the analysis.
aNonBiasAttempt
12/05
I trust anyone with a mustache like that!
AWBenkert
6/10
With the mustache and those sideburns, Jaffe looks like he just arrived via a time warp from the 70s. All that's missing is a leisure suit!