I was flipping through Bill James’ 1987 Baseball Abstract the other day, because that’s the sort of thing that passes for fun in my house, and came across this little nugget:
…I wrote a program to compare the 1986 rookies with those in the 1948-1975 database so as to find the ten most comparable players to each of the 1986 rookies… One way to project what a player will do is to find similar players in the past and look at what they have done… the ten comparable players will subsequently diverge… but it gives us the range of normal expectation for a player of this type.
For the crop of significant position players that were rookies in 1986 (a total of 18) James estimates their career totals based on the weighted average of the career performances of their 10 most similar players. In other words, the most similar career would be counted more heavily than the 10th most similar career, with various gradations between those two endpoints.
Inasmuch as we can predict the future from the past, and acknowledging that every individual is unique, it’s a fun toy that sometimes yields startlingly accurate results. To use a fairly trivial example, here’s how James’ estimate for Andy Allanson compares with Allanson’s actual career:
G
AB
BA
Allanson (James)
518
1415
.243
Allanson (actual)
512
1486
.240
Of course, it doesn’t always work out. Barry Bonds, for example, ends up with 122 homers according to the method. (James notes that there were no truly similar players to Bonds and that such a low home run total would be a “mighty disappointment.”)
Anyway, James runs through this exercise for several rookies. It’s pretty hit-or-miss, but check out the two American League right fielders, Danny Tartabull and Ruben Sierra:
G
BA
HR
RBI
Tartabull (James)
1418
.269
200
742
Tartabull (actual)
1406
.273
262
925
Sierra (James)
1922
.280
271
1026
Sierra (actual)
2186
.268
306
1322
With just one season’s worth of data, those aren’t bad guesses at all. For grins, here’s how the entire crop turned out (James doesn’t provide all numbers for all players):
James
Actual
G
BA
HR
RBI
G
BA
HR
RBI
Andy Allanson
518
.243
-
-
512
.240
-
-
Barry Bonds
1111
-
122
470
2986
.298
762
1996
John Cangelosi
848
.247
19
-
1038
.250
12
134
Jose Canseco
1825
.262
290
998
1887
.266
462
1407
Will Clark
1399
.274
165
658
1976
.303
284
1205
Andres Galarraga
1085
.265
107
-
2257
.288
399
1425
Pete Incaviglia
1823
.265
282
982
1284
.246
206
655
Wally Joyner
897
.262
89
-
2033
.289
204
1106
John Kruk
713
.273
41
247
1200
.300
100
592
Mike LaValliere
632
.240
25
166
879
.268
18
294
Steve Lombardozzi
900
.246
37
259
446
.233
20
107
Kevin Mitchell
1269
.266
168
654
1223
.284
234
760
Bip Roberts
1002
.258
-
-
1202
.294
30
352
Ruben Sierra
1922
.280
271
1026
2186
.268
306
1322
Cory Snyder
1254
.277
189
-
1068
.247
149
488
Kurt Stillwell
523
.241
10
-
998
.249
34
310
Danny Tartabull
1418
.269
200
742
1406
.273
262
925
Robby Thompson
928
.260
35
-
1304
.257
119
458
Well, that was fun. You know, if you like that sort of thing.
That was great. Thanks. Other than the Bonds mis-fire, I'd say he was amazingly prescient. The most obvious divergence that the hitters consistently hit more HRs than predicted, and there's no way James could have predicted the offensive environment of the 90s (whether as a result of PEDs or whatever).
There's also the fact that any comparison of multiple similar players would tend towards "average," even for all-time greats. What 10 comps could he have that would end up with an expected HR total over 700?
Had Ruben Sierra not hung around so long after he was clearly past his prime, his average probably would have been a bit higher, and he would have ended up with fewer HRs...closer still to James' projection!
James had an interesting comment about Sierra in his 1988 Abstract. Sierra apparently idolized Roberto Clemente and modeled his game after Clemente. That struck James as odd since Sierra was just seven when Clemente died in 1972. He seemed to be questioning Sierra's stated birth date, the first time I can recall hearing such skepticism. In any event, Sierra was a player whose best years came when he was quite young.
I idolized Bill Buckner was seven thougjh all I have to show for it is bad knees. Anyway, I don't think 7 is too young to have a favorite baseball player, especially for a Latin (and maybe from the same country) like Sierra.
Bill's program must not have understood that Bonds was the clear cream of the crop.