“The present is never our goal: the past and present are our means: the future alone is our goal. Thus, we never live but we hope to live; and always hoping to be happy, it is inevitable that we will never be so.”
–Blaise Pascal

Well, it’s that time of year again. The season that was is past, and plans are being laid for the season to come. From a performance analyst’s perspective this can only mean that a part of our attention turns to player forecasts, as Joe Sheehan‘s recap of “ShandlerFest” reminded me. I also received The Bill James Handbook this week, which has a “Hitter Projections” chapter written by James himself. In that chapter, we find a list of projections that went very well for Baseball Info Solutions (Brad Ausmus, Rob Mackowiak, Jeff Conine, Pedro Feliz, Carlos Beltran, and Adrian Beltre) as well as those that didn’t (Carlos Quentin, Carlos Pena, Prince Fielder, B.J. Upton, Marcus Giles, and Richie Sexson). In thinking about cases like Quentin (a bust for the Diamondbacks) and Pena (a boom for the Devil Rays) I started to wonder about the biggest booms and busts of all time in terms of a reasonable projection of performance. Which teams were most let down by a player’s performance, and which were most surprised, assuming of course that they had reasonable expectations in mind? (I grant you, that’s a huge and demonstrably false assumption that bursts the entire bubble, but play along).

Projections on the Cheap

We’ll start by examining almost 17,000 batter seasons from 1903 through 2006 in an effort to sort out the booms and busts. The first task here is to create some “projections.” I say “projections” because the projections we’re creating are for seasons that have already occurred, so there really is no future aspect to this. We can estimate a performance level for a given season and league based on a player’s performance prior to the season in question. Since we’re not trying to replicate something as complex as PECOTA or even the BIS projections, we’ll base ours on normalized park-adjusted OPS (NOPS/PF). I’ve used this measure in the past to boil offensive production down into a single number on a rate basis, where 100 is a perfectly league average OPS when adjusted for the home park of the player. Although very simple to calculate, OPS is a great proxy for run production and matches up very well with more complex run estimators, including BaseRuns and Runs Created.

So the methodology for creating the projections goes like this:

  1. Select all players with 300 or more plate appearances for a major league team from 1903 through 2006, excluding the Federal League.
  2. Calculate the NOPS/PF, plate appearances, and age for the set of players from step one. This is the actual performance against which we’ll measure our projection.
  3. Weight the previous three years’ NOPS/PF using a system where the previous year is weighted at seven, year minus two is weighted at four, and year minus three gets a two.
  4. Take the value from step three and regress it to the mean by calculating a weighted average of the plate appearances for the previous three years and combining it with enough league-average plate appearances to total 2,000. In other words, if a player had accumulated 2,000 weighted plate appearances in the previous three years, his calculated NOPS/PF from step three would not be regressed at all. However, if he had, for example, a weighted average number of plate appearances of 312 (as Tadahito Iguchi did coming into the 2006 season) in his previous three seasons, his NOPS/PF would be regressed to the mean with 1,064 additional league average plate appearances, since (312*3)+1064 = 2000.
  5. Given the NOPS/PF calculated in step four, apply an aging adjustment. This curve is similar to the kinds of aging curves that Nate Silver has talked about in the past, although I adjusted it somewhat to be less severe, since I’m pre-selecting players with 300 or more plate appearances, and in this population declines are not so precipitous.
  6. Finally, apply a league adjustment using the Level Indexes that I discussed in a previous column. This allows us to account both for players who switch leagues and for the natural decline that happens as the league improves over time. In the former adjustment, for example, both Frank Howard in 1965 and Frank Robinson in 1966 have their projections increased slightly by moving from the National to the American League.

We end up with 16,900 projected NOPS/PF values for the actual NOPS/PF values, with a correlation coefficient of r=0.64. When compared with how other projection systems perform, that doesn’t seem bad at all; in fairness, our correlation should be pretty good since we’re using a backwards-looking approach and have selected only players who accumulated 300 or more plate appearances in the season we’re trying to project. In short, we have a selection bias in play. Players who suffered injuries that wiped out their seasons, or who retired, or who simply performed so poorly that they didn’t even garner the minimum playing time for what could be considered a regular player are all excluded from the projections. In 2007 terms, think Nick Johnson.

On the other hand, we haven’t set minimums on the number of plate appearances for previous seasons; even if a player has one plate appearance in the previous three seasons, we’ll create a projection, as we did for Hanley Ramirez in 2006 (who not surprisingly came in with a projection of 100). When we remove all players who had fewer than 100 projected plate appearances, the correlation coefficient goes up to 0.66.

To get a feel for the actual shape of the distribution, the following chart shows the histogram of the differences between the actual and projected NOPS/PF values, along with the cumulative frequency (the orange line).

chart 1

From this you can see that there are almost an equal number of projections that fall on either side of zero, with the average difference being -.25 points of OPS. Altogether, 6,630–or 39 percent–of the projections fall within five points.

As an aside, one of the paragraphs that caught my eye in The Handbook chapter on projections was this one relating to the nature of projections:

We project, basically, that every player will continue to do in the future whatever he has done in the past. If a player has hit .250 in the past, we project that he will continue to hit .250. If he has hit .350 in the past, we project that he will continue to hit .350. If he hit .350 in 2006 and .250 in 2007, we project that he will hit .300 in 2008. We’re pretty close to right most of the time, because most players in any season will continue to do about what they have done in the past.

I wholeheartedly agree with the last sentence; after all, past performance is what we use to project the future. But still, I may be reading the previous sentences too literally. I found that when doing these projections it was essential to employ regression to the mean and specifically not assume that a .350 hitter one year (or a player with an NOPS/PF of say 135) was necessarily going to be a .350 hitter the next. As has been shown time and time again, it turns out that when a player hits .350–which is near the far right end of the distribution for a major leaguer–it is much more likely that he’ll hit something less than that the next season, simply because staying out on that tail is extremely difficult and more likely achieved with a little push from lady luck. In any case, of steps four through six discussed above, the regression to the mean in step four had the greatest impact on improving the correlation coefficient, moving it from 0.53 to very nearly its final value.

As you might expect, there were a few players that the system nailed both in terms of projected NOPS/PF and in plate appearances, some of whom are shown in the following table:

                                                  Actual                  Projected
Year    Name              Team   Lg     Age     PA  NOPS/PF      G      PA  NOPS/PF
1960    Mickey Mantle      NYA   AL      28     644     138     153     641     138
1966    Bill White         PHI   NL      32     659     116     159     657     116
1969    Ron Hunt           SFN   NL      28     569     103     128     569     103
1976    Carl Yastrzemski   BOS   AL      36     636     110     155     636     110
1978    Oscar Gamble       SDN   NL      28     437     114     126     433     114
1985    Dwayne Murphy      OAK   AL      30     619     105     152     617     105
1987    Randy Bush         MIN   AL      28     349      99     122     352      99
1991    Barry Bonds        PIT   NL      26     634     135     153     637     135
1991    Kent Hrbek         MIN   AL      31     534     112     132     534     112
1992    Matt Nokes         NYA   AL      28     430     102     121     427     102
2001    Jeromy Burnitz     MIL   NL      32     651     111     154     654     111
2001    John Vander Wal    PIT   NL      35     360     110      97     360     110

There were many more (over 600) that projected the NOPS/PF exactly but not the exact number of plate appearances, and around 300 that had the correct number of plate appearances. However, the players in the previous table aren’t the interesting ones, nor are they the focus of this column: let’s focus on the booms and busts.

A Question of Measures

Before we actually come up with the top ten booms and busts, we need to determine what we should use to rank them. At first glance it would seem the simplest thing to do would be to take the difference between the actual NOPS/PF and the projected NOPS/PF. If we do that, we’ll end up with some extreme differences. However, that won’t necessarily give us the biggest differences in terms of the expected performance relative to our projection. So instead, we’ll measure the percentage by which the projection missed, and go from there. Without further ado, here are the top ten booms and busts:

                                            Actual                Projected
Year  Name             Team   Lg    Age      PA NOPS/PF     G      PA NOPS/PF PctDiff
2003  Javy Lopez        ATL   NL     32     495     144   129     436      95   50.8
1961  Norm Cash         DET   AL     26     672     157   159     271     108   45.0
1904  Mike Grady        SLN   NL     34     363     140   101      58      97   43.6
1911  Joe Jackson       CLE   AL     21     641     153   147      55     107   42.8
1989  Lonnie Smith      ATL   NL     33     577     138   134     215      97   42.3
1995  Mark McGwire      OAK   AL     31     422     153   104     213     108   42.2
1947  Harry Walker      PHI   NL     30     569     133   130     207      94   42.1
1905  Frank Isbell      CHA   AL     29     389     131    94     454      93   40.5
1980  Mike Easler       PIT   NL     29     445     140   132      36     100   40.0
1954  Ted Williams      BOS   AL     35     526     155   117     166     111   39.1
1974  Dave Roberts      SDN   NL     23     358      73   113     407     109  -33.0
1985  George Wright     TEX   AL     26     393      66   109     519      99  -33.3
1984  Houston Jimenez   MIN   AL     26     317      65   108      49      98  -33.9
1995  Pat Listach       MIL   AL     27     369      66   101     254     102  -35.3
1979  Mario Mendoza     SEA   AL     28     401      62   148      74      97  -36.0
1987  Angel Salazar     KCA   AL     25     332      60   116     196      95  -36.8
2000  Homer Bush        TOR   AL     27     325      65    76     307     103  -37.2
1977  Doug Flynn        NYN   NL     26     300      63    90     170     100  -37.3
1909  Bill Bergen       BRO   NL     31     372      53   112     273      85  -37.4
1968  George Scott      BOS   AL     24     387      73   124     554     119  -38.5

The biggest boom of the past 103 years turns out to be Javy Lopez in 2003. The reason this ranks at the top is that Lopez was entering his age-32 season, and having been a regular for nine seasons had never eclipsed a .317 average (1999), 34 home runs (1998), 28 doubles (1997), or 106 RBI (1998). As a result, his projection came in at an NOPS/PF of 95, actually below league average, and heavily weighted on his 2002 campaign, where he hit .233/.299/.372 in almost 400 plate appearances; in 2003 he wound up at 144, a 51 percent trouncing of the projection. In real terms his projected NOPS/PF of 95 would have translated into 51 runs contributed, but his actual performance was more like 88 runs, a difference of 37.

A few notes on the rest of the booms:

  • Norm Cash, in just his second full season, put up a 157 when projected to be around 108 based on a solid 1960 rookie season. The fact that he would never come close to matching that 1961 performance makes it fitting that this age-26 season feat would be seen as an outlier even at the outset of his career.
  • At the age of 21, Shoeless Joe Jackson hit .408 after not collecting more than 75 at-bats in any of his previous three seasons. As a result, his projection of 107 was heavily regressed to the mean, despite a .387/.446/.587 line in 1910 in 20 games played for the Indians. In real terms his NOPS/PF of 153 was good for 124 runs, when his projection called for a mere seven. To a lesser extent, the same thing can be said about the “Hit Man,” Mike Easler, who in his age-29 season hit .338/.396/.583 in just over 400 plate appearances; he had never been given a real shot in six previous seasons spent with the Astros and Pirates.
  • In the only year of his career where he hit more than nine home runs, Lonnie Smith smacked 21 in 1989 and put up a .315/.415/.533 performance for the Braves. Although a fine player for the Phillies and Cardinals earlier in the decade, he became involved in the scandal of that era which diminished his performance, eventually leading to his release by the Royals in 1987. Teetering on the brink he was offered a minor league contract during spring training in 1988, and played well enough at Richmond to get some at-bats that season, and a look in 1989. His below league-average projection is largely based on those 1987 and 1988 seasons. The rest, as they say, is history, including his Comeback Player of the Year Award.
  • Mark McGwire came back from two injury-plagued seasons in 1995 at the age of 31 to beat the projection by 42 percent.
  • In tenth place, Ted Williams’ 1954 season is an example of where regression to the mean doesn’t serve us particularly well. He had the highest projection of any in the top ten (111), but that was based primarily on his 1951 season (not a great one by his standards) and then heavily regressed to the mean, as he had only 120 plate appearances combined in 1952 and 1953 due to the Korean War. Even coming into his age-35 season, Williams would have been a good bet to put up a number more like 130 or 140.
  • Other seasons of note that didn’t make the top ten include Tito Francona in 1959, who hit .363 (an NOPS/PF of 142) but was projected to be league average (102), good enough to rank 11th overall; Barry Bonds’ 2001 campaign, projected at a healthy 139 but actually reaching a ridiculous 191 (ranked 15th); Rico Carty in 1964, where reality beat the projection 137 to 100 (17th); and Tony Clark‘s 2005 season with the Diamondbacks (ranked 26th), where he put up a 132 in 393 plate appearances but was projected at 97 in 284.

On the busts side, George “Boomer” Scott burst onto the scene in 1966 as a 22-year-old rookie, playing in all 162 games and hitting 27 home runs. He bested that performance during the “impossible dream” season of 1967 (wonderfully told, I might add, by our own Jay Jaffe in It Ain’t Over) by hitting .303/.373/.465. But the year of the pitcher would prove to be more than a little rough for Boomer, as he put up a .171/236/.237 line in 387 plate appearances, good for an NOPS/PF of just 73 against a projection of 119. That’s 38.5 percent off the mark (or -38.5, as represented); in real terms, a difference of more than 30 runs in production. Boomer would rebound somewhat in 1969 through 1971, but he didn’t return to his 1967 level until he was with the Brewers in 1973.

Other notes from busts:

  • One wouldn’t think that light-hitting Doug Flynn would have much room to disappoint, but that he did in 1977. After putting up league-average numbers in limited playing time with Cincinnati in 1975 and 1976, the projection was for a league-average season in 1977, in limited playing time. After just a handful of plate appearances he was dealt in June of 1977 to the Mets as a part of the Tom Seaver deal, and he hit just .191 for them en route to a .197/.223/.232 season. His actual NOPS/PF of 63 was -37 percent off of the projected mark.
  • Homer Bush was a one-hit wonder with the Blue Jays in 1999 when at the age of 26 he hit .320/.353/.421. That, combined with his age and good marks in limited time in 1997 and 1998 translated to a projection of 103 for 2000. He couldn’t recapture the magic however, and wound up at 65, which is fourth from the bottom on our list.
  • Sort of like Ted Williams in reverse, Mario Mendoza’s expectations should never have been that high in 1979, but limited playing time regressed to the mean will do that to a guy.
  • Pat Listach won the Rookie of the Yeasr in 1992 at the age of 24, but he struggled the next two seasons in playing time limited by injuries. In 1995, the projection was to jump back to league average at 102, but he struggled mightily and finished at .219/.276/.254 and an NOPS/PF of 66.
  • If you’re wondering where Super Joe Charboneau is on the list, he’s not. After batting more than 500 times in 1980, he didn’t get back to 300 plate appearances in either 1981 or 1982.
  • Although not in the bottom ten, Jimmy Wynn‘s 1971 season with the Astros is similar to Boomer’s 1968. Although older at 29, he was coming off three very solid seasons, including a phenomenal 1969 where he walked 148 times and hit 33 home runs. His projection for 1971 was 133, but he ended up at 89, a real difference of 49 runs, and “good” for 12th from the bottom at -33 percent.
  • Most recently, Jason Giambi‘s injury-riddled 2004 season ranks 32nd, where a projected value of 135 turned into a 95, a real difference of an astounding 82 runs.

While measuring the difference from the projection in terms of percentage is certainly adequate, there is also another equally valid way to look at this. Given that we know the actual number of plate appearances and have calculated a projected number of plate appearances, we can–as hinted at above–use these facts to calculate the number of runs contributed given the NOPS/PF values for both reality and the projection. So, the following top and bottom ten are calculated on this basis:

                                            Actual                        Projected
Year  Name           Team  Lg  Age    PA  NOPS/PF   G    Runs      PA  NOPS/PF  Runs Diff
1911  Joe Jackson     CLE  AL   21   641     153   147    124      55     107     7   117
1934  Hal Trosky      CLE  AL   21   685     131   154    119      25     103     3   116
1996  Alex Rodriguez  SEA  AL   20   677     132   146    124      98      98    13   111
1987  Mark McGwire    OAK  AL   23   641     135   151    112      31     101     4   108
1929  Lefty O'Doul    PHI  NL   32   731     134   154    138     210     101    30   108
1950  Al Rosen        CLE  AL   26   668     128   155    112      30     100     4   108
1964  Dick Allen      PHI  NL   22   708     138   162    107      13     105     1   105
1929  Chuck Klein     PHI  NL   24   679     132   149    126     148     107    22   104
1925  Earle Combs     NYA  AL   26   673     115   150    106      21     102     3   103
1964  Tony Oliva      MIN  AL   25   718     131   161    103       7     103     1   102
2004  John Olerud     SEA  AL   35   312      97    78     40     651     111    95   -55
1923  Bobby Veach     DET  AL   35   339     109   114     47     693     117   104   -56
1994  John Kruk       PHI  NL   33   301     110    75     41     631     126    98   -57
1922  Babe Ruth       NYA  AL   27   495     148   110     93     645     184   151   -58
1995  Ken Griffey     SEA  AL   25   314     110    72     46     573     137   104   -58
2000  Vinny Castilla  TBA  AL   32   354      71    85     35     680     100    94   -59
2000  Mark McGwire    SLN  NL   36   321     159    89     67     666     149   131   -64
1950  Ted Williams    BOS  AL   31   416     138    89     75     696     153   139   -64
1925  Babe Ruth       NYA  AL   30   426     124    98     72     657     162   145   -73
2004  Jason Giambi    NYA  AL   33   322      95    80     40     686     135   122   -82

The top of this table is dominated by players who had excellent rookie seasons after garnering scattered playing time in one or more previous seasons. The bottom of the list primarily includes established stars who were the victims of injuries, such as Williams fracturing his elbow in the 1950 All-Star Game, or the Bambino’s famous intestinal problems in 1925, or the Babe’s other more self-imposed problems in drawing three suspensions in 1922. As you can see, at the maximum a team might lose 60 to 70 runs, which can be translated to six or seven wins when a superstar doesn’t perform as expected. On the other end of the spectrum, players who contributed much more than expected were typically counted on to produce prior to the season and so these gains are not wholly unexpected.

Moving Forward

One of the great things about baseball is its variation from season to season within a larger stable structure. Part of that variation is our fascination with projections and what they ultimately translate to in terms of wins and losses for our favorite teams. On that note, I’ll leave you with one other quote from Blaise Pascal to ponder through this season of projections:

The reason people find it so hard to be happy is that they always see the past better than it was, the present worse than it is, and the future less resolved than it will be.