“The present is never our goal: the past and present are our means: the future alone is our goal. Thus, we never live but we hope to live; and always hoping to be happy, it is inevitable that we will never be so.”
Well, it’s that time of year again. The season that was is past, and plans are being laid for the season to come. From a performance analyst’s perspective this can only mean that a part of our attention turns to player forecasts, as Joe Sheehan‘s recap of “ShandlerFest” reminded me. I also received The Bill James Handbook this week, which has a “Hitter Projections” chapter written by James himself. In that chapter, we find a list of projections that went very well for Baseball Info Solutions (Brad Ausmus, Rob Mackowiak, Jeff Conine, Pedro Feliz, Carlos Beltran, and Adrian Beltre) as well as those that didn’t (Carlos Quentin, Carlos Pena, Prince Fielder, B.J. Upton, Marcus Giles, and Richie Sexson). In thinking about cases like Quentin (a bust for the Diamondbacks) and Pena (a boom for the Devil Rays) I started to wonder about the biggest booms and busts of all time in terms of a reasonable projection of performance. Which teams were most let down by a player’s performance, and which were most surprised, assuming of course that they had reasonable expectations in mind? (I grant you, that’s a huge and demonstrably false assumption that bursts the entire bubble, but play along).
Projections on the Cheap
We’ll start by examining almost 17,000 batter seasons from 1903 through 2006 in an effort to sort out the booms and busts. The first task here is to create some “projections.” I say “projections” because the projections we’re creating are for seasons that have already occurred, so there really is no future aspect to this. We can estimate a performance level for a given season and league based on a player’s performance prior to the season in question. Since we’re not trying to replicate something as complex as PECOTA or even the BIS projections, we’ll base ours on normalized park-adjusted OPS (NOPS/PF). I’ve used this measure in the past to boil offensive production down into a single number on a rate basis, where 100 is a perfectly league average OPS when adjusted for the home park of the player. Although very simple to calculate, OPS is a great proxy for run production and matches up very well with more complex run estimators, including BaseRuns and Runs Created.
So the methodology for creating the projections goes like this:
- Select all players with 300 or more plate appearances for a major league team from 1903 through 2006, excluding the Federal League.
- Calculate the NOPS/PF, plate appearances, and age for the set of players from step one. This is the actual performance against which we’ll measure our projection.
- Weight the previous three years’ NOPS/PF using a system where the previous year is weighted at seven, year minus two is weighted at four, and year minus three gets a two.
- Take the value from step three and regress it to the mean by calculating a weighted average of the plate appearances for the previous three years and combining it with enough league-average plate appearances to total 2,000. In other words, if a player had accumulated 2,000 weighted plate appearances in the previous three years, his calculated NOPS/PF from step three would not be regressed at all. However, if he had, for example, a weighted average number of plate appearances of 312 (as Tadahito Iguchi did coming into the 2006 season) in his previous three seasons, his NOPS/PF would be regressed to the mean with 1,064 additional league average plate appearances, since (312*3)+1064 = 2000.
- Given the NOPS/PF calculated in step four, apply an aging adjustment. This curve is similar to the kinds of aging curves that Nate Silver has talked about in the past, although I adjusted it somewhat to be less severe, since I’m pre-selecting players with 300 or more plate appearances, and in this population declines are not so precipitous.
- Finally, apply a league adjustment using the Level Indexes that I discussed in a previous column. This allows us to account both for players who switch leagues and for the natural decline that happens as the league improves over time. In the former adjustment, for example, both Frank Howard in 1965 and Frank Robinson in 1966 have their projections increased slightly by moving from the National to the American League.
We end up with 16,900 projected NOPS/PF values for the actual NOPS/PF values, with a correlation coefficient of r=0.64. When compared with how other projection systems perform, that doesn’t seem bad at all; in fairness, our correlation should be pretty good since we’re using a backwards-looking approach and have selected only players who accumulated 300 or more plate appearances in the season we’re trying to project. In short, we have a selection bias in play. Players who suffered injuries that wiped out their seasons, or who retired, or who simply performed so poorly that they didn’t even garner the minimum playing time for what could be considered a regular player are all excluded from the projections. In 2007 terms, think Nick Johnson.
On the other hand, we haven’t set minimums on the number of plate appearances for previous seasons; even if a player has one plate appearance in the previous three seasons, we’ll create a projection, as we did for Hanley Ramirez in 2006 (who not surprisingly came in with a projection of 100). When we remove all players who had fewer than 100 projected plate appearances, the correlation coefficient goes up to 0.66.
To get a feel for the actual shape of the distribution, the following chart shows the histogram of the differences between the actual and projected NOPS/PF values, along with the cumulative frequency (the orange line).
From this you can see that there are almost an equal number of projections that fall on either side of zero, with the average difference being -.25 points of OPS. Altogether, 6,630–or 39 percent–of the projections fall within five points.
As an aside, one of the paragraphs that caught my eye in The Handbook chapter on projections was this one relating to the nature of projections:
We project, basically, that every player will continue to do in the future whatever he has done in the past. If a player has hit .250 in the past, we project that he will continue to hit .250. If he has hit .350 in the past, we project that he will continue to hit .350. If he hit .350 in 2006 and .250 in 2007, we project that he will hit .300 in 2008. We’re pretty close to right most of the time, because most players in any season will continue to do about what they have done in the past.
I wholeheartedly agree with the last sentence; after all, past performance is what we use to project the future. But still, I may be reading the previous sentences too literally. I found that when doing these projections it was essential to employ regression to the mean and specifically not assume that a .350 hitter one year (or a player with an NOPS/PF of say 135) was necessarily going to be a .350 hitter the next. As has been shown time and time again, it turns out that when a player hits .350–which is near the far right end of the distribution for a major leaguer–it is much more likely that he’ll hit something less than that the next season, simply because staying out on that tail is extremely difficult and more likely achieved with a little push from lady luck. In any case, of steps four through six discussed above, the regression to the mean in step four had the greatest impact on improving the correlation coefficient, moving it from 0.53 to very nearly its final value.
As you might expect, there were a few players that the system nailed both in terms of projected NOPS/PF and in plate appearances, some of whom are shown in the following table:
Actual Projected Year Name Team Lg Age PA NOPS/PF G PA NOPS/PF 1960 Mickey Mantle NYA AL 28 644 138 153 641 138 1966 Bill White PHI NL 32 659 116 159 657 116 1969 Ron Hunt SFN NL 28 569 103 128 569 103 1976 Carl Yastrzemski BOS AL 36 636 110 155 636 110 1978 Oscar Gamble SDN NL 28 437 114 126 433 114 1985 Dwayne Murphy OAK AL 30 619 105 152 617 105 1987 Randy Bush MIN AL 28 349 99 122 352 99 1991 Barry Bonds PIT NL 26 634 135 153 637 135 1991 Kent Hrbek MIN AL 31 534 112 132 534 112 1992 Matt Nokes NYA AL 28 430 102 121 427 102 2001 Jeromy Burnitz MIL NL 32 651 111 154 654 111 2001 John Vander Wal PIT NL 35 360 110 97 360 110
There were many more (over 600) that projected the NOPS/PF exactly but not the exact number of plate appearances, and around 300 that had the correct number of plate appearances. However, the players in the previous table aren’t the interesting ones, nor are they the focus of this column: let’s focus on the booms and busts.
A Question of Measures
Before we actually come up with the top ten booms and busts, we need to determine what we should use to rank them. At first glance it would seem the simplest thing to do would be to take the difference between the actual NOPS/PF and the projected NOPS/PF. If we do that, we’ll end up with some extreme differences. However, that won’t necessarily give us the biggest differences in terms of the expected performance relative to our projection. So instead, we’ll measure the percentage by which the projection missed, and go from there. Without further ado, here are the top ten booms and busts:
Actual Projected Year Name Team Lg Age PA NOPS/PF G PA NOPS/PF PctDiff 2003 Javy Lopez ATL NL 32 495 144 129 436 95 50.8 1961 Norm Cash DET AL 26 672 157 159 271 108 45.0 1904 Mike Grady SLN NL 34 363 140 101 58 97 43.6 1911 Joe Jackson CLE AL 21 641 153 147 55 107 42.8 1989 Lonnie Smith ATL NL 33 577 138 134 215 97 42.3 1995 Mark McGwire OAK AL 31 422 153 104 213 108 42.2 1947 Harry Walker PHI NL 30 569 133 130 207 94 42.1 1905 Frank Isbell CHA AL 29 389 131 94 454 93 40.5 1980 Mike Easler PIT NL 29 445 140 132 36 100 40.0 1954 Ted Williams BOS AL 35 526 155 117 166 111 39.1 ------------------------------------------------------------------------------------- 1974 Dave Roberts SDN NL 23 358 73 113 407 109 -33.0 1985 George Wright TEX AL 26 393 66 109 519 99 -33.3 1984 Houston Jimenez MIN AL 26 317 65 108 49 98 -33.9 1995 Pat Listach MIL AL 27 369 66 101 254 102 -35.3 1979 Mario Mendoza SEA AL 28 401 62 148 74 97 -36.0 1987 Angel Salazar KCA AL 25 332 60 116 196 95 -36.8 2000 Homer Bush TOR AL 27 325 65 76 307 103 -37.2 1977 Doug Flynn NYN NL 26 300 63 90 170 100 -37.3 1909 Bill Bergen BRO NL 31 372 53 112 273 85 -37.4 1968 George Scott BOS AL 24 387 73 124 554 119 -38.5
The biggest boom of the past 103 years turns out to be Javy Lopez in 2003. The reason this ranks at the top is that Lopez was entering his age-32 season, and having been a regular for nine seasons had never eclipsed a .317 average (1999), 34 home runs (1998), 28 doubles (1997), or 106 RBI (1998). As a result, his projection came in at an NOPS/PF of 95, actually below league average, and heavily weighted on his 2002 campaign, where he hit .233/.299/.372 in almost 400 plate appearances; in 2003 he wound up at 144, a 51 percent trouncing of the projection. In real terms his projected NOPS/PF of 95 would have translated into 51 runs contributed, but his actual performance was more like 88 runs, a difference of 37.
A few notes on the rest of the booms:
- Norm Cash, in just his second full season, put up a 157 when projected to be around 108 based on a solid 1960 rookie season. The fact that he would never come close to matching that 1961 performance makes it fitting that this age-26 season feat would be seen as an outlier even at the outset of his career.
- At the age of 21, Shoeless Joe Jackson hit .408 after not collecting more than 75 at-bats in any of his previous three seasons. As a result, his projection of 107 was heavily regressed to the mean, despite a .387/.446/.587 line in 1910 in 20 games played for the Indians. In real terms his NOPS/PF of 153 was good for 124 runs, when his projection called for a mere seven. To a lesser extent, the same thing can be said about the “Hit Man,” Mike Easler, who in his age-29 season hit .338/.396/.583 in just over 400 plate appearances; he had never been given a real shot in six previous seasons spent with the Astros and Pirates.
- In the only year of his career where he hit more than nine home runs, Lonnie Smith smacked 21 in 1989 and put up a .315/.415/.533 performance for the Braves. Although a fine player for the Phillies and Cardinals earlier in the decade, he became involved in the scandal of that era which diminished his performance, eventually leading to his release by the Royals in 1987. Teetering on the brink he was offered a minor league contract during spring training in 1988, and played well enough at Richmond to get some at-bats that season, and a look in 1989. His below league-average projection is largely based on those 1987 and 1988 seasons. The rest, as they say, is history, including his Comeback Player of the Year Award.
- Mark McGwire came back from two injury-plagued seasons in 1995 at the age of 31 to beat the projection by 42 percent.
- In tenth place, Ted Williams’ 1954 season is an example of where regression to the mean doesn’t serve us particularly well. He had the highest projection of any in the top ten (111), but that was based primarily on his 1951 season (not a great one by his standards) and then heavily regressed to the mean, as he had only 120 plate appearances combined in 1952 and 1953 due to the Korean War. Even coming into his age-35 season, Williams would have been a good bet to put up a number more like 130 or 140.
- Other seasons of note that didn’t make the top ten include Tito Francona in 1959, who hit .363 (an NOPS/PF of 142) but was projected to be league average (102), good enough to rank 11th overall; Barry Bonds’ 2001 campaign, projected at a healthy 139 but actually reaching a ridiculous 191 (ranked 15th); Rico Carty in 1964, where reality beat the projection 137 to 100 (17th); and Tony Clark‘s 2005 season with the Diamondbacks (ranked 26th), where he put up a 132 in 393 plate appearances but was projected at 97 in 284.
On the busts side, George “Boomer” Scott burst onto the scene in 1966 as a 22-year-old rookie, playing in all 162 games and hitting 27 home runs. He bested that performance during the “impossible dream” season of 1967 (wonderfully told, I might add, by our own Jay Jaffe in It Ain’t Over) by hitting .303/.373/.465. But the year of the pitcher would prove to be more than a little rough for Boomer, as he put up a .171/236/.237 line in 387 plate appearances, good for an NOPS/PF of just 73 against a projection of 119. That’s 38.5 percent off the mark (or -38.5, as represented); in real terms, a difference of more than 30 runs in production. Boomer would rebound somewhat in 1969 through 1971, but he didn’t return to his 1967 level until he was with the Brewers in 1973.
Other notes from busts:
- One wouldn’t think that light-hitting Doug Flynn would have much room to disappoint, but that he did in 1977. After putting up league-average numbers in limited playing time with Cincinnati in 1975 and 1976, the projection was for a league-average season in 1977, in limited playing time. After just a handful of plate appearances he was dealt in June of 1977 to the Mets as a part of the Tom Seaver deal, and he hit just .191 for them en route to a .197/.223/.232 season. His actual NOPS/PF of 63 was -37 percent off of the projected mark.
- Homer Bush was a one-hit wonder with the Blue Jays in 1999 when at the age of 26 he hit .320/.353/.421. That, combined with his age and good marks in limited time in 1997 and 1998 translated to a projection of 103 for 2000. He couldn’t recapture the magic however, and wound up at 65, which is fourth from the bottom on our list.
- Sort of like Ted Williams in reverse, Mario Mendoza’s expectations should never have been that high in 1979, but limited playing time regressed to the mean will do that to a guy.
- Pat Listach won the Rookie of the Yeasr in 1992 at the age of 24, but he struggled the next two seasons in playing time limited by injuries. In 1995, the projection was to jump back to league average at 102, but he struggled mightily and finished at .219/.276/.254 and an NOPS/PF of 66.
- If you’re wondering where Super Joe Charboneau is on the list, he’s not. After batting more than 500 times in 1980, he didn’t get back to 300 plate appearances in either 1981 or 1982.
- Although not in the bottom ten, Jimmy Wynn‘s 1971 season with the Astros is similar to Boomer’s 1968. Although older at 29, he was coming off three very solid seasons, including a phenomenal 1969 where he walked 148 times and hit 33 home runs. His projection for 1971 was 133, but he ended up at 89, a real difference of 49 runs, and “good” for 12th from the bottom at -33 percent.
- Most recently, Jason Giambi‘s injury-riddled 2004 season ranks 32nd, where a projected value of 135 turned into a 95, a real difference of an astounding 82 runs.
While measuring the difference from the projection in terms of percentage is certainly adequate, there is also another equally valid way to look at this. Given that we know the actual number of plate appearances and have calculated a projected number of plate appearances, we can–as hinted at above–use these facts to calculate the number of runs contributed given the NOPS/PF values for both reality and the projection. So, the following top and bottom ten are calculated on this basis:
Actual Projected Year Name Team Lg Age PA NOPS/PF G Runs PA NOPS/PF Runs Diff 1911 Joe Jackson CLE AL 21 641 153 147 124 55 107 7 117 1934 Hal Trosky CLE AL 21 685 131 154 119 25 103 3 116 1996 Alex Rodriguez SEA AL 20 677 132 146 124 98 98 13 111 1987 Mark McGwire OAK AL 23 641 135 151 112 31 101 4 108 1929 Lefty O'Doul PHI NL 32 731 134 154 138 210 101 30 108 1950 Al Rosen CLE AL 26 668 128 155 112 30 100 4 108 1964 Dick Allen PHI NL 22 708 138 162 107 13 105 1 105 1929 Chuck Klein PHI NL 24 679 132 149 126 148 107 22 104 1925 Earle Combs NYA AL 26 673 115 150 106 21 102 3 103 1964 Tony Oliva MIN AL 25 718 131 161 103 7 103 1 102 ---------------------------------------------------------------------------------------- 2004 John Olerud SEA AL 35 312 97 78 40 651 111 95 -55 1923 Bobby Veach DET AL 35 339 109 114 47 693 117 104 -56 1994 John Kruk PHI NL 33 301 110 75 41 631 126 98 -57 1922 Babe Ruth NYA AL 27 495 148 110 93 645 184 151 -58 1995 Ken Griffey SEA AL 25 314 110 72 46 573 137 104 -58 2000 Vinny Castilla TBA AL 32 354 71 85 35 680 100 94 -59 2000 Mark McGwire SLN NL 36 321 159 89 67 666 149 131 -64 1950 Ted Williams BOS AL 31 416 138 89 75 696 153 139 -64 1925 Babe Ruth NYA AL 30 426 124 98 72 657 162 145 -73 2004 Jason Giambi NYA AL 33 322 95 80 40 686 135 122 -82
The top of this table is dominated by players who had excellent rookie seasons after garnering scattered playing time in one or more previous seasons. The bottom of the list primarily includes established stars who were the victims of injuries, such as Williams fracturing his elbow in the 1950 All-Star Game, or the Bambino’s famous intestinal problems in 1925, or the Babe’s other more self-imposed problems in drawing three suspensions in 1922. As you can see, at the maximum a team might lose 60 to 70 runs, which can be translated to six or seven wins when a superstar doesn’t perform as expected. On the other end of the spectrum, players who contributed much more than expected were typically counted on to produce prior to the season and so these gains are not wholly unexpected.
One of the great things about baseball is its variation from season to season within a larger stable structure. Part of that variation is our fascination with projections and what they ultimately translate to in terms of wins and losses for our favorite teams. On that note, I’ll leave you with one other quote from Blaise Pascal to ponder through this season of projections:
The reason people find it so hard to be happy is that they always see the past better than it was, the present worse than it is, and the future less resolved than it will be.