February 21, 2003
PECOTA At Altitude: A Review of Major League Hitters in Colorado
Up until now, the Coors Field Wars have been fought from the top down. There have plenty of theories advanced about what sort of hitter should do well at Coors. Joe Sheehan presented one theory (players who put the ball in play make best use of Coors), Rany Jazayerli presented another (high altitude provides a comparative advantage to whiff-prone hitters by reducing strikeouts), and Dan O'Dowd has tested out both theories and then some in his manic building and rebuilding of the Rockies.
What hasn't been done, at least so far as I am aware, is a systematic study of what sort of hitters actually have benefited from high altitude. Baseball in Denver is no longer a novelty; the Rockies have accumulated tens of thousands of plate appearances in their decade of existence. There is enough evidence to perform at least an exploratory empirical analysis of what types of skills are best accentuated by the ballpark.
Including the Mile High years, there have been 29 hitters with significant major league experience in another organization who accumulated at least 130 plate appearances in a season in purple pinstripes. Although it would be stretch to call any of those hitters an established superstar prior to his initiation as a Rockie - Larry Walker can make the best case - they represent every possible permutation of strength and deficiency. It would be hard to identify two more opposite players than Dante Bichette and Alex Cole, who took the outfield together in the Rockies' first ever home game on April 9, 1993.
I turned back the clock and ran PECOTA projections for each of these 29 players. There are only a couple of differences between this set of forecasts and those that appear in this year's book. First, because we do not have Davenport Translations that far back into time, only major league stats were used; thus the emphasis on established major leaguers. Second, all players were projected into a neutral park and league. The PECOTA system makes certain assumptions about how to apply park effects - all players are not treated equally. In this case, however, we're using our forecasting system to test out certain theories about actual performance, and not the other way around; introducing PECOTA's notions about park effects would bias the analysis.
We can get away with comparing park-neutral forecasts to park-affected results by using a measure for value that places all players back on an equal footing - in this case, Equivalent Average. Our nouveau Rockies are listed in the table below, sorted by the difference between their expected and actual EQA.
Table 1: Projected versus actual performance of hitters in first season in Denver EQA Year Player Forecast Actual Delta 1993 Andres Galarraga .247 .321 +.074 2002 Jay Payton .239 .302 +.063 1993 Charlie Hayes .239 .287 +.048 1994 Ellis Burks .280 .323 +.043 1993 Joe Girardi .216 .257 +.041 1998 Darryl Hamilton .250 .289 +.039 1996 Jeff Reed .234 .261 +.027 2000 Todd Hollandsworth .257 .283 +.026 2000 Tom Goodwin .235 .260 +.025 1993 Dante Bichette .259 .284 +.025 1998 Greg Colbrunn .242 .258 +.016 2001 Greg Norton .250 .263 +.013 1993 Jerald Clark .244 .255 +.011 2000 Jeff Hammonds .280 .287 +.007 2002 Todd Zeile .256 .260 +.004 2000 Brian L. Hunter .230 .227 -.003 2000 Brent Mayne .261 .258 -.003 1994 Walt Weiss .227 .222 -.005 2000 Jeff Cirillo .282 .276 -.006 1995 Larry Walker .319 .308 -.011 1998 Curtis Goodwin .230 .217 -.013 1993 Alex Cole .250 .235 -.015 1997 Kirt Manwaring .210 .184 -.026 1999 Lenny Harris .245 .218 -.027 1998 Mike Lansing .266 .239 -.027 1993 Daryl Boston .285 .256 -.029 1994 Howard Johnson .285 .247 -.038 2000 Darren Bragg .252 .206 -.046 2001 Alex Ochoa .282 .227 -.055
At first glance, it's tough to know what to make out of that list; Andres Galarraga, of course, did well in Denver, but so did Darryl Hamilton. Fortunately, one of PECOTA's strengths is its ability to break down a hitter's value into its component parts. Based on each hitter's previous performance, the program generates baseline values for five primary production metrics: batting average, isolated power, unintentional walk rate, strikeout rate, and speed score.
The following table presents the correlations between each of these attributes and the difference between a player's projection and his actual performance ("EQA Delta"). Keep in mind that the baseline attributes are determined based on a player's performance prior to the season that we've forecasted for him in Coors.
Table 2: Correlation between hitting attributes and EQA Delta Batting Average -.17 Isolated Power -.05 Walk Rate -.44 Strikeout Rate +.29 Speed Score -.23
PECOTA is designed to be neutral with respect to any particular hitting attribute. The positive effect of strong isolated power, the negative effect of a low walk rate, and so on, are accounted for appropriately in the forecast; given a sufficiently large sample, the correlation between any particular hitting attribute and the forecast error will approach zero.
Instead, a several of the correlations for the sample of Rockies hitters appear to be significant. Especially affected is the behavior of the two categories representing plate discipline. Relative to their performance at sea level, strikeout-prone hitters benefited from the high altitude, while players with a high walk rate were disadvantaged. (The table also contains a noteworthy non-result; at least based on our limited sample, players with high isolated power did not appear to enjoy any particular advantage from playing in Denver).
The relationship between walk rate, strikeout rate, and performance in Denver deserves further exploration. In the chart below, I have plotted each player's baseline walk and strikeout rates against his EQA Delta. Players who exceeded their forecast are indicated in green, and players who fell below it in red. Differences of at least 25 points of EQA above or below their forecast are plotted with larger data points. The chart is arranged such that league average walk and strikeout rates - about 7.6% and 16.5%, respectively - are represented by the middle of each axis.
Five hitters in the sample had walk rates significantly below the league average, and strikeout rates significantly above it. All of them exceeded their PECOTA forecast, and four of them - Bichette, Galarraga, Charlie Hayes, and Todd Hollandsworth - exceeded it by at least 25 points of EQA. Certainly, your mileage may vary. With an interspersion of red triangles and green diamonds that is neither wholly ordered nor wholly random, the chart looks a little bit like a thirty dollar rug from IKEA (PEKÖTA?). There are players like Hamilton who buck the trend. But for the most part, the favorable outcomes are concentrated in the northwest part of the graph (high strikeout/low walk), while the unfavorable outcomes are concentrated in the opposite corner (low strikeout/high walk).
Given the limited sample size, it is worth testing to see whether the pattern is statistically significant. This can be done by placing baseline walk rate, strikeout rate, and EQA delta into a regression equation. Contrary to my better judgment, I am going to reproduce the raw output from my statistics package here; those with an aversion to such things are advised to avert their eyes, flick on whichever Michael Jackson special is running at the moment, and rejoin us further down the page.
Table 3: Dirty Laundry >. regress eqadelta bb k [aw=pa] ,noc >(sum of wgt is 1.0488e+04) > > Source | SS df MS Number of obs = 29 >-------------+------------------------------ F( 2, 27) = 5.54 > Model | .007804868 2 .003902434 Prob > F = 0.0096 > Residual | .019022657 27 .000704543 R-squared = 0.2909 >-------------+------------------------------ Adj R-squared = 0.2384 > Total | .026827525 29 .000925087 Root MSE = .02654 > >------------------------------------------------------------------------------ > eqadelta | Coef. Std. Err. t P>|t| [95% Conf. Interval] >-------------+---------------------------------------------------------------- > bb | -.4144639 .1442057 -2.87 0.008 -.7103497 -.1185782 > k | .2231925 .0672948 3.32 0.003 .0851149 .3612701 >------------------------------------------------------------------------------
Both walk rate and strikeout rate have a significant effect on performance at Coors at the 99% confidence level. For each one percent increase in strikeout walk rate, a Rockie hitter receives a "bonus" of about two points of EQA above and beyond his performance in a neutral park; for each one percent increase in his walk rate, he receives a penalty of four points of EQA.
What does that mean for players that will be new to Coors this year? All three of the Rockies' major position player acquisitions - Preston Wilson, Jose Hernandez, and Charles Johnson - are in a position to benefit. Below, I have presented each player's baseline walk rate and strikeout rate, the bonus he can expect to receive from playing in Coors, and a revised PECOTA forecast.
Table 4: New Rockies Baseline Baseline EQA Projection BB Rate K Rate EQA Bonus Neutral Coors Wilson 7.7% 23.1% +.020 .279 .299 Hernandez 6.6% 29.3% +.038 .240 .278 Johnson 8.3% 22.8% +.016 .261 .277
The differences are substantial. Our model believes that Jose Hernandez could gain as many as forty points of EQA by virtue of being a Rockie. Because that score accounts for both his performance both at home and on the road, his performance at Coors could be truly lights out. (Hernandez' own performance history backs this prediction as well; for his career, his batting line at Coors reads 327/374/720).
Directions for Further Research
Certainly, there are a variety of approaches that might be applied to tackle a question like this one, and I expect to receive objections both that I have oversimplified, and that the analysis is unnecessarily complicated. We have leveraged off of the PECOTA system here because one of its strengths is analyzing the different components of offensive performance. However, the results are robust enough, at least with respect to strikeout and walk rates, that a simpler approach, such as using the player's statistics from the previous season, would likely have produced the same conclusion.
On the other hand, I have focused primarily on walk rate and strikeout rate while ignoring the other components of offensive performance. All of these skills are interrelated, and twenty-nine data points simply aren't enough to tackle more than a couple of explanatory variables at once. An approach that would permit more observations, such as looking at the performance of visiting players, might allow for greater flexibility.
It also might be objected that I shouldn't have lumped home and road performance together in conducting this analysis. However, this is the approach that best replicates the decision that the Rockies must face. As Keith Scherer points out in this year's book, the Rockies have been overwhelmingly successful at winning games at home, but utterly inept on the road. It may be that the advantage of free-swinger like Hernandez or Galarraga is not merely that he'll cut down on his strikeouts at Coors, but also that his approach - swing at everything, and swing hard - is less subject to deterioration in road games.
The point has been made in many places that the Rockies are subject to profound structural disadvantages as they attempt to build a winning ballclub; the longer innings required of pitchers performing in a high-run environment, and the roadtrip hangover that seems to result after long homestands at Coors, are particularly prickly problems to overcome.
However, the Rockies may also have a profound structural advantage that they stand to benefit from: based on the history to date of performance in Denver, certain types of hitters appear to benefit disproportionately from the high-altitude environment. In particular, high-strikeout, low-walk hitters - players like Wilson, Hernandez, and Johnson - are more valuable to the Rockies than they are to anyone else. In stockpiling these players, potentially at a market discount, the Rockies have a comparative advantage.