July 12, 2001
Aim For The Head
Walk Rate Spikesmy June 26 column "the most boring thing I've EVER managed to read in its entirety." Hopefully this week's question, from Mikael Haxby, won't put too many of my readers to sleep...or at least put different ones to sleep than I did two weeks ago.
My question has to do with walk rates. Whenever a player--say, Alex Rodriguez last year--has a great leap forward in walk rate, sabermetricians attribute it to an added skill, and expect him to keep it up. But if a player has a similar leap forward in batting average or power--say, Deivi Cruz last year--sabermetricians (not to generalize or anything) tend to assume the player will fall back some, that some of the leap forward was based on luck.
Thanks for the question, Mikael.
There is a tendency to attribute jumps in batting average as flukes, while praising comparable jumps in walk rate as improved plate discipline. I suspect part of it has to do with the sabermetric love affair with the walk in general, while denigrating batting average--the conventional metric for hitting. However, rather than psychoanalyze my fellow analysts, let's broaden Mikael's question to address unexpected jumps (or drops) in production, and whether they represent a true change in skill level.
In order to determine whether we have a fluke or an actual change in skill we need to define several data points:
If #3 is close to #2, then we could reasonably say that a significant change in skill has occurred. If #3 reverts back to near #1, then #2 could be considered a fluke.
As you've probably guessed by now, I've run a statistical analysis that looks for flukes and changes in ability. I looked at several types of production--hits, home runs, total bases, walks, on-base percentage, and strikeouts. In all cases, I'm computing the skill on a per-PA basis (as opposed to say, batting average, which is on a per-AB basis). My goal was to see whether players are more likely to retain gains made in spike seasons in certain skills over other skills. Or in other words, whether flukes are more likely in batting average or walk rate.
For purposes of this study, I used all major-league players from 1954-2000. For each player, I looked at a seven year span. For the established level of skill, I used the average over seasons 1-3 (min: 1000 PA). Season 4 was the spike season candidate, used to compare to the established level of skill (min: 400 PA). Seasons 5-7 comprised the future level of performance (min: 1000 PA) following the spike season. I looked at how much of the gain in spike season (season 4) over the established level of performance (seasons 1-3) was retained in the following years (seasons 5-7).
An example may be more illustrative. Let's use Jimmy Wynn's 1969 as our spike season, and focus on his walk rate as the skill being measured:
Established level: From 1966-68, Wynn had 213 walks (which, throughout this analysis is defined as BB+HBP), in 1782 PA, for a walk rate of .120
Spike season: In 1969, he walked 151 times in 651 PA, for a walk rate of .232
Future level: From 1970-72, he walked 270 times in (coincidentally) 1782 PA, for a walk rate of .152
His walk rate rose .112 (.232-.120) in his spike season over his established level.
Following his spike season, his walk rate established a new level .032 (.152-.120) above his old established rate.
Thus Wynn retained .032 / .112 = 28.5% of the gain in walk rate following his spike season. Though he didn't again walk as much as he did in 1969, neither did he revert all the back to his prior level of production. He hung on to some portion of the improvement he showed in 1969. We'll refer to this figure (28.5%) as Wynn's retention percentage in walks after 1969.
Harkening back to Mikael's question, if walk rate shows a higher retention percentage than hit rate (batting average), then there is some basis for calling the latter more "lucky," and the former more "improvement." To see if this was true, I looked at all (but two, see below) qualifying players from 1954-2000, and computed the established levels, spike season rates, and future levels in every skill under consideration. There were 3,220 player-spans that comprised the data set (which often overlapped as a player could qualify in 1990-97, and 1991-98, and 1992-99, etc.).
(In the interest of disclosure, I should mention that I omitted Ozzie Smith's 1982 and Frank Tavares's 1977, because their established level in home runs was zero (they didn't hit a single home run in a three-year-span of 1000+ PA). In earlier stages of this research, I used ratios of spike-to-established rates, in which having a zero in the denominator was problematic. Later on, I switched to using differences, but neglected to return those particular Smith and Tavarez years to the data set. I hope you won't miss them too much.)
First, I looked at the magnitude of the differences of both the spike season rate and the future level over the established level (e.g. [.112, .032] from the Wynn/1969 example). I computed the correlation across all 3,220 samples, without trying to separate spikes in performance. This correlation indicates what percentage of the differences in gains in future level can be explained by gains in the spike season, or roughly whether a large gain in a spike season is associated with a large gain in future level (and vice versa).
The first thing to note is that all of the correlations are positive, and pretty significantly so. In all cases, a change in one season is likely to be followed by additional seasons with at least some of those gains retained. Second, while all of the correlations are fairly close to one another, it's interesting to see that hit rate has the lowest correlation, suggesting that changes in batting average are slightly less likely to be predictive of future performance, and more likely to be a fluke. Among the skills I looked at, strikeout rate had the highest correlation, meaning that changes in that stat are slightly more likely to be real than those in other areas.
However, this analysis looked at every player. Mikael was asking about the "great leap forward" in a skill, not the more typical average player whose improvement is minimal. To look only at the players with the largest changes, I sorted the data by the difference in rates between the spike season and the established level (the .112 figure in the Wynn example). I then selected the 300 players with the biggest gains or losses (roughly the top 10% and the bottom 10%), in each skill and looked specifically at how much of their spike season change was retained going forward. Note that I'm using both the best and worst changes, so that I'm looking not only at those with the greatest improvements, but also those with the largest declines.
For each player, I computed the retention percentage in each skill, and took the average retention percentage across the group of 300. Those figures are reported below:
Note that in the table, I'm using "Increase" and "Decrease" for each statistic to point in the direction of desirability for the batter, that is, "Increase" means higher home-run rates, but lower strikeout rates.
We see some very interesting results here. First, we see some empirical support for the observation Mikael made, which is that gains in walk rate are more real than gains in hit rate. Among the 300 players who increased walk rate the most in a spike season, they typically retained over half of the gains in future seasons. By contrast, the players with the biggest gains in batting average were only able to hold on to about 1/5th of their improvements. Flukes are more likely in batting average than walk rate.
However, changes in strikeout rate were even more likely to be kept with a retention percentage of nearly 60%. Home-run rate and on-base percentage showed medium levels of retention percentage, while total base rate was pretty low at about 30%. There is, of course, some overlap between these skills. On-base rate is a combination of hit rate and walk rate, and sure enough the retention rate is between the rates for each component. Likewise, total-base rate (comparable to slugging average) is affected by both HR rate and hit rate. It would be possible, of course, to separate things further (using something like isolated slugging), or to synthesize more complex retention percentages (such as OPS, EqA, or VORP).
Though improvements in skills get most of the attention, I am somewhat perversely interested in the analysis of decline phases of players' careers. The second column of numbers shows how the retention rate following a sharp decline in skill. Unlike the previous set of numbers, high percentages are not desirable here, since it would mean that a player continues to perform close to the new, lower level of performance. Lower retention percentage are related to the likelihood and magnitude of a player bouncing back towards his original established level of performance.
The contrast is sharp, especially for batting average. Players who decline sharply are far more likely to be stuck with most of the decline than players who see a spike in batting average. Drops in home-run rates are less likely to be fluky than HR jumps, and a falling total-base rate is perhaps the strongest sign of decline among the skills analyzed. On-base percentage, often thought of as an "old-player" skill, is in fact the area where players tend to bounce back the most.
It's important to note that I did the analysis with unadjusted numbers, either for park or league. A player who moved to Colorado for several years may seem to have a substantial jump in skill level (Andres Galarraga comes to mind), but whose actual change in underlying ability was somewhat lower than what it appeared. Players active during the 1960s across the change in strike zones could be similarly affected. Neither did I account for age in this analysis. A sharp increase for a 25-year-old in home-run rate is probably more real than the same gain for a 36-year-old. However, I suspect that a retention percentage analysis, accounting for park, league, and age, could be a useful tool in devising more accurate player predictions (indeed, most projection systems do some version of this, combining age adjustments with some form of regression to the mean).
Keith Woolner is an author of Baseball Prospectus. You can contact him by clicking here.