keyboard_arrow_uptop

Evidently, not everyone finds articles on attendance all that compelling. In
fact, one response called
my June 26 column
"the most boring thing I’ve
EVER managed to read in its entirety." Hopefully this week’s question,
from Mikael Haxby, won’t put too many of my readers to sleep…or at least
put different ones to sleep than I did two weeks ago.


My question has to do with walk rates. Whenever a player--say, Alex
Rodriguez
last year--has a great leap forward in walk rate,
sabermetricians attribute it to an added skill, and expect him to keep it
up. But if a player has a similar leap forward in batting average or
power--say, Deivi Cruz last year--sabermetricians (not to generalize
or anything) tend to assume the player will fall back some, that some of the
leap forward was based on luck.

My question is, is there a statistical backing for these assumptions? Put
another way, how much does luck play into a player's walk rate? Is there
less of an element of luck in walk rate than in other statistical
categories?

Thanks for the question, Mikael.

There is a tendency to attribute jumps in batting average as flukes, while
praising comparable jumps in walk rate as improved plate discipline. I
suspect part of it has to do with the sabermetric love affair with the walk
in general, while denigrating batting average–the conventional metric for
hitting. However, rather than psychoanalyze my fellow analysts, let’s
broaden Mikael’s question to address unexpected jumps (or drops) in
production, and whether they represent a true change in skill level.

In order to determine whether we have a fluke or an actual change in skill
we need to define several data points:

  1. an established level of skill
  2. a "spike" season with a sudden and large change in that
    level

  3. a new level of performance going forward

If #3 is close to #2, then we could reasonably say that a significant change
in skill has occurred. If #3 reverts back to near #1, then #2 could be
considered a fluke.

As you’ve probably guessed by now, I’ve run a statistical analysis that
looks for flukes and changes in ability. I looked at several types of
production–hits, home runs, total bases, walks, on-base percentage, and
strikeouts. In all cases, I’m computing the skill on a per-PA basis (as
opposed to say, batting average, which is on a per-AB basis). My goal was to
see whether players are more likely to retain gains made in spike seasons in
certain skills over other skills. Or in other words, whether flukes are more
likely in batting average or walk rate.

For purposes of this study, I used all major-league players from 1954-2000.
For each player, I looked at a seven year span. For the established level of
skill, I used the average over seasons 1-3 (min: 1000 PA). Season 4 was the
spike season candidate, used to compare to the established level of skill
(min: 400 PA). Seasons 5-7 comprised the future level of performance (min:
1000 PA) following the spike season. I looked at how much of the gain in
spike season (season 4) over the established level of performance (seasons
1-3) was retained in the following years (seasons 5-7).

An example may be more illustrative. Let’s use Jimmy Wynn‘s 1969 as
our spike season, and focus on his walk rate as the skill being measured:

Established level: From 1966-68, Wynn had 213 walks (which,
throughout this analysis is defined as BB+HBP), in 1782 PA, for a walk rate
of .120

Spike season: In 1969, he walked 151 times in 651 PA, for a walk rate
of .232

Future level: From 1970-72, he walked 270 times in (coincidentally)
1782 PA, for a walk rate of .152

His walk rate rose .112 (.232-.120) in his spike season over his established
level.

Following his spike season, his walk rate established a new level .032
(.152-.120) above his old established rate.

Thus Wynn retained .032 / .112 = 28.5% of the gain in walk rate following
his spike season. Though he didn’t again walk as much as he did in 1969,
neither did he revert all the back to his prior level of production. He hung
on to some portion of the improvement he showed in 1969. We’ll refer to this
figure (28.5%) as Wynn’s retention percentage in walks after 1969.

Harkening back to Mikael’s question, if walk rate shows a higher retention
percentage than hit rate (batting average), then there is some basis for
calling the latter more "lucky," and the former more
"improvement." To see if this was true, I looked at all (but two,
see below) qualifying players from 1954-2000, and computed the established
levels, spike season rates, and future levels in every skill under
consideration. There were 3,220 player-spans that comprised the data set
(which often overlapped as a player could qualify in 1990-97, and 1991-98,
and 1992-99, etc.).

(In the interest of disclosure, I should mention that I omitted Ozzie
Smith
‘s 1982 and Frank Tavares‘s 1977, because their established
level in home runs was zero (they didn’t hit a single home run in a
three-year-span of 1000+ PA). In earlier stages of this research, I used
ratios of spike-to-established rates, in which having a zero in the
denominator was problematic. Later on, I switched to using differences, but
neglected to return those particular Smith and Tavarez years to the data
set. I hope you won’t miss them too much.)

First, I looked at the magnitude of the differences of both the spike season
rate and the future level over the established level (e.g. [.112, .032] from
the Wynn/1969 example). I computed the correlation across all 3,220 samples,
without trying to separate spikes in performance. This correlation
indicates what percentage of the differences in gains in future level can be
explained by gains in the spike season, or roughly whether a large gain in a
spike season is associated with a large gain in future level (and vice
versa).

Statistic Correlation
H .4168
HR .4703
TB .4542
SO .5139
OBP .4544
WALK .4751

The first thing to note is that all of the correlations are positive, and
pretty significantly so. In all cases, a change in one season is likely to
be followed by additional seasons with at least some of those gains
retained. Second, while all of the correlations are fairly close to one
another, it’s interesting to see that hit rate has the lowest correlation,
suggesting that changes in batting average are slightly less likely to be
predictive of future performance, and more likely to be a fluke. Among the
skills I looked at, strikeout rate had the highest correlation, meaning that
changes in that stat are slightly more likely to be real than those in other
areas.

However, this analysis looked at every player. Mikael was asking about the
"great leap forward" in a skill, not the more typical average
player whose improvement is minimal. To look only at the players with the
largest changes, I sorted the data by the difference in rates between the
spike season and the established level (the .112 figure in the Wynn
example). I then selected the 300 players with the biggest gains or losses
(roughly the top 10% and the bottom 10%), in each skill and looked
specifically at how much of their spike season change was retained going
forward. Note that I’m using both the best and worst changes, so that I’m
looking not only at those with the greatest improvements, but also those
with the largest declines.

For each player, I computed the retention percentage in each skill, and took
the average retention percentage across the group of 300. Those figures are
reported below:

Skill Players who
increased in skill
Players who
decreased in skill
H 21.9% 45.8%
HR 41.5% 47.5%
TB 29.3% 51.4%
SO 59.7% 43.1%
OBP 42.6% 37.4%
WALK 51.7% 42.1%

Note that in the table, I’m using "Increase" and
"Decrease" for each statistic to point in the direction of
desirability for the batter, that is, "Increase" means higher
home-run rates, but lower strikeout rates.

We see some very interesting results here. First, we see some empirical
support for the observation Mikael made, which is that gains in walk rate
are more real than gains in hit rate. Among the 300 players who increased
walk rate the most in a spike season, they typically retained over half of
the gains in future seasons. By contrast, the players with the biggest gains
in batting average were only able to hold on to about 1/5th of their
improvements. Flukes are more likely in batting average than walk rate.

However, changes in strikeout rate were even more likely to be kept with a
retention percentage of nearly 60%. Home-run rate and on-base percentage
showed medium levels of retention percentage, while total base rate was
pretty low at about 30%. There is, of course, some overlap between these
skills. On-base rate is a combination of hit rate and walk rate, and sure
enough the retention rate is between the rates for each component. Likewise,
total-base rate (comparable to slugging average) is affected by both HR rate
and hit rate. It would be possible, of course, to separate things further
(using something like isolated slugging), or to synthesize more complex
retention percentages (such as OPS, EqA, or VORP).

Though improvements in skills get most of the attention, I am somewhat
perversely interested in the analysis of decline phases of players’ careers.
The second column of numbers shows how the retention rate following a sharp
decline in skill. Unlike the previous set of numbers, high percentages are
not desirable here, since it would mean that a player continues to perform
close to the new, lower level of performance. Lower retention percentage are
related to the likelihood and magnitude of a player bouncing back towards
his original established level of performance.

The contrast is sharp, especially for batting average. Players who decline
sharply are far more likely to be stuck with most of the decline than
players who see a spike in batting average. Drops in home-run rates are less
likely to be fluky than HR jumps, and a falling total-base rate is perhaps
the strongest sign of decline among the skills analyzed. On-base percentage,
often thought of as an "old-player" skill, is in fact the area
where players tend to bounce back the most.

It’s important to note that I did the analysis with unadjusted numbers,
either for park or league. A player who moved to Colorado for several years
may seem to have a substantial jump in skill level (Andres Galarraga
comes to mind), but whose actual change in underlying ability was somewhat
lower than what it appeared. Players active during the 1960s across the
change in strike zones could be similarly affected. Neither did I account
for age in this analysis. A sharp increase for a 25-year-old in home-run
rate is probably more real than the same gain for a 36-year-old. However, I
suspect that a retention percentage analysis, accounting for park, league,
and age, could be a useful tool in devising more accurate player predictions
(indeed, most projection systems do some version of this, combining age
adjustments with some form of regression to the mean).

Keith Woolner is an author of Baseball Prospectus. You can contact him by
clicking here.

You need to be logged in to comment. Login or Subscribe