Baseball Prospectus Basics: Statistical Consistency

February, in the baseball world, is the month of predictions. Every
analyst, writer, web site, undefeatable computer program, guy with a beer,
and book (some better than
others) will spend the next month looking over the offseason wasteland
and espousing conclusions. The method behind these processes varies more
widely than Johnny Depp’s acting roles; some are based purely on numbers,
some purely on empirical data, some purely on names, and some purely on
nothing. So what can you count on?

For one thing, you can count on me not offering you any spectacular
predictions, guaranteed to be more accurate than anything on the market. If
you want that, read up on BP’s own PECOTA projection system. Instead, the aim will be to lay a basic groundwork for your
expectations of the consistency of basic statistics from season to season.
Surmising the volatility of various metrics, and their consistency from year-to-year, is the primary goal.

To accomplish this, I’m going to start with batting statistics, which are traditionally more stable than pitching statistics. To reduce outliers and
the game’s inherent degree of chance, only seasons in which a player
accumulated at least 200 ABs will be used. All seasons from 1991 to 2003
were considered, looking particularly for consecutive seasons of sufficient
sample size. This process yielded 3066 sample seasons from which to draw
data.

The variety of statistics that can be tested is understandably large, but
it’s important to only use rate statistics such as AVG, OBP, and SLG because
the large variance allowed in ABs and PAs. For the purposes of the study,
20 home runs in 300 AB is considered the same as 40 HR in 600 AB, but the
difference between 20 and 40 actual home runs is irrelevant. To this end,
AVG, OBP, SLG, BB% (Walk Rate, BB/PA), K% (Strikeout Rate), XBA% (Extra-Base
Hit Percentage, XBA/H), HR% (Home Run Rate), and ISO (Isolated Power,
SLG-AVG) were considered. Each is a rate statistic that reveals information
about certain parts of a player’s composition at the plate. Looking at the
results both individually and in concert will yield some conclusions about
year-to-year statistical consistency.

 

Metric   R-Squared   Standard Deviation 
AVG       0.1761           0.031 
OBP       0.3820           0.041 
SLG       0.4171           0.080 
BB %      0.5745           3.520
K %       0.6884           5.230
XBA %     0.4634           8.820
HR %      0.5751           1.730
ISO       0.5510           0.064

Before we get to the results, however, first let’s do some house-cleaning. To the far-left we have our offensive metrics, followed by the R-Squared, as well as the Standard Deviation. For the uninitiated, R-Squared is another term for “coefficient of determination”–a measurement of correlation. The higher the R-Squared total, the greater the correlation, and thus, the more consistent the metric. Depending on how it’s being used, an R-Squared of below 0.5000 is typically considered too low to justify any sort of predictive value. Standard deviation, meanwhile, is simply a measure of variance–the higher the number, the more volatile the metric.

With that being said, of these metrics, batting average has the least consistency, and thus the least predictive ability. Meanwhile, four metrics cleared the fabled 0.5000 line–Walk-Rate, K-Rate, and HR-Rate–all of which are defense-independent. This fact supports the
idea that the hitters remain consistent from year-to-year, while much of the
volatility of AVG and, to a lesser extent, OBP and SLG, can be attributed to
the opposing defense. Removing the defense from the equation greatly
increases the predictability of batting statistics, a fact that reinforces
the idea that there is a significant amount of luck involved in AVG. This
finding isn’t really big news, but it’s always nice to reconfirm something
some of us might take for granted.

(As a brief aside, it’s important to clarify what is meant by batting
average being subject to great deal of “luck.” This is not to say that all
major league hitters are equal when it comes to AVG, and the differences
evident between them are entirely random. Rather, players have a
theoretical AVG-ability that varies from player-to-player, but the sample
size of a season is too small to accurately reveal that every year. The
high volatility of AVG from year-to-year–the statistical “noise,” if you will–is
sufficiently large enough to obscure the differences between many major league
hitters of similar ability. The book Curve Ball, by
Jim Albert and Jay Bennett, has some excellent discussion on this topic.)

When looking at pitchers, many of the same constraints were placed on the
data as batters. The minimum playing time for pitchers was set at 50 IP in
any given season. This yielded 2695 sample seasons from 1991-2003.
Statistics considered were, again, entirely rate metrics: starting with the
mainstream ERA and WHIP, and moving on to K/9 (Strikeouts per 9 IP), BB/9,
H/9, HR/9, K/BB, and GB/FB. (Data for GB/FB was only available from 1999
on, yielding a much smaller sample size of 912 seasons.) Let’s see how it turned out:


Metric   R-Squared   Standard Deviation 
ERA       0.1091           1.20 
WHIP      0.1410           0.20 
K/9       0.5627           1.82 
BB/9      0.3413           1.09 
H/9       0.1745           1.45 
HR/9      0.1273           0.41 
K/BB      0.3610           1.00 
GB/FB     0.5591           0.50

If you’re a regular visitor to BP, the fact that ERA is, so far, worse
than any other statistic at maintaining consistency from year-to-year should
be of no surprise. Its volatility is approaching almost total
randomness due to the variety of game events it attempts to take into
account: the official scorer’s decisions, defense, the sequence of events,
and pitcher’s actual ability, just to name a few. Interestingly, WHIP doesn’t
fair quite as well as expected when comparing it to H/9 and BB/9–two
statistics that should map to it rather well since they take into account
two of the three stats used in WHIP. Instead, by combining two inconsistent
statistics, WHIP comes out worse overall. The only two metrics that seem
to have any consistent value are K/9 and GB/FB–once again, statistics that
do not involve the defense.

Considering the fact that much of the blame for the inconsistency of AVG,
ERA, and other statistics has thus far been blamed on the defense, it would
be unfair not to check and see how variable defense is. Measuring defense,
though, is sticky business. It’s best to read the results below with
large grains of salt, constantly reminding yourself that defensive
statistics don’t always reflect the events on the field, and that
defense is inherently a team activity. Adjusting for players switching
positions over the course of the year also threw a wrench into the
works.

The sample group was once again drawn from the same years, but the
caveats included having to accumulate at least 100 innings at any one
position. Further, if players accumulated over 100 innings at more than one
position, those positions were only considered together if they were similar
defensively. For instance, a player who played 200 innings in RF and 200 in
LF had his total defensive line added together; likewise players who played
2B, SS, and 3B. Players moving around between 1B and the outfield were
assigned on the stats from the position they played the most in the
following season. (For example, if a player played 1000 innings at 1B in
2002 and split time between 1B and OF in 2001, only his 1B stats from 2001
were considered. Likewise with catchers and anyone named Craig
Biggio or Chuck Knoblauch.) These conditions
yielded a sample size of 5606 seasons.

The three statistics considered where again rate stats based on the
(rather limited) defensive stats available. First is fielding percentage
(FP, pronounced “Santangelo” if you like) which is Putouts (PO) plus
Assists (A) over Total Chances (TC). Second is Total Chances per 9 Innings
(TC/9), a measure that’s almost the exact same stat as range factor, but
with errors included. Finally, Defensive
Efficiency (DE) was included because it more accurately reflects the
team aspect of defense. Admittedly, this is a very small range of
statistics to consider, but the current crop of available defensive
statistics yields few options and instills limited confidence that the
numbers are an accurate reflection of the events on the field (which, of
course, is the whole point of stats).

 

Metric   R-Squared   Standard Deviation 
FP        0.1183           0.030 
TC/9      0.8056           2.580
DE        0.2767           0.011

While there is little hope for FP, TC/9 looks more impressive than any
statistic sampled thus far. The only drawback to this is the fact that TC/9
doesn’t reveal very much about the actual player involved. It’s at least as
dependent on the GB/FB and handedness of the pitcher or the quality of other
defenders as it is on the ability of the player in question. Its
year-to-year consistency does little more than reveal that balls put into
play, for the most part, are distributed around the field in a consistent
manner from season-to-season. The consistency of Defensive Efficiency falls
towards the middle of the pack when compared with other metrics viewed so
far, but its variance helps explain the high variance of H/9 and ERA, as
expected. It does not, however, explain batting average, since league-wide
DE stays very stable from year to year.

While the idea that defense-independent statistics are steadier than
defense-dependent ones is not a new idea, it’s worthwhile to clarify within
those ranges which ones are the most constant. In the rather simple cases
looked at here, the hierarchy would start with strikeouts, drop slightly to
walks, then to home runs, and finally to anything involving balls in play.
Obviously, there are ways to improve the year-to-year consistency–looking
at more than the immediate previous season, adjusting for age, park, team,
etc.–but for now, when various publications are predicting big things for
this season based on last year’s numbers, remember that things aren’t quite
as consistent as you might expect. That’s why they play the games.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Baseball Prospectus Basics: Statistical Consistency

Thank you for reading

Latest Articles

The Stash List ’24: Week Four $

Box Score Banter: No Exit B

MLU: Triantos Tries on Some Power $

Speed, Spin, and Snap $

Pat Murphy, Wade Miley, and the Ship of Theseus $

James Click

Latest Articles

The Stash List ’24: Week Four $

Box Score Banter: No Exit B

MLU: Triantos Tries on Some Power $