February, in the baseball world, is the month of predictions. Every

analyst, writer, web site, undefeatable computer program, guy with a beer,

and book (some better than

others) will spend the next month looking over the offseason wasteland

and espousing conclusions. The method behind these processes varies more

widely than Johnny Depp’s acting roles; some are based purely on numbers,

some purely on empirical data, some purely on names, and some purely on

nothing. So what can you count on?

For one thing, you can count on me not offering you any spectacular

predictions, guaranteed to be more accurate than anything on the market. If

you want that, read up on BP’s own PECOTA projection system. Instead, the aim will be to lay a basic groundwork for your

expectations of the consistency of basic statistics from season to season.

Surmising the volatility of various metrics, and their consistency from year-to-year, is the primary goal.

To accomplish this, I’m going to start with batting statistics, which are traditionally more stable than pitching statistics. To reduce outliers and

the game’s inherent degree of chance, only seasons in which a player

accumulated at least 200 ABs will be used. All seasons from 1991 to 2003

were considered, looking particularly for consecutive seasons of sufficient

sample size. This process yielded 3066 sample seasons from which to draw

data.

The variety of statistics that can be tested is understandably large, but

it’s important to only use rate statistics such as AVG, OBP, and SLG because

the large variance allowed in ABs and PAs. For the purposes of the study,

20 home runs in 300 AB is considered the same as 40 HR in 600 AB, but the

difference between 20 and 40 actual home runs is irrelevant. To this end,

AVG, OBP, SLG, BB% (Walk Rate, BB/PA), K% (Strikeout Rate), XBA% (Extra-Base

Hit Percentage, XBA/H), HR% (Home Run Rate), and ISO (Isolated Power,

SLG-AVG) were considered. Each is a rate statistic that reveals information

about certain parts of a player’s composition at the plate. Looking at the

results both individually and in concert will yield some conclusions about

year-to-year statistical consistency.

Metric R-Squared Standard Deviation AVG 0.1761 0.031 OBP 0.3820 0.041 SLG 0.4171 0.080 BB % 0.5745 3.520 K % 0.6884 5.230 XBA % 0.4634 8.820 HR % 0.5751 1.730 ISO 0.5510 0.064

Before we get to the results, however, first let’s do some house-cleaning. To the far-left we have our offensive metrics, followed by the R-Squared, as well as the Standard Deviation. For the uninitiated, R-Squared is another term for “coefficient of determination”–a measurement of correlation. The higher the R-Squared total, the greater the correlation, and thus, the more consistent the metric. Depending on how it’s being used, an R-Squared of below 0.5000 is typically considered too low to justify any sort of predictive value. Standard deviation, meanwhile, is simply a measure of variance–the higher the number, the more volatile the metric.

With that being said, of these metrics, batting average has the least consistency, and thus the least predictive ability. Meanwhile, four metrics cleared the fabled 0.5000 line–Walk-Rate, K-Rate, and HR-Rate–all of which are defense-independent. This fact supports the

idea that the hitters remain consistent from year-to-year, while much of the

volatility of AVG and, to a lesser extent, OBP and SLG, can be attributed to

the opposing defense. Removing the defense from the equation greatly

increases the predictability of batting statistics, a fact that reinforces

the idea that there is a significant amount of luck involved in AVG. This

finding isn’t really big news, but it’s always nice to reconfirm something

some of us might take for granted.

(As a brief aside, it’s important to clarify what is meant by batting

average being subject to great deal of “luck.” This is not to say that all

major league hitters are equal when it comes to AVG, and the differences

evident between them are entirely random. Rather, players have a

theoretical AVG-ability that varies from player-to-player, but the sample

size of a season is too small to accurately reveal that every year. The

high volatility of AVG from year-to-year–the statistical “noise,” if you will–is

sufficiently large enough to obscure the differences between many major league

hitters of similar ability. The book *Curve Ball*, by

Jim Albert and Jay Bennett, has some excellent discussion on this topic.)

When looking at pitchers, many of the same constraints were placed on the

data as batters. The minimum playing time for pitchers was set at 50 IP in

any given season. This yielded 2695 sample seasons from 1991-2003.

Statistics considered were, again, entirely rate metrics: starting with the

mainstream ERA and WHIP, and moving on to K/9 (Strikeouts per 9 IP), BB/9,

H/9, HR/9, K/BB, and GB/FB. (Data for GB/FB was only available from 1999

on, yielding a much smaller sample size of 912 seasons.) Let’s see how it turned out:

Metric R-Squared Standard Deviation ERA 0.1091 1.20 WHIP 0.1410 0.20 K/9 0.5627 1.82 BB/9 0.3413 1.09 H/9 0.1745 1.45 HR/9 0.1273 0.41 K/BB 0.3610 1.00 GB/FB 0.5591 0.50

If you’re a regular visitor to BP, the fact that ERA is, so far, worse

than any other statistic at maintaining consistency from year-to-year should

be of no surprise. Its volatility is approaching almost total

randomness due to the variety of game events it attempts to take into

account: the official scorer’s decisions, defense, the sequence of events,

and pitcher’s actual ability, just to name a few. Interestingly, WHIP doesn’t

fair quite as well as expected when comparing it to H/9 and BB/9–two

statistics that should map to it rather well since they take into account

two of the three stats used in WHIP. Instead, by combining two inconsistent

statistics, WHIP comes out worse overall. The only two metrics that seem

to have any consistent value are K/9 and GB/FB–once again, statistics that

do not involve the defense.

Considering the fact that much of the blame for the inconsistency of AVG,

ERA, and other statistics has thus far been blamed on the defense, it would

be unfair not to check and see how variable defense is. Measuring defense,

though, is sticky business. It’s best to read the results below with

large grains of salt, constantly reminding yourself that defensive

statistics don’t always reflect the events on the field, and that

defense is inherently a team activity. Adjusting for players switching

positions over the course of the year also threw a wrench into the

works.

The sample group was once again drawn from the same years, but the

caveats included having to accumulate at least 100 innings at any one

position. Further, if players accumulated over 100 innings at more than one

position, those positions were only considered together if they were similar

defensively. For instance, a player who played 200 innings in RF and 200 in

LF had his total defensive line added together; likewise players who played

2B, SS, and 3B. Players moving around between 1B and the outfield were

assigned on the stats from the position they played the most in the

following season. (For example, if a player played 1000 innings at 1B in

2002 and split time between 1B and OF in 2001, only his 1B stats from 2001

were considered. Likewise with catchers and anyone named

Biggio

yielded a sample size of 5606 seasons.

The three statistics considered where again rate stats based on the

(rather limited) defensive stats available. First is fielding percentage

(FP, pronounced “Santangelo” if you like) which is Putouts (PO) plus

Assists (A) over Total Chances (TC). Second is Total Chances per 9 Innings

(TC/9), a measure that’s almost the exact same stat as range factor, but

with errors included. Finally, Defensive

Efficiency (DE) was included because it more accurately reflects the

team aspect of defense. Admittedly, this is a very small range of

statistics to consider, but the current crop of available defensive

statistics yields few options and instills limited confidence that the

numbers are an accurate reflection of the events on the field (which, of

course, is the whole point of stats).

Metric R-Squared Standard Deviation FP 0.1183 0.030 TC/9 0.8056 2.580 DE 0.2767 0.011

While there is little hope for FP, TC/9 looks more impressive than any

statistic sampled thus far. The only drawback to this is the fact that TC/9

doesn’t reveal very much about the actual player involved. It’s at least as

dependent on the GB/FB and handedness of the pitcher or the quality of other

defenders as it is on the ability of the player in question. Its

year-to-year consistency does little more than reveal that balls put into

play, for the most part, are distributed around the field in a consistent

manner from season-to-season. The consistency of Defensive Efficiency falls

towards the middle of the pack when compared with other metrics viewed so

far, but its variance helps explain the high variance of H/9 and ERA, as

expected. It does not, however, explain batting average, since league-wide

DE stays very stable from year to year.

While the idea that defense-independent statistics are steadier than

defense-dependent ones is not a new idea, it’s worthwhile to clarify within

those ranges which ones are the most constant. In the rather simple cases

looked at here, the hierarchy would start with strikeouts, drop slightly to

walks, then to home runs, and finally to anything involving balls in play.

Obviously, there are ways to improve the year-to-year consistency–looking

at more than the immediate previous season, adjusting for age, park, team,

etc.–but for now, when various publications are predicting big things for

this season based on last year’s numbers, remember that things aren’t quite

as consistent as you might expect. That’s why they play the games.