Happy Labor Day! Regularly Scheduled Articles Will Resume on Tuesday, September 2.
February 20, 2004
Baseball Prospectus Basics
Statistical ConsistencyFebruary, in the baseball world, is the month of predictions. Every analyst, writer, web site, undefeatable computer program, guy with a beer, and book (some better than others) will spend the next month looking over the offseason wasteland and espousing conclusions. The method behind these processes varies more widely than Johnny Depp's acting roles; some are based purely on numbers, some purely on empirical data, some purely on names, and some purely on nothing. So what can you count on?
For one thing, you can count on me not offering you any spectacular predictions, guaranteed to be more accurate than anything on the market. If you want that, read up on BP's own PECOTA projection system. Instead, the aim will be to lay a basic groundwork for your expectations of the consistency of basic statistics from season to season. Surmising the volatility of various metrics, and their consistency from year-to-year, is the primary goal.
To accomplish this, I'm going to start with batting statistics, which are traditionally more stable than pitching statistics. To reduce outliers and the game's inherent degree of chance, only seasons in which a player accumulated at least 200 ABs will be used. All seasons from 1991 to 2003 were considered, looking particularly for consecutive seasons of sufficient sample size. This process yielded 3066 sample seasons from which to draw data.
The variety of statistics that can be tested is understandably large, but it's important to only use rate statistics such as AVG, OBP, and SLG because the large variance allowed in ABs and PAs. For the purposes of the study, 20 home runs in 300 AB is considered the same as 40 HR in 600 AB, but the difference between 20 and 40 actual home runs is irrelevant. To this end, AVG, OBP, SLG, BB% (Walk Rate, BB/PA), K% (Strikeout Rate), XBA% (Extra-Base Hit Percentage, XBA/H), HR% (Home Run Rate), and ISO (Isolated Power, SLG-AVG) were considered. Each is a rate statistic that reveals information about certain parts of a player's composition at the plate. Looking at the results both individually and in concert will yield some conclusions about year-to-year statistical consistency.
Metric R-Squared Standard Deviation AVG 0.1761 0.031 OBP 0.3820 0.041 SLG 0.4171 0.080 BB % 0.5745 3.520 K % 0.6884 5.230 XBA % 0.4634 8.820 HR % 0.5751 1.730 ISO 0.5510 0.064
Before we get to the results, however, first let's do some house-cleaning. To the far-left we have our offensive metrics, followed by the R-Squared, as well as the Standard Deviation. For the uninitiated, R-Squared is another term for "coefficient of determination"--a measurement of correlation. The higher the R-Squared total, the greater the correlation, and thus, the more consistent the metric. Depending on how it's being used, an R-Squared of below 0.5000 is typically considered too low to justify any sort of predictive value. Standard deviation, meanwhile, is simply a measure of variance--the higher the number, the more volatile the metric.
With that being said, of these metrics, batting average has the least consistency, and thus the least predictive ability. Meanwhile, four metrics cleared the fabled 0.5000 line--Walk-Rate, K-Rate, and HR-Rate--all of which are defense-independent. This fact supports the idea that the hitters remain consistent from year-to-year, while much of the volatility of AVG and, to a lesser extent, OBP and SLG, can be attributed to the opposing defense. Removing the defense from the equation greatly increases the predictability of batting statistics, a fact that reinforces the idea that there is a significant amount of luck involved in AVG. This finding isn't really big news, but it's always nice to reconfirm something some of us might take for granted.
(As a brief aside, it's important to clarify what is meant by batting average being subject to great deal of "luck." This is not to say that all major league hitters are equal when it comes to AVG, and the differences evident between them are entirely random. Rather, players have a theoretical AVG-ability that varies from player-to-player, but the sample size of a season is too small to accurately reveal that every year. The high volatility of AVG from year-to-year--the statistical "noise," if you will--is sufficiently large enough to obscure the differences between many major league hitters of similar ability. The book Curve Ball, by Jim Albert and Jay Bennett, has some excellent discussion on this topic.)
When looking at pitchers, many of the same constraints were placed on the data as batters. The minimum playing time for pitchers was set at 50 IP in any given season. This yielded 2695 sample seasons from 1991-2003. Statistics considered were, again, entirely rate metrics: starting with the mainstream ERA and WHIP, and moving on to K/9 (Strikeouts per 9 IP), BB/9, H/9, HR/9, K/BB, and GB/FB. (Data for GB/FB was only available from 1999 on, yielding a much smaller sample size of 912 seasons.) Let's see how it turned out:
Metric R-Squared Standard Deviation ERA 0.1091 1.20 WHIP 0.1410 0.20 K/9 0.5627 1.82 BB/9 0.3413 1.09 H/9 0.1745 1.45 HR/9 0.1273 0.41 K/BB 0.3610 1.00 GB/FB 0.5591 0.50If you're a regular visitor to BP, the fact that ERA is, so far, worse than any other statistic at maintaining consistency from year-to-year should be of no surprise. Its volatility is approaching almost total randomness due to the variety of game events it attempts to take into account: the official scorer's decisions, defense, the sequence of events, and pitcher's actual ability, just to name a few. Interestingly, WHIP doesn't fair quite as well as expected when comparing it to H/9 and BB/9--two statistics that should map to it rather well since they take into account two of the three stats used in WHIP. Instead, by combining two inconsistent statistics, WHIP comes out worse overall. The only two metrics that seem to have any consistent value are K/9 and GB/FB--once again, statistics that do not involve the defense.
Considering the fact that much of the blame for the inconsistency of AVG, ERA, and other statistics has thus far been blamed on the defense, it would be unfair not to check and see how variable defense is. Measuring defense, though, is sticky business. It's best to read the results below with large grains of salt, constantly reminding yourself that defensive statistics don't always reflect the events on the field, and that defense is inherently a team activity. Adjusting for players switching positions over the course of the year also threw a wrench into the works.
The sample group was once again drawn from the same years, but the
caveats included having to accumulate at least 100 innings at any one
position. Further, if players accumulated over 100 innings at more than one
position, those positions were only considered together if they were similar
defensively. For instance, a player who played 200 innings in RF and 200 in
LF had his total defensive line added together; likewise players who played
2B, SS, and 3B. Players moving around between 1B and the outfield were
assigned on the stats from the position they played the most in the
following season. (For example, if a player played 1000 innings at 1B in
2002 and split time between 1B and OF in 2001, only his 1B stats from 2001
were considered. Likewise with catchers and anyone named
The three statistics considered where again rate stats based on the (rather limited) defensive stats available. First is fielding percentage (FP, pronounced "Santangelo" if you like) which is Putouts (PO) plus Assists (A) over Total Chances (TC). Second is Total Chances per 9 Innings (TC/9), a measure that's almost the exact same stat as range factor, but with errors included. Finally, Defensive Efficiency (DE) was included because it more accurately reflects the team aspect of defense. Admittedly, this is a very small range of statistics to consider, but the current crop of available defensive statistics yields few options and instills limited confidence that the numbers are an accurate reflection of the events on the field (which, of course, is the whole point of stats).
Metric R-Squared Standard Deviation FP 0.1183 0.030 TC/9 0.8056 2.580 DE 0.2767 0.011
While there is little hope for FP, TC/9 looks more impressive than any statistic sampled thus far. The only drawback to this is the fact that TC/9 doesn't reveal very much about the actual player involved. It's at least as dependent on the GB/FB and handedness of the pitcher or the quality of other defenders as it is on the ability of the player in question. Its year-to-year consistency does little more than reveal that balls put into play, for the most part, are distributed around the field in a consistent manner from season-to-season. The consistency of Defensive Efficiency falls towards the middle of the pack when compared with other metrics viewed so far, but its variance helps explain the high variance of H/9 and ERA, as expected. It does not, however, explain batting average, since league-wide DE stays very stable from year to year.
While the idea that defense-independent statistics are steadier than defense-dependent ones is not a new idea, it's worthwhile to clarify within those ranges which ones are the most constant. In the rather simple cases looked at here, the hierarchy would start with strikeouts, drop slightly to walks, then to home runs, and finally to anything involving balls in play. Obviously, there are ways to improve the year-to-year consistency--looking at more than the immediate previous season, adjusting for age, park, team, etc.--but for now, when various publications are predicting big things for this season based on last year's numbers, remember that things aren't quite as consistent as you might expect. That's why they play the games.