BP Comment Quick Links

Happy Thanksgiving! Regularly Scheduled Articles Will Resume Monday, December 1


April 8, 2005 Averages and ExtremesGleaning Information from Performance Spikes
Batting Average gets no love anymore. Three decades of sabermetric analysis has diminished the onceproud Triple Crown stat through bites (onbase percentage) and nibbles (batting average on balls in play). It's enough to drive any selfrespecting freeswinger to distraction. But batting average is important. In 2004, hits accounted for more than 70% of the onbase events in onbase percentage across the majors. The first base of each hit accounted for more than 60% of total bases in the majors, a key to slugging average. A single is more valuable than a walk, because it carries runneradvancement potential the walk mostly lacks. So a better understanding of batting average seems like useful knowledge to have. Batting average is a skill. Being able to choose to swing at a ball, put it in play and successfully reach first base before being thrown out is an ability which some players possess at higher level than others. Of course, it's also a famously volatile stat. Being a skill, if in a single season a player shows a large deviation from his previouslyestablished level of hitting, then you'd expect that the next year he'd likely see a regression to that established level, wouldn't you? This was the theory behind my analysis of Pat Burrell's poor 2003 season. In that, I used constraints of looking only at players who saw a big drop in batting average while their patience and isolated power remained about the same. Why not look at a much larger pool of players, without that constraint, and see how they performed? Using stats from the Baseball Archive, I looked at players who logged at least 300 plate appearances for three consecutive years (to try to filter out the noise of very small sample sizes). The first two years form the "baseline," while the third is the candidate's "spike" year (S). The spike year was considered if the player's batting average (AVG) in that year was above the higher of the AVG in the two baseline years, or was below the lower. I used translated batting averages for every year starting with 1946. This yielded 6550 data points. For each spike year S I plotted it against their regression year (S+1), without putting a PA requirement on the S+1 season. The spike year was measured in the magnitude of the spike, in AVG points, while the regression year was measured in the magnitude of the change from the spike year. So if a player dropped 50 points of AVG in one year, but gained 30 of it back the next year, his S value is 50 and his S+1 value is +30. Of our sample, 2237 players (34%) fell between the two baseline numbers and so have S values of 0 (zero). Everyone else had a spikeif only a very small oneoutside their baseline range. The data points plot this chart:
Regression to the mean is a powerful thing. According to the trend line, players whose batting average dropped from their baseline regained about 60% of it the next year. Players whose batting average increased lost about 80% of their gains. (For those keeping score at home, the single biggest oneyear drop in the set is one of the Dave Robertses. In 1974 with San Diego, Roberts lost more than 100 points of AVG from his baseline, then regained more than 120 points the next year, albeit in just 132 PA. The guy at the other extreme is more heralded: Andres Galarraga made a run at a .400 average in 1993 with Colorado, gained about 90 points in BA then gave back about 2/3 of it the next year.) What's going on here is a combination of notsomysterious things:
The table suggests that luck is a very strong factor in the performance of batting average, which explains its volatility. But the indication that players with an upward spike tend to lose more of their gains than players with a downward spike recoup their losses suggests that the improvement/decline elements also play a role. In particular, we could infer that skills decline is a more likely occurrence than skills improvement. (This baseball game must be a tough one to master!) Let's look more closely at the characteristics of the data set. First, what really constitutes a "large" spike? It's one which is sufficiently rare to be interesting. Players experience 5 or 10point changes in their batting average yearoveryear all the time; indeed, it's unusual when a player doesn't see at least a small change. The following tables break down the tails of the chart, showing population of the tail for a spike of a given size or greater, and the percentage of players in that tail who continued to improve after an upward spike, or decline after a downward spike. This gauges for a spike of a given magnitude how likely a player is to regress towards the mean the next year: Min Down % Further Amt (S) Decline (S+1) #Players 5 34.4% 2054 10 31.1% 1668 15 28.1% 1321 20 24.7% 1041 25 21.5% 808 30 20.6% 613 35 20.0% 450 40 16.5% 328 45 15.7% 230 50 12.6% 151 Min Up % Further Amt (S) Improve (S+1) #Players +5 20.6% 1655 +10 17.6% 1323 +15 16.6% 1024 +20 14.8% 777 +25 12.2% 572 +30 10.3% 416 +35 10.5% 293 +40 8.0% 199 +45 7.5% 132 +50 7.9% 88The 30point spikes seem like an interesting dividing line: About one player in five with a down spike of 30 points or more will continue to decline, while only 1 in 10 with a 30+ point spike upwards will continue to improve. 40 and 50point spikes show an increasing trend in this regard, albeit with a much smaller sample set to work with. But these tables provide a continuum of performance trends which can be used to study players who displayed a batting average spike in a given year. The population distribution suggests another avenue: The 6550 data points in our set work out to about 4.8 players per teamyear. (In other words, in a season a team is likely to have just under five players on it who have collected at least 300 PA in each of the last two seasons.) Of these players, about 1 in 6 will have a 30point spike (or more, and either up or down), about 1 in 12 will have a 40point spike (and will be a subset of the first set), and about 1 in 27 will have a big 50point spike. So with the current 30 majorleague teams, this means that a season will have about 152 players who meet the PA requirement. About 25 of them should have had 30point spikes, 13 of them 40point spikes, and four or five should have had big 50point spikes. So let's look at the 2003 season's battingaverage spikes, and see how those players did in 2004: PLAYER S Age S (2003) S+1 (2004) S+1 PA Jason Giambi/NYA 32 69 42 322 David Bell/PHI 30 67 +92 603 Larry Walker/COL 36 51 +13 316 Pat Burrell/PHI 26 49 +44 534 Mark McLemore/SEA 38 46 +12 295 Paul Konerko/CHA 27 46 +36 435 Bernie Williams/NYA 34 44 1 651 Bobby Higginson/DET 32 42 +9 531 Craig Counsell/ARI 32 42 +12 545 Tony Womack/ARICOLCHN 33 41 +86 606 Kevin Millar/BOS 31 41 +19 588 Todd Zeile/NYAMON 37 40 +6 396 John Olerud/SEA 34 38 12 500 Ryan Klesko/SDN 32 37 +37 480 Eric Young/MILSFN 36 33 +28 402 Desi Relaford/KCA 29 32 24 430 Todd Helton/COL 29 +30 21 683 Ben Molina/ANA 28 +31 13 363 Albert Pujols/SLN 23 +33 32 692 Kenny Lofton/PITCHN 36 +33 21 313 Mark Grudzielanek/CHN 33 +34 14 278 Richard Hidalgo/HOU 28 +34 70 578 Matt Stairs/PIT 35 +39 24 496 Mike Young/TEX 26 +42 +2 739 Jason Kendall/PIT 29 +45 7 658 Javy Lopez/ATL 32 +64 20 638 Melvin Mora/BAL 31 +66 +13 63627 30+ spikes, 15 40+ spikes and five 50+ spikes, a little on the high side, but not far off. More down spikes than up spikes, as expected. Downspike players were a little more likely to continue going down than upspike players were likely to keep going up. The down spike players tended to be in their 30s, while the upspike players tended to be in their (late) 20s. This brings us to the matter of age. To come clean, there is an age bias in this study, because in order to qualify for the study a player has to have three years of semiregular majorleague service. The median age at which a player logs his first 300PA season is 25, so at least half the players wouldn't even have a chance to be part of the study before age 27. By contrast, many old players will qualify near the end of their careers. With that caveat, the following graphs show the age distribution of players in the data set with spikes of at least 30 points:
Despite the bias against young players due to the PA requirement, upspike players are still a little younger than downspike players, as a group. Ages 23 and 24 seem somewhat more likely to produce an up spike than a down spike, while ages 33 and 34 are the reverse. But otherwise there isn't a big difference between up spikes and down spikes based on age. What about the impact of age on a player's regression in his S+1 year? These next two charts show the average regression by age of players with at least a 30point spike, where the regression is expressed as a percentage of the spike. So if a player's up spike was +50 and his regression the next year was 30, then his regression percentage was 60%. As a cutoff, I excluded ages for which there were fewer than 10 players (which I admit is somewhat arbitrary). For visual consistency between the two charts, the regression from upspikes is expressed as a negative percentage.
Here I think we've found something: the older a player is, the less likely he is to bounce back from a big down spike, and the more likely he is to come back to Earth after a big up spike. In particular, once a player enters his 30s, a big down spike may indicate a significant skills erosionhe's likely to regain only about half of his losseswhile he's also unlikely to maintain more than a small fraction of gains from a big up spike. As I said earlier, many of these behaviors are not very surprising: We expect hitting statistics to regress to the mean, and for older players to struggle more than younger players But it's also valuable to have an understanding of how likely a player is to regress, and by how much. With that in mind, I'll close out with your class of 2004 30point batting average spikes: PLAYER S Age S (2004) Chipper Jones/ATL 32 62 Scott Spiezio/SEA 31 52 Jacque Jones/MIN 29 52 Geoff Blum/TBA 31 44 Jason Giambi/NYA 33 42 AJ Pierzynski/SFN 27 39 Juan Encarnacion/LANFLA 28 36 Bret Boone/SEA 35 35 Luis Gonzalez/ARI 36 34 Cliff Floyd/NYN 31 33 Sammy Sosa/CHN 35 33 Doug Mientkiewicz/MINBOS 30 32 Jay Payton/SDN 31 31 Edgar Renteria/SLN 28 30 Timo Perez/CHA 29 30 Juan Uribe/CHA 24 +32 Cesar Izturis/LAN 24 +33 Tony Womack/SLN 34 +36 Sean Casey/CIN 29 +38 Aramis Ramirez/CHN 26 +38 Carlos Guillen/DET 28 +41 Ichiro!/SEA 30 +44 JT Snow/SFN 36 +48 Jack Wilson/PIT 26 +51 Terrence Long/SDN 28 +51 Adrian Beltre/LAN 25 +70 Brandon Inge/DET 27 +77Inge and Beltre rank seventh and 10th in up spikes since 1946 by my numbers. I don't expect them to repeat their lofty performances in 2005. Beltre was one of the big freeagent signings this past offseason, and even if he retains half of his spike his AVG will fall from over .330 to under .300 (his career high before 2004 was .290). That probably isn't what the Mariners thought they were paying for when they signed him. Caveat Emptor. Michael Rawdon is a programmer and happy Red Sox fan who lives in Silicon Valley, CA. His cats have remained strangely indifferent to the Red Sox championship, possibly because they're from Wisconsin. He can be reached here. 0 comments have been left for this article.
