June 1, 2005
Lies, Damned Lies
Strikeouts and Hitter Projections
Here's a secret: strikeouts are a good thing for a young power hitter.
That quote is mine, from a chat session about four weeks ago. It's pretty tempting to come up with punchy, ill-considered one-liners like that when in the midst of a chat, and indeed the comment triggered plenty of e-mails, as well as a persuasive counter-argument by Rich Lederer. So let me try and explain myself a little bit more thoroughly.
My comment was based on something I discovered when revising PECOTA a couple of years ago. Specifically, I found that when everything else is held equal, higher strikeout rates have a somewhat positive predictive effect on power output. For example, the regression equations I use for PECOTA suggest that--all else being equal--45 extra strikeouts in the previous year are "worth" about one additional home run in the upcoming year. The effect is not enormous, but it's there, and it's one reason why folks like Adam Dunn and Hee Seop Choi and Wily Mo Pena tend to get such favorable PECOTA projections.
Strikeout rates also have an inverse predictive effect on base hits, and consequently on batting average (to be a bit more specific, a player who strikes out more can be expected to have somewhat fewer singles; there is no discernable impact on his predicted rate of doubles or triples). This is to be expected. While hitters' batting averages on balls in play do not exhibit the same strong regression to the mean that pitchers' BABIPs do, hitter BABIP is at least somewhat a matter of luck, and you can't get lucky if you don't put the ball in play.
Finally, all else being equal, strikeouts have a positive predictive effect on walk rate, as both strikeouts and walks tend to result from going deep into the count.
Further complicating things is that these general principles can also have some quirky effects depending on the particular characteristics of the player. For example, PECOTA predicted Albert Pujols, a hitter who does not strike out very much, to have a batting line this year of .334/.419/.633. If we go back in the system and double Pujols' previous strikeout rates, PECOTA instead comes up with a projection of .325/.411/.633. Pujols' expected BA has declined by about 10 points; he makes up for some of this with a 10-point increase in his expected isolated power and a slight increase in his "isolated walk rate," but the overall effect on his value is negative. PECOTA is happy, in other words, that Pujols does not strike out very much.
On the other hand, if we perform a similar exercise for Adam Dunn, we get the opposite result. Dunn, with his actual strikeout rate, had a PECOTA projection of .270/.395/.562. If we cut Dunn's historic strikeout rate in half, PECOTA instead retrieves a projection of .276/.390/.539. The slight increase is batting average is not worth the significant declines in walk rate and isolated power, and Dunn's overall projection is notably worse. PECOTA is happy that Dunn does strikeout often.
This stuff is complex. PECOTA is not just using the regression equations I've discussed above, but also a complicated system of adjustments based on comparable players. In fact, the whole motivation for PECOTA is to identify certain player typologies, and to understand how these player typologies progress over time. A player with the Pujols typology…
…will develop better with a lower strikeout rate, but a player with the Dunn typology …
…would prefer a higher strikeout rate.
What in the hell is going on here?
What I think is going on--bear with me here--is that all great hitters can be categorized more or less into one of two typologies:
Early-Count Hitters: These hitters have extremely quick bats, excellent plate coverage, and will not take many pitches, especially for strikes. They tend to have very high batting averages, moderate-to-strong isolated power, moderate walk rates and low strikeout rates. They also tend to be reasonably good athletes, often playing premium defensive positions. Examples include Vladimir Guerrero, Joe DiMaggio, Derek Jeter and George Brett.
Late-Count Hitters: These hitters have outstanding-pitch recognition skills. Rather than force the issue, they wait for the pitcher to make a mistake with the pitch type or location they find most favorable. These hitters hit for moderate batting averages, strong or very strong isolated power, high walk rates and high strikeout rates. They tend to be big and bulky and slow. Examples include Jim Thome, Mark McGwire and Reggie Jackson.
Typology BA ISO BB Rate K Rate Speed Early-Count Hitters Very High Moderate-High Moderate Low Moderate-High Late-Count Hitters Moderate Very High High High Low-ModerateWhat I think is going on is that the closer a hitter is to one of these idealized typologies, the better he is likely to do. Vladimir Guerrero, for example, doesn't make any "sense" with a high strikeout rate, since his hitting approach involves swinging at pitches that lesser hitters wouldn't dare dream of attacking. If Guerrero wasn't phenomenally good at actually making contact with these pitches, his approach would not work nearly as well. Conversely, a hitter like Jim Thome wouldn't make any sense with a lower strikeout rate. Because Thome does have some holes in his swing, he needs to work the pitcher and the count, until he gets a pitch to his liking, which he will then hit very, very far. High strikeout rates and high walk rates are a necessary consequence of this approach, since he will go deep into so many counts. (I think it's important to note that under this theory, a hitter like Thome isn't "choosing" to take a lot of walks. The walks, rather, are a beneficial side effect of the way in which he finds it most natural to go about getting base hits and home runs).
We might think of Guerrero as a "generalist", who hits lots of pitches pretty well, and Thome as a "specialist", who hits certain pitches extremely well. Of course, there are a very few hitters, like Barry Bonds and perhaps Pujols, who hit lots of pitches extremely well. Most everyone else has to settle.
The reason that I say that high strikeout rates may be a favorable sign for certain types of young hitters is because strikeout rates are an indicator of "count-working" ability. If we had more detailed data on things like called versus swinging strikes, and a hitter's performance on different counts, then we would not need to look at strikeout rates, and it seems unlikely that they would show up as a positive developmental sign in any way, shape, or form. But because we do not have this information on a wide-scale basis, we must use strikeout rate and walk rate as proxies.
One of the perverse consequences of this is that a specialist hitter, who may not hit certain types of pitches or pitches in certain parts of the strike zone especially well (e.g., a hitter "with some holes in his swing") will appear to benefit more from a higher strikeout rate than a generalist type of hitter. Hee Seop Choi must work the count and wait for his pitch because he can't hit certain pitches very well. That he's striking out a lot is an indicator that he is in fact going deep into many counts, which is helpful for him on balance, even though the strikeouts themselves are not favorable outcomes.
One thing it would be fascinating to study is how sensitive these different hitter typologies are to different pitcher typologies. My guess is that a hitter like Guererro will do relatively better against "good" pitchers, but relatively less well against "bad" pitchers, than a hitter like Thome, since Thome excels at hitting mistakes, and Gurerrero excels at hitting non-mistakes.
In any event, this is just the tip of the iceberg, and I hope that it inspires further research, either on my part or someone else's.