May 1, 2009
Checking the Numbers
On May 28, 2007, Freddy Garcia took the hill for the Phillies, squaring off against the Diamondbacks in a standard, run-of-the-mill game that would ultimately have no bearing on the standings. Nor would it boast any outstanding feats you'd have cause to recall. Quite simply, the meeting served as the perfect example of a nondescript game that lives in the memories of a few avid baseball fans for one minuscule reason or another. Though Garcia proved to be a bust for the Phillies, he pitched very effectively in this particular outing, missing a flurry of bats. By missing bats, I am not referring to the shorthand for fanning a hitter, but rather the literal definition: he induced a lot of swings and misses. A quick glance at the box score shows that Garcia recorded 18 swinging strikes against the Snakes, an impressive tally, and one he had not reached in almost two years. Swinging strikes are rare in major league baseball, especially when compared to the other events capable of occurring on a pitch, and they usually signal some sort of overpowering of the hitter, whether that's with a deceiving off-speed delivery or an ample supply of late movement.
How rare is the swinging strike? Consider that last season Juan Cruz posted the highest swinging-strike rate among those pitchers with 50 or more frames logged. His mark was 15.5 percent. The average swinging-strike rate was considerably lower, 8.6 percent, meaning that in big-league action, it would be perfectly normal for other outcomes to take place on nine of every ten pitches thrown. The highest swinging-strike rate achieved in this decade belongs to Brad Lidge, when he recorded whiffs on 24.2 percent of his pitches during the 2004 season, a mark still short of a quarter of the time. Since 2000, there have been just five instances in which a pitcher exceeded 20 percent with this rate, all of which belong to either Lidge or Eric Gagne. Though the leaderboard in this same span is topped primarily by relievers, adjusting the playing-time qualifier to 120 or more innings places Francisco Liriano's 15.9 percent in 2006 atop all others. Perhaps not at all surprisingly, names like Randy Johnson, Roger Clemens, and Johan Santana appear the most frequently among the starters.
Earlier this year, Matthew Carruth of Fangraphs penned a very interesting article in which the swinging-strike rates were broken down by pitcher, pitch, and batter handedness using Pitch-f/x data. The research concluded that Ryan Madson's changeup, when delivered to same-handed hitters, caused the highest percentage of whiffs, at 36 percent. The Pitch-f/x data set is still only a toddler, making similar inquiries into past years impossible, but Retrosheet has kept track of the pitch breakdowns for quite a while; their data makes possible at least some semblance of this kind of research for previous seasons. With this idea in tow, I first calculated the overall pitch breakdowns for everyone from 2003-08, and further partitioned the data based on batter handedness. Here are the swinging-strike percentages for the four different types of matchups over the last six seasons:
Pitcher Batter Swinging Strike % LHP LHH 9.3 LHP RHH 8.7 RHP LHH 7.9 RHP RHH 9.9
As expected, pitchers perform better against same-handed hitters, but righties were less prone by almost an entire percentage point as far as getting a swing-and-miss against opposite-handed hitters.
Aside from averages, who are the leaders this decade against hitters from each side of the plate? The data below consists of the top swinging-strike rates from pitchers with at least 150 batters faced from a specific side of the plate in this decade. Take note of the much higher percentages from northpaws to same-handed hitters, which makes sense given that this specific matchup boasted the highest average rate. Also of interest is how the top southpaw against right-handed hitters posted a lower rate than the six pitchers shown on the righties vs. left-handed hitters chart, suggesting that the latter matchup featured much more extreme results, with certain righties faring very well, and others missing bats at a Jon Garland-like clip.
LHP-LHH LHP-RHH Swinging Swinging Pitcher Year Strike % Pitcher Year Strike % Randy Johnson 2005 15.4 Johan Santana 2002 16.5 Andy Pettitte 2005 14.8 Randy Johnson 2002 15.9 Scott Olsen 2008 14.7 Billy Wagner 2005 15.8 CC Sabathia 2007 14.6 Francisco Liriano 2006 15.8 Randy Johnson 2004 14.3 Billy Wagner 2002 15.8 CC Sabathia 2008 14.3 Randy Johnson 2001 15.4 RHP-LHH RHP-RHH Swinging Swinging Pitcher Year Strike % Pitcher Year Strike % Brad Lidge 2004 21.6 Brad Lidge 2004 26.8 Eric Gagne 2003 20.4 Eric Gagne 2003 23.8 Eric Gagne 2002 19.3 Eric Gagne 2004 23.2 F. Rodriguez 2000 19.2 Ugueth Urbina 2001 20.5 John Smoltz 2002 18.9 Octavio Dotel 2002 20.4 Eric Gagne 2004 18.6 Antonio Otsuka 2004 20.3
Although the data mining can be very enjoyable—yeah, I'm a nerd—what should be of great interest is whether or not recording swinging strikes is an actual ability. In other words, are these rates consistent from year to year for each pitcher? When testing the relationship between two variables on a multi-year level, an AR(1) Intra-Class Correlation becomes a very valuable statistical test, working the same way as a year-to-year correlation, but incorporating more than just two years of data. For the sake of this study I queried all data from 2003-08, since it is important to avoid very large time frames when performing such tests; comparing a pitcher in his age-27 season to his performance in his age-37 season will naturally produce different numbers based on changes in style and approach, skewing the results. Prior to crunching the numbers I hypothesized that the correlations would be closer to zero, proposing that the swinging-strike rates were more random than a sustainable skill, but what does the actual data say?
Swinging Strike % ICC Overall 0.60 LHP-LHH 0.55 LHP-RHH 0.61 RHP-LHH 0.57 RHP-RHH 0.65
These coefficients work similarly to standard bivariate correlations in the sense that marks closer to 1.0 increase the strength of the relationship. All five of the aforementioned relationships ended up being particularly stable over the last six seasons, defying my hypothesis and suggesting that cajoling barren swings is actually more skill-based than random. The overall swinging-strike rate also shared a moderately strong 0.44 correlation to total percentage of strikes, indicating that just about 20 percent of the variation in total strike percentage, including balls in play, relative to all pitches thrown can be chalked up due to swinging-strike rates. Even though it makes sense for these rates to go hand in hand with foul-ball rates given the lack of solid contact made, their 0.05 correlation portends randomness in the data. Strikeout and walk rates were also particularly independent of swinging-strike rates with low correlation coefficients of their own. Since both of these rates are decidedly stable themselves, and are direct components of the more accurate metrics exhibiting success through controllable skills, it stands to reason that swinging strikes really have no advantage over fouls or called strikes.
At this juncture, it dawned on me that certain pitchers posted lower swinging-strike rates relative to all pitches thrown, but they might actually be much more apt to recording whiffs relative only to themselves, called strikes, and foul balls. If balls and balls in play are removed from the number of pitches thrown, do these correlations change at all?
SS%/K ICC Overall 0.58 LHP-LHH 0.53 LHP-RHH 0.58 RHP-LHH 0.54 RHP-RHH 0.63 SS%/K = Swinging-Strike Rate relative to non-BIP strikes
A minimal drop-off (at best) is observed when comparing the overall swinging-strike rates to the rates relative to non-BIP strikes. Interestingly, these two rates share virtually no relationship with an r of -0.032. In fact, the only significant relationship I found involving the rate per strike included foul balls per strike at -0.38; as the distribution of non-BIP strikes tilted in favor of swings and misses, pitchers experienced a decrease in foul balls. Thinking of the pitches along the lines of non-BIP strikes paves the way for some other intriguing ideas, such as quantifying the term "effectively wild." These types of pitchers are notorious for being all over the place with location, throwing higher percentages of balls, but possessing "stuff" capable of getting the job done.
Matt Clement instantly springs to mind as an example of such a pitcher whose control could not be predicted, and yet for a while the wildness worked to his advantage. Though this is merely a cursory attempt to quantify the aforementioned scouting term, I got to thinking that the effectively wild pitchers would throw strikes, including balls in play, less than half of the time, but would post high marks in the swinging-strike area. Querying for pitchers with at least 120 innings in a season and a strike percentage below 50 percent, sorted by swinging-strike rate relative to non-BIP strikes produced the following list:
Pitcher Year Strike% SS%/K Victor Zambrano 2004 47.8 23.1 Daniel Cabrera 2006 49.1 22.8 Jason Jennings 2003 48.8 21.7 Victor Zambrano 2003 47.2 21.5 Zach Day 2003 48.4 21.2 Jason Jennings 2004 48.4 21.1 Victor Zambrano 2005 49.6 20.3 Jason Jennings 2005 47.2 19.5 Damian Moss 2003 45.6 19.4 Barry Zito 2006 49.2 18.8 SS%/K = Swinging-Strike Rate relative to non-BIP strikes
Swinging-strike rates are a very stable metric, year to year and pitcher to pitcher, but the lack of a noteworthy relationship to any other performance-based metric that makes some sort of logical sense—swinging-strike rates have no relationship with shutouts, no matter what any correlation coefficient argues—makes them much less meaningful. What would be of interest is the underlying root of these fruitless swings, searching for patterns in pitch repertoire and sequencing that leads to the whiffs. This sort of research will not be meaningful until the Pitch-f/x data set is expanded, but it may help us to determine solid ways to attack a hitter as well as potential out-pitches not currently being used in the correct fashion.
For example, it would be very valuable to know if Ryan Madson's changeup to righties displays stability in the swing-and-miss department over a predetermined time span, or if the pitch loses its effectiveness. Swinging and missing at the major league level is very rare, especially on fastballs. That's reflective of the level of talent; a hitter who whiffs too often had better exhibit tremendous ability in some other facet of performance, or else he will not last very long in the big leagues. Due to this kind of rarity, pitchers who miss bats with the greatest of ease are incredibly appealing assets, but the lack of a relationship between this feat and other performance-based metrics suggests that there is no true advantage over those who struggle to elude the whooping sticks.