One of the aspects of baseball that lends itself so easily to statistics is that
most of the outcomes are very clean and usually binary. A batter either reaches
base or he doesn’t; a ball is a hit or not; even if it is a hit, there are only four
possible degrees of hit. Even more important, the data are–with certain
occasional exceptions–perfectly gathered: An official scorer sits at each game,
carefully noting the outcome of each event in the commonly accepted manner. This runs counter to the application of statistics in the real world, where data can not only be incorrect but outstandingly complex.

With play-by-play data, however, the events we’ve taken for granted as simple
and binary can suddenly become more complex and, if properly applied, a more
accurate reflection of the action on the field. Not entirely unlike J.J. Thomson
and others discovering an internal structure to the atom (but without the massive
scientific and physical impacts on our understanding of the universe), breaking
large, binary blocks of baseball stats into smaller, more descriptive pieces can
yield more information. While this method has largely been applied to more
advanced defensive metrics such as UZR, it can also be applied to events such as
singles, doubles, triples, and, in particular, strikeouts.

On the surface, strikeouts seem to be a very clean statistic, much like walks
and home runs, its cousins in the Three True Outcomes tree of knowledge and
certainty. On the other hand, each of those three stats can be broken up into
smaller pieces: Walks can be intentional, unintentional, or semi-intentional; home
runs can be inside-the-park, opposite field, towering, or line drive; and
strikeouts can be swinging, looking, bunting, or even dropped. In the box score,
they all look the same–a line-drive home run counts the same as blasts of Ruthian
proportions. But being able to break strikeouts into separate categories may yield
additional insight into both player approaches on the mound and at the plate as
well as predictive value about players who may be under- or over-performing
reasonable expectations.

With that in mind, let’s take a closer look at every pitcher’s
friend–the whiff. In 2004, 73.1% of strikeouts were swinging (either
complete whiffs or foul tips), 26.3% were looking, and the remaining 0.6% was
either missed bunt attempts or foul bunts. This data gives us a handy baseline for
seeing who’s above and below average when it comes to types of strikeouts. Let’s check out the leaders in 2004 (minimum 50
strikeouts) to get an idea of what kind of pitchers inhabit both ends of the

Pitcher         Year  Swinging  Looking  Total  Swing_Perc  SO/PA
-------         ----  --------  -------  -----  ----------  -----
Dave Burba      2004     94        4      50      95.9%     15.3%
Mike Wood       2004     50        4      54      92.6%     12.5%
Esteban Yan     2004     63        6      69      91.3%     18.2%
Guillermo Mota  2004    150       16      85      90.4%     21.6%
Brad Lidge      2004    140       16     157      89.7%     42.5%
Luis Vizcaino   2004     56        7      63      88.9%     21.1%
Salomon Torres  2004     54        7      62      88.5%     16.3%
Danny Baez      2004     46        6      52      88.5%     17.6%
Jon Lieber      2004     89       12     102      88.1%     13.6%
Brad Radke      2004    125       18     143      87.4%     15.9%
Ismael Valdez   2004     78       56      67      58.2%      8.9%
Carlos Silva    2004     44       32      76      57.9%      8.7%
Darrell May     2004     69       51     120      57.5%     14.4%
Woody Williams  2004     73       55     131      57.0%     16.0%
Scot Shields    2004     62       47     109      56.9%     24.0%
Jeff Weaver     2004     84       67     153      55.6%     16.4%
Esteban Loaiza  2004    126      108     117      53.8%     14.3%
Chad Cordero    2004     44       39      83      53.0%     23.2%
Dave Weathers   2004     93       90      61      50.8%     17.1%
Jaret Wright    2004     70       87     159      44.6%     20.4%

The top group–those who cause the most swings and misses–looks mostly like a
pretty hard-throwing, walk-stingy group with a couple exceptions. The bottom group is
a slightly different brand of pitcher–not as many closers and not as many players
with a reputation of missing bats. Interestingly for Yankee fans, the top group
includes discarded rotation member Jon Lieber; newly acquired Jaret Wright leads the bottom group by a wide margin. These two players frame the next natural question stemming from breaking strikeouts into sub-categories: By looking at one type of strikeout or another, could the
Yanks have seen Wright’s disappointing (and brief) performance coming?

To check it out, let’s first see how consistent something like the percentage of
strikeouts swinging (S% for short) is from year to year. Unfortunately, reliable
play-by-play data doesn’t always include accurate pitch-by-pitch information, so
we’ll only have 2003-2005 data to use. Obviously, 2005 is far too young to use
when determining the consistency of a stat from year to year, so we’ll have to
settle for two consecutive seasons of data as a first pass. As more accurate data
going backwards is available, we’ll be able to add more confidence to these
findings, but with limited data, the r-squared from 2003 to 2004 of S% is .3022.
That’s not entirely insignificant, falling just below stats like BB/9 and OBP in
terms of statistical consistency.

Given that S% is somewhat consistent from year to year, perhaps it could help us
predict an imminent change in K/PA. It’s certainly possible that some
pitchers appear to keep up their K/PA rate–a critical stat for predicting pitcher
success–with a few more favorable umpire calls on third strikes rather than
missing bats. To see if that’s the case, a multivariable regression using each
player’s K/PA and S% in 2003 against K/PA in 2004 should give us an idea if that’s
the case.

Unfortunately, the previous year’s K/PA dominates S% in the regression analysis,
accounting for 59.67% of the variation while S% manages only a meager 2.49%. It’s
not quite Royals-Yankees or Koror-Ulong, but it’s close. Given the dominating
determining factor of the previous year’s K/PA rate in predicting K/PA, S% doesn’t
yield any significant predictive value when looking for an edge in predicting
pitcher breakout or decline in terms of K/PA. It’s certainly possible that with
more years of data available, a more discernable trend could be found, but with
regards to predicting K/PA changes the following season, a whiff is a whiff no
matter how you can get it.