Checking the Numbers: Whiffing the Pitcher, Part Two

Earlier this month we took a look at pitcher-on-pitcher violence, computing the rates at which hurlers fan their counterparts and seeing how the results compared to overall strikeout rates. Two stats were introduced: PW%, which divides pitcher whiffs by pitcher plate appearances, and PWO%, the rate of whiffing pitchers as a measure of the overall tally. The league averages from 1979-2008 for those with 20 or more pitcher plate appearances against, were 31.4 percent for PW% and 14.5 percent for PWO%, respectively. The hypothesis presented at the outset of that article was eventually confirmed in that, when compared to a control group, pitchers exceeding the league average PWO% in one year experienced a more substantial decline in K/9 in the subsequent season. Pitchers remaining in the same league from one year to the next experienced an aggregate decline of 0.05 of their K/9, with those making the AL to NL conversion gaining 0.25 whiffs per nine, and senior circuit tossers heading to the tougher league dropping 0.31 whiffs per nine.

When the PWO% increased, the declines for pitchers remaining in the National League followed suit to the tune of .20-.40 points off of their strikeout rates. The results suggested that teams should do well to avoid pitchers posting PW% marks greater than 31.4 percent and PWO% rates in excess of 14.5 percent, as their strikeout success was contingent upon feasting on these, the feeblest of all hitters. This week, the goal shifts to investigating a few aspects left untouched, such as how BABIP factors into the mix, aggregate leaders and trailers over the past 30 years, and whether a regression equation can produce a lower root mean square error as a predictor of K/9 in Year X+1 than a few other commonly used methods.

BABIP and Pitcher Whiffs

The idea of researching how the batting average on balls in play relates to pitchers cannibalizing their brethren is the byproduct of a conversation with Matt Swartz, wherein we theorized that pitchers with low BABIPs would face pitchers with men on less often. In turn, this would decrease the likelihood of a sac-bunt attempt and increase the probability of a strikeout in the process. After all, pitchers fan pitchers about 31.4 percent of the time on average, with the bulk of the other plate appearances involving sacrifices. Remove the bunt attempts, and that league-average strikeout rate rises, lending plenty of credence to the theory. Essentially, the luck and peripherals would follow one another in this case, as lower BABIPs are often deemed fluky, yet the inherent luck would add to the strikeout chances.

Taking all pitchers with 20 or more Pitcher PAs in a season from 1979-2008, here are the BABIPs for the corresponding PW%, the rate of P-SO out of P-PA:


BABIP Against, by PW% Intervals

  PW%      BABIP
<= 0.20    .289
<= 0.30    .289
>= 0.31    .288
>= 0.40    .288
>= 0.50    .287

And how about if the data is reverse-engineered, instead searching for the average PW% based on different BABIP filters? Again, keep in mind that the league-average PW% is .314:


PW% by BABIP Intervals

  BABIP       PW%
<= 0.270    .317
<= 0.280    .317
<= 0.290    .317
>= 0.291    .315
>= 0.300    .312

The data does technically bear out the idea that a greater propensity for fanning pitchers goes hand in hand with lower BABIPs, however the discrepancies are minuscule before any type of statistical scrutiny that may determine the differences statistically insignificant. T-Tests compare the discrepancies between means of two sets of data, but with five different groupings in the sample, an ANOVA-Analysis of Variance-will do its best Lenny Harris impression and pinch-hit admirably. Much like the AR(1) Intra-Class Correlation used so frequently in this space to measure stability over a span larger than two years, the ANOVA compares the means for more than two sets of numbers.

The analogy of the two tests is not perfect, but it works in the sense that one form provides a measure on a straight one-to-one level, while the others allow for more multi-tiered analysis. When running an ANOVA, the reported result of interest is the p-value, or level of significance, which we want to be 0.05 or below. Through this specific run, the resulting p-value soared way above that interval, indicating that the BABIPs and PW% means presented above, likely deemed no different from one another from nothing more than a naked glance, are not statistically different. The underlying theory continues to hold validity and makes intuitive sense, but is not evident in the more advanced test.

Overall Leaders and Trailers

Following the publication of the first part of this article, which featured leaders and trailers for single season PW and PWO marks, I received several requests for the same sort of data, applied over the entire 30-year span in the sample. Amongst pitchers who faced their counterparts at least 100 times overall from 1979-2008, continuing to satisfy the 20+ Pitcher PA single-season criteria, here are the leaders and trailers in PW%:


                PA vs    K vs                            PA vs   K vs
Pitcher          Ps       Ps    PW%    Pitcher            Ps      Ps     PW%
Randy Johnson    418     217   .519    Charlie Hough     118      15    .127
Bruce Chen       113      58   .513    Kent Bottenfield  153      24    .157
Sid Fernandez    470     240   .511    Allen Watson      100      16    .160
Mark Prior       181      87   .481    Scott Karl        133      23    .173
Wandy Rodriguez  142      68   .479    Kevin Ritz        181      32    .177

And for PWO%:


Player            K   K vs P   PW%    Pitcher             K   K vs P   PW%
Dick Ruthven     519   117    .225    Charlie Hough      202    15    .074
Mike Harkey      161    36    .224    Doyle Alexander    253    20    .079
Dan Schatzeder   233    52    .223    Kent Bottenfield   285    24    .084
Chris Holt       251    56    .223    David Weathers     226    20    .088
Aaron Cook       260    57    .219    Allen Watson       172    16    .093

Randy Johnson struck out over half of the pitchers he faced in the years in which he faced at least 20 pitchers, but whiffed so many position players that his PWO% registered a lower-than-average 9.7 percent. Relating back to the prior article on the subject, of interest to either fantasy teams or teams in general looking to acquire a player, the red flags should wave for pitchers faring worse than the league in both metrics; if only one could be chosen, PWO% provides more utility, in that it serves to educate those interested in strikeout-padding pitchers.

When the pitchers and position players are separated, the small sample that the pitchers comprise becomes much clearer, especially when placed under the microscope of an ICC. The overall K/9 produced a 0.76 level of stability, with whiffs of solely position players at 0.75. Suffice to say, at 0.40, pitcher whiffs are moderately stable but of such a minute relative sample that only the extremes are really going to affect valuation methods. This does not mean that applying the PW and PWO analysis in the future becomes irrelevant, but rather that the results will be negligible outside of the outliers. Unfortunately, it does make the next step a bit less valid, even if the results themselves are interesting.

PW%, RMSE, and Regression

Knowing what we do with regards to the effects of pitcher whiffs on the overall strikeout rate, as well as the vast decline in the following season for those deviating from the mean, I decided to test out the predictive accuracy of three different measures. The goal involves finding the most accurate predictor of the strikeout rate in a subsequent season. The first method uses the straight, unadjusted K/9 in Year X to predict K/9 in Year X+1. The second method adjusts the Year X strikeout rate to incorporate the normal league to league drops or gains-as mentioned in the beginning, -0.05 for those remaining in the same league, 0.25 from AL to NL, and -0.31 from NL to AL-as well as a bit of aging. Lastly, the third method runs a linear regression on Year X+1 strikeout rate using the K/9 in Year X and PW% as dependent variables.

The regression produced the formula: K/9 in YR X+1 = 1.413 + (.762*K/9 in YR X) + (-.121*PW%). As a cherry-picked example, Joaquin Andujar struck out 4.22 batters per nine innings in the 1981 season, with a lower than average PW% of .272. The regression predicted a 4.64 K/9 in 1982 that fared favorably to the actual 4.60 rate. His rate of whiffing pitchers was expected to rise in the following season, a facet of the projection that the straight comparison and age/standard adjustments ignored. Not everyone worked as well as Joaquin Andujar did here, but perhaps the predictive accuracy of this formula could best the other two methods in this area. As previously mentioned, the smaller sample of pitcher whiff data precludes this from being concrete or definitive in utility, and with a p-value right around 0.45 for the PW% coefficient, it is not statistically significant. However, the root mean square error comparative results remain very interesting, and perhaps a foreshadowing of what will surface with a larger sample.

Root mean square error may sound intimidating, but it’s fairly easy to calculate and is one of the best statistical measures of predictive accuracy around, basically measuring the average discrepancy between actual data and predicted data. To calculate the RMSE, begin by computing the error, which is actual minus predicted. Square the results so any negatives become positive, find the mean, and take the square root. The three methods proposed above produced RMSEs of 1.13, 1.08 and 0.97 respectively, and since lower is better in this case, the regression won out. When the sample of this data grows, more significance may occur, transforming this two-part theory from anecdotal and interesting into a serious measure worthy of being incorporated into projections.

Marrying the Parts

This entire study began as nothing more than a fun exercise in seeing which pitchers padded their stats by taking advantage of less capable hitters. Though the results and lack of statistical significance in certain areas may bring skepticism out of the woodwork, the underlying theory remains valid, and most of the findings support the initial hypothesis in one form or another, even if more data is required to feel confident and comfortable in reporting. Pitchers drastically deviating from the mean in terms of striking out opposing pitchers as a function of pitcher plate appearances and/or overall strikeouts should be expected to regress the following season, as the ICC indicated that pitcher whiffs are more random than skill, especially when stacked up against the inherent stability of overall strikeout rates.

In the end, all of this may only amount to a useful fantasy tip, but even though the aforementioned measures lacked statistical significance, the original idea and theory make enough sense that pitcher whiffs deserve to be evaluated on some front, especially when making decisions about bringing in pitchers from the opposite league. American League teams should be very wary of senior circuit pitchers with consistently high PWO% numbers, and National League teams holding onto their pitchers should be aware that certain strikeout rates might not hold up from year to year if a higher than average percentage of pitchers comprise the overall total. Then again, all of this may be for naught, but the idea makes enough sense that it becomes worthy of pursuing to some extent moving forward.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

You need to be logged in to comment. Login or Subscribe

BurrRutledge

9/18

Eric, this is fascinating regardless of the results of the ANOVA. Great job!

As a suggestion, it would be very useful to this fantasy player to see how your findings compare to what PECOTA foresees for 2010, for all the pitchers who you'd red-flag based on the PW% and PWO% filter you've identified.

Thanks again!

Reply to BurrRutledge

EJSeidman

Burr, when it comes out next year, I'll definitely run through the 2009 data and point out guys who may experience regression due to the PW and PWO marks.

Reply to EJSeidman

adenzeno

9/22

One of the records at which I have alwys marveled(and did at the time) is Nolan Ryan's 383 Ks in 1973. First year of the DH so he did not have any pitchers to strike out. Of course he was able to stay in some games due to the DH(no pinch hitter), but STILL that is impressive!

Reply to adenzeno

Checking the Numbers: Whiffing the Pitcher, Part Two

Thank you for reading

Latest Articles

BSB: Jared Jones Strikes Up a Conversation B

What They’re Saying: Orion Kerkering and Spencer Turnbull Edition $

The Stash List ’24: Week Three $

MLU: Leave it to Beavers $

A Really Bad Day at the Office $

Eric Seidman

Latest Articles

BSB: Jared Jones Strikes Up a Conversation B

What They’re Saying: Orion Kerkering and Spencer Turnbull Edition $

The Stash List ’24: Week Three $