September 18, 2009
Checking the Numbers
Whiffing the Pitcher, Part Two
Earlier this month we took a look at pitcher-on-pitcher violence, computing the rates at which hurlers fan their counterparts and seeing how the results compared to overall strikeout rates. Two stats were introduced: PW%, which divides pitcher whiffs by pitcher plate appearances, and PWO%, the rate of whiffing pitchers as a measure of the overall tally. The league averages from 1979-2008 for those with 20 or more pitcher plate appearances against, were 31.4 percent for PW% and 14.5 percent for PWO%, respectively. The hypothesis presented at the outset of that article was eventually confirmed in that, when compared to a control group, pitchers exceeding the league average PWO% in one year experienced a more substantial decline in K/9 in the subsequent season. Pitchers remaining in the same league from one year to the next experienced an aggregate decline of 0.05 of their K/9, with those making the AL to NL conversion gaining 0.25 whiffs per nine, and senior circuit tossers heading to the tougher league dropping 0.31 whiffs per nine.
When the PWO% increased, the declines for pitchers remaining in the National League followed suit to the tune of .20-.40 points off of their strikeout rates. The results suggested that teams should do well to avoid pitchers posting PW% marks greater than 31.4 percent and PWO% rates in excess of 14.5 percent, as their strikeout success was contingent upon feasting on these, the feeblest of all hitters. This week, the goal shifts to investigating a few aspects left untouched, such as how BABIP factors into the mix, aggregate leaders and trailers over the past 30 years, and whether a regression equation can produce a lower root mean square error as a predictor of K/9 in Year X+1 than a few other commonly used methods.
The idea of researching how the batting average on balls in play relates to pitchers cannibalizing their brethren is the byproduct of a conversation with Matt Swartz, wherein we theorized that pitchers with low BABIPs would face pitchers with men on less often. In turn, this would decrease the likelihood of a sac-bunt attempt and increase the probability of a strikeout in the process. After all, pitchers fan pitchers about 31.4 percent of the time on average, with the bulk of the other plate appearances involving sacrifices. Remove the bunt attempts, and that league-average strikeout rate rises, lending plenty of credence to the theory. Essentially, the luck and peripherals would follow one another in this case, as lower BABIPs are often deemed fluky, yet the inherent luck would add to the strikeout chances.
BABIP Against, by PW% Intervals PW% BABIP <= 0.20 .289 <= 0.30 .289 >= 0.31 .288 >= 0.40 .288 >= 0.50 .287
And how about if the data is reverse-engineered, instead searching for the average PW% based on different BABIP filters? Again, keep in mind that the league-average PW% is .314:
PW% by BABIP Intervals BABIP PW% <= 0.270 .317 <= 0.280 .317 <= 0.290 .317 >= 0.291 .315 >= 0.300 .312
The data does technically bear out the idea that a greater propensity for fanning pitchers goes hand in hand with lower BABIPs, however the discrepancies are minuscule before any type of statistical scrutiny that may determine the differences statistically insignificant. T-Tests compare the discrepancies between means of two sets of data, but with five different groupings in the sample, an ANOVA-Analysis of Variance-will do its best Lenny Harris impression and pinch-hit admirably. Much like the AR(1) Intra-Class Correlation used so frequently in this space to measure stability over a span larger than two years, the ANOVA compares the means for more than two sets of numbers.
The analogy of the two tests is not perfect, but it works in the sense that one form provides a measure on a straight one-to-one level, while the others allow for more multi-tiered analysis. When running an ANOVA, the reported result of interest is the p-value, or level of significance, which we want to be 0.05 or below. Through this specific run, the resulting p-value soared way above that interval, indicating that the BABIPs and PW% means presented above, likely deemed no different from one another from nothing more than a naked glance, are not statistically different. The underlying theory continues to hold validity and makes intuitive sense, but is not evident in the more advanced test.
Overall Leaders and Trailers
Following the publication of the first part of this article, which featured leaders and trailers for single season PW and PWO marks, I received several requests for the same sort of data, applied over the entire 30-year span in the sample. Amongst pitchers who faced their counterparts at least 100 times overall from 1979-2008, continuing to satisfy the 20+ Pitcher PA single-season criteria, here are the leaders and trailers in PW%:
PA vs K vs PA vs K vs Pitcher Ps Ps PW% Pitcher Ps Ps PW% Randy Johnson 418 217 .519 Charlie Hough 118 15 .127 Bruce Chen 113 58 .513 Kent Bottenfield 153 24 .157 Sid Fernandez 470 240 .511 Allen Watson 100 16 .160 Mark Prior 181 87 .481 Scott Karl 133 23 .173 Wandy Rodriguez 142 68 .479 Kevin Ritz 181 32 .177
And for PWO%:
Player K K vs P PW% Pitcher K K vs P PW% Dick Ruthven 519 117 .225 Charlie Hough 202 15 .074 Mike Harkey 161 36 .224 Doyle Alexander 253 20 .079 Dan Schatzeder 233 52 .223 Kent Bottenfield 285 24 .084 Chris Holt 251 56 .223 David Weathers 226 20 .088 Aaron Cook 260 57 .219 Allen Watson 172 16 .093
Randy Johnson struck out over half of the pitchers he faced in the years in which he faced at least 20 pitchers, but whiffed so many position players that his PWO% registered a lower-than-average 9.7 percent. Relating back to the prior article on the subject, of interest to either fantasy teams or teams in general looking to acquire a player, the red flags should wave for pitchers faring worse than the league in both metrics; if only one could be chosen, PWO% provides more utility, in that it serves to educate those interested in strikeout-padding pitchers.
When the pitchers and position players are separated, the small sample that the pitchers comprise becomes much clearer, especially when placed under the microscope of an ICC. The overall K/9 produced a 0.76 level of stability, with whiffs of solely position players at 0.75. Suffice to say, at 0.40, pitcher whiffs are moderately stable but of such a minute relative sample that only the extremes are really going to affect valuation methods. This does not mean that applying the PW and PWO analysis in the future becomes irrelevant, but rather that the results will be negligible outside of the outliers. Unfortunately, it does make the next step a bit less valid, even if the results themselves are interesting.
PW%, RMSE, and Regression
Knowing what we do with regards to the effects of pitcher whiffs on the overall strikeout rate, as well as the vast decline in the following season for those deviating from the mean, I decided to test out the predictive accuracy of three different measures. The goal involves finding the most accurate predictor of the strikeout rate in a subsequent season. The first method uses the straight, unadjusted K/9 in Year X to predict K/9 in Year X+1. The second method adjusts the Year X strikeout rate to incorporate the normal league to league drops or gains-as mentioned in the beginning, -0.05 for those remaining in the same league, 0.25 from AL to NL, and -0.31 from NL to AL-as well as a bit of aging. Lastly, the third method runs a linear regression on Year X+1 strikeout rate using the K/9 in Year X and PW% as dependent variables.
The regression produced the formula: K/9 in YR X+1 = 1.413 + (.762*K/9 in YR X) + (-.121*PW%). As a cherry-picked example, Joaquin Andujar struck out 4.22 batters per nine innings in the 1981 season, with a lower than average PW% of .272. The regression predicted a 4.64 K/9 in 1982 that fared favorably to the actual 4.60 rate. His rate of whiffing pitchers was expected to rise in the following season, a facet of the projection that the straight comparison and age/standard adjustments ignored. Not everyone worked as well as Joaquin Andujar did here, but perhaps the predictive accuracy of this formula could best the other two methods in this area. As previously mentioned, the smaller sample of pitcher whiff data precludes this from being concrete or definitive in utility, and with a p-value right around 0.45 for the PW% coefficient, it is not statistically significant. However, the root mean square error comparative results remain very interesting, and perhaps a foreshadowing of what will surface with a larger sample.
Root mean square error may sound intimidating, but it's fairly easy to calculate and is one of the best statistical measures of predictive accuracy around, basically measuring the average discrepancy between actual data and predicted data. To calculate the RMSE, begin by computing the error, which is actual minus predicted. Square the results so any negatives become positive, find the mean, and take the square root. The three methods proposed above produced RMSEs of 1.13, 1.08 and 0.97 respectively, and since lower is better in this case, the regression won out. When the sample of this data grows, more significance may occur, transforming this two-part theory from anecdotal and interesting into a serious measure worthy of being incorporated into projections.
Marrying the Parts
This entire study began as nothing more than a fun exercise in seeing which pitchers padded their stats by taking advantage of less capable hitters. Though the results and lack of statistical significance in certain areas may bring skepticism out of the woodwork, the underlying theory remains valid, and most of the findings support the initial hypothesis in one form or another, even if more data is required to feel confident and comfortable in reporting. Pitchers drastically deviating from the mean in terms of striking out opposing pitchers as a function of pitcher plate appearances and/or overall strikeouts should be expected to regress the following season, as the ICC indicated that pitcher whiffs are more random than skill, especially when stacked up against the inherent stability of overall strikeout rates.
In the end, all of this may only amount to a useful fantasy tip, but even though the aforementioned measures lacked statistical significance, the original idea and theory make enough sense that pitcher whiffs deserve to be evaluated on some front, especially when making decisions about bringing in pitchers from the opposite league. American League teams should be very wary of senior circuit pitchers with consistently high PWO% numbers, and National League teams holding onto their pitchers should be aware that certain strikeout rates might not hold up from year to year if a higher than average percentage of pitchers comprise the overall total. Then again, all of this may be for naught, but the idea makes enough sense that it becomes worthy of pursuing to some extent moving forward.