Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

When I wrote about pitchers with major divides between their ERAs and SIERAs two weeks ago, a reader inquired why Clay Buchholz had such a pedestrian strikeout rate while having an above average swinging-strike rate. Buchholz has mustered just 6.2 K/9, nearly a full strikeout below the 7.1 league average, but has induced batters to swing and miss on 9.5 percent of his pitches according to FanGraphs, a full percentage point above the 8.5 percent league average. The question was apparent: Do pitchers who get a lot of whiffs increase their strikeout rates over time?

The question is a logical one to ask. Inducing a swing and a miss could be more indicative of skill than getting an umpire to call a strike on a pitch that a hitter opted to take. The most notorious strikeout kings of all time have always been those that can get hitters to swing and miss. The top 10 swinging strike rates in 2010 according to FanGraphs are pitchers with high strikeout rates:

Pitcher

Swinging Strike %

K/9

Francisco Liriano

12.5

9.54

Cole Hamels

11.9

9.29

Josh Johnson

11.8

9.11

Jered Weaver

11.1

9.62

Tim Lincecum

11.0

9.62

Shaun Marcum

10.9

7.72

Mat Latos

10.8

9.38

Hiroki Kuroda

10.8

7.37

Ryan Dempster

10.7

8.52

Ricky Nolasco

10.5

8.39

All are above average, and the top five strike out more than a hitter per inning. Clearly, swinging strikes are highly correlated with strikeouts—in fact, they have a correlation of .84 among starting pitchers.

Swinging strikes are also heavily correlated year-to-year. Among all pitchers with at least 80 innings as starting pitchers from 2002-09, there was a .79 correlation for swinging-strike rate, slightly above the .77 correlation for strikeout rate (SO/PA) itself.

Further evidence that this information could be useful is that swinging strikeouts per PA have a .77 correlation year-to-year, while called strikeouts per PA have a .59 correlation. 

I ran a regression of strikeout rate on the previous year’s swinging-strike rate, controlling for age and year, and found that a one percentage point increase in swinging-strike rate correlated with a 1.55 percentage point increase in SO/PA the following year, which was extremely significant. Thus, in the absence of information about the previous year's strikeout rate, knowing that a pitcher had more swinging strikes implies they likely had more strikeouts the following year.

Interestingly, the swinging SO/PA and called SO/PA do a fairly good job of predicting each other. The correlation between called SO/PA and swinging SO/PA the following year is .26, and the correlation between swinging SO/PA and called SO/PA the following year is .25. 

The question remains whether swinging strikes provide additional information than strikeouts already do, so I ran a regression (again controlling for age and year) of strikeout rates on the previous year’s strikeout rate and the previous year’s swinging-strike rate, and found an interesting result.

Variable

Coefficient

P-Stat

Constant

.0606

.000

SO/PA

.7294

.000

Year 2002-04

-.0056

.026

Year 2008

.0060

.074

Age

-.0010

.000

Swinging Strike%

.1236

.251

This implies that the extra information that swinging-strike rate provide, once the previous year’s strikeout rate is already determined, is not very useful at all. For every one percentage point above average in the previous year’s strikeout rate, the following year’s strikeout rate is likely to be about 0.73 percentage points above average. However, for pitchers with the same strikeout rate the previous year, a pitcher with one percentage point higher swinging-strike rate only will have a 0.12 percentage point higher strikeout rate, which is not statistically significant. The value added from this information is virtually useless. 

(Note that 2005-07 is not included as a coefficient. Those familiar with regression analysis will recall the coefficients for 2002-04 and for 2008 are both measured relative to the 2005-07 effect.)

The R2 statistic, which measures how much of the variation in the dependent variable (following year’s strikeout rate) can be explained by the variables used tells the same story. The R2 for the regression above is .6118, just a tiny fraction of the .6110 R2 statistic for running the same regression without swinging-strike rate. 

In other words, the value added by knowing the swinging-strike rate when the strikeout rate is already known is less than a tenth of a percent of the differences in players’ strikeout rates the following year.

Running the same regression on pitchers who were less than 28 years old in the first year actually reduced the coefficient to a statistically insignificant negative number (-.169), suggesting that swinging-strike rate for younger pitchers provides no additional information that the strikeout rate does not already provide.

I decided to check whether getting more of one’s strikeouts as swinging strikes was helpful in predicting which direction a pitcher’s strikeout rate was headed, and found that this was not useful either. I ran a regression of strikeout rate on the previous year’s strikeout rate, dummies to control for year and age, and swinging-strikeout rate and got the following results:

Variable

Coefficient

P-Stat

Constant

.0633

.000

SO/PA

.7551

.000

Year 2002-04

-.0045

.053

Year 2008

.0058

.087

Age

-.0010

.000

Swinging SO/PA

.0266

.752


In other words, knowing how much of a pitcher’s SO/PA came on swings versus called strikes was not useful at all.

In fact, running an equivalent regression with called SO/PA and swinging SO/PA as separate variables to see this more clearly shows that the coefficients on called and swinging strikeouts are almost exactly the same:

Variable

Coefficient

P-Stat

Constant

.0633

.000

Year 2002-04

-.0045

.053

Year 2008

.0058

.087

Age

-.0010

.000

Called SO/PA

.7551

.000

Swinging SO/PA

.7817

.000

The information provided by knowing the form of those strikeouts is not all that useful.

Does this mean that none of the pitch information that we find in the “Plate Discipline” section on FanGraphs is useful when we have the regular box scores? Answering this requires using the same approach to look at each of the other variables. The following tables show the regression coefficients in a series of regressions on previous year’s strikeout rate, age and year controls, and an alternating statistic in each column. The P-Stats are in parenthesis underneath the coefficients in each cell. Statistically significant coefficients are bolded, while weakly statistically significant coefficients are bolded and italicized.

Variable

Coef (P-Stat)

Coef (P-Stat)

Coef (P-Stat)

Coef (P-Stat)

Coef (P-Stat)

Coef (P-Stat)

Coef (P-Stat)

Coef (P-Stat)

Coef (P-Stat)

Constant

.0606

(.000)

.0526

(.004)

.0808

(.000)

.1007

(.057)

.0748

(.000)

-.0037

(.946)

.0536

(.000)

.0470

(.122)

.0449

(.034)

SO/PA

.7294

(.000)

.7736

(.000)

.7754

(.000)

.7458

(.000)

.7586

(.000)

.8132

(.000)

.7563

(.000)

.7764

(.000)

.7704

(.000)

Year ‘02-‘04

-.0056

(.026)

-.0045

(.056)

-.0040

(.091)

-.0052

(.038)

-.0054

(.030)

-.0029

(.280)

-.0022

(.399)

-.0051

(.044)

-.0046

(.048)

Year ‘08

.0060

(.074)

.0058

(.084)

.0054

(.110)

.0060

(.078)

.0067

(.055)

.0058

(.084)

.0041

(.246)

.0062

(.073)

.0058

(.087)

Age

-.0010

(.000)

-.0010

(.000)

-.0010

(.000)

-.0010

(.000)

-.0010

(.000)

-.0010

(.000)

-.0010

(.000)

-.0010

(.000)

-.0010

(.000)

Swinging Strike%

.1236

(.251)

 

 

 

 

 

 

 

 

F-Strike%

 

.0201

(.496)

 

 

 

 

 

 

 

Zone%

 

 

-.0341

(.327)

 

 

 

 

 

 

Contact%

 

 

 

-.0399

(.474)

 

 

 

 

 

O-Contact%

 

 

 

 

-.0158

(.328)

 

 

 

 

Z-Contact%

 

 

 

 

 

.0686

(.218)

 

 

 

Swing%

 

 

 

 

 

 

.0620

(.060)

 

 

O-Swing%

 

 

 

 

 

 

 

.0233

(.575)

 

Z-Swing%

 

 

 

 

 

 

 

 

.0422

(.339)

Definitions:

Swinging Strike% = Percent of pitches thrown that were swung at and missed
F-Strike% = Percent of hitters faced for which the first pitch of PA was a strike
Zone% = Percent of pitches thrown in the strike zone
Contact% = Percent of hitters’ swings that were fouls or hit into play
O-Contact% = Contact% on pitches out of the strike zone
Z-Contact% = Contact% on pitches in the strike zone
Swing% = Percent of pitches at which hitters swung
O-Swing% = Swing% on pitches out of the strike zone
Z-Swing% = Swing% on pitches in the strike zone

Nearly every bit of information that the pitch data gave us was useless. For each of the regressions above, we had the relevant information we needed by knowing the pitcher’s strikeout rate, age, and what year it was. The coefficients on eight of the nine variables are not even weakly statistically significant, but there is one variable that has a weakly significant (positive) effect on predicting the next year’s strikeout rate: O-Swing%, the rate at which pitchers get hitters to chase pitches (supposedly) out of the zone.

Part of the reason that this statistic’s weak significance was so surprising to me is that I did not expect this information to be useful due to measurement error. The information, which FanGraphs obtains from Baseball Info Solutions, is determined by watching each pitch from the center-field camera. Given the issue of parallax, the center-field camera gives a distorted view and the observer can be fooled. This is an important point—the data appears rather questionable when looking at the yearly averages. League average O-Swing% across the years has moved around and mostly increased:

2002: 18.1%
2003: 22.2%
2004: 16.6%
2005: 20.3%
2006: 23.5%
2007: 25.0%
2008: 25.4%
2009: 25.1%
2010: 29.3%

Perhaps pitchers were gradually getting better at inducing batters to swing at the right pitches? It seems unlikely given the change in league average “Zone%.” describing the percentage of pitches in the strike zone:

2002: 54.6%
2003: 51.4%
2004: 55.1%
2005: 53.8%
2006: 52.6%
2007: 50.3%
2008: 51.1%
2009: 49.3%
2010: 46.6%

It seems far more likely that pitches were being recorded as out of the strike zone more over the years, since this occurrence seems to become less likely, just as percent of swings at pitches that were supposedly out of the strike zone seems to become more likely. The overall swing rate has only ranged from 45.3 to 46.5 percent over the years, so it seems more likely that those swings were being treated as pitches out of the strike zone more so later on in the decade than that hitters were swinging at more pitches out of the zone as pitchers throwing a roughly equal and opposite amount of pitches out the zone.

I thought that normalizing the data might lead to stronger results, by measuring the O-Swing% relative to league average. This did not work:

Variable

Coefficient

P-Stat

Constant

.0659

.000

SO/PA

.7654

.000

Year 2002-04

-.0046

.049

Year 2008

.0057

.090

Age

-.0010

.000

Net O-Swing%

.0340

.386

The O-Swing% relative to the league average is now useless. Chances are this is because of the reason Baseball Prospectus' Colin Wyers expressed concerns with the data—the year to year fluctuations in league average O-Swing% are probably a result of moving center-field cameras. The average for each park is probably vastly different and using a league-average effect is probably not very useful.

None of the other statistics as measured relative to the league average yielded remotely significant coefficients when included in regressions, either.

The most promising pitch data is the rate at which pitchers can get hitters to chase pitches out of the strike zone. The pitchers that tend to do so are more likely to see their strikeout rates increase the following year. However, the measurement error in these statistics is currently so large that it is difficult to glean any major insight from them. Chances are that this information could be more useful if measured more scientifically, and this could be one of the areas where pitch data could move our understanding of baseball forward.

However, the most important information to take away from this article is that even more objective statistics like swinging-strike rate, swing rate, and contact rate, as well as called versus swinging strikeout rates are all of very little added value beyond knowing what the pitchers strikeout rate will tell you.

 Of course, strikeout rate for pitchers is one of the quickest to stabilize among all baseball statistics, and so the added value of information beyond knowing historical strikeout rate is least likely to be significant for strikeout rate as compared with any other statistic. Thus, next week I will look at walk rates and attempt to determine whether this type of information can inform our knowledge about walk rates any more than it could have informed us about strikeout rates.  

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
mikefast
9/24
Ricky Zanker looked briefly at the changes in O-Swing% earlier this year:
http://www.hardballtimes.com/main/blog_article/is-the-bis-data-right/

He found that PITCHf/x did not show an increase in O-Swing% from 2008 to 2010.
ravenight
9/24
Might also be interesting to see if it has an effect on BABIP. In other words, does a pitcher who gets more swings and misses tend to get hitters to make weaker contact also?
mikefast
9/24
J.C. Bradbury back in 2005 found that strikeout rate did correlate with lower allowed BABIP:
http://www.hardballtimes.com/main/article/another-look-at-dips1/

I'm not aware that he looked for a difference between the impact of swinging strikes and called strikes on BABIP. That would be an interesting research question.
swartzm
9/24
I definitely have found the BABIP vs. SO% inverse correlation and written about it some before. I would suspect that is stronger than the swinging strike vs. BABIP correlation, but it's worth checking at some point when I can merge all that data in one set. Good idea, thanks.
blcartwright
9/24
I believe you have to look at contact rate and strike rate at the same time. A pitcher who throws more balls may have a higher whiff rate, but will walk a few more batters before getting the strikeout.

Two inputs (strike rate, contact rate), three outcomes (walk, strikeout, ball in play).
blcartwright
9/24
and for BABIP - it may be that higher strikeout rates correlated with lower BABIP allowed not because one causes the other, but that they may be both results of the same cause - high strikes. Low groundball pitchers have lower BABIPs allowed, and also have higher strikeout rates than high groundball pitchers.
swartzm
9/24
I think the last correlation you found is a result of selection bias. There are pitchers who are good at getting ground balls, pitchers who are good at striking guys out, pitchers who are good at both, and pitchers who are good at neither. Halladay is good at both. Hamels is good at Ks. Hudson is good at GBs. The guy who is good at neither is in AAA. Hence you get negative correlation.

I'm not sure that high strikes are the best explanation, because when I ran my regression in the spring in the "Why SIERA Doesn't Throw the BABIP Out with the Bathwater" article, I found that even controlling for ground ball rate, high-K pitchers had lower BABIPs. Controlling for Ks, high ground ball pitchers had higher BABIPs. I think both are distinct effects. One is about the types of batted balls most likely to go for hits (GBs vs. FBs) and the other is about batter's decisions with how hard they swing, etc.
blcartwright
9/25
Cool, just wanted to make sure that possibility had been considered. I didn't recall that you had studied it already, hence my "it may be..."
vetrini
9/24
"The coefficients on eight of the nine variables are not even weakly statistically significant, but there is one variable that has a weakly significant (positive) effect on predicting the next year’s strikeout rate: O-Swing%, the rate at which pitchers get hitters to chase pitches (supposedly) out of the zone."

Did you mean Swing% - not O-Swing% - instead or am I misreading the associated table?
swartzm
9/24
Weird, I somehow mistyped the tables! The one labeled Swing% is O-Swing%, the one labeled O-Swing% is Z-Swing%, and the one labeled Z-Swing% is Swing%. O-Swing% was weakly statistically significant.
j2avage
9/26
I wonder is treating age linearly is best specification approach.

I would also have found it more interesting to see the relationships between the explanatory variables before regressions were presented. If the correlations between SO/PA and the other strike variables is really high, then I don't see a point in running regressions that include both variables.
swartzm
9/26
If you read J.C. Bradbury's article on aging, he found that strikeouts peak so early that having a normal aging curve wouldn't have been appropriate.

The correlation between SO/PA and the other variables is very high, which is why I needed a regression in the first place, to see which effect was driving it. They aren't colinear or anything, just highly correlated, so it's appropriate to run the regression with both K% and the correlated BIS variables like swinging strike rate.