Roy Halladay. Carlos Zambrano. Derek Lowe. Greg Maddux.
Some pitchers’ most readily-identifiable characteristic is their ability to induce groundballs. Indeed, for pitchers like Lowe, Zambrano, and Brandon Webb, inducing groundballs is an essential part of their game plan. Zambrano, for instance, can get away with maintaining a relatively high walk rate because he induces a lot of double plays, and avoids giving up home runs, which are especially costly with runners on base.
As I discussed in the PECOTA essay in this year’s book, groundball induction–or, more specifically, groundball-to-flyball ratio–is remarkably consistent from year-to-year, more so than virtually any other measure of pitching performance. Here is a plot containing the groundball percentages of all major league pitchers who faced at least 200 batters in each season 2002 and 2003:
Look at how nicely the points form a line–the correlation described in the chart is .75, remarkably high as far as baseball statistics go. Compare that to what we get, using the same group of pitchers, for a more commonly-used metric like walk rate (measured as walks as a percentage of batters faced):
Here, we get a much sloppier array of values; the year-to-year correlation for walk rate is just .51. And, yet, walk rate is used as one of the primary drivers of many forecasting systems, whereas groundball rate is not (we introduced groundball rate into PECOTA this year).
One might protest, and rightly so, that the fact that a variable has strong correlation from season to season ought not automatically accord it a high priority in a projection system. Something like, say, uniform number is very highly correlated from season-to-season–its influence in predicting ERA is zero.
Where groundball rate is valuable in serving as a proxy for predicting home runs. The number of home runs allowed has a very large influence on a pitcher’s ERA. At the same time, home run rate varies quite sharply from season to season–I won’t run another graph, but the correlation coefficient on HR% for the pitchers in the sample above is just .25. This phenomenon can be thought of a corollary of sorts to DIPS theory, which suggests that pitchers have little influence on what happens to a batted ball once it is in play. Though DIPS theory explicitly excludes home runs from consideration, it is proper to say that a pitcher can influence (through pitch location and pitch type) whether a ball is hit in the air or on the ground–but there’s considerable luck involved in determining whether a ball hit in the air will turn into a home run, a warning-track shot, or just a routine putout for the right fielder.
This is why I assert in the book essay that groundball ratio is a better predictor of home run rate than is home run rate itself. I looked at league- and park-adjusted statistics for all pitchers from 1975 onward who faced at least 500 batters in two consecutive seasons (1975 is the year in which reliable groundball-flyball data begins to be available from Retrosheet):
- The correlation between home run rate in year N and home run rate in year N-1 is .326 (note that it is a little bit higher than in the previous example since we’ve increased the batters faced threshold).
- The correlation between home run rate in year N and groundball rate in year N-1 is -.345. Though the sign proceeding the correlation figure is negative (since a higher groundball ratio tends to predict a lower home run rate), the magnitude of the correlation is a bit higher.
Of course we can do better still if we account both for home run rate and for groundball rate in the previous season. A simple regression model that uses home run rate in year N as the dependent variable, and home run rate in year N-1 as the independent variable, is capable of explaining only about 11% of the variance in home run rate for the sample of pitchers we’ve taken above. If groundball rate in year N-1 is included as a second independent variable, the explanatory power increases sharply to 16%. We can get up closer to 20% if we include other factors like strikeout rate and walk rate (and do considerably better than that if we look at three years worth of previous seasons data, as PECOTA does)–but all the while, groundball rate maintains the largest influence on predicting home runs allowed.
Keeping something like this in mind can be useful when explaining the struggles of a pitcher like Barry Zito. Zito turned in a successful campaign last season in spite of mediocre strikeout and walk numbers in part because he allowed just 19 home runs in 231.2 innings. At the same time, Zito posted a GB/FB ratio of 0.89. This year, Zito has yielded eight home runs in his first 48 innings, and his ERA has jumped from 3.30 to 5.63. Jason Schmidt, though he’s had a better season than Zito, belonged in a similar category–a flyball pitcher whose success was brought about in large part by a low home run rate.
One of the “tricks” to building a good projection system is to figure out the myriad direct and indirect ways in which certain variables can have a predictive influence on others. Guys like Zito and Schmidt who fit that pattern need to have red flags by their names, just as a pitcher might if he had allowed very few hits in spite of a low strikeout rate, or a hitter might if he improved his batting average while his plate discipline fell apart. We think that incorporating groundball ratio is one of PECOTA’s best tricks.