One of the challenges of bringing BP's new pitching data to light is figuring out whether it’s useful and how we can leverage it to better understand what is happening on the field. As mentioned previously, we look at this in much the same way we look at pitch movement or velocity; we need to figure out how these tunnels data points interact with other components of a player’s performance to unlock a deeper understanding of what is happening.
Cubs right-hander Kyle Hendricks is a perfect subject to start with. As we mentioned in "Two Ways to Tunnel," Hendricks has some of the smallest pitch tunnels in all of baseball. Hendricks is often compared to Greg Maddux (including by us!), and we can see how he is in fact like Maddux in certain respects. It gives us an idea of how he’s successful, but only an abstract one. That is, we rationalize Hendricks’ success because we’ve seen Maddux do it before, but we don’t really know how all of the moving pieces come together.
In order to better understand how Hendricks is successful, we’ll have to dig into some of our new data to see what that can tell us about how he pitches.
Hendricks has steadily learned how to strike out opposing batters, increasing his K% by 55 percent from 2014 to 2015 and 2016, and it’s clear the effect that has had on his game. In fact, Hendricks’ new-found ability to strike batters out has resulted in him becoming one of the best pitchers in baseball as he has posted a sub-3.50 DRA over each of the past two seasons despite getting dinged for pitching (and winning an ERA title) in front of an elite defense.
Hendricks’ ability to strike batters out despite minimal stuff—20-80 scouting grades for his velocity and movement range from below average to solidly average are shown in the following graphic—has been a key to his success in the major leagues.
So how does Hendricks do it? To answer that question, we had to see what we could learn about his ability to strike opposing hitters out. From 2014 to 2015 Hendricks increased his strikeout rate by 55 percent, from 14.6 percent to 22.6 percent. In addition to generating weak contact, something we’ll get to later, Hendricks has become an above-average strikeout pitcher.
We know that Hendricks is a good command pitcher, and his ability to throw strikes is key to his ability to succeed. To wit, below is a table showing Hendricks’ Called Strike Probability (CS Prob) and Called Strikes Above Average (CSAA) over his short three-year career:
Interestingly, Hendricks’ dramatic increase in strikeouts came with a corresponding drop in Called Strike Probability. This example follows the trend that we discussed in our introduction of CS Prob and CSAA about how pitchers with good command (Hendricks’ CSAA scores were among the top five in the league for both 2015 and 2016) actually pitch less in the zone to greater effect.
Hendricks learned that he could pitch effectively out of the zone while grabbing an extra strike here or there through his superb command. This of course is just one piece of the puzzle.
From 2014 to 2015 Hendricks actually saw a nominal decrease in swinging strike rate, but did see a jump of more than 23 percent from 2015 to 2016. In addition to his ability to get extra strikes through enhanced command and pitching further away from the center of the zone, Hendricks saw an uptick in his ability to generate whiffs.
The first step in analyzing Hendricks’ swinging strikes was to build a model that helped us explain swinging strikes across all of baseball. In doing so, we were able to identify some main drivers across the league. We could then see how our findings inform us about specific players, in this case Hendricks. We compiled our pitch tunnels data and a variety of other inputs into a model to determine the major factors leading to swinging strike rates, to see what we could learn.
Our model gave us two major results which it identified as being influential in getting whiffs. They were:
1. The interaction between xangle (mean)—the horizontal approach angle of a pitcher’s offerings—and the tunnel differential.
2. The interaction between xangle (standard deviation)—how much variance there is in horizontal approach angle—and the differences in flight time between sequential pitches.
In layman’s terms, we discovered that there was a convergence point where a specific horizontal trajectory—data pulled from PITCHf/x, as corrected by PitchInfo—and a few components of our tunnels data could explain some of his increased whiff tendencies. Specifically, we found that a low average xangle coupled with a low tunnel differential is a strong driver of swinging strikes. Pitchers who have a more over-the-top motion (low average xangle) and a small tunnel differential make it difficult for opposing hitters to identify pitches.
This is likely a combination of deception—hiding the ball through their motion—and difficulty in pitch identification. We’ve established previously that Hendricks is among the league leaders in minimizing tunnel differential, so we know he qualifies from that standpoint. Hendricks is also on the low end for average xangle, meaning that his horizontal approach angle, consistent with an over-the-top throwing motion. Take a look at this gif, noting specifically how the ball appears to follow right behind Hendricks’ head as he releases it:
We also found that having low variance in your horizontal approach angle (xangle), along with a certain range of flight time difference, nearly doubles the likelihood that you’ll be able to generate whiffs. Not only does Hendricks have a low mean xangle, which we detailed above, he also has low variance within his horizontal approach to the plate. There is very little deviation in Hendricks’ offerings as they come out of his hand, once again making pitch identification difficult for opposing batters.
Finally, there’s the matter of changing speeds. We often discuss changing speeds in absolute terms, but the reality is that many things can impact the effectiveness of changing speeds in a particular sequence. For example, a three-pitch sequence of:
1. 95 mph fastball 2. 76 mph curveball 3. 95 mph fastball
isn’t necessarily better than a three-pitch mix of:
1. 95 mph fastball 2. 88 mph cut fastball 3. 76 mph curveball
Our model suggests that smaller flight time differences are better in terms of driving swings and misses. This makes some intuitive sense. Swinging strikes largely come from a pitcher’s ability to confuse a batter with pitch movement or by disguising one pitch as another. Called strikes on the other hand are driven by surprising a hitter with a dramatic change in speeds or a combination of location and movement.
Hendricks fits this aspect of our swinging strikes model as well. He slots in on the lower end of the scale for changing speeds, meaning that his approach is driven by the kind of subtlety that leads to swings that don’t connect with their intended targets.
For Hendricks, it’s his approach and ability to effectively sequence his pitches that makes him successful. Couple those skills with his elite command, and you have a Greg Maddux starter kit on your hands. Hendricks doesn’t need to have the best stuff in baseball to be successful, and has proven that a guy with average stuff can still strike out nearly a quarter of the batters he faces.
Maddux often extolled the benefits of generating weak contact as opposed to trying to strike all of the opposing hitters out. In fact, he has become the namesake for a quirky piece of baseball trivia—“throwing a Maddux.” When a pitcher is able to produce a complete-game shutout while throwing fewer than 100 pitches, their effort is termed as a “Maddux.” And for good reason. From 1988 to 2013, Maddux stayed within those parameters 13 times, almost twice as many as the next closest pitcher.
Hendricks isn’t racking up “Madduxes” yet, but he’s certainly following in the famed pitcher’s footsteps. In our recent piece on pitch tunnels we noted that Hendricks’ narrow pitch tunnels make it very difficult for the batter to tell which pitch is coming until it's too late. In effect, Hendricks’ lackluster stuff plays to his advantage, allowing him to generate weak contact because hitters are only slightly miscalculating when swinging at his offerings.
Working down in the zone also helps Hendricks, because the contact he does give up is often from locations that are difficult to drive.
Rob Arthur recently showed that Hendricks’ Statcast data supports this idea as well. He noted that a confluence of factors, including Hendricks’ pitch location, command (though Rob uses another form for this), and propensity for getting ahead in the count have helped him suppress exit velocity and even launch angle from opposing batted balls. Arthur notes:
Statcast data also shows that Hendricks isn’t just getting lucky. A model (specifically, a mixed model of exit velocity, with terms for the pitcher, batter, and park) of exit velocity shows that Hendricks is driving down the speed of his batted balls by about 1 mph, which is in the top 10 among all pitchers in baseball.
A similar launch angle model shows Hendricks pushing batted balls downward by four full degrees, which meshes well with his excellent 54 percent ground ball rate. For any given batted ball, a four degree decline in angle and a 1 mph drop in speed may not prevent a hit — but added together over 237 tracked batted balls, and it’s enough to significantly depress opponents’ success on balls in play.
Arthur suggests that a combination of factors are at play here, helping Hendricks to limit the damage when opposing hitters do make contact. It’s possible if not likely that Hendricks’ tight pitch tunnels and general deception play a big role here as well.
Hendricks and Maddux
We’ve spent a lot of time trying to convince ourselves first, then others, that these Hendricks and Maddux comparisons might be pretty legitimate. We’ve hopefully laid out a pretty compelling case here, but don’t let us be the only ones who try to convince you.
Maddux has said of Hendricks:
I like watching him pitch. I like guys that rely on movement and location. I can relate to him. That's what I had to do. I'd rather watch him pitch than some lefty throwing 95 mph. I think it's great. I look at the way he pitches off his fastball kind of like I did, and you look for guys who pitch the way you did.
Hendricks, for his part, is modest. He has heard people compare him to Maddux, but he’s not buying it just yet:
[The comparison] is hard for me to take, honestly, because it's Greg Maddux. I get that I'm a similar pitcher to his type. I've learned from what he did and how he approached hitters. But I have a long, long way to go.
Hendricks might be a lot closer than he thinks.
The Swinging Strike Model Detail
We modeled the average relationship between tunneling characteristics and swinging strike rate, using aggregate data of various pitching statistics for the 2016 season, as compiled and corrected by PitchInfo. This requires us to balance several goals:
(1) We want a reasonable amount of explanatory power (we ended up with an R-squared around .4);
(2) We want to control for other factors that might otherwise be driving whiff rate;
(3) We want a modeling method that (a) is flexible, (b) will accommodate non-linearities, and (c) is interpretable and thus will allow meaningful coefficient inference.
As to Factor (1), to test the explanatory power of tunnels, we included all of our new tunneling statistics:
· breaktotunnelratio, and
All of these are explained in the introductory tunneling article.
For Factor (2), we added a representative mix of covariates that are already well-associated with strikeout rate:
· fastball velocity (we used 95% percentile pitch velocity for each pitcher),
· pitch velocity standard deviation,
· each pitcher’s mean CS_Prob (see the introductory article),
· each pitcher’s standard deviation in CS_Prob,
· each pitcher’s average and standard deviation of x-angle (horizontal motion),
· each pitcher’s average and standard deviation of z-angle (vertical motion),
· each pitcher’s average and standard deviation of spin-rate (raw spin).
Factor (3) guides the actual choice of modeling method and, as usual, requires a great deal of thought. As noted above, tunneling impacts are likely to be non-linear and interactive, playing off of other pitching characteristics. That rules out your typical generalized linear model. Furthermore, we are interested in inference rather than prediction: as in, we already know what each pitcher’s swinging strike rate is. What we want to know is how tunneling drives that rate.
This means we are looking to specific coefficients assigned to each variable, which rules out most machine learning methods. The easiest way to get interpretable coefficients is to use a generalized linear model, which we’ve already ruled out. That leaves us with a non-parametric, but additive compromise of some sort.
The typical default for many modelers would be a generalized additive model, or GAM. GAMs have become the workhorse of non-linear modeling; they easily fit curves, usually with what are known as smoothing splines, while offering some protection from overfitting. However, GAMs struggle to provide useful inference, at least in a way that can be easily explained to others. Typical GAM packages (such as mgcv) don’t provide useful coefficients for the non-linear contributors; instead, they give estimates in terms of “effective degrees of freedom”; good luck explaining that to the coaching staff.
Instead, the modeler is left to interpret the (admittedly rather nice) curves drawn by the software. Those plots aren’t bad, but also aren’t giving you either the underlying coefficients or quantifying the precise break point(s) at which particular factors become relevant. For this reason, our model of choice for this application is Multi-Adaptive Regression Splines (MARS). MARS can fit non-linear data just like GAMs, but it offers further advantages. MARS creates fixed hinge points at which each combination of variables starts to produce uniquely important and quantifiable effects.
In this regard, MARS performs a lot like tree-based models or Random Forest, while still providing coefficient interpretability and often better performance on numeric inputs. MARS also performs variable selection, selecting only those variables that are most likely contributing to explaining the outcome, while pruning out those that do. All of these features need to be used with care and good judgment, but the same is ultimately true of any modeling solution.
The software of choice for performing MARS in the R programming environment is the free earth package, which provides all of the features above, and further provides cross-validation to help select the best combination of predictors. We used all pitches from the 2016 season, and specified five repeats of 10-fold cross-validation.
A full listing and plot of the various coefficients is beyond the scope of this article, but should certainly be reproducible by statistically-capable readers. Consistent with our analysis above, though, we found that certain inflection points caused tunneling characteristics to dramatically improve a pitcher’s whiff rate. These include:
· Pitchers with a mean xangle less than ~ -1.3 degrees and less than ~.7 feet of tunnel distance. The further the pitcher is below either of those thresholds, the larger the effect.
· Pitchers with an xangle standard deviation of less than ~ -1.5 degrees and a flight time difference between pitches of greater than .03 seconds. The more extreme a pitcher is in in these two categories, the larger the effect.
There almost certainly are more and better models to be fitted along these or even other lines. Our hope is that by making this data available to the baseball community, its next generation of innovators will set their sights on unlocking all of the information these new measurements have to offer.
Special thanks to Sahadev Sharma and Rob Arthur for their assistance with getting this piece together.
R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Wood, S.N. (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B) 73(1):3-36.
Stephen Milborrow. Derived from mda:mars by Trevor Hastie and Rob Tibshirani. Uses Alan Miller's Fortran utilities with Thomas Lumley's leaps wrapper. (2016). earth: Multivariate Adaptive Regression Splines. R package version 4.4.7. https://CRAN.R-project.org/package=earth.