January 22, 2015
Forecasting With Fastball Frequency
The fastball is the meat and potatoes of the batter-pitcher contest. Variations in fastball velocity and movement explain a lot of the differences between pitchers, and a good heater can set up a whole arsenal of other pitches to boot. Fastballs are the most commonly thrown pitch by a wide margin, and so they determine to a great extent the results of any given matchup.
It’s no surprise then that pitchers tend to vary how much they use their fastballs on a hitter-by-hitter basis. Some hitters see fastballs rarely, others overwhelmingly, and the difference between hitters tells us something about their power (as well as their proficiency against fastballs). Being that they are the main offering of most pitchers, fastballs are the easiest to tee off against, and so they are thrown more rarely against powerful hitters.
In the past, I’ve advanced the idea that changes in the manner in which pitchers handle each batter can be used as a forecasting tool. Given the noted correlation between fastball percentage and power, it seems plausible that when fastball percentage changes, it reflects an underlying change in the hitter’s ability. This whole line of inquiry depends upon pitchers appraising the opponent’s talent faster than we do, but that seems a reasonable assumption given the greater tools and resources at the disposal of a major-league baseball team.
The problem is that there is always pesky variation in the way, obscuring our ability to notice true changes in fastball percentage. Sometimes a hitter will see a few extra fastballs because they face a series of fastball-happy pitchers all in a row. Sometimes an increase in fastball percentage is just caused by luck. Either way, this kind of random variation has to be handled.
To do so, we can model how many fastballs a hitter should see, given the pitcher they face and the count. Removing the effect of a pitcher is as simple as factoring in their fastball percentage. The count is a little more complicated: As a rule, hitters tend to see more fastballs as the number of balls increases, and fewer as the number of strikes increase.
The result of this is that a hitter’s fastball percentage can vary because his is going deeper into counts or facing more wild pitchers, an effect we must subtract to get to how each hitter is perceived.
After considering the effects of the count and the opposing pitcher, we can come up with an expected number of fastballs faced by each batter, and then subtract this from the actual number of fastballs seen. The remainder constitutes primarily the effect of the batter. To turn this information into a breakout predictor, we can calculate this number for the two halves of each baseball season (April, May, June, and July, August, September).
Now, according to the information presented, a hitter who sees more fastballs later in the season ought to be on the downswing. Pitchers have decided to challenge these players more aggressively, in the way that they usually approach players with less power. Conversely, a hitter who sees fewer fastballs later in the season than in the first half might be in line for a power surge: The opposition is treating him like he developed enhanced pop.
Using data from 2011-2013, I looked at how hitters performed in the following year (that is, 2012-2014) as a function of their change in fastball rate across half-seasons. In plainer terms, I calculated the degree of fastball avoidance for each batter in the first half of each season, and then in the second half, correlating the difference between halves with their ability in the next year. I required that hitters see at least 300 fastballs in each half-season to qualify for inclusion, to make sure that there was enough signal to get reasonably accurate estimates of fastball percentage for each hitter.
I created a set of models, using the change in fastball frequency as a predictor variable. As usual, I employed PECOTA as a baseline. It doesn’t do us any good, projections-wise, if fastball percentage just produces the same forecast as PECOTA. It has to improve upon that forecast for it to be actionable intelligence. So I considered it a win if and only if fastball frequency differences explained additional variance when put into a regression with PECOTA’s forecast.
Surprisingly, between-half fastball frequency difference (abbreviated FFD) seems to predict the next year’s performance alone (R2=.08), albeit not with superlative accuracy. Happily, the same statistic is also able to improve upon PECOTA’s forecast to a significant degree, upping the R2 value from .28 to .30. This regression then shows that the concept works: Differences between halves in terms of fastball frequency can foretell next year’s performance, to a degree that significantly exceeds our current knowledge (embodied by PECOTA).
The best model appeared when I also added age into the mix. I decided to include age after perusing the data a little bit and determining that fastball frequency changes, especially increases, tended to be more significant the older a hitter was. Sometimes, abrupt increases in fastball frequency even seemed to portend the end of careers, as for Vlad Guerrero in 2011, as for Paul Konerko and Derek Jeter last year. As implied by this anecdotal evidence, age was a significant contributor of accuracy in the combined regression (p=5.19e-13).
To verify that these results were not driven by outliers or small sample size, I also used a fancier regression approach, utilizing a Support Vector Machine with 5-fold cross-validation (#GoryMath). The SVM-based method is likely to be slightly more accurate. In this more complex formulation, FFD still reduced the Mean Squared Error to a significant degree when included alongside PECOTA: in terms of TAv, it dropped the absolute error from 67 points of TAv to 60, a roughly 10 percent improvement on PECOTA’s forecast.
An important caveat is that fastball frequency information proved less informative the more PAs or pitches a hitter saw. That dovetails with my article from last week, which showed that PECOTA is more accurate for hitters with more career PAs (because we have more information about them). The usefulness of fastball frequency declines precipitously as PECOTA gets a better and better handle on a hitter, so that the 7 point improvement (.067->.060) in TAv prediction becomes only a 2 point improvement if one applies a 100 PA threshold to the following year’s data (.029->.027). Even so, fastball frequency remains useful, albeit only slightly, out to PA thresholds that limit the data to only everyday regular-type players (400 PAs).
I have shown that we can leverage information about how each hitter is approached within halves of a single year to improve upon forecasts in the next year. Fastball frequency normally varies according to the pop of the batter, so that when it changes, it may be indicating a change in the underlying skill level of the same batter. Combined with the previous breakout indicator I developed (zone distance), there is the strong indication that forecasting systems can be improved with the use of PITCHf/x data from the opposing pitchers’ perspective.
 Consider as fastballs the following pitches: Four-seamers, cut fastballs, split-finger fastballs, and sinkers.
 To put the remaining variation in perspective, fastball percentage reaches a correlation between halves of r~.65, near the magical stabilization point of .7.
 Curiously, however, there was not a significant interaction term between age and FFD. I think that this has to do with the relationship being a little more complex than linear regression can handle, because more complex models performed significantly better when Age and FFD were combined than when they were separated.