May 12, 2014
Predicting Batter Breakouts with PITCHf/x
In my last article, I looked at the career path of Albert Pujols from the perspective of PITCHf/x. Given the extreme fluctuations in Pujols’ skill over the last five years, I suspected that he would be a good test case to understand how batters are handled differently as their skills change. I found that pitchers approached Prince Albert more and more aggressively as his skills fell off, throwing him pitches closer to the center of the zone.
Even before Pujols’ results began to decline, pitchers were attacking his strike zone ever more audaciously. Consider this graph, which looks at the trend in Pujols’ zone distance in 2009 (left of the blue line) and 2010 (right), years in which he posted TAvs of .373 and .357.
Throughout both years, pitchers were pitching Pujols closer to the center of the zone. In other words, as Pujols was at his most dangerous, pitchers were paradoxically more willing to come after him in the zone. One interpretation of this graph might be that pitchers and teams had noticed an underlying flaw in Pujols’ plate discipline, the same flaw which later caused his historic drop-off. In the broader context of understanding baseball, the above graph prompts the following question: Can one predict a hitter’s performance based on the trends in how pitchers approach that hitter?
Ben Lindbergh and Sam Miller raised this very question on a recent episode of Effectively Wild. I took their discussion as a challenge and set out to address the relationship between changes in zone distance and changes in batter ability.
The Year of Chris Davis
As a side note, for now, I’m going to limit myself to looking for breakouts, rather than breakdowns (unexpected decreases in performance). Even though Pujols’ breakdown prompted this piece, it turns out that breakdowns are more difficult to examine and predict, largely because they often involve the added complication of injury. I’ll return to the subject of breakdowns in a future article.
The below graph is the kind of pattern we are looking for: in which pitchers gradually learn not to provoke a hitter and begin shading ever farther away from the zone’s center.
As it turns out, this graph belongs to Chris Davis, circa 2012. Sometime around midseason, there appears to have been a dramatic change in the behavior of opposing pitchers, such that they decided to stop throwing him strikes.
Using Davis as a template, I looked at the slope of the linear regression line of each hitter’s zone distance profile over the course of the 2012 season (min. 1500 pitches). This method is a little crude, but also simple: the higher the slope, the more pitchers threw away from the zone as the season progressed. Here are the hitters with the most significant zone distance shifts for the year 2012, and how they performed in 2013 (min. 500 PAs).
It appears that changes in zone distance do a good job at predicting breakouts. Eight of the top 10 saw their 2013 performances exceed PECOTA’s projections, by margins ranging from .079 (the aforementioned Mr. Davis) to .003 (Giancarlo Stanton). MVP Andrew McCutchen makes an appearance with a solid (+.031) increase in TAv. Chris Davis is undoubtedly the great success of the method, given that his 2013 could well be regarded as the definition of a breakout season. Recall that Davis’ breakout was a huge surprise, as he had formerly been regarded as a typical Quad-A type player.
Only two players on this list underperformed their projections: David Freese and Buster Posey. Both suffered injuries in the course of the 2013 season, in both cases severe enough for them to miss games. Freese dealt with a lingering lower-back injury throughout the season, at one point visiting the 15-day disabled list. Posey’s injury was more minor (a fracture in his ring finger) but potentially affected his hitting as well.
Even considering these two cases as “misses” for the predictions, changes in zone distance are remarkably accurate overall. Collectively, the above 10 players outperformed their PECOTA projections by .0237 points of TAv, or roughly the difference between Paul Goldschmidt and Todd Frazier this year. That’s no small margin. The probability of randomly picking a group of 10 players who overperformed their PECOTA projections by that amount is something like .005, suggesting that changes in zone distance are statistically significant for predicting breakouts (although excluding Davis increases that probability to ~.05). Overall, there’s a statistically significant relationship between change in zone distance and difference from the PECOTA forecast, with or without Chris Davis included.
So far, the breakout candidates this year are outperforming their projections by about .012 points of TAv (or .0079 for just the top 10), which is down substantially from last year, but in line with the non-Chris Davis breakouts of 2013. Generally, TAv numbers are going to be a lot more volatile in ~100 at-bats, so consider these over-performance numbers provisional for now.
The successes so far this year include David Ortiz, who continues his highly successful rebellion against Father Time, as well as Victor Martinez and Chase Utley, who appear to be joining Ortiz in that struggle. Indeed, the mean age of the group is an ancient 32, which is worth further investigation; maybe radical changes in zone distance mostly occur for older hitters. Among non-geriatrics, Anthony Rendon has been raking and would have been predicted to do so by this method, while Starling Marte seems not to have improved as much as expected.
Raul Ibanez can safely be counted as a huge failure of the method, since he’s been absolutely putrid so far this season and shows no signs of improvement. Since the season started, he’s also seen a dramatic decrease in zone distance, partially reversing the increase from last year. He’s probably not as bad as his current TAv, since his BABIP stands at an absurdly unlucky .172. With that said, given the sub-Mendoza batting average, Ibanez is either going to improve substantially or fail to get many more plate appearances.
That fact highlights a survivor bias issue: generally, good players are going to see more pitches and get more playing time than bad players, so I’m missing players who were so bad as to be demoted. This bias could artificially inflate the accuracy of the breakout predictions. However, if I go back to the 2013 data and instead allow players with any (>0) PAs in 2013, increases in zone distance are still significantly associated with over-performance of PECOTA projections, so survivor bias is probably not sufficient to explain the predictive power of zone distance changes.
There are at least two direct ways in which teams may gather and employ information to which we don’t have access. The first is via the scouting report, which involves lots of careful observation of a hitter: tendencies, quirks, holes in their swing, and so on. That kind of assiduous study could certainly reveal weaknesses in a hitter before the general public catches on (or the results catch up). A well-trained scout could notice that Chris Davis is suddenly showing an improved mentality at the plate and relay that to the pitcher, resulting in the pitcher avoiding the zone marginally more.
The second way this change could happen is in-game. Besides scouts, both the pitcher and catcher are meticulously watching the hitter for any sign of unexpected weakness or strength. Imagine the following scenario: a pitcher takes aim at the outside edge of the plate and misses towards the middle, and the hitter punishes the mistake with a towering homer. The pitcher or catcher will take note and be a little less likely to challenge the hitter inside the zone in the following at bat.
Most likely, adjustments are made at both the scout- and player-level. Both sources probably contribute to the observed tendency of hitters with zone distance increases to break out in the following year. Anecdotally, it appears that the converse idea—that hitters with zone distance decreases are in line for decreases in performance—is also true. However, these breakdowns often involve the extra complication of injuries, and so require a more nuanced approach for prediction.
Both breakouts and breakdowns presumably involve some additional adjustment by the batter, as well, which I will examine in the future. For example, even if pitchers were hesitant to throw Chris Davis a strike in 2013, Davis had to capitalize on this by curtailing his swing rate on those out-of-zone pitches. Presumably, some cases of missed breakouts (or breakdowns) are the result of batters failing to make adjustments to their approaches to suit the new pitch mixtures they were receiving.
This response was notably the case with Albert Pujols, who took cuts at pitches farther out of the zone on average even as he was seeing pitches closer to the zone. So there’s a whole other side to this issue to address, because even as the pitcher is matching his strategy to the batter, the batter is countering that with a new set of tactics in response. Considering the matchup as a head-to-head battle of wits and execution, it’s not surprising that the pitchers would become aware of a hitter’s changes in ability long before the hitter’s outcomes improve enough to be noticed.