Pitch sequencing has long lurked as a sort of terra incognita in sabermetric analysis. It’s something that all baseball folks agree is important, but it’s proved mostly impenetrable to strictly quantitative approaches. There’s an intuitive sense that sequencing must be one of the crucial determinants of pitcher success, and although we can seemingly identify a good sequence when we see one, any attempt to apply a universal criterion of good sequencing across all pitches (or pitchers) is much more challenging. The rest of this article will be devoted to applying just such a criterion, and determining whether it is of any practical utility in understanding pitching generally.
There are at least two schools of thought about pitch sequencing. On the one hand, there seems to be an appreciation for sequences that mix up locations, speeds, and breaks in unpredictable ways, on the grounds that those kinds of sequences ought to be the most challenging for a hitter. On the other hand, Mitchel Lichtman (aka MGL) has argued forcefully on the basis of game theory that the ideal sequencing would be something like weighted randomness (weighted, that is, by the quality of each pitch). MGL’s argument says that if a pitcher tried too hard to mix things up, for instance by purposefully not throwing two of the same pitch in a row, he would end up tipping the next pitch to the batter, resulting in a powerful disadvantage.
I believe that it’s instructive to consider an analogy to rock-paper-scissors. Any player of rock-paper-scissors knows that it’s best not to establish any predictable tendencies. Should a player begin to always follow his “rocks” with “papers,” it becomes a trivial exercise to counter with “scissors.” This line of argument represents the weighted randomness school of MGL. On the other hand, very sophisticated players of rock-paper-scissors can predict patterns in novices or observe them in experts; countering these subtle tendencies may involve nonrandom usage of the various symbols (i.e. rock/paper/scissors). By analogy, this latter strategy would represent the more classical school of sequencing.
It’s important to note that these two schools are not necessarily contradictory, even though they appear to be at first glance. A pitcher may be carefully crafting his sequences in a single game or against a certain batter or team—in short, on a small scale. However, these individual sequences may all even out at the scope of a full season, yielding MGL’s pattern of randomness. To that point, an important caveat for the following is that I will look at sequencing at the level of a whole season (in order to maximize sample size). But if there are interesting patterns at the scale of individual games or months or at-bats, they’ll have to wait for further analysis.
I’m going to analyze sequencing by looking only at pitch types in the context of the entropy framework I developed in my last article. Importantly, and as before, I’ll be looking only at pitch type and not considering how pitchers may vary location in their pitch sequences. In my previous post, I examined how pitchers can use different pitch types to establish entropy (a measure of uncertainty), ultimately increasing strikeouts. As before, I’ll use data graciously supplied to me by Pitch Info.
For this analysis, I’ll use another quantity borrowed from information theory, mutual information, which is closely allied with entropy. Mutual information measures the dependence the choice of one pitch has on another. If pitch two in a sequence has a certain amount of entropy (= uncertainty), then mutual information captures how much of that entropy is lost by knowing the preceding pitch. From this, we can see that there is a straightforward connection between entropy and mutual information: However well a pitcher does at confusing the batter with uncertainty, that uncertainty is reduced by the extent to which the pitcher non-randomly chooses the next pitch in a sequence based on the previous pitches.
Mutual information ranges from 0 (complete statistical independence) to the entropy of the second event (the second event is entirely determined by what happened before). When sequences are nonrandom, mutual information is high; in which case, by knowing the first in a series of pitches, a batter can guess the next pitch. On the other hand, when mutual information is low, the identity of the preceding pitches is not terribly useful for guessing the next pitch.
I’ll begin by calculating the mutual information for the first three pitches of each at-bat for all the starting pitchers in the league. Let’s recall our two schools of thought with regards to sequencing: If mutual information is near zero, sequences are effectively random; whereas if mutual information is high (for instance in the range of pitch entropies I established earlier, ~1 bit), sequences are structured so that pitches are predictable. But as with so many phenomenon, the answer appears to be somewhere in the middle.
No pitcher exhibits zero mutual information, but conversely, no pitcher is giving up enough information to establish stereotypical sequences, either. The mean information given up in the process of sequencing is about .23 bits, but the range is quite substantial, from .15 all the way to .35 bits, a considerable amount of information to give the opposing batter when you consider the effect of uncertainty on strikeouts.
The trouble with this global view is that it ignores the way context can shape a pitcher’s pattern of pitch usage. If pitch usage is constrained by the count, for instance, and pitchers regularly find themselves behind in the count, there could be the false appearance of sequences that are really driven by context. Indeed, looking at the distribution of entropy by pitch count, I find that pitchers become more and more predictable the further they get behind in the count. Each line here represents one pitcher and his particular entropy in different counts.
A way around this might be to ask whether pitchers still sequence when they are achieving their best outcomes and not constrained by the absolute necessity of throwing a strike. If at this point pitchers are adhering to sequences, it suggests that all other things being equal, pitchers prefer to mix up their pitches in more predictable, but perhaps also more effective, ways. I took the at-bats which led to strikeouts, and I looked at the mutual information in this more successful sample.
Each dot here represents one pitcher and the level of mutual information he shows overall (left) and in sequences that ended in strikeouts (right). Interestingly, pitchers seem to have sequenced considerably more in the at-bats that led to strikeouts—this was true for all pitchers, junkballers and strikeout artists alike. Consider Yu Darvish, the preeminent strikeout artist in baseball. On at-bats that led to strikeouts, his third-pitch entropy was a very strong (1.84 bits). But as it turns out, by knowing that he had thrown fastball-sinker as his first pitches, the batter could guess accurately that the next pitch would be a sinker as well, because sinker frequency after that two-pitch sequence was four times higher than his normal rate of usage. In other words, the non-randomness of Darvish’s sequences spikes in the particular situations in which he did his best work.
Sequencing and Success
It would be nice to connect all of this sequencing business to something concrete and meaningful, some statistic that summarized pitching success like FIP. The most straightforward way to do that is via linear regression. Mutual information does exercise a rather significant effect on FIP: that is, more mutual information correlates with a higher, worse FIP. We can visualize that relationship in this graph.
Sometimes, linear regression—despite its name—captures nonlinear relationships. I think that’s the case here, in the following way: Mutual information isn’t that important for most pitchers who stay in a fairly broad range in the middle, but for the tails of the distribution, mutual information can be either very bad (at the high end) or very good (towards the low end). Indeed, removing the 15 percent tails of the distribution entirely eliminates the significance of the correlation.
Who are the pitchers at the extremes of the distribution, the ones whose success appears to be correlated with their sequencing? No definite profile presents itself, but there are clues. A few notable names towards the low (good) end of the sequencing spectrum include Hisashi Iwakuma, Justin Verlander, and the two pitchers who stuck out as exceptions in my previous entropy article: Cliff Lee and Madison Bumgarner. Neither Lee nor Bumgarner seemed to have either the velocity or the entropy to be good pitchers, despite their superior skills. I posited last time that one way they might be exceeding their expected performance is by harnessing some other aspects of quality pitching, such as deception, command, or sequencing, so it’s satisfying to see them come up as exceptional sequencers. I must note, however, that this sequencing skill is not necessarily the reason why they outperform their stuff; both also possess outstanding command of the strike zone. Generally, this group of pitchers tends to be a bit on the old side—the top five pitchers with the least mutual information averaged 30.6 years of age.
This stands in contrast to the pitchers at the opposite end of the spectrum, who were giving up the most mutual information in their sequences. The bottom five guys averaged 27.25 years old and included some youngsters with varying levels of success: Randall Delgado, Henderson Alvarez, and Clay Bucholz. Further down the list, on the other hand, there were some older folks on their way out of major league careers, such as Barry Zito and Ryan Dempster. So while it’s tempting to draw a neat and tidy relationship between age and sequencing skill, the situation is probably a little more complex than that.
Sequencing in Context
It’s tough to imagine Joey Votto at the plate taking the logarithms of the frequencies of four-seam fastballs in order to predict the next pitch. But then again, maybe he doesn’t have to; maybe it’s the job of some poor scout in the Reds’ front office to pick out pitch sequence tendencies, and Votto merely knows which two pitches to look for such that the next pitch is easily predictable. Maybe, too, Votto gets a hunch once in a while, an intuition or a feeling, that directs him to be aware of the possibility of a slider more than another pitch.
That constant struggle to predict the pitcher’s next move forces the pitcher to vary his repertoire as much as he is able, given the count and the quality of his pitches. Yet, perhaps because of those constraints, no pitcher approaches the theoretical optimum of perfectly random sequencing. Returning again to the dichotomous schools of sequencing, I think my results provide something for both sides to appreciate. On the one hand, lower mutual information correlates with better pitchers (though causation is much more difficult to pin down). On the other, even when the outcome is favorable to the pitcher, pitchers aggressively structure their sequences. The results are therefore somewhat inconclusive.
There remains much to be investigated. As mentioned above, this view on sequencing ignores one of the most crucial sources of uncertainty in pitching, namely location. Understanding location in the context of entropy is very tricky, but necessary, and I’ll look to tackle that in the future. In addition, there’s an entire other party in calling pitches to whom I’ve given little mention: the catcher. Which catchers sequence well, and which poorly? Finally, it would be desirable to connect individual sequences of pitches to good or bad results, looking for situations in which pitchers establish predictable sequences and are then punished. So look upon this article as a first foray into a previously unexplored area of sabermetrics, one in which there appears to be much more to be discovered on account of the fact that there’s so little known.