July 22, 2008
Removing the Selection Bias
What's the difference between a mantra and a metric? By definition, a mantra refers to a repeated word or phrase. In baseball this is none the more evident then in the pitching coach's timeless advice to his charges, such as "throw strikes" or "get ahead." A metric refers to a measurement facilitating the quantification of a given characteristic. As we discussed last week, when mantras and metrics are combined, or when a mantra attempts to replicate itself in metric form, the result will likely be convoluted. The major factor is that the contents of the mantra are not necessarily automatic functions of the players to whom they relate. Additionally, while some may sound legitimate, they often lack a uniform definition and can clash with actual in-game strategy.
"Throw strikes," for instance, could mean several different things. It could refer to getting swings and misses, or perhaps called strikes, or it could even refer to pitches thrown in the zone regardless of the outcome. I would tend to think the latter is the most likely intended definition of this mantra, which would counteract sensible strategy; if you constantly throw pitches in the zone, hitters will gain confidence that they will not necessarily see anything outside of it. Knowing they are not too likely to see pitches out of the zone increases their ability to hone in on those thrown in the zone.
Which brings us back to A3P, or "Attack in Three Pitches," a metric developed at the University of Missouri. The statistic measures the percentage of plate appearances that result in an out within three pitches by either strikeout or outs in play, or, after the third pitch, the count is 0-2 or 1-2. Our discussion last week revolved around the inconsistency of the number itself which led to convoluted results. It combines two unrelated "skills" and, in doing so, means relatively nothing. While getting ahead may be a controllable skill, recording a high percentage of outs in play within three pitches hinges upon the defense. Via DIPS, we know that the likelihood that balls in play get turned into hits can fluctuate. What that means for A3P is that some pitchers with high A3P percentages may have been lucky that their balls in play were turned into outs more so than others.
Running correlations between A3P percentage and several statistics showed that this mantra attempting to become a metric shared quite strong relationships with stats like opponents' OPS, batting average, slugging percentage, on-base percentage, FIP, ERA, BB/9, and K/BB. In fact, the only studied numbers it didn't correlate well to were HR/9, K/9, and the ability for ERA to outperform FIP. Regardless, these relationships meant relatively nothing, because the stipulation of outs in play created a big selection bias. By only including outs in play we would be predicting success upon success-of course those with a higher percentage of outs within three pitches will tend to produce lower figures for their opponents' OPS or ERA. On top of that, since A3P measures two-strike counts through three pitches or plate appearances resulting in three-pitch outs, anything involving walks could be expected to produce very strong negative correlations.
I received a wide array of e-mails following the initial look at this theory, all of which tended to agree, and most of which offered the simple solution: replace outs in play with balls in play and re-run the tests. This would test the approach against the selection bias of predicting success upon success. If all balls in play within three pitches-along with 0-2 or 1-2 counts through three pitches and three-pitch strikeouts-produced similar correlations then perhaps something of merit would exist with regards to the attack mentality. If, however, the correlations dropped into insignificance, we would know that the previously established strong relationships were built upon the inherent bias of outs in play instead of balls in play. Here are the new correlations, removing the numbers like BB/9 and K/BB, which of course will exhibit strong relationships considering that a higher percentage of plate appearances ended in three pitches limits the amount of plate appearances even capable of resulting in a walk. On top of that, the two strikes through three pitches means a pitcher would have to lose a batter for a walk to result.
Metric A3P Correlation ERA -.273 AVG .036 OBP -.373 SLG -.058 OPS -.212 K/9 -.022 HR/9 -.132 ERA-FIP .040
Other than OBP, which just as easily could have been removed alongside K/BB and BB/9, the correlations for the most part did drop into insignificance. This tells us that the relationship between A3P percentage and these established metrics was a mirage, smoke and mirrors, largely due to the selection bias. In other words, the conclusion from last week still stands: this mantra may help younger pitchers develop the confidence in attacking hitters or working ahead, but at the major league level, it really does not go hand in hand with anything that would make us rely on it as a clear-cut indicator of success.