June 12, 2014
DIPS, Random Variation, and the Matt Cain Quandary
When Matt Cain signed his $127.5 million contract extension in April 2012, Colin Wyers memorably tweeted: “If your response to the Matt Cain extension involves xFIP I'll be by later to pour coffee on your keyboard.” As someone who was still relatively new to the field of sabermetrics, seeing that was a watershed moment in that it signaled a substantial retreat from our previous collective understanding that DIPS statistics were unambiguously superior to ERA as measurements of pitching ability.
In the two years since Wyers’ excoriation of those who still judged Cain by his peripherals despite his history of outperforming them, I’ve noticed that the best analysts in our midst have continued to trend toward nuance in their discussions of DIPS theory, looking for qualities and characteristics that explain the outliers both in the stats and from personal observations. This ever-increasing blend of sabermetrics and scouting is unambiguously good for bettering our understanding of baseball.
Yet I wonder if, in our collective quest to bridge the gaps between traditional baseball wisdom and sabermetric logic, we are failing to see the forest for the trees. Specifically, I fear we are becoming far too quick to identify outlier pitchers as exceptions to DIPS norms rather than understanding them to be manifestations of typical population variance. Some basic probability can show us that it is a lot harder to demonstrate results incompatible with the DIPS theory than we seem to think.
The Starting Assumptions
In recognition of the intuitive and empirical logic in different types of pitchers having different relationships with their DIPS numbers, I divide the population of MLB pitchers into three hypothetical groups with the following characteristics and proportions:
· The Normals: Pitchers whose true-talent levels are more or less accurately described by their DIPS numbers. In any given season, each has a 50-percent chance of having an ERA below his DIPS numbers and a 50-percent chance of having an ERA above his DIPS numbers. I’d guess that 60 percent is a conservative estimate for the proportion of MLB pitchers who fall into this category.
· The Overachievers: Pitchers whose true abilities to prevent runs are underestimated by their DIPS numbers. In any given season, I assume that each has a 75-percent chance of having an ERA below his DIPS numbers and a 25-percent chance of having an ERA above his DIPS numbers. My intuitive guess is that about 15 percent of MLB pitchers are Overachievers.
· The Underachievers: Pitchers whose run-prevention skills are overestimated by their DIPS numbers. In any given season, I assume that each has a 25-percent chance of having an ERA below his DIPS numbers and a 75-percent chance of having an ERA above his DIPS numbers. I’d estimate that around 25 percent of MLB pitchers fit this description. (The asymmetry in my theoretical proportions of Over- and Underachievers reflects the fact that, in baseball, it is a lot easier to be significantly below average than it is to be significantly above average.)
Categorizing pitchers this way is an oversimplification, and these numbers are nothing more than educated guesses, but for the purpose of some exploratory calculations I think it’s a fair description of the population of MLB arms.
When Does Overachieving Signify DIPS-Beating Skill?
Imagine a young pitcher named Wendy from my beloved hypothetical E Street League. Wendy gets the call for the Opening Day roster and outperforms her DIPS numbers in her first season. What are the odds that she is an Overachiever? The probability of her beating her DIPS stats if he is an Overachiever is 75 percent. Multiply that by the 15-percent chance of her being an Overachiever and divide by the general population’s 47.5-percent chance of beating DIPS, and the odds that she’s an Overachiever are 23.7 percent. So after one year of observed overachieving, the odds of a pitcher being a true Overachiever are less than one in four.
Now say Wendy keeps it up for a second year—what could we infer about her true nature then? Overachievers have a 56.2-percent chance of beating their DIPS numbers two years in a row, compared to 25-percent and 6.3-percent odds for Normals and Underachievers, respectively. Yet because Overachievers are but a minority of the population of MLB pitchers, Wendy’s odds of being one of them are just 33.8 percent — barely over one in three.
So how long would it take before we could confidently describe Wendy as an Overachiever? Here’s a look at how the probability increases over time:
When Does Underachieving Signify Lack of Skill?
If Janey failed to improve relative to her DIPS numbers the next season, her odds of being an Underachiever would rise to 46.9 percent—still less than the probability that she is a Normal. Only after Year 3 would Janey’s chances of being an Underachiever exceed 50 percent, and it would take 10 seasons to reject the null hypothesis that she is a Normal. Here’s a look at how the odds would change over time:
Complications and Caveats
More importantly, there is more to the art of categorizing pitchers than seeing where their ERAs and DIPS numbers end up. For example, if you watch an otherwise-great pitcher and notice that he throws more than his share of mistake pitches, you could expect to see him underperform his DIPS numbers. I suspect that these kinds of observations are sometimes the tail wagging the dog as analysts look to explain the ERA-DIPS disparities they are already observing, but they still matter.
But the imprecision of this model doesn’t undermine the basic point: DIPS theory needn’t be uniform in its empirical manifestations for us to conclude that it is true. Even over several seasons, random variation can still have a substantial impact, so a given pitcher’s apparent nonconformity doesn’t necessarily mean he’s an exception to the rule.
The Bigger Picture
“There are things that are generally publicly held as sabermetric doctrine—in some cases, crucial underlying assumptions—that are demonstrably false,” Russell Carleton wrote upon returning to the public world after consulting stints for MLB teams. These words are humbling. There is a lesson in them to be open-minded about ideas both old and new and to constantly think critically not just about the old traditions we believe to be outdated but of the studies and conclusions we perform and draw ourselves. (From my heretofore more-limited experience working on the inside, I would wholeheartedly agree with his advice.)
But it doesn’t mean that we need be deferential by default, especially when the contrary evidence is anecdotal. Some (if not most) of the greatest advances in sabermetric thought have come from looking at the game through a wide-angle lens and not giving a damn about how the happenings of the game felt to the players and coaches and fans. No formula will ever be able to encompass everything that happens in a baseball game (let alone the years of preparation that go into the construction of every roster before a team takes the field), and no sabermetrician should ever think of a model as not needing to be improved or a conclusion as too sacrosanct to test again. But a smart sabermetrician knows not only when to acknowledge when the facts contradict him or her but when not to be dissuaded by insufficient evidence.
Now think back to Matt Cain: So convinced is the sabermetric community of his exceptional DIPS-beating skill (despite lacking a clear explanation for how he does it, to my knowledge) that one of the most prominent analysts in the field didn’t think his xFIP was even worth considering. Yet according to this model, the odds that Cain was an Overachiever after beating his DIPS numbers five years in a row (as his streak was at the time of Wyers’ tweet) were just 65.2 percent. So while a Cain skeptic might have had coffee on his or her keyboard, these numbers say he or she would have had better-than one-in-three odds of being right.
I don’t believe the odds that Cain’s DIPS-beating skill is an illusion are really as high as one in three, and a more complicated model for assessing the probability that a pitcher is an Overachiever would likely bear that out. But even if the odds of his being a Normal are even half of that, then the certitude with which we as a community have asserted that Cain is special is undeserved. And that speaks to the fact that we have far more confidence in the anecdotally empirical and the subjectively perceived than they should warrant in the face of mathematical logic.
This is the last time I’ll be able to discuss baseball publicly for the indefinite future, so after one final plug for my senior thesis (there’s even an abridged version now!) I want to share my favorite quote about the game, from Dick Cramer: “Baseball is a soap opera that lends itself to probabilistic thinking.” A good analyst will never dismiss the value of a scout or assume that a model that describes some aspect of the game cannot be improved upon. But if our quest to explain baseball takes us so far into the rabbit hole that we mistake ordinary random variation for causal trends, that’s not nuance—that’s overfitting to the data.