I’m certainly not the first person, and maybe not even the first person whom you’ve read today, to point out that the Cubs are having an incredible season. As of the moment this sentence is being written, their third-order winning percentage is an insane 0.750, and they sit in first place on both the batting and overall WARP leaderboard (and in fourth on the pitching one). As was pointed out by Rob Arthur and Ben Lindbergh at FiveThirtyEight last week, their pitching staff’s BABIP allowed is historically low. They also are among the best all-time in outperforming their DRA, the best pitching skills estimator currently available.
It was even more extreme a few days ago, but as of Friday evening the Cubs’ RA9-DRA was -0.95—almost a full run difference over nine innings. That’s the 12th-biggest difference in the entirety of what you might call the “DRA era,” which begins in the early 1950s. Also of note, both their DRA and RA9 are lower than any team above them on that list.
On the other side of the coin, and closer to my own heart/fandom, the 2016 Twins have been thus far historically *under*performing their DRA. In reality, the Twins are allowing 5.72 runs per nine innings; DRA, on the other hand, has them pegged at 4.38, a 1.34 RA9 underperformance. Again, dating to 1953 (the first year DRA is available for), only one team has ever had an RA9 that far above their DRA: the 2007 Rays.
Seeing this huge separation between actual and expected performance led me to a question that, to the best of my knowledge, hasn’t really been definitively answered (or at least hadn’t when I began this piece)—what factors, if any, can predict a large DRA/RA9 gap (in either direction)?
First, however, there’s an important difference between RA9 and DRA that needs to be addressed. DRA is a park- and league-adjusted statistic, attempting to show what a pitcher deserved to allow, while RA9 is both raw and subject to the “run-charging” rules of baseball. Solely using team and season as categorical variables in a linear model can explain, in r-squared terms, can explain about a quarter of the difference. Unfortunately, RA9- isn’t a readily available stat anywhere I could find, so here we are.
Although I may have had some inklings about what would prove worthwhile, I started as broadly as I could with regards to which pitching stats I checked for correlation. This proved extremely useful, as something that ended up being (relatively) highly correlated to the RA-DRA difference was something I’d completely forgotten to consider until I was well underway here. Using the “lm” package in R, I simply found the r-squared value between my DRA-RA values and my stat of choice/comparison. I checked out just about everything I thought could possibly be relevant: BABIP, strikeout rate, walk rate, home run rate, double-play percentage, batted ball type profile, workload split between starters and relievers, DRA split between starters and relievers, and what total percentage of pitchers were strikes. I also dove into the PITCHf/x data, and included fastball velocity, fastball movement (both axes), fastball usage, curveball movement, and curveball usage.
For the most part, I found nothing. However, there can (hopefully) be interesting results even in what looks like nothing.
Using the complete 1953-2016 data set, to come up with anything to even talk about I was forced to set my threshold for “correlation worth mentioning” far below where I ever had it in my previous life as a chemist. In the case of batted ball types, r2 values for GB%, FB%, LD%, and Pop% came out to 0.044, 0.006, 0.001, and 0.0005 respectively—so, nothing at all. Percent of innings thrown by starter had an r2 of 0.03. Walk rate was 0.02, and strikeout rate was even worse. Of all the PITCHf/x data I included, only fastball usage had an r2 above 0.01, and even then it was only 0.07.
The biggest r2 for any stat I looked at over the whole data set was BABIP, which even then only came to 0.167. Home run rate was at 0.123, and DP% was 0.097. I strongly considered stopping after that, and telling my editor that I didn’t find anything that was even worth mentioning.
Fortunately for all of us (especially me), Rob and Ben’s piece at FiveThirtyEight, wherein they discuss factors that affect a team’s ERA-FIP difference, came out right about at the time I was considering giving up. There are multiple reasons this was useful, but I’ll start with LOB%.
As they mention, LOB% (left-on-base percentage) can be used as a proxy for the effects of sequencing. It’s only available going back to 2002, so I subsetted my data to only include 2002-present. This allowed me to look at LOB% as well as Hard%, Medium%, and Soft% from Inside Edge. Although the quality-of-contact stats didn’t show any meaningful correlation, I found (just as Rob and Ben did for ERA-FIP) that LOB% was reasonably well correlated with the RA9-DRA difference, especially relative to all the other factors I’d examined thus far—the r2 was 0.381.
The next reason I’m grateful to Rob and Ben is that subsetting the data prompted me to check whether the r2 values I found for the complete data set matched those found for the recent years only. For the most part, the answer was yes, with one enormous exception: groundball percentage. This went from an r2 of 0.044 in the full set—basically no correlation—to 0.206 in the subset! Now, 0.206 by itself is nothing to write home about, but that’s an enormous jump. It really stands out if you look at it via scatter plot.
I don’t really understand why the correlation in recent years wouldn’t match that of earlier years, but it seems to be real. I also checked the correlation between GB% and RA9 and DRA individually and could not find a matching jump, for whatever that might be worth. I’ll leave it to someone more knowledgeable on the topic to potentially find an explanation (if one is even warranted).
Lastly, I used Total Runs Saved (TRS) from Rob and Ben’s Statcast-based model to see if I could match their ERA-FIP explanatory power in the RA9-DRA relationship, and unsurprisingly I can. A linear model of the RA9-DRA difference using just TRS and LOB correlates to the actual difference with an r2 of 0.769. Adding in GB% and the pitcher SRAA gets it up to 0.854, and that’s about the best I could do. I’m a bit skeptical of these results, of course—it’s intuitively hard to trust just one partial season’s worth of data to reveal anything too deep—but at the very least it’s noteworthy. Unfortunately, since the TRS model is Statcast-based, I couldn’t see if the correlation held up over multiple years.
So, I guess I’m not entirely confident where that leaves this. As the sample got more modern, more of the difference was explainable, but the sample size also got progressively smaller, leading me to lower confidence. The Statcast-based model seems to be very promising, but it’ll be years before any definitive answer is possible there. I think the preponderance of evidence does show, though, that at least some of the difference between RA9 and DRA can be explained statistically.