March 9, 2017
DRA 2017: The Convergence
Two years ago, I wrote the first DRA essay, focusing on the challenge of modeling descriptive versus predictive player performance. At the time, my prognosis for threading that needle was rather grim:
What is it, exactly, that you want to know? For example:
(1) Do you care primarily about a pitcher’s past performance?
(2) Are you more worried about how many runs the pitcher will allow going forward?
(3) Or do you want to know how truly talented the pitcher is, divorced from his results this year or next?
The reader’s likely response is: “I’d like one metric that excels at all three!” Sadly, when it comes to composite pitcher metrics, this might not be possible.
The article reviewed a variety of metrics from plain RA9 to Fielding Independent Pitching (FIP) to SIERA (Skill Independent Earned Run Average) to show that all of them made sacrifices that committed them to one direction or the other.
DRA itself has tried to ride alternate sides of this fence. In its first year (2015), we elected to focus on descriptive performance, and designed DRA to be the best descriptive metric of what had previously happened short of RA9 itself.
Last year, we began to question the value of prioritizing descriptive performance, and switched to focusing on future performance instead. Again, though, this was presented in terms of a choice: decide which direction you care about, and resign yourself to it.
As always, we prefer to measure success objectively. To do that, we use a Spearman correlation (from 0 to 1), weighted by innings pitched, for 2010 to the present, to compare metrics. When you compare FIP to last season’s DRA formula, you get the following:
On this chart, Descriptive is the correlation between the metric and the player’s runs allowed per nine innings (RA9) that same year. Reliability is the consistency with which the metric rates the same player in one year and then the next. Finally, Predictive measures the extent to which the metric corresponds to next year’s RA9.
DRA 2016 went all-in on reliability, viewing a consistent description of a player’s skills as the primary virtue of a component-based metric. In other words, we placed a priority on the same player getting assigned the same DRA for his skills one year as the next. This left FIP as the pitching estimator with the best handle on descriptive performance, but given a choice between two emphases, we thought descriptive performance was the inferior one. Furthermore, focusing on reliability gave us the ability to solve the challenge of DIPS, and better assess a pitcher’s true skill with respect to Batting Average on Balls in Play (BABIP).
But what if you no longer had to make this compromise? What if you could, truly, do a bit of everything: have a metric that accurately describes what a pitcher did while also reliably forecasting the skills that pitcher would bring with them to the future? If you didn’t have to choose between them, wouldn’t you want your measure of pitcher value to deliver both?
Of course you would. Thus, we are pleased to say that with the 2017 update to DRA, you can almost have it all. Again using seasons 2010 to the present, here are the weighted Spearman correlations for our metrics, this time including the updates to DRA:
Going forward, DRA has basically the same (actually slightly better) reliability and predictive qualities as before. But we’ve now managed to make DRA estimates every bit as descriptive as FIP, while preserving the other features that made DRA uniquely valuable. It has taken two years, but we’ve managed to solve a problem that we had written off as unsolvable.
How did we do this? Primarily by incorporating pitch classifications from PitchInfo into many of the DRA models. We no longer grade pitchers solely on the fact of an event happening, controlling only for externalities like platoon and stadium. Now, our models actively incorporate intrinsic pitcher information about the actual pitches that were thrown. Called strike probability, recently unveiled in connection with our pitch tunnels work, is now an explicit input in most models. Many models now also consider the type of pitch thrown (sinker? changeup? knuckleball?), the velocity of the pitch, the horizontal and vertical angles on the pitch, and the amount of vertical drop as the pitch approaches the plate.
Most of these characteristics are also classified (in a manner of speaking) by MLB’s PITCHf/x system, although we (not surprisingly) prefer the adjustments and re-classifications made by PitchInfo. Not all events benefit from these types of inputs, but for those that do (like home runs and other balls in play) the amount of additional information is enormously useful, and substantially responsible for the no-cost improvement in descriptive power shown above.
This year’s rollout reflects other tweaks as well. We’ve incorporated MLB Gameday’s fielding coordinates on balls in play to improve accuracy. We’ve also parallelized the 23 models inside DRA so that they can be run over the course of an hour, rather than five hours—meaning you can see updated values by breakfast each day instead of mid-afternoon. Finally, after discussion with Neil Weinberg, we’ve tweaked the formula for DRA-minus to make it more straightforward. By using a similar method to that of ERA-minus and FIP-minus, we think DRA-minus, which allows you to compare players across seasons, will be easier to understand and use.
The Effects of the Changes
What effect does this have on the numbers themselves? Let’s start with DRA, and with the pitchers who now look better, compared to where they were last year:
None of these are earth-shattering, but these pitchers benefit notably when DRA focuses more on their stuff than their outputs. Jimmy Nelson’s 2016 performance has been upgraded from abysmal to merely rather bad. Noah Syndergaard has gotten even more frightening. Chris Tillman is upgraded to average, and Jeff Samardzija becomes above average. Jake Arrieta, who was DRA’s whipping boy at the start of last year, jumps back up into the realm of quite good, although like other Cubs pitchers his results are still a bit inflated by the quality of the defense behind him.
In turn, let’s look at pitchers who took a hit:
The quality of these pitchers’ stuff belied their results. CC Sabathia, who had a large gap between his DRA and RA9 last year, has now been downgraded close to his actually-charged runs. Josh Tomlin takes a major hit as well, although he still checks in as much better than the runs charged to him. Particularly satisfying is the decline for Michael Pineda, whose outlier status last year provided sport for certain MLB Network hosts during sabermetric TV appearances. That said, DRA remains of the opinion that Pineda’s stuff is much better than his results. The Yankees' coaching staff agrees with this, and we’ll just have to see if he can prove us all right, finally.
A refreshed version of DRA means that we have also refreshed the DRA Runs table, a subset of other stats that quickly summarizes what we think will be of most interest to you. In addition to a pitcher’s team, DRA, and innings pitched, we also are providing you with (1) their runs above average in “not in play” (NIP) events (walks, strikeouts, HBP), (2) their runs above average in “hit” events (singles through home runs), as well as (3) their runs above average in “out” events. In sequence, these will tell you the general areas where a pitcher is either succeeding or getting roughed up, as compared to an average pitcher with the same opponents and stadiums.
The best pitchers tend to do particularly well in NIP runs; others (also) specialize in limiting hard contact, which is reflected in hit runs, and still others specialize in minimizing BABIP, which is reflected in out runs. These are reflected in the headings NIP_Runs, HIT_Runs, and OUT_Runs respectively. In all of these categories, negative numbers are favorable to the pitcher (good) and positive numbers are hurting the pitcher (bad).
Lastly, let’s take a quick look at the effect of these updates on DRA-minus. Since its purpose is to allow comparison across seasons, we’ll give you a short list of the updated “best seasons” since 1951, which is DRA’s current earliest season. In light of one of the names on this list, we’ll just provide this without further comment:
Why should you care?
DRA’s reliability from year to year demonstrates that it is built on a solid foundation. It achieves state-of-the-art results despite including certain baseball events (such as balls in play and home runs) that other estimators either refuse to consider or take only at face value. Balls in play do not simply cancel each other out; rather, a pitcher’s ability to control them is directly related to his success, and a quality assessment of pitcher skill should take them into account.
Some have expressed concerns about DRA’s methodological complexity. In some respects, those criticisms are fair. However, I would offer a few points in response. First, there are many baseball statistics with poorly-understood calculations (e.g., “earned” runs) which fans of all experience levels rely upon anyway. Much of our perception about “complicated” stats is based on our strong bias toward what we already know and therefore prefer. Second, the correlation data we give you provides independent verification of DRA’s accuracy and can be replicated by anyone who downloads the exact same data from our site. This allows you to have confidence in DRA’s methods without having to reverse engineer them for yourself.
Finally, I strongly believe that the last generation of sabermetric analysis, to its credit, managed to wring pretty much everything there was to be found inside plain algebra and basic linear regression. If we want further accuracy, that is going to require more complexity. You may decide that complexity is ultimately not for you, but for those who want more understanding and better analysis, increased complexity is inevitable.
The Path Forward
At this point, we don’t anticipate further changes to DRA this season. DRA does not presently incorporate exit velocity, although it’s not clear that would help anything, as there are still a lot of batted balls escaping detection. Furthermore, DRA now equals or exceeds the performance of other component pitcher metrics in the public domain, which limits our appetite for further tinkering. DRA of course remains the rate foundation for pitcher wins above replacement player (PWARP) here at Baseball Prospectus.
Nonetheless, if you think you have a good suggestion for how we can make it better, we are always all ears. Likewise, if you have any questions about these or any other changes, we hope you’ll let us know in the comments below, on Twitter, or by any of the other means we are reachable. We appreciate your continued interest and especially your financial support of our research.
Special thanks to the BP Stats team for their review and feedback.
 Again, the reason we use FIP in all of these comparisons is not to pick on FIP, but because if your proposed metric does not beat FIP in any of these three categories, you are probably just wasting people’s time.