keyboard_arrow_uptop

Two years ago, I wrote the first DRA essay, focusing on the challenge of modeling descriptive versus predictive player performance. At the time, my prognosis for threading that needle was rather grim:

What is it, exactly, that you want to know? For example:

(1) Do you care primarily about a pitcher’s past performance?

(2) Are you more worried about how many runs the pitcher will allow going forward?

(3) Or do you want to know how truly talented the pitcher is, divorced from his results this year or next?

The reader’s likely response is: “I’d like one metric that excels at all three!” Sadly, when it comes to composite pitcher metrics, this might not be possible.

The article reviewed a variety of metrics from plain RA9 to Fielding Independent Pitching (FIP) to SIERA (Skill Independent Earned Run Average) to show that all of them made sacrifices that committed them to one direction or the other.

DRA itself has tried to ride alternate sides of this fence. In its first year (2015), we elected to focus on descriptive performance, and designed DRA to be the best descriptive metric of what had previously happened short of RA9 itself.

Last year, we began to question the value of prioritizing descriptive performance, and switched to focusing on future performance instead. Again, though, this was presented in terms of a choice: decide which direction you care about, and resign yourself to it.

As always, we prefer to measure success objectively. To do that, we use a Spearman correlation[1] (from 0 to 1), weighted by innings pitched, for 2010 to the present, to compare metrics. When you compare FIP to last season’s DRA formula, you get the following:

Metric

Descriptive
(same year RA9)

Reliability
(next year’s measurement)

Predictive
(next year's RA9)

FIP[2]

0.68

0.38

0.31

DRA 2016

0.53

0.50

0.33

On this chart, Descriptive is the correlation between the metric and the player’s runs allowed per nine innings (RA9) that same year. Reliability is the consistency with which the metric rates the same player in one year and then the next. Finally, Predictive measures the extent to which the metric corresponds to next year’s RA9.

DRA 2016 went all-in on reliability, viewing a consistent description of a player’s skills as the primary virtue of a component-based metric. In other words, we placed a priority on the same player getting assigned the same DRA for his skills one year as the next. This left FIP as the pitching estimator with the best handle on descriptive performance, but given a choice between two emphases, we thought descriptive performance was the inferior one. Furthermore, focusing on reliability gave us the ability to solve the challenge of DIPS, and better assess a pitcher’s true skill with respect to Batting Average on Balls in Play (BABIP).

But what if you no longer had to make this compromise? What if you could, truly, do a bit of everything: have a metric that accurately describes what a pitcher did while also reliably forecasting the skills that pitcher would bring with them to the future? If you didn’t have to choose between them, wouldn’t you want your measure of pitcher value to deliver both?

Of course you would. Thus, we are pleased to say that with the 2017 update to DRA, you can almost have it all. Again using seasons 2010 to the present, here are the weighted Spearman correlations for our metrics, this time including the updates to DRA:

Metric

Descriptive
(same year RA9)

Reliability

(next year of metric)

Predictive

(next year's RA9)

FIP

0.68

0.38

0.31

DRA 2016

0.53

0.50

0.33

DRA 2017

0.68

0.51

0.34

Going forward, DRA has basically the same (actually slightly better) reliability and predictive qualities as before. But we’ve now managed to make DRA estimates every bit as descriptive as FIP, while preserving the other features that made DRA uniquely valuable. It has taken two years, but we’ve managed to solve a problem that we had written off as unsolvable.

How did we do this? Primarily by incorporating pitch classifications from PitchInfo into many of the DRA models. We no longer grade pitchers solely on the fact of an event happening, controlling only for externalities like platoon and stadium. Now, our models actively incorporate intrinsic pitcher information about the actual pitches that were thrown. Called strike probability, recently unveiled in connection with our pitch tunnels work, is now an explicit input in most models. Many models now also consider the type of pitch thrown (sinker? changeup? knuckleball?), the velocity of the pitch, the horizontal and vertical angles on the pitch, and the amount of vertical drop as the pitch approaches the plate.

Most of these characteristics are also classified (in a manner of speaking) by MLB’s PITCHf/x system, although we (not surprisingly) prefer the adjustments and re-classifications made by PitchInfo. Not all events benefit from these types of inputs, but for those that do (like home runs and other balls in play) the amount of additional information is enormously useful, and substantially responsible for the no-cost improvement in descriptive power shown above.

This year’s rollout reflects other tweaks as well. We’ve incorporated MLB Gameday’s fielding coordinates on balls in play to improve accuracy. We’ve also parallelized the 23 models inside DRA so that they can be run over the course of an hour, rather than five hours—meaning you can see updated values by breakfast each day instead of mid-afternoon. Finally, after discussion with Neil Weinberg, we’ve tweaked the formula for DRA-minus to make it more straightforward. By using a similar method to that of ERA-minus and FIP-minus, we think DRA-minus, which allows you to compare players across seasons, will be easier to understand and use.

The Effects of the Changes

What effect does this have on the numbers themselves? Let’s start with DRA, and with the pitchers who now look better, compared to where they were last year:

Fullname

fld_team

DRA_2016

DRA_2017

RA9

IP

Diff

Jimmy Nelson

MIL

7.12

5.64

5.42

179.33

-1.48

Noah Syndergaard

NYN

3.39

2.67

2.99

183.67

-0.71

Chris Tillman

BAL

4.75

4.06

3.82

172.00

-0.69

Jason Hammel

CHN

5.44

4.78

4.16

166.67

-0.66

Jeff Samardzija

SFN

4.38

3.73

3.90

203.33

-0.66

Jake Arrieta

CHN

3.93

3.38

3.28

197.33

-0.55

None of these are earth-shattering, but these pitchers benefit notably when DRA focuses more on their stuff than their outputs. Jimmy Nelson’s 2016 performance has been upgraded from abysmal to merely rather bad. Noah Syndergaard has gotten even more frightening. Chris Tillman is upgraded to average, and Jeff Samardzija becomes above average. Jake Arrieta, who was DRA’s whipping boy at the start of last year, jumps back up into the realm of quite good, although like other Cubs pitchers his results are still a bit inflated by the quality of the defense behind him.

In turn, let’s look at pitchers who took a hit:

Fullname

fld_team

DRA_2016

DRA_2017

RA9

IP

Diff

CC Sabathia

NYA

3.16

4.08

4.16

179.67

0.91

Josh Tomlin

CLE

3.53

4.40

5.02

174.00

0.87

Cole Hamels

TEX

2.84

3.48

3.72

200.67

0.65

Kenta Maeda

LAN

2.66

3.26

3.69

175.67

0.59

Michael Pineda

NYA

2.95

3.49

5.02

175.67

0.54

Gio Gonzalez

WAS

3.50

3.97

4.97

177.33

0.48

Collin McHugh

HOU

3.89

4.36

4.48

184.67

0.47

The quality of these pitchers’ stuff belied their results. CC Sabathia, who had a large gap between his DRA and RA9 last year, has now been downgraded close to his actually-charged runs. Josh Tomlin takes a major hit as well, although he still checks in as much better than the runs charged to him. Particularly satisfying is the decline for Michael Pineda, whose outlier status last year provided sport for certain MLB Network hosts during sabermetric TV appearances. That said, DRA remains of the opinion that Pineda’s stuff is much better than his results. The Yankees' coaching staff agrees with this, and we’ll just have to see if he can prove us all right, finally.

A refreshed version of DRA means that we have also refreshed the DRA Runs table, a subset of other stats that quickly summarizes what we think will be of most interest to you. In addition to a pitcher’s team, DRA, and innings pitched, we also are providing you with (1) their runs above average in “not in play” (NIP) events (walks, strikeouts, HBP), (2) their runs above average in “hit” events (singles through home runs), as well as (3) their runs above average in “out” events. In sequence, these will tell you the general areas where a pitcher is either succeeding or getting roughed up, as compared to an average pitcher with the same opponents and stadiums.

The best pitchers tend to do particularly well in NIP runs; others (also) specialize in limiting hard contact, which is reflected in hit runs, and still others specialize in minimizing BABIP, which is reflected in out runs. These are reflected in the headings NIP_Runs, HIT_Runs, and OUT_Runs respectively. In all of these categories, negative numbers are favorable to the pitcher (good) and positive numbers are hurting the pitcher (bad).

Lastly, let’s take a quick look at the effect of these updates on DRA-minus. Since its purpose is to allow comparison across seasons, we’ll give you a short list of the updated “best seasons” since 1951, which is DRA’s current earliest season. In light of one of the names on this list, we’ll just provide this without further comment:

fullname

fld_team

year

DRA_minus

DRA_final

IP

Greg Maddux

ATL

1995

32

1.48

209.7

Pedro Martinez

BOS

2000

34

1.69

217

Pedro Martinez

BOS

1999

35

1.71

213.3

Randy Johnson

SEA

1995

37

1.70

214.3

Randy Johnson

ARI

2004

38

1.74

245.7

Curt Schilling

ARI

2002

40

1.78

259.3

Randy Johnson

ARI

2001

42

1.94

249.7

Erik Bedard

BAL

2007

43

1.95

182

Greg Maddux

ATL

1994

43

2.03

202

Pedro Martinez

MON

1997

44

1.95

241.3

Ben Sheets

MIL

2004

45

2.08

237

Randy Johnson

SEA

1994

46

2.15

172

Randy Johnson

ARI

1999

46

2.28

271.7

Jose Fernandez

MIA

2016

46

2.00

182.3

Greg Maddux

ATL

1996

47

2.26

245

Why should you care?

DRA’s reliability from year to year demonstrates that it is built on a solid foundation. It achieves state-of-the-art results despite including certain baseball events (such as balls in play and home runs) that other estimators either refuse to consider or take only at face value. Balls in play do not simply cancel each other out; rather, a pitcher’s ability to control them is directly related to his success, and a quality assessment of pitcher skill should take them into account.

Some have expressed concerns about DRA’s methodological complexity. In some respects, those criticisms are fair. However, I would offer a few points in response. First, there are many baseball statistics with poorly-understood calculations (e.g., “earned” runs) which fans of all experience levels rely upon anyway. Much of our perception about “complicated” stats is based on our strong bias toward what we already know and therefore prefer. Second, the correlation data we give you provides independent verification of DRA’s accuracy and can be replicated by anyone who downloads the exact same data from our site. This allows you to have confidence in DRA’s methods without having to reverse engineer them for yourself.

Finally, I strongly believe that the last generation of sabermetric analysis, to its credit, managed to wring pretty much everything there was to be found inside plain algebra and basic linear regression. If we want further accuracy, that is going to require more complexity. You may decide that complexity is ultimately not for you, but for those who want more understanding and better analysis, increased complexity is inevitable.

The Path Forward

At this point, we don’t anticipate further changes to DRA this season. DRA does not presently incorporate exit velocity, although it’s not clear that would help anything, as there are still a lot of batted balls escaping detection. Furthermore, DRA now equals or exceeds the performance of other component pitcher metrics in the public domain, which limits our appetite for further tinkering. DRA of course remains the rate foundation for pitcher wins above replacement player (PWARP) here at Baseball Prospectus.

Nonetheless, if you think you have a good suggestion for how we can make it better, we are always all ears. Likewise, if you have any questions about these or any other changes, we hope you’ll let us know in the comments below, on Twitter, or by any of the other means we are reachable. We appreciate your continued interest and especially your financial support of our research.

Special thanks to the BP Stats team for their review and feedback.



[1] Ahmad Emad & Paul Bailey (2016). wCorr: Weighted Correlations. R package version 1.8.0. https://CRAN.R-project.org/package=wCorr.

[2] Again, the reason we use FIP in all of these comparisons is not to pick on FIP, but because if your proposed metric does not beat FIP in any of these three categories, you are probably just wasting people’s time.

You need to be logged in to comment. Login or Subscribe
Grasul
3/09
I'm not sure if its related, but if I look at Forecast in the Fantasy section now, pitchers' forecast HR/9 is up a ton across the board.
mcquown
3/09
Currently re-running PECOTA, and for the time being Depth Charts and PFM have been turned off. I will cut a new spreadsheet today, with the impact of the DRA changes included.
ErikBFlom
3/09
Do the prior evaluations of pitchers (tiers, three-year predicted value) incorporate this?
bretsayre
3/09
There will likely be a few pitchers who stand a slight reevaluation, but as Jonathan mentions in the article, the majority of these fall into the "doesn't really change much" bucket. If, as we continue to review it, there end up being enough pitchers with interesting ticks up or down, we'll write some fantasy content specifically about it.
buckb2
3/09
Just to be clear, are you saying that all of the pitcher projections for 2017 are being updated? If so, seems like you buried the lede!
mcquown
3/09
Absolutely. As the projections are based on DRA components, and those have all changed. It wouldn't make any sense to have projections based on components we've improved upon. This is a big change all around - since all the past data is impacted (DRA and related stats, from which WARP is computed). And projections are based on a model which utilizes much of the past data.
kvamlnk
3/09
This seems to suggest that DRA is now preferred over cFIP as a predictor. Is that true??
bachlaw
3/10
Great question! cFIP remains the most reliable estimator from year to year (.6); it takes a hit on descriptive power (.51) and is still competitive on next year's RA9 (.34). These numbers are using the same seasons and method described in the article.
newsense
3/09
When you showed pitchers whose DRAs changed significantly, they all got closer to their RA/9. Were there no pitchers with large changes in DRA that moved farther from RA/9? It seems too good to be true if that didn't happen occasionally.
bachlaw
3/10
My sense was that there were definitely some of both; on balance the trend of moving toward the RA9 is what we would expect with an increase in descriptive power.
dethwurm
3/09
If DRA goes back to 1951 but incorporates PitchInfo material, how are you handling pitchers from before the Pitch F/x era? I apologize if this is answered elsewhere but I clicked through the linked material and couldn't find anything about it. I guess I find it curious that apparently all of the top 15 pitching performances ever have happened since 1994 - it perhaps suggests that whatever proxy/adjustments that are being used instead of the actual data strongly favors good pitching performances in a hitting-friendly eras?
bachlaw
3/10
No I should have been more clear about this in the article. The DRA adjustments described in the article only apply to seasons 2008 and onward. Previous seasons use essentially the previous formula. So, less descriptive power, still very reliable and predictive, as compared to other metrics. Thanks for letting me clarify this.
pblabs2
3/10
I have the same question as labrat21. Is DRA your preferred predictor to cFIP? And will any of the DRA work - which is superb - result in reanalyzing th cFIP computation or is that invariant to the DRA work
Junts1
3/10
DRA was already the basis of PECOTA, as a big popup on the PECOTA card warns you since that change was made.
pblabs2
3/10
Sure, but that's not my question. The first DRA article says that cFIP has an "edge in predicting future scoring and pitching talent," and wonders whether DRA might provide an even bigger edge. The new article compares DRA to various other things, including FIP, for predictive accuracy, but doesn't compare it to cFIP. You can use DRA or anything else you like for PECOTA; the question is whether it's more predictive than cFIP, and unless I'm missing something you haven't answered that question.
bachlaw
3/10
pblab2: I think I answered this above. cFIP at this point is a subset of DRA using only the models for home runs, strikeouts, walks, and hit batsmen. So it does still have a reliability advantage over DRA.
BleedingBlue88
3/23
But does this reliability advantage mean that cFIP is a better predictive stat overall? In your 7/22/16 article "Challenging the Citadel of DIPS" you wrote: "cFIP maintains an advantage over DRA in year-to-year reliability, smoking all competitors, including xFIP and SIERA in that regard. This arguably means that cFIP remains useful, at least for that purpose, but it is noteworthy that this increased reliability does not come with added ability to predict future pitcher runs allowed." So if cFIP does not do a better job of predicting future pitcher runs allowed, why is it still the better tool for predictive purposes? Maybe I just don't understand what cFIP's "increased reliability" means in this context. If cFIP is a better stat is better than DRA for the purpose of reliability, what is reliability? Hope that question made sense.
mattgold
3/10
Why is it that all of the top pitcher-seasons since 1951 are from 1994 and afterward? No showing from Koufax or Gibson? Does this tell us more about the pitchers or about the yardstick?
mcquown
3/10
This is definitely an avenue for further research, but is it really unexpected that the rate stats of past-era pitchers would have suffered, given what we know now about the impacts of starting on 3 days rest and going deeper into games?
bachlaw
3/13
I think it tells us how much better pitching has become. The top seasons actually coincide with some of baseball's highest-scoring seasons, which makes sense: pitchers who achieve epic results while the average pitcher is allowing many more runs by definition are the best pitchers in the game. Koufax does have a season in the top 30 or so, but let's face it: Koufax's numbers would be a lot more impressive in the 2000 season than they were in an era of depressed hitting. Minus metrics define greatness by how much better you are than your peers in a given season, so people from Koufax's era have a hard time measuring up. I suspect that the best pitchers, by virtue of throwing fewer innings, are also throwing better and harder overall, but that doesn't need to be true to make the point.
marctacoma
3/10
Jonathan et al., Just eyeballing the DRA and pWARP leaderboards, it would appear that, in general, many of the top pitcher seasons are now slightly less valuable. That is, in the last iteration, RJ and Schilling's 2001 seasons were worth 11 or 12 WARP,and now they're like 10.5 and 9.2 or so. Is this delta redistributed to the position players? That is, in an earlier version, RJ/Schilling's DRA was lower, and their catchers were higher as some portion of their success was attributed to the good framing of Damian Miller. In the next iteration, their WARP increased significantly, and I assume ( but don't know) that Miller's WARP would drop as a result. This version lands somewhere in between, but for a guy like Arrieta - does his increased WARP come at the expense of the Cubs defenders/catcher? Have any WARPs for Mets position players changed as a result of Thor's WARP increasing? Or are they pretty independent, and, FRAA needn't line up with the defensive numbers implied in DRA?
bachlaw
3/13
Hi, it is a zero-sum game, so wins that were taken away from a pitcher would be getting redistributed elsewhere. At this point we haven't studied where they went, but off the top of my head I would certainly expect them to be redistributed into some other area of run prevention.
MateoM
4/16
How are the changes in measuring pitch velo impacting DRA?
bachlaw
5/22
Hi Matt, Sorry I did not notice this earlier. Fortunately, our pitch data all goes through Harry Pavlidis first, so it is fairly seamless, once the corrections get made.
lmalone2424
5/09
Jonathan, Awesome article. I am a little confused after reading the comments though, and came up with a simple question that could help everyone out. If I want to look at one statistic at the all-star break, that will be the best predictor of how that pitcher will perform over the second half of the season, what statistic should I use? DRA, cFIP or SIERA? Thank you.
lmalone2424
5/21
John, An answer here would be huge for the kid. I'd really appreciate.
bachlaw
5/22
Boy that is tricky. I honestly would look at all 3 and hope to get similar signals. If one of them disagreed significantly, I would look into why that is and perhaps use that to make your final decision.