keyboard_arrow_uptop

We’ve reached awards season, with the Cy Young—designated for the best pitcher in each league—due to be awarded this coming week.

In the National League, the named finalists are two Cubs (Kyle Hendricks, Jon Lester) and one National (Max Scherzer). Here is how they compare on various measures of pitcher quality:

NAME

ERA

DRA

PWARP

IP

BABIP

Team BABIP

Kyle Hendricks

2.13

3.34

4.4

190

.250

.255

Jon Lester

2.44

3.10

5.3

202.7

.256

.255

Max Scherzer

2.96

3.01

6.2

228.3

.255

.288

With pitcher wins largely discredited, pitcher ERA has become the most common measurement cited by baseball writers in casting Cy Young votes. True to form, Hendricks had the best ERA among qualified NL pitchers this year, with Lester coming in second-best. The problem is that ERA is a flawed statistic, and this year’s Cy Young finalists make those flaws clearer than ever.

Many of ERA’s problems are not news. For one thing, ERA is biased toward groundball pitchers,[1] who put more pitches on the ground, thus inevitably causing more errors—which conveniently exempt those pitchers from responsibility for runs that cross the plate that inning. Perhaps for this reason, ERA is no better and maybe even less accurate than plain old unearned RA/9 when it comes to measuring pitcher ability.[2] Most concerning for our purposes, ERA is biased toward pitchers who have better defenses,[3] which results in those pitchers getting all the credit for outs primarily generated by defense, not pitching.

That’s a particular concern this year, since the Cubs have one of the best defenses in modern baseball history; since 1980, no team has held opposing batters to a lower batting average on balls in play (BABIP): .255. The bottom line is that while ERA may be useful to summarize what happened in an individual game, it is an inferior way to look at a pitcher’s season overall. To control for all these confounders, we need to turn to something else.

Last year, we created Deserved Run Average (DRA), which makes a number of adjustments designed to tease out a pitcher’s true contribution to his team’s success. DRA adjusts for parks, quality of opponent, catcher framing, and temperature, among other things. Most importantly for our purposes, DRA controls for the quality of the defense behind each pitcher. As of this year, DRA runs separate models for putouts at each position of interest, both for single outs (all positions) and for groundball double plays (positions two through six). That amounts to 14 models dedicated solely to evaluating pitchers in the context of their respective defenders, and that’s before we even get to things like home runs, walks, or hit batsmen.

The legitimacy of DRA as a comparative tool was tested and discussed earlier this year. From 2010 through 2015, DRA was more consistent from year-to-year, and better predicted the following year’s runs allowed than any other pitching metric. Particularly relevant for our analysis here, that same article showed that DRA’s outs models were four times more reliable than BABIP as a measure of pitcher skill for balls put in play. No one statistic should govern everything, but if ERA and DRA substantially disagree on a player,[4] chances are that it's the latter which is giving you the more accurate picture.

So, with DRA both explained and in hand, let’s return to our chart. Scherzer’s DRA and ERA are almost identical, indicating that his success is no mirage. The DRA-ERA gaps for Hendricks and Lester, however, are more striking. DRA says that Hendricks would be giving up more than an extra run per nine innings on an average defensive team. Lester’s gap is not so dramatic, but like Hendricks, his ERA might very well have a “3” in front of it rather than a “2” if he were pitching for some other team. Pitchers whose ERAs begin with 3 do not win Cy Young awards in this day and age, at least not when other qualified pitchers are posting better numbers, which they certainly were this year.

The last time a pitcher won a Cy Young with such a large differential between their ERA and their DRA was … last year actually, when Jake Arrieta did so, featuring an ERA of 1.77 but a DRA of 2.89. Starting to see a pattern here? Arrieta rode a strong second-half narrative in 2015, getting credit along the way for being a wizard on balls in play–2.89 runs per 9 innings is still terrific pitching, but the Cubs' defense, particularly once Addison Russell was made the full-time shortstop, contributed substantially to the difference between the runs Arrieta “earned” in 2015 and those he actually “deserved.”

Before last year, at least in the NL, one has to go back almost 20 years—1998 to be exact—to find the last time a pitcher (Tom Glavine) had such a large gap between his ERA and DRA. This year Kyle Hendricks and Jon Lester both have a flashier pitching line than 1998 Tom Glavine, but the paths to their artificially low ERAs have a lot in common: put plenty of balls in play, and let an excellent defense do the rest.

One could certainly argue that defensive penalties like this are unfair, and that pitchers should not be penalized for pitching to the strengths of their team. A pitcher’s foremost job is to get outs, and if the best way to get outs is to get the ball on the ground, then that’s probably where a pitcher should be generating as many plays as possible. This may even have been the Cubs’ strategy this season, since they didn’t have a single starter reach even a strikeout per inning: an unusual fact for a championship rotation. Furthermore, tools like Statcast have certainly verified that generating weak contact is a skill, and Cubs pitchers have probably made at least some of their own luck in that respect.

There are two good responses to these arguments, at least in the context of the Cy Young award. The first is that there are plenty of pitchers who play in front of good defenses that nonetheless have good DRA numbers, with Tom Glavine’s own teammates being excellent examples. In front of the same good Braves defense, Greg Maddux (2.22 ERA, 2.08 DRA) and John Smoltz (2.90 ERA, 2.03 DRA) both contributed very good performances that did not benefit unduly from Braves defenders. DRA does give credit to pitchers who either substantially out-perform their team’s overall allowed BABIP (as Maddux did in 1998) or rely on other means to keep run-scoring down (as did Smoltz, whose .300 BABIP actually hurt him if anything).

The problem with Cubs pitchers is that their overall team BABIP was a ridiculously-low .255; working from this baseline, Hendricks came up with a slightly-improved BABIP of .250, and Lester basically made par at .256. The bottom line is that if a pitcher is uniquely responsible for his team’s success on balls in play, then it stands to reason (statistically and otherwise) that he should be substantially outperforming the rest of the staff in the BABIP department. When that is not happening, DRA properly credits the defense, not the pitching, for those results.

The second response is that the “E” in ERA means that only runs fairly charged to the pitcher are meant to be considered. And if we are going to credit a pitcher when his fielder screws up, it is only fair to debit that same pitcher when his fielders make him look even better than he really is. From a Statcast standpoint, while both Hendricks and Lester are above average at minimizing exit velocity, Jason Hammel, John Lackey, and most of the Cubs' bullpen were much worse—and yet the Cubs' defense still was turning an unusually-high number of balls in play into outs.

DRA was not computationally feasible 20 years ago, when Tom Glavine was being evaluated, but it is available now, and there is no reason to ignore the clear message it is sending about Cubs starters, and the fact that ERA just isn’t a fair measure of the Cubs' rotation. Measured by DRA, and certainly by strikeout rate, Jose Fernandez was the best starting pitcher in the National League. Since Fernandez unfortunately was not chosen as a finalist, the NL Cy Young probably should go to Scherzer. He deserves his 2.96 ERA, and unlike Hendricks and Lester, he accomplished it largely on his own: his .255 BABIP is more than 30 points lower than the Nationals' team average of .288.

Both Lester and Hendricks put an above-average number of balls on the ground this year, which creates a further ERA bias in their favor, and once again points to Scherzer as the more deserving candidate. Finally, Scherzer has many more innings pitched and many more strikeouts than either Hendricks or Lester. By any reasonably informed measure, Scherzer was the best pitcher of the three NL finalists.

Kyle Hendricks and Jon Lester are terrific pitchers who played key roles in ending the Cubs' 108-year championship drought. But the Cy Young is an individual award, and neither player was the best pitcher in the National League this year.


[1] This was proved by subtracting a pitcher’s RA9 from their ERA, and doing a weighted Spearman correlation, controlling for innings pitched, between the differential and a pitcher’s ground ball percentage. The resulting correlation of +.16 (on 2016 data) confirms ERA’s bias toward ground ball pitchers.

[2] Using pitchers who played in back-to-back seasons from 2010-2016, a year-to-year Spearman correlation was run, comparing the predictive force of ERA versus RA9, weighted by the average of the innings pitched in both seasons. ERA registered at .284; RA at .286.

[3] Using 2016 data, a weighted Spearman correlation, controlling for innings pitched, was performed between pitcher team ERAs and their respective team defensive efficiency. The result of -.1 confirms the seemingly obvious: pitchers with better defenses behind them have fewer runs cross the plate and get better ERAs as a result.

[4] DRA is actually indexed to RA9, not ERA, but that distinction makes no material difference for the purposes of this discussion.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
marshaja
11/14
I see end of year awards as rewarding results and DRA more of a predictive tool. Hendricks taking advantage of his good idea absolutely provided value. If a hitter goes crazy with a high BABIP, they may have provided MVP value without necessarily being predictive. All that said, Scherzer is still a worthy choice given his innings advantage, higher strikeout rate and ERA in the same general ballpark.
bachlaw
11/14
You could certainly take that position. Hitters tend to be much more responsible for their BABIP than pitchers are. However, I don't think is a "descriptive" versus "predictive" issue, and we need to be careful with those terms. A pitcher either probably prevented a certain number of runs or he did not. DRA tells you the most likely contribution of a player, based on the seasonal data. DRA says that neither Hendricks or Lester contributed as much as Scherzer, either by rate or volume.
tomshipley75
11/14
Nowhere do you mention that Hendricks averaged an exit velocity of 87.2, which was equal to Clayton Kershaw this season. He induced weak contact, which made it easier for the defenders behind him. Also, you cite that ground balls pitcher's ERAs benefit from more errors committed, but then turn around and say one reason we should not believe Hendrick's ERA is because he had an historically great defense behind him. I don't know, this article seems more a defense of DRA than an objective view of who's deserving of the Cy Young this year.
kbannon77
11/14
"DRA says that Hendricks would be giving up more than an extra run per inning on an average defensive team." An more than an extra run per NINE innings, right? Or am I crazy?
bachlaw
11/14
You are absolutely correct. I'll get that changed.
WoodyS
11/14
Congratulations on doing amazingly interesting, analytical research. Your article was fun to read, and it certainly makes a lot of sense. However, I worry that your conclusions aren't yet fully justified by the evidence. For example, while it's true that the legitimacy of DRA has been investigated, the investigation was done by the inventor rather than by a neutral researcher. Similarly, in your footnotes you talk about having "confirmed" particular hypotheses, but in reality all you've done is to provide supporting evidence. Finally, while you make a strong case that DRA is an improvement over ERA in many regards, I agree with Marshaja that such a result doesn't necessarily imply that DRA is superior to ERA for the purposes of Cy Young consideration. My guess is that DRA will keep evolving and eventually will become a commonly-used metric, but right now I'm not sure that your results justify your conclusions.
bachlaw
11/14
Woody, I have no problem with people holding their own opinion as to what statistics they find useful and which they do not. But in terms of what makes a metric valid, I will disagree with you. The validity of a metric is not a voting process or a popularity contest. And yes, the investigations were published and conducted by me but the metrics used for comparison are (1) clearly established in the statistical community as valid, (2) clearly described, and (3) can be replicated by anyone who downloads the same data off BP. They are objective. Other metrics either perform better or they do not. I would be delighted to have others run the same tests but no one has publicly done so, probably in part because they have no reason to dispute the findings. As far as the footnotes, I am a bit confused by that comment also. The quasi-population of baseball statistics over a season is what it is and the correlations between events are what they are. The results here are what we would expect, which is a good thing. I could certainly combine more seasons and run the same tests but have done that in the past and found nothing different. Thanks for reading and commenting.
lichtman
11/14
Great stuff. Agree 100%. This is a no-brainer. Not that I don't like DRA (I do, other than the fact that it's a black box to anyone other than a high level statistician), but the exact same conclusion could be reached by merely looking at team BABIP, as you did, or team UZR (or DRS). Team UZR (actually a regressed version of it) directly comes off a pitcher's RA9 or ERA to establish his role in preventing runs. As you said, the "prediction" versus "what actually happened" is not at all relevant to this discussion. ERA minus defense, or DRA, tells us EXACTLY what happened. That it also happens to be an excellent predictor is only by accident (any stat that captures "responsibility" (talent) will tend to be a good predictor, especially when context changes, like when a pitcher moves to another team or park or the defense or catcher changes). Anyway, good and correct reasoning and presentation. By all rights, Scherzer was a better, probably much better, pitcher this season than Hendricks or Lester because of the reasons you correctly articulate. Also, the notion that a pitcher "takes advantage of good defense," is 98% nonsense. To the extent that a pitcher might deliberately allow more BIP with a good defense behind him or fewer with a bad defense is de minimus. In addition, ALL pitchers would (and should) do that. If one were to do a study where a pitcher moves from a team with bad to good (or vice versa) defense, I'm pretty certain that the % of BIP he allows would not change by very much. It's simply not possible (or prudent) for it to change much. Yes, the pitching approach should change a little (based on quality of defense), but not by much. It took the voters over 50 years to discount wins in the CYA. I'm guessing it will take another 50 years to "factor out" defense and pitch framing. And they still don't fully understand park factors (or know which ones to use). BTW, in citing raw ERA for these players, shouldn't you at least have mentioned or adjusted for park effects? I know that's included in DRA but still you should have noted that even absent a significant difference in team defense between WAS and CHC, the Cubs home stadium is a large hitters park and WAS is a moderate pitchers park. Using my park factors, that creates a difference of .13 runs per 9 after park adjusting (in favor of the Cubs pitchers). Some of these commenters who simply can't wrap their heads around why team defense has nothing to do with pitching, consider the worst pitcher in baseball who plays in a park where any BIP is considered an out. Should he win the CYA with an ERA of nearly zero? What if he's smart and doesn't try to strike anyone out or walk anyone? (As I said, all pitchers would do that - you don't have to be "smart".)
bachlaw
1/08
Hi MGL, Thanks for the comments and these suggestions. You are right: when switching back to raw metrics, park effects are still worth keeping in mind.
oldbopper
11/15
The idea that statistical proof was needed to realize that pitchers had the ability to induce weak contact should never have been necessary, and the concept of three true outcomes has always seemed fallacious. Watching Mariano Rivera break bat after bat was more than enough "proof" to anybody with two eyes that he was inducing weak contact on a very high percentage basis.