One of my favorite essays on any topic is Nate Silver‘s “Is Barry Bonds Better Than Babe Ruth?” from Baseball Between the Numbers. Rather than rehashing the same tired arguments about how much harder it was to hit a home run in Ruth’s time or how much better the competition against Bonds was almost a century later, Silver uses a variety of metrics to demonstrate how the two players would actually perform on a level playing field.
The two players are very close until Ruth’s pitching is taken into account. While his hitting is examined in great detail, his pitching is only valued via NRA. NRA has nothing to do with rifles; it actually stands for Normalized Runs Allowed, which basically takes Ruth’s run prevention skills from the Teens and tells us what they would look like in a more typical “all-time” Major League season, adjusting for his park and league along the way. An NRA looks like an ERA in a normalized environment. An average pitcher will have an NRA of about 4.50. While I don’t find fault with any of Nate’s conclusions, NRA does have some inherent issues that could make it a faulty metric when we want to see the quality of a pitcher’s work. It adjusts the runs to more recent times, but doesn’t take the player’s peripheral stats into account. In other words, if a player had incredibly good luck piled on top of good defense, NRA would like him a lot more than it should. All of this got me thinking about Babe Ruth and how he would really fare against not just a level playing field, but also the players of today.
In order to know how Ruth would fare we have to first become acquainted with Davenport Translations, or DTs. If you’ve ever walked into a sports bar, you probably heard some drunk guy shouting about how steroids are ruining the “statistical integrity” of the game. Minding your own business, you knew with certainty that the game never had the type of statistical integrity the fellow was shouting about. A home run in 1920 was different than one in 1960, which in turn is different from one hit today. DTs account for just that. They convert the numbers for all players in all time periods to tell us how they would perform in a neutral environment.
There is a DT for ERA, but it is essentially very similar to NRA-it doesn’t account for the pitcher’s peripheral numbers. Fortunately, those peripheral numbers are also included in the DTs. K/9, BB/9, and HR/9 are the central stats that the pitcher has control over. They are also the stats we need to look at to find out how Ruth would have pitched today.
We’ll concern ourselves with the years 1915 through 1919 when Ruth did the vast majority of his pitching. He had 3.70 K/9, 3.22 BB/9, and 0.09 HR/9. Anyone who’s looked at a box score of a game that Mike Pelfrey has pitched knows a lot about these numbers. Having a similar number of walks and strikeouts is bad, but on days when the pitcher in question doesn’t allow any home runs, they look like some kind of all right.
Ruth didn’t allow homers because of the era he pitched in, but a look at the Davenport Translations of these numbers from Ruth’s DT page tells us that he wasn’t as bad of a pitcher as the Ks and BBs seem to suggest. His average DT rates for the five years we are concerned with were as follows: 5.56 K/9, 3.46 BB/9, and 0.99 HR/9.
Of course, the homers shoot up because the neutral environment reflects how much power has gone up in the years since Ruth pitched, and strikeouts come up as well. Ruth pitched at a time when pitchers didn’t get high K rates. One reason is that they were expected to throw hundreds and hundreds of innings. For example, the Babe threw 317 in 1917. We couldn’t expect him to keep up a high K rate any more than we might expect Mariano Rivera to keep up his if he had to throw 6 innings every time he came out.
The DT rates exist on a level playing field. It is a sort of imaginary “all-time” season where the average pitcher has rates of 6.00 K/9, 3.00 BB/9, 1.00 HR/9, and a 4.50 ERA. We can use these numbers to compare whichever pitcher we’re looking at. In this case, Ruth has walk and K rates that aren’t bad but still aren’t as good as an average pitcher’s rate. Although the average DT rates represent those of a typical environment, I’m not sure where we can find a real season exactly like it. Nevertheless, rates in the last ten years of Major League Baseball aren’t entirely dissimilar: 6.40 K/9, 2.91 BB/9, 1.04 HR/9 and a 4.17 ERA (I am only focusing on starting pitchers who logged at least 150 innings here since their job is more similar to Ruth’s). Clearly, the numbers aren’t that far off, except maybe the ERA, which doesn’t concern us since we’ll figure out our own from the peripheral stats.
The first thing we need to do is convert Ruth’s numbers from the neutral setting to the contemporary setting, i.e. the last ten years. For the mathematically interested: the ideal way to do this is through the use of standard deviation. But in order to do that we need a season very similar to the neutral setting, and I don’t have one on hand. That said, the neutral stats are similar enough to the actual stats that the quick and dirty method of “splitting the difference” will suffice and give us numbers that aren’t far off. For example, we saw that the average K rate changed from 6.00 in the neutral DT environment to 6.40 in the last ten years. We can assume that hitters strike out a bit more often these days (see Howard, Ryan), but pitchers are also probably more talented and Ruth’s level of talent would remain the same. So we’ll split the difference and bump up his K rate by 0.20, half of the full 0.40.
Ruth gets an altered line of 5.76 K/9, 3.41 BB/9, and 1.01 HR/9. These numbers aren’t sexy and the Babe certainly wouldn’t be confused with a Cy Young candidate, but pitchers like Jeff Suppan and Miguel Batista have made careers out of lines like this, and it is certainly better than replacement level.
Let’s take an even closer look to see what kind of ERA the Sultan of Stock Pitching would have in the years 1999 to 2008. Our typical ERA is 4.17 so we’ll start there. Last week, I went over the value of preventing home runs. Ruth’s home run rate is not only estimated without the benefit of groundball rates, but it’s also so close to average that it’s not going to make much difference, in terms of accuracy, to apply it here. However, in the spirit of being thorough while assessing a pitcher’s ERA, we’ll apply his home run rate and find that he comes down to 4.13.
The methodology I used to find how much a good HR/9 lowers an ERA may also be used to find how much it is affected by Ks and walks. Using the pitchers in the last ten years that we are comparing Ruth against, each K/9 below or above average tended to respectively increase or decrease an ERA by 0.17. For instance, if Ruth had 5.40 K/9, exactly one less the average of 6.40, we’d raise his ERA by 0.17. As it stands, he has 0.64 K/9 less than the typical starting pitcher so his ERA is raised by 0.11 to 4.24.
Each BB/9 will tend to increase an ERA by 0.29. Ruth would walk 0.50 hitters more than average according to our conversion, so he gets a final lift of 0.15. Ruth’s ERA stands at 4.39. The Bambino might have been what we like to call an innings eater: he eats up innings and vomits runs, but at least he makes it through enough of them to give the bullpen a rest.
Of course, we looked at all Ruth’s pitching years as a whole. Let’s see how he would have done as a contemporary pitcher in each of the individual seasons he pitched:
Year K/9 BB/9 HR/9 ERA 1915 6.41 3.50 1.20 4.56 1916 7.10 3.34 0.74 3.73 1917 5.91 3.17 1.03 4.30 1918 4.01 3.16 1.17 4.82 1919 3.56 4.33 1.04 5.05
Looking at him on a year-to-year basis, a clear pattern comes into focus. Ruth had a nice year in 1916, but the following years don’t reflect a positive trend. Even with a healthy conversion rate, his strikeout rate dips into the Carlos Silva danger zone. Meanwhile, the walks and home runs don’t do him any favors either.
Earlier, I mentioned Ruth’s NRA and DT ERA and how they convert his actual ERA, while ignoring peripheral stats. Here, we find that the above ERAs, based on his translated peripherals, are higher than the two former metrics suggested, even despite the lower baseline ERA in the last ten years. The fact that Ruth’s translated earned run averages come up better than they should indicates that Ruth’s ERA was better than it should have been in the years he actually pitched. In other words, his pitching was somewhat overrated even in his own time because of good luck and defense.
Ruth famously quit pitching so he could hit every day, but his would not be a memorable pitching career whether or not it had continued. If he pitched like this in the twenty-first century, he’d have a job, but by his fifth year, he’d be borderline and would probably be looking at the end of a shorter career than he might have liked. In his day, he would have had a longer pitching career that stretched into the twenties, but he wouldn’t be a very notable figure in sports history. Good thing he could hit a little bit too.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
I mean Babe Ruth was in the top 10 in wins 3 times, ERA 3 times, WHIP 3 times, hits per nine 4 times, Ks per nine 2 times, IP twice, complete games 3 times. No matter how you translate them, does this look like Jeff Suppan, Miguel Batista, or Carlos Silva?
My comments:
Wow. After hitting a home run last week with his delightful Brett Cecil article, Brian completely lost my interest around the mid point of this article. I couldn't finish it.
Content: C - way too much about so little. Apparently, Brian came up with a well calculated conclusion about Ruth's performance as a pitcher. I didn't have the patience to get there.
Writing B - Brian writes clearly, but lacked his usual spark of personality.
Granted, you did the best with a very tough area to translate, my only problem is the conclusion drawn. That you have convinced many posters here that Ruth wasn't that good of a pitcher is a little troubling to me, because I'm not sure that is correct.
Of course it's not really fair. Take a young Walter Johnson to the present day and give him a modern ballplayer's upbringing and he could be the same Cy Young-winning legend. Or perhaps he'd never get the feel for a change-up and wash out of Rookie ball. It's impossible to say. But that's what this article was trying to do -- take a stab at what a pitcher with Babe Ruth's (pitching) talents might look like today. It's not necessarily terribly meaningful, but I thought it a fun exercise.
What we know is that he was one of the most dominant pitchers of his day. How does his dominance compare to Koufax's, or Santana's? That's what translations are for.
Nice job.
Easy thumbs up... but I'm running out of thumbs.
I will say this though - I couldn't write even one structurally sound or interesting piece - so I give mega-kudos to all of the contestants. There have been some very good works by the remaining writers in previous weeks, thank you all for some good reads.
Paragraphs 2 through 9 dragged on and on; there has to be a more concise way to say that. Especially since, as you nearly said explicitly, you simply can't translate Ruth's K/9 or HR/9 to a translated modern equivalent and make comparisons to modern pitchers with those rates.
Modern pitchers are effective by minimizing baserunners and HR, because crooked numbers lurk in every PA. Dead-ball pitchers had different primary worries, and were successful in different ways. Was Ruth declining in his ability to do the things that made pitchers successful in 1917? It would have been nice to learn the answer to that.
Also, I must disagree with the idea that the discussion should have been significantly condensed. It probably could have done without the little rehash of what DTs are for -- we're at BP, we know that -- but that's nitpicking. When you're doing your own translation, as Brian is here, you need to be really explicit about what you're doing and why. he managed to do it thoroughly without getting caught up in the minutiae. Superb job.
However, given the way his conclusion flies in the face of conventional wisdom, it would be stronger if he included some translations of others from Ruth's pitching era, be it good pitchers, like Stan Coveleski and Hippo Vaughn, or inarguable greats, like Walter Johnson and Pete Alexander.
It would be interesting to see if Johnson graded out as a number 3 starter. If he did, then I think, though the methodology is mathematically sound, further tweaking to compensate for the strikingly different era is warranted.
Still, this was overall a fine piece of writing, as Brian does seem to have kicked his writing up a few notches these past 2 weeks.
I don't think it is OK, given that the Babe finished 2, 1, 3, 4 in H/9 in his four full seasons of pitching. He's much better than teammates at preventing hits in '16-'18 (15-20% better than the rest of the team, and none of the other starters were bad pitchers). So, the more I look at it, the more I have to question that it's not at least addressed that his record is built by avoiding hits, whether by luck or skill. I mean, how else does a pitcher with average HR, BB and K rates get so much better results than the other pitchers on the staff?
I don't understand the shift from league averages (via DT) to a sample that only includes 150+ innings hurlers...that's definitely not kosher. Maybe there's no difference between this 150+ IP sample and league average, but you don't say that.
And it has to be worth a word on the HR translation being is a bit silly because we're talking about 8 homers in 1200 innings. And 0 HR in 1916 translates into 0.72HR/9 somehow...I understand the idea, but a few caveats seem in order...obviously these translations break down at certain extremes, and HR rates from the deadball era is one of them. For example, if he'd allowed 1 HR that year, he would have had a MUCH higher translated ERA for you, right?
The writing is clear and concise, though more slangy than I like it. I certainly never doubt what you're trying to say, which is a solid plus.
I don't think you've proved your point, and I get the feeling that the stuff you've left out was left out because it doesn't fit your conclusion...and that's no good.
Could the Babe limit hits of opposing batters in his era? Maybe. His hit translations come up to around 8 H/9. That could just as likely be due to luck and defense. In either event, it is unlikely that the ability would convert to today's game so I needed to omit it in order to eliminate luck and defense from the equation, which, of course, was part of the point of the article.
As far as using starting pitchers in my comparison, the DT Translations make him into a starter and convert his innings. Furthermore his original stats as a starter were converted. That is to say that relief pitchers have different data attached to their rate stats because of coming in in the middle of an inning and so forth. Consistently comparing him to SPs seemed like the way to go.
"For example, if he'd allowed 1 HR that year, he would have had a MUCH higher translated ERA for you, right?" That is, perhaps, a better question for Davenport, but for my part, it seemed appropriate that they translated close to league average. I assume that there is a lot of reversion to the mean going on there. I would have loved some ground ball data to use, but I couldn't even find anecdotal evidence of whether he was killing worms. You do the best you can with what God or the record-keepers of the time give you.
I don't get your response on H/9. I can't see how it would be explained by team defense given his huge margin of outperformance of his teammates. But my main point - maybe the only point that's really fair, given the time constraints you had - is that you can't just ignore this thing that happened.
On the translation - thanks for clarifying...I had assumed that the DT was done using straight league averages.
On the HR/9, you could do the DT with 1 HR vs his actual 0 HR in 1916, just as an exercise to measure the sensitivity. Since the league average is so low, each HR has a disproportionate effect on his translated stats and translated ERA. It really does call into question the whole thing, in a 'why bother' kind of way.
Interesting topic, provocative analysis, clearly written. I don't think you proved the point, but I liked the piece. Thumbs up.
First, I have concerns about how well DTs work when stretched into an era SO different from the one we are all concerned about. For example, HR/9 can be converted to what it would be, relative to the era, in a "neutral" year, but what we haven't converted for is the relative importance of the stat. I mean, you can figure out how to convert HR/9, but you need to take account somehow of the fact that it was a much less meaningful or important stat in 1917 than it is today, perhaps comparable to triples/9 innings for pitchers today, something that is not really tracked and probably pitcher-to-pitcher differences are virtually random (maybe why Babe was so close to average on his converted numbers).
Second, the entire discussion focuses on rate stats, rather than counting stats, and thus robs the Babe of credit for pitching so many innings (which was mentioned in passing). Surely a pitcher with the same rate stats is contributing a lot more to his team if he maintains that pace over 1/4 of his team's innings or more (per 1917) as compared to a typical-for-today more like 1/6 or 1/7 of his team's innings.
Ruth's 1916 numbers translated to 7.2 K/9 and 2.2 K/BB, with a 26% hit rate (BABIP) and 74% strand rate. His 1917 numbers translated to 5.9 K/9 and 1.9 K/BB with a 26% hit rate and 73% strand rate. Just Walter Johnson's lines -- from a peer in the exact same seasons -- showed the gap between Ruth and a true all-time great. Johnson's 1916 numbers translated to 9.1 K/9 and 6.7 K/BB, but with a 28% hit rate and ugly 65% strand rate, and his 1917 numbers translated to 8.7 K/9 and 5.1 K/BB with a relatively "normal" 29% hit rate and 71% strand rate.
As your article suggested, Ruth was the beneficiary of good luck/defense in addition to being a good-but-not-great pitcher.
Year - Innings - H/9 - team H/9 ex-Ruth - % below team ex-Ruth
1915 - 217.2 - 6.9 - 7.6 - 10%
1916 - 323.2 - 6.4 - 8.2 - 22%
1917 - 326.1 - 6.7 - 7.8 - 14%
1918 - 166.1 - 6.8 - 7.6 - 11% (1918 was a 124-game season)
4 sea - 1034 - 6.66 - 7.82 - 15%
In 1919, he started 15 games, and simply didn't have it - he gave up 10 hits/9 with the highest BB/9 and lowest K/9 rates of his career. Maybe because he got 500+ ABs. Or maybe Ruth had peaked as a pitcher in 1916 - his K/9 and K/BB rates were falling fast from that peak. But it's generally attributed to him not dedicating himself to pitching - he got PH ABs in 1916-1917, but didn't appear in the field until 1918, when he started 19 games on the mound and played in 72 in the field (and got his second ring in three seasons).
Anyway, I would suggest that the burden of proof is on those of you who say that Ruth's low H/9 rates are luck and defense. You both seem to have it as an article of faith.
And THANK you (speaking as a casual baseball stat-head) for explaining NRA and DT in detail rather than passing through and assuming we knew what the hell you were talking about.
Other than that, the article rambled a little. Good stuff anyway.
As a sim-baseball player, I find Ruth to be over-valued as a pitcher, generally. Definitely better options to choose from in his peers, as you point out.
Question for you, however, about one of the details of the translations from the deadball era in regards to home runs allowed. Since HR/9 is a key component of your analysis, and correctly so, understanding how the DT handles HRs seems to be critical, including park factors. Do the DTs take into account the park effects for the pitchers HR rate? If so, how?
More specifically, in his best season (1916), Ruth allowed zero home runs over 323 innings, with theoretically half these games at Fenway. Yet the DT for that season has him projected for 22 HRs over 274 innings. Why?
At any rate, the fact that I want to know more about the details of the analysis means I'm giving you another thumbs up.
While I am not privy to the exact DT calculations, my understanding is that they take season, era, ballpark, and pitcher into account and then do a good deal of regressing to the mean. In other words, 22 HRs is about as well as you could expect Ruth to do if you played that season over again in a different era.
PECOTA might take a first year professional player who didn't allow any HRs in A-ball and project a poor HR rate for the majors in the following year. It is using the same types of principles (not counting ground ball rate).