Recently, Mariano Rivera revealed that 2013 would be his final season. It wasn’t unexpected news, in that Rivera is 43 years old and coming off a serious injury that caused him to consider retirement in 2012. But the report, however predictable, hit many fans hard. Not only is Rivera respected and beloved both inside and outside of New York (a relative rarity for a big, bad Yankee), but he’s shown so little erosion in his skills that it’s possible to picture him throwing his cutter until he turns 50. Most players go through a decline phase, which gives us time to get used to the idea that it’s about to be over. Rivera really hasn’t, except in the sense that he’s less durable than he once was.
Rivera’s announcement inspired many written responses, one of which was an email to me from a reader named David Greene. “Rivera’s true ranking among pitchers all-time,” the subject line said.
I can't get my arms around the idea that 60 (or so) starting pitchers in the history of baseball are "greater" than Rivera, as career WARP stats would say. … So maybe the real answer to my question is how many relievers relative to starters ought to be included in any all-time team of 25 or 30 players? Is that a question for analysis or only for opinion?
How good is Rivera, really? And is it possible to compare him to baseball’s best starters?
Last season, Aroldis Chapman was probably the best reliever in baseball. (“Craig Kimbrel” is also an acceptable answer). PWARP put him at 2.6 wins, which made him the 27th-most-valuable pitcher in baseball, by that metric. He was so good as a reliever, in fact, that he nearly placed out of the bullpen, briefly becoming a candidate to start this season.
In 1996, Mariano Rivera was worth almost twice what Chapman was last year. That season, AL pitchers allowed 1.21 home runs per nine innings, the highest rate ever. Rivera allowed one home run in 107 2/3 innings, the lowest rate of any AL pitcher in the DH era. He posted the highest strikeout rate and the lowest FIP of his career, and he pitched almost 30 more innings than he has in any season since. Then he added 14 scoreless frames in the postseason. If you count those October innings, Rivera’s 1996 was the most valuable season ever by a pitcher who didn’t make a single start.*
*If you don’t count them, it was the second-best such season, behind Dick Radatz’ 1964. That year, Radatz, who’d converted to the bullpen after experiencing a sore arm as a starter, pitched 157 innings, all in relief, with a 2.29 ERA and 10.4 strikeouts per inning (more than a K/9 higher than any other pitcher who pitched at least 150 innings in ’64). Radatz’ 1962-1964 campaigns were three of the six most valuable relief seasons ever, worth a combined 13.2 wins. But like most successful relievers, he burned out after a few years. Supposedly, an attempt to add a sinker in 1965 irreparably altered his mechanics, hobbling his fastball. It’s always something. Well, almost always. We’re getting to that.
Rivera hasn’t come close to equaling the value of his 1996 season since. Here are the zero-start pitchers with the highest WARPs in each of the past 16 seasons:
1997: Danny Patterson
1998: Robb Nen
1999: Keith Foulke
2000: Gabe White
2001: Kyle Farnsworth
2002: Eric Gagne
2003: Eric Gagne
2004: B.J. Ryan
2005: B.J. Ryan
2006: Takashi Saito
2007: Carlos Marmol
2008: Brad Lidge
2009: Jonathan Papelbon
2010: Carlos Marmol
2011: David Robertson
2012: Aroldis Chapman
According to WARP, Rivera wasn’t the most valuable reliever in baseball in any season from 1997-2012 (although he tied Lidge to the first decimal place in 2008). Granted, FRA (and, consequently, WARP) can be a bit myopic when it comes to pitchers who reliably allow low BABIPs, as Rivera does. And WARP treats a scoreless ninth the same as a scoreless first, so it gives Rivera no extra credit for his high-leverage outings. So Rivera may have been, and probably was, better than the next guy in at least a couple of those seasons, October aside. But there was no uncertainty in 1996: Rivera was leaps and bounds better than the next guy. His brilliance since has been less about being the absolute best in any one season than it has about never, ever being bad.
That’s not really Rivera’s fault. After the 1996 season, Yankees closer John Wetteland became a free agent and signed with Texas, and Joe Torre decided that he’d rather Rivera record saves than outs. Rivera never pitched more than 80 2/3 innings (in the regular season) again. In retrospect, 1996 is especially tantalizing: imagine how different bullpens might look today had Rivera remained in the 100-inning relief ace role and inspired copycats rather than staying behind the bullpen door in non-save situations. And imagine what he might have been worth to his team: he would have been immensely valuable as long as he sustained his success. But that’s the thing: we don’t know whether he would have lasted this long had he not been coddled as closer. In the ’96 Annual, we said Rivera looked “way too skinny to be durable,” and we probably weren’t the only ones who thought he’d have trouble repeating that season’s performance.
*Another interesting fact about the 1997 season, when Rivera became closer: that was also the year when he discovered his cutter. Which means that his best season—just about the best season—came without the cutter, the signature, semi-mystical pitch that gets the bulk of the credit for his success. If only Mariano Rivera hadn’t been cursed with a cutter!
Pure speculation: if he’d never come up with the cutter, Rivera might have had higher strikeout rates and been just as good or better for at least a few years. But no way would he still be pitching, let alone counted on to close, at age 43. His control could have compensated for some velocity loss, but the strain of throwing thousands of pitches harder than he had to with the cutter would have taken its toll.
Given the sample sizes involved in a typical relief season, you’d think Rivera would have had at least one year when a bunch of bloopers fell in, or a few extra balls flew out of the park, and his superficial stats suffered. But he hasn’t, except (sort of) for 2007, when he had a .322 BABIP and finished with the worst full-season ERA of his career: 3.15. (Jeff Reardon, Troy Percival, and Randy Myers, who rank seventh, eighth, and ninth, respectively, on the career saves leaderboard, all have career ERAs higher than Rivera’s ERA in his worst season.) Stephen Loftus took a look at Rivera’s consistency last week at Beyond the Box Score and found that no matter what metric he used to assess it, Rivera’s performance was less variable than anyone else’s. Great relievers come and go, but only Rivera endures.*
*That said, Rivera's retirement isn't a reason for Yankees fans to despair, given how sparingly he's been used in recent seasons. While it might be impossible to replace Rivera with a reliever as good, it's not that hard to find someone who can keep the fallout from his absence to a minimum, as Rafael Soriano did last season. Earlier this month, The Star-Ledger's Steve Politi wrote that "it is hard to comprehend a bigger challenge in sports history" than replacing Rivera, which seems a bit extreme, especially with David Robertson already on the roster. If the best reliever in baseball is a two-to-three win player, how big can the dropoff from him to, say, the 15th-best reliever be? Replacing Robinson Cano would be a much bigger challenge.
Rivera’s career overlaps almost perfectly with the 18-edition run of the BP Annual. It’s fascinating, now, to look back and see what we said before it became clear that Rivera was an outlier, not just a guy on a good run. Here are a few excerpts from his player comments, starting with Baseball Prospectus 1998—which, remember, was published after Rivera had followed the best relief season ever with another excellent one.
1998: “I think he needs a better second pitch and more work. … Without one, I think he’ll decline further this year.”
1999: “Something very bad is happening here. … I think Rivera is experiencing a loss in effectiveness, one that is going to start showing up on the scoreboard this season.”
2000: “…Can a pitcher survive on one pitch if that pitch is perhaps the best in baseball? I am still skeptical…”
2001: “The career paths of many top closers have included a high, relatively brief peak not unlike Rivera’s last five seasons. … At this point, Mariano Rivera doesn’t look much different than Gregg Olson after 1993 or John Wetteland after 1998, and no one is offering those names up for immortality.”
It wasn’t until 2004, after Rivera had been in the bullpen for eight full seasons, that we felt comfortable enough to say “expect more of the same.” And by then, Rivera was 34, old enough for us to start worrying in subsequent comments about when his body would break down. When we look back, it seems like a given that the Yankees could count on 70 or so dominant regular-season innings from Rivera, year in and year out, and that the cutter would never wear thin. But consistency is kind of like clutch: some players might have it, and you might think you can tell which ones, but it’s tough to know for sure until a lot of time has elapsed. And at that point, it’s too late to act on the information.
Predictable or not, Rivera’s consistency has been worth quite a bit: he’s the only pitcher since 1950 with at least 30 career PWARP and fewer than 269 career starts. (Sorry, Goose; you don’t quite cut it.) But 57 starters, from Roger Clemens to Camilo Pascual, rank ahead of Rivera on that list. It’s not hard to see why: including October, Rivera has pitched a total of 1360 2/3 innings in his 18 seasons. Even for a latter-day ace like Justin Verlander, that’s six seasons’ worth of work. You’d have to give Rivera a ton of extra credit for the timing of his appearances to make up for that difference in workload.
Maybe it’s because he didn’t have the durability or the arsenal to handle a heavier workload; maybe it’s because the era in which he pitched (and the managers he played for) artificially imposed a cap on how much he could contribute. Regardless of the reasons, Rivera’s career value can’t compare to that of even a very good starter. Nor can his peak value: even in 1996, when Rivera was the best a reliever could be, his PWARP placed behind those of several starters. But David asked a different question: Does Rivera (or any reliever) deserve a spot on an all-time team?
Let’s say this team would look roughly like most teams do today, with a five-man rotation and a bullpen seven or eight strong. And let’s say we’d select the five best starters in history—Cy Young, Walter Johnson, Roger Clemens, Randy Johnson, and Greg Maddux, maybe—to reprise their roles. How would we want to stock the rest of the staff, knowing we need the rest of the pitchers to go only an inning or two at a time? Would we want Nolan Ryan and Tom Seaver setting up for Pedro Martinez, with Lefty Grove and Steve Carlton coming in to attack tough lefties? Or would there be room for some real relievers?
To answer this question, we’d need to know how well all those starters would have pitched in short bursts. And we can’t know that, not really; we never saw them do it, so it’s impossible to say with certainty how they’d respond. But we can take what happens to the typical pitcher making the transition to relief and apply it to the potential members of our all-time team bullpen.
In a 2006 two–part series on pitchers who moved from the bullpen to the rotation (and vice versa), Nate Silver laid out this rule of thumb for estimating post-conversion performance:
The typical pitcher will have an ERA about 25% higher when pitching in a starting role than when pitching in relief. That is, if you take a given reliever with a 3.00 ERA, your best guess, all else being equal, is that his ERA as a starter would be 3.75.
If you take a starter whom you know nothing else about, you can expect him to knock off about 25% of his ERA when he pitches in relief.
Rivera has a 2.21 career ERA, pitching in a hitter’s park in a good offensive era, in a difficult division and against the DH. Factor in his 11 earned runs in 141 postseason innings (against even tougher competition), and his career ERA falls to 2.06. If any starting pitcher could match that, it would be the one with the best park- and era-adjusted ERA other than Rivera’s (min. 1000 innings): Pedro Martinez, who, fortunately for us, happens to have pitched at essentially the same time.
Pedro’s career ERA is 2.93. Cut a quarter from that, and you get 2.20—essentially identical to Rivera’s career mark in the regular season. But plug in both pitchers’ postseason innings, and the fact that Martinez pitched over half his innings in the NL, and Rivera retains a clear lead. Using Nate’s rule of thumb, then, we wouldn’t expect any starter in history to match Rivera’s career ERA in relief under identical conditions. And in that case, Rivera—and probably Rivera alone, among relievers—deserves to make the all-time team. Other consistently low-ERA relievers like Hoyt Wilhelm, Trevor Hoffman, and Dan Quisenberry don’t come close to competing with the best of the “converted” all-time-team starters. (Billy Wagner has the best case, among non-Rivera relievers, but he barely topped 900 innings.)
Of the many starters with higher career WARPs than Rivera, it’s likely that some of them would be better in the bullpen than he was, but it’s hard to know how many, or which ones. Nate’s findings suggest that the ones with the best out pitches would have an edge, but it’s tough to top Rivera’s cutter.
We can try to test this empirically. With the help of Ryan Lind and Andrew Koo, I identified every starter from 1982-2012 who made a regular-season or postseason relief appearance in a season in which he pitched at least 150 innings, started at least 90 percent of his games, and had a FRA+ of at least 120. This limited the sample to the best of the best: a handful of starters each season. Last year, only five pitchers would have satisfied all the standards: Verlander, Stephen Strasburg, Max Scherzer, Chris Sale, and Gio Gonzalez.
I ended up with a group of 19 pitchers, two of whom qualified in more than one season. The list reads like a who’s who of starting pitchers of the last three decades: Randy Johnson four times; Roger Clemens twice; Pedro Martinez, Bret Saberhagen, and Tim Lincecum during Cy Young seasons; Curt Schilling, Nolan Ryan, CC Sabathia, Roy Oswalt, Mike Mussina, and more. With standards so high, the result was a very sample: 29 appearances and 63 innings, or roughly as many as Rivera pitched in 2011 alone.
In those relief outings, the aces walked 4.3 batters per nine, struck out 12.4, and posted a combined 2.00 ERA. Essentially, they were almost exactly as effective as, well, Mariano Rivera. Keep in mind that we’re comparing starters at the height of their powers to Rivera’s entire career, and that those aces who did pitch out of the bullpen were likely the ones who were expected to take to it well. Considering that and the small sample, if my all-time team needs three outs, I’m still signaling for Rivera.
On the first episode of ESPN’s Behind the Dish podcast, Joe Sheehan told Keith Law, “If you had a choice between—just to pick a teammate—Andy Pettitte’s career or Mariano Rivera’s career, I think you take Andy Pettitte’s career.” The value stats agree; WARP gives Pettitte almost a 20-win edge. But when the two teammates hit the Hall of Fame ballot, perhaps simultaneously, Rivera will almost certainly be inducted immediately, while Pettitte’s candidacy could linger until his years of eligibility are up. Both pitchers, despite the different ways that they’re used, have essentially the same job—to get batters out—and Pettitte has retired many more of them. It’s not necessarily fair that Rivera will waltz into the Hall while Pettitte watches and waits, but no one will object. Rivera, after all, is the one with the case for the all-time team.
So, back to David’s question. How many relievers ought to be included in any all-time team of 25 or 30 players? Just one, I think—and it’s exactly the one you’d expect. Enjoy watching him while you can.
Thanks to Andrew Koo, Ryan Lind, and Colin Wyers for research assistance.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.Subscribe now
The only (lesser) comp I can think of is Joakim Soria.
Would it make sense to multiply pitchers' WARP by the leverage value for the situation they inherit?
"And WARP treats a scoreless ninth the same as a scoreless first, so it gives Rivera no extra credit for his high-leverage outings. "
Well, that would suggest that WARP alone is a flawed way to make the "value" evaluation. Rivera's leverage adjusted WARP will compare favorably to starters who pitch, on average, in dramatically lower leverage situations.
It also gives the player credit/blame for something beyond his control. If the Yankees adopted a bizarre bullpen setup where Mariano Rivera always pitched the 6th inning if the game was close, he still would have to get 3 outs. Presumably, he would strike guys out, induce weak contact and keep his low BABIP, and suppress HR rates. How much "less great" would he be only because of his manager's odd decisions?
Not sure how to insert a link, sorry.
Rivera has pitched 1219.7 regular-season innings; Pettitte has pitched 3130.7. That's an enormous difference. Pettitte has pitched over 2 1/2 Mariano Rivera careers! Shouldn't that matter more than the fact that Rivera's innings mostly came toward the end of the game?
Here, everyone did the same job as in a 'traditional' 4-3 victory (*), but you're assigning less value to Rivera because his inning came first. Suddenly, Pettitte has a much higher 'leverage' despite throwing the same pitches and getting the same results. That's the disconnect.
Further, ask yourself if Rivera is more or less important than the 'setup' guy in this hypothetical game.
Now, there are real differences in pitching the 9th. The offense can alter their strategy to 'play to the score'. The offense can deploy pinch-hitters and pinch-runners aggressively. They can make substitutions with minimal concern for the resulting defensive implications. But I don't think this adds up to "twice as important" when the runs are added up at the end of the game.
(*): Actually, Rivera would have been forced to pitch to the top of the lineup, so he'd have a harder-than-average task. In a close game, this is an argument for having Rivera pitch the 8th if the #1 or #2 hitter is leading off, and letting the 'setup' guy handle the 9th against weaker hitters. Nobody does this, but it would help.
If this has already been established please point me to the evidence.
I thought that coming into the seventh inning with one out and men on second and third would make it a much higher leverage inning. Along with many other situations which are far more likely to see a lead overturned than getting to start the ninth inning with a clean plate, which has a known success rate of 95%.
Am I wrong about that?
We can discuss the relative value of leverage, which is underexplored, but to rely on a stat that implicitly assumes the value to be 0 seems to be severely flawed, again, unless you can prove pitching the 3rd inning of any random game facing the 7-8-9 hitters has exactly the same value as facing the final 3 hitters of a game when the opposing manager has every incentive to use the three most favorable, available, matchups against you, every time.
By not assigning value to leverage, the designers of WARP are essentially saying leverage averages out. I doubt this is actually the case, but I can imagine calculating a pitcher's total Leveraged Innings Pitched Score would be extremely difficult to do.
I imagine over the course of a career, it would add something worthwhile to the analysis without requiring too much additional processing.
I suspect for starters, the influence of leverage pretty much averages out. However since relievers enter games under a wider range of situations and pitch fewer innings, I suspect the variance is greater.
"Until someone can conclusively argue there is no difference between high and low leverage innings in value..."
I don't think anyone should have to prove the negative; burden of proof lies on the assertion that a player should be given extra credit due to situational context outside of their control.
In other words, you have to adjust for the context in which the player is placed, otherwise it's not a fair comparison. We do this all the time with park factors.
In fact, I think you'd need to identify how much better Mariano would have been compared to a pitcher in that contextual situation.
Rivera was probably a slightly better closer than Smoltz, so if you don't mind extrapolating from one data point, you'd guess that Rivera would translate to a slightly better starter.
Of course, this whole analysis (mine as well as the one above) is predicated on the assumption that "starter" and "reliever" is the same job. To me, that's like saying "outfielder" and "shortstop" are the same job because they're both come to the plate and hit the ball. A 2.50 ERA from a starter is different from a 2.50 ERA from a reliever in the same way that an .850 OPS form a 1B is different from a .850 OPS from a SS. For position players, we make adjustments to their WARP based on their position. It seems that we haven't found an intelligent way to make a similar adjustment for pitchers, though.
A starter pitches three times as many innings as a closer, and "leverage" can't possibly make up for that because starters face more leverage too. Roger Clemens once said the two most important keys to winning were shutting down opponents on the first inning and shutting them down right after your team scores. Getting ahead early sets the tone for the rest of the game, therefore the first inning just as important as the ninth inning. Same for the inning after scoring, as momentum is a key to victory. Then there's those games where the other team's most dangerous hitters come up in the seventh or eighth inning as opposed the ninth. There's a BP book with a chapter that points out this inning rather than the ninth is the better place for your best reliever. That's three leverage situations more likely to go to a starter than a closer, so starters not only face three times the innings, but three times the leverage.
Starters can't throw as hard because they need to last six innings, and they need a third pitch because getting the same hitter out three times a day is more than three times harder than getting him out once. Closers go one inning so they can go all out and usually only need two pitches. Saying the ninth inning is more important than the other eight innings is like saying home runs in the ninth count more than homers in other innings.
We are talking about a much different era in baseball history,
although by 64 most teams had at least one pitcher to close out games. The great thing about Radatz was his control and his pitching motion which made it difficult for hitters to pick up his fastball
This is where closers are different. By definition they are being brought into a game in a situation where ANY fractional WAR they collect (either positive or negative) contributes directly to a win or loss.
This is why Rivera was so devastating. He is/was a magnificent pitcher AND those very valuable innings were being focused directly where they counted.