Analysis Begins with Questions

One of the great things about baseball analysis is that you can take a relatively simple question—like “who is the best relief pitcher in baseball?”—and break it down into dozens of sub-questions. Put another way, we have to answer questions about methodology before we can answer the substantive question at hand.

An ideal quantitative methodology for determining the best relief pitcher in baseball would rely on information known to be predictive of future success, including peripheral stats like strikeouts and walks. But it’s also true that, because even peripheral stats can bounce around a fair bit over a limited number of innings, we want to know something about outcomes. Outcomes give us a measure of performance that doesn’t entirely overlap with peripherals. By using win/run expectancy to value relievers, we are not simply duplicating the information provided by peripheral stats while adding noise to the mixture. On the contrary, we are valuing relievers on a different axis: the crucible of high leverage.

But there’s another, and I believe overlooked, reason to focus on outcomes. To see why, we have to start with a very particular sort of answer to the question posed at the outset of this column (“who is the best relief pitcher in baseball?”). So let’s borrow some methodology from an article I wrote back in March to look at some of the best relievers not yet slapped with the closer tag. We’ll take WXRL and SIERA for all pitchers who have pitched solely in relief. For each pitcher, we’ll calculate how many standard deviations they are away from the mean in each category. Then we’ll add them together. For example, a pitcher who was one standard deviation better than the mean in both SIERA and WXRL would get a score of two.

So if you do that and figure out the top 10 scores for the current season, you get a list that looks like the following:




Brian Wilson



Heath Bell



Carlos Marmol



Joakim Soria



Hong-Chih Kuo



Matt Thornton



Joaquin Benoit



Daniel Bard



Mike Adams



Luke Gregerson



This is a decent list. The kind of list you can live with a for a while, thinking to yourself that there are some well-established guys (Bell, Marmol, Soria) as well as some targets of man-crushes (Thornton, Benoit, Adams). But then you notice the glaring omission. See it yet?

One Size Fits Almost All

The problem with this list, and with any list like it created in the last 10 years really, is that you could just throw it out. Who needs a list like this when the answer to the question “who is the best relief pitcher in baseball?” is so simple? You could just give the same answer—Mariano Rivera—every year and be right in a very real sense every year. It’s not that Rivera fares especially poorly by this metric. He actually comes in 13th out of nearly 200 qualified pitchers. It’s that by not putting him closer to the top, the list fails to pass the smell test—it appears wrong on its face. Instead of a definitive ranking, we get a neat-o list with little meaningful difference between the individual rankings. That’s disappointing.

Part of the reason why Rivera doesn’t rank higher is because his SIERA is not especially fantastic this year. At 3.01, Rivera clocks in at 31st among relievers with at least 20 innings. By more traditional metrics, of course, Rivera is having a superlative year: 1.06 ERA, 36/7 K/BB in 42 1/3 IP. That isn’t enough for SIERA, however. His relatively low K rate hurts him, and he gets no credit for the exceptional job he’s done at preventing home runs (just the one this year, thank you very much). He’s also having a career year in terms of preventing hits: just 23 all season. So what gives?

Let’s start with the hit rates. Rivera has been consistently excellent at preventing hits. His career BABIP is just .263, and he has allowed fewer than seven hits per nine innings. To illustrate the point, let’s compare Rivera’s seasonal hit rates to the distribution of hit rates for all relievers from 2007 through this year.

Now, I suppose it’s possible that if we could clone Rivera and run his career over again, we’d end up with a spread closer to the mean for all relievers. But I didn’t single out Rivera’s hit rates because they were low, I singled them out because all of his other pitching attributes have been so dominant for so long. That means our selection bias sensors can calm down for just a second, and we can marvel at what is a pretty darn impressive ability to limit hits.

Of course, any pitcher who strikes out enough batters will have a similar sort of distribution, since hits per nine innings is influenced by the total amount of contact made by opposing batters. To illustrate this point, let’s compare Rivera’s chart above to a similar chart for another highly successful reliever: Billy Wagner.

Wagner’s career .265 BABIP is just barely behind Rivera’s mark, and his average H/9 of 6.0 is actually lower than Rivera’s, as reflected in the chart above. What we can learn from these pitchers is that the shorthand that pitchers don’t control outcomes on balls in play breaks down at the extreme represented by elite relief pitchers. These pitchers—who basically only pitch with maximum effort in higher than average leverage situations—can in fact show the ability to limit hits more than starting pitchers. This is true even above and beyond the ~.005 split in BABIP between relievers and starters. That means estimators that model reliever performance on the assumption that hit rates revert heavily back to the mean may overlook this particular skill. But it also means Rivera isn’t unique—in other words, that can’t be what keeps him from the top of the chart above, since plenty of relievers exhibit the same ability.

Now we should ask whether there is any other skill that Rivera has that we might be overlooking. Sure, he has posted above average ground ball rates his entire career, but that is definitely reflected in his SIERA (which includes ground ball rates as well as various ground ball interaction terms). What about home runs? Rivera has allowed just 61 home runs in his career—one that has spanned 1,132 innings. Since 1996, he hasn’t allowed more than seven home runs in a single season, and from 1996-2009, he has allowed three home runs or fewer in eight different seasons.

In fact, if we look at the distribution of Rivera’s seasonal HR/9 rates versus all relievers from 2007 through this season, we notice just how exceptional Rivera has been at limiting home runs.

Now if we compare that with a similar chart for Wagner, we’ll see a big difference.

Now, some of that is due to the differences in ground ball rates Rivera and Wagner have put up over the course of their careers. But whatever your favorite ERA predictor/post-dictor/third-world dictator may be, most assume in one way or another that limiting fly balls—not home runs—is the relevant skill. And while that appears to be true for someone like Wagner, who has posted rates of home runs per fly ball that hover around league average, it is decidedly not the case with a rate environment extremophile like Rivera.

Question of the Day

It appears as though our best estimators do a good job for 99 percent of all players, but break down at extremes. A pitcher like Rivera, who is extreme in almost every way possible, simply doesn’t rate properly if you use the same metrics used to measure other guys. The alternative view is that Rivera really isn’t quite as good as he’s made out to be. Is anyone willing to make that argument?

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
I think Rivera just proves that sabermetrics still has more to learn. It's cute when we can use SIERA to predict that a very average pitcher will regress and sabermetrics was developed by looking at the whole, so our knowledge is going to be greater about the more usual cases.

Having watched most of Rivera's career, there's no way that the stats we use can reflect his abilities. Even most of the hits he gives up aren't usual. Maybe you should have SIERA take into account all of the infield hits Rivera gives up on weak contact or the dozens of bloop hits on broken bat swings. They seems like line drives in the box score, but they won't score runs at the same rates as usual hits.
SIERA may not apply very well to Rivera, but more advanced stats do tell us why he is able to limit hits so well. He has incredible command of his pitches. Reference Dave Allen's article:
I hope (and expect) that no matter how excellent our understanding of sabermetrics becomes, the occasional outlier-one who defies quantitative description and can only be fully appreciated qualitatively-will still come along. While statisticians are adept at describing mere mortals, we should need poets to tell the tales of the true immortals.
Love Marooners great and true statement. Players like Mariano Rivera are true baseball immortals. I don't need sabermetrics to explain Albert Pujols- #'s explain the greatness of say, Mark Texiera, but Albert Pujols is in a dimension of his own. As incredible as some of Mariano's #'s are, the only way to understand his true dominance is by watching him and marveling. Numbers are there to put into context the rest of the guys.
The analysis of HR in particular fits with Mo's unique talent, the ability to throw a pitch that's impossible to hit for a HR (the cutter, which breaks radically away from the HR-hitting part of the bat)- even when you know it's coming.
With most other elite pitchers, hitters still have a couple ways of being successful - 1) by guessing what they'll throw and "sitting" on a pitch, or 2) by having the pitcher make a mistake and lay one over the center of the plate.
Mariano virtually eliminates 1) with the cutter, and he also makes ridiculously few mistakes.
(I don't know if there is a stat for pitching mistakes, but maybe there should be - pitches "centered up" in say, the middle 3 inches of the plate, about waist high. I saw a graphic of Mariano's pitches, and there is an uncanny vacuum there, like a black hole.)
Rivera has such a specific skill set and approach, he's going to defy SIERA. Attack with the cutter, put it on their fists, snap some bats. Over his career he's drawn an incredible amount of weak contact at a pretty high rate. Couple years with over 24.5% popup rate is pretty remarkable. With a better middle infield behind him, he may have had a .240 BABIP as bloops and weak grounders were turned from "just past a diving Jeter" to outs.

HIs is an approach that couldn't possibly work as a starter, and I don't think it's an approach that could work for other relievers, like, say Carlos Marmol, who wipes hitters out with his slider on 2-strike counts, but often uses his FB to get ahead. If Marmol had to rely only on his slider, as Rivera does his cutter, we'd have a different Marmol.

Regardless of SIERA, there's no diminishing what Rivera has accomplished. He has over 100 more WAR in relief than Lee Smith, who is second in all-time WAR accumulated in relief. Is Rivera not quite as good as he's made out to be? Nope. He's better.
oops. I meant RAR, re Mariano Rivera over Lee Smith.
For a guy like mo who relies on one pitch and has such a small sample size of innings pitched, but a substantial number of pitches thrown, I think going into the pitch f/x would be immensely helpful. A lot of what makes him dominant is the way he locates the cutter.
I agree that pitchF/X is helpful in analyzing Rivera. I would direct your attention to the link shared by Mike Fast above.
This is certainly an interesting article, but I have one question: why are all of the statistics provided on a per nine-inning basis? If we want to remove the effect of strikeouts on Mo's ability to prevent hits, why not look at BABIP? If we want to remove the effect of ground balls on HR rate, why not look at HR/FB?

I don't mean to come across as combative - you know these things better than I do (and I'm assuming it was some sort of data availability issue) - but absent an explanation it seems a bit sloppy.
I am sorry you found the analysis sloppy. I work hard to create engaging and interesting stories, and it sounds like I haven't done a good job at that for you today. I hope to reassure that any sloppiness you perceive is not for a lack of effort.

However, I do want to make some points about your particular criticisms. I will assume that you are wondering why I didn't create density charts for HR/FB and BABIP, since I did reference each in my article. The data are certainly available to make analogous charts for those stats. I chose not to do so for two related but distinct reasons.

The first is that those stats have been fetishized as measures of pitcher "luck" in a way that I generally, and this column in particular, hope to undermine. That is to say, I take the point of this story to be about how, despite our best efforts to correct for certain trends in populations, we often cannot accurately postdict performance for all players in a season (and that this is particularly true for players at the extremes). If I were to use HR/FB and BABIP, we'd essentially be doing the sort of thing I am disclaiming.

The second reason is that I think "chart fatigue" is real, so I try to limit the number of data dumps in my articles. That means editing with an eye toward visualizations that tell the story I am relating. HR/9 and H/9 do that well, because each measure ties up several skills. Some of those skills lots of pitchers have, some of them only a few pitchers have, and one or two of them are possessed by crazy outliers like Rivera. To drive this point home, I wanted to choose metrics for my visualization that captured all of those skills, and HR/9 and H/9 both do that.

Thanks for taking the time to read and comment.
Thank you for taking the time to read and comment on my comment.
This was not a sloppy article, great job. Never looked at the comments on this site before, going to start to now. It's refreshing hearing insight from other readers. Not used to it.
Enjoyed this article, but I believe this statement is an error: "Rivera has allowed just 61 home runs in his career—one that has spanned 1,132 innings. Since 1996, he hasn’t allowed more than seven home runs in a single season, and in five separate seasons he has allowed either zero or one home run."

It seemed too good to be true, so I had to look it up. He's only had 1 such season since 1996, that being this one. If you're looking at the same statistics table I am (from his B-R page) then you get his IBB column. The numbers are always in the low single digits, so one can easily mistake them for HR allowed if you lose track for a second and don't look at the column headings. And indeed, he has five separate seasons of either zero or one intentional walks since 1996.
I forgot to add that that IBB column was just 2 columns away from the HR column, which is why you can easily make that mistake. (Another call for the ability to edit posts)
Thank you for catching this error, which I regret.
"A pitcher like Rivera, who is extreme in almost every way possible . . ."

I can only assume that Extreme Sports Punk Number One would strongly approve of this article.
I see him pitch everyday. I don't need stats to see that he is good. (Poe's law)
"The alternative view is that Rivera really isn’t quite as good as he’s made out to be."

I would argue the opposite. He's spent his career in the AL East (with Jeter playing behind him) in this current era of mega-offense and currently resides in the homer-friendliest ballpark in MLB. Oh, and he's pitched about two full seasons' worth of playoff games (often against some of the game's best lineups) where his stats DWARF his regular season stats.

btw, his one HR allowed this year is the shortest one Jason Kubel's hit (341 ft).
I'm probably going to get flamed for this, but when I see people fall all over themselves praising Mariano Rivera, I always wonder how the closers I watched play when I was a kid would have fared if bullpen usage was handled in their day the way it is now.

I can think of three offhand who I'm pretty sure could have posted comparable career numbers - specifically Goose Gossage, Rollie Fingers & Bruce Sutter. They (like all the closers back then) routinely pitched two or even three innings for a save, and seldom were called into a game unless the opposition was threatening to score. Imagine them each being called in to get three outs with no one on base.

My point is that if you say that Rivera is on a par with guys like that, I'll go along with you. But I often hear Rivera held up as the greatest relief pitcher ever, standing head and shoulders above guys like those I just named, and I'm not so sure about that.
What are the units that we are to understand as "Frequency"? That is "how many whats per unit what" is being depicted on the vertical axes? I cannot understand your charts.

It's a kernel density plot, so the sum of the area under the curve equals one. That means, for example, that as a rough estimate you could look at the percentage of the area that is to the left of a certain value and get an idea of how likely it was to be no higher than that value.
Don't mean to be obtuse, but I still don't get it. So if you integrate your frequency function you get one. Are you saying that your frequency has no units?

And your other axis, "HR/9" and "H/9"; is this showing a distribution of individual games, or seasons, or what? I can understand that when one plots ALL other pitchers that one would get a smooth looking distibution even if the HR/9 or H/9 were seasons, because the number of player-seasons would be quite large. But when one is considering only one pitcher and for only a portion of his career I can only assume that the distribution would be decidedly unsmooth unless the plot was of individual games.

So you can see I am still confused about what this is showing, other than say, Mo gave up less HR/9 than the rest of the relievers he was being compared to, but I already kinda knew that.

I looked up kernal density plot (on Wikipedia) since I had never heard of that before, and the explanation there didn't really shed light on what I was looking at in your article.

Would it be accurate (or helpful) to think that the "Frequency" axis was counting games that fit the variable on the horizontal axis, like a histogram?

Thanks very much.