It’s a good time to be a free-agent relief pitcher. So far this offseason, Billy Wagner and B.J. Ryan turned their closer roles into $43 and $47 million, respectively. The Yankees appear to have inked Kyle Farnsworth for over $17 million. To boot, the Cubs have signed Bobby Howry and Scott Eyre for a combined $23 million, indicating they either know something the rest of us don’t about the seventh and eighth innings or there’s some weird reenactment of “Brewster’s Millions” going on in Chicago. Only $7 million more to go before the Cubs have spent it all and gotten no tangible assets in return.

Putting aside the dollar values on these contracts for a moment, it’s important to consider just how consistent and predictable reliever performances are. There are a multitude of factors that routinely influence reliever performance more than that of starting pitchers or batters; primarily those are small sample size and the prevailing usage patterns of modern bullpens. The sample size issue is obvious–most relievers top out around 60 or 70 innings, roughly 1/3 of a typical starting pitcher’s innings–but the way modern bullpens are managed (bringing in relievers in the middle of innings, for example) often means that a reliever’s performance, as measured by ERA, is as much a reflection of those pitching before and after him than his own contributions. Whereas starters often get to work into and out of their own jams, relievers don’t have that luxury.

The second problem is more easily corrected than the first. We can use Fair Run Average (FRA), a BP stat that removes the problems of appropriately placing responsibility for inherited or bequeathed runners. As a first pass, just to see how bad the small sample size is, let’s see how consistent a variety of pitching statistics are for both starters and relievers. To do so, we’ll only use significant consecutive seasons, in this case defined as a minimum of 150 innings in consecutive seasons for starters and 50 innings for relievers.

              WXRL /
GROUP   FRA   SNLVAR    RA      ARP     SO%     BB%     HR%
SP     0.146   0.113   0.132    N/A    0.654   0.482   0.245
RP     0.040   0.094   0.039   0.052   0.585   0.353   0.092

(The second column “WXRL / SNLVAR” is SNLVAR for SP and WXRL for RP.) What we have here are the coefficient of determination (r-squared) for each metric, on a scale of 0-to-1 with 1 indicating a perfect correlation and 0 indicating total randomness. For example, 14.6% of FRA for starters is explained by the FRA in the previous season; for relievers, it’s 4.0%. As expected, relievers see significantly lower correlations across the board, most likely due to their smaller sample sizes.

Of course, teams aren’t limited to looking at only the previous season when determining how they think relievers are going to perform in the future and, thus, how much to pay them. Eyre’s FRAs the last three years have been 3.82, 3.98 and 1.01… Wait, maybe that’s not the best example. Farnsworth checks in with 3.37, 5.52 and 2.01. Okay, not so great. Howry has been good the last two years, but missed nearly all of 2003. Only Wagner and Ryan have been anywhere near consistently dominant the last three years, posting FRAs of 1.61, 2.76, 1.83 and 3.17, 1.10, 2.44, respectively. But notice that even their last three seasons show dramatic changes. There’s a large difference between 2.76 and 1.61 or 3.17 and 1.10, anywhere between around eight and 15 runs depending on workload.

Instead, let’s see how well a pitcher’s three previous seasons project a fourth, again broken up by starters and relievers.

              WXRL /
GROUP   FRA   SNLVAR    RA      ARP     SO%     BB%     HR%
SP     0.192   0.177   0.179    N/A    0.683   0.541   0.287
RP     0.064   0.104   0.068   0.054   0.631   0.450   0.161

While any prediction system will tell you that using the previous three seasons is going to be more accurate than using just a single year, the difference isn’t as great as perhaps we would hope. Reliever FRA and RA are still nearly three times as random as that of starting pitchers.

To get an idea of what that means in a practical sense, take a look at the distribution of the difference in FRA from the actual value to the predicted value in the previous three seasons. Remember that FRA has already accounted for bullpen support, so the change in values isn’t a result of a pitcher suddenly having better or worse pitchers around him.

chart 1

The large pink spike in the middle is the starters, while the wider, blue curve is the relievers. While there is considerable overlap over the two series, the relievers are significantly more disbursed, particularly when it comes to pitchers whose FRA suddenly jumps by around two runs. The standard deviation for the starters is 0.81; for relievers it’s 1.46. This means two key things: First, relievers are nearly twice as unpredictable as starting pitchers. Second, in any given season, over a third of relievers will have an FRA more than 1.46 runs more or less than their previous three year average.

Even when separating starters and relievers into three groups, the elite relievers do not suddenly become surer bets. Relievers who post low FRA numbers–in this case, 2.50 or below–in any three-year stretch have a .008 r-squared to their next season and the difference between their previous three year stretch and the next season has a standard deviation of 1.15. Similar starters have a consistency of .100 and a standard deviation of 0.67. Once again, relievers show a distribution of performance just less than twice that of starters, but while relievers were about a third as consistent as starters
(as measured by r-squared), the elite relievers show almost no consistent ability to remain as such. For every Mariano Rivera in 1999–posting a 1.37 FRA after a weighted 1.34 the previous three seasons–there’s a Mariano Rivera in 2000: a 3.23 after 1.32. For every Keith Foulke in 2004, there’s Keith Foulke in 2005 (who doesn’t even make this study because he didn’t pitch enough innings).

So what does this mean for teams like the Cubs, Yankees, Blue Jays and Mets? Of the five relievers they signed, it’s likely that two of them will post an FRA a run-and-a-half or more from their established levels. Some of this variance is the natural change in player performance; after all, starting pitchers, while more consistent than relievers, are certainly no models of consistency. But even when comparing three-year groups of relief performance–attempting to remove the small-sample-size issue–relievers never approach the consistency of starting pitchers. Over the next three years, it’s likely that two of them will post a total FRA more than a run off of their established levels over the past three seasons.

There are a multitude of reasons why those signings are a mistake–overpaying for such a small number of innings pitched and the availability of respectable relief arms at the league minimum via other sources to name two–but the main reason is that relief pitchers are significantly less consistent than even volatile starting pitchers. Toronto GM J.P. Riccardi has publicly justified his decision to sign Ryan by comparing him to Trevor Hoffman, but it’s just as likely the Jays are now the proud owners of Troy Percival (1995-97: 1.36; 1998-2000: 4.33), Jose Mesa (1995-97: 2.69; 1998-2000: 5.45), or Rod Beck (1992-94: 1.68; 1995-97: 3.89).

By the way, Hoffman’s FRA from 1997-99 was 1.65; it jumped to 3.12 from 2000-02.