“Give me a lever long enough and a fulcrum on which to place it, and I shall move the world”

There seems to be one baseball topic where there is agreement between the “old school” and the “new school” bullpen management. Frequently, former players-those who haven’t played in 20 or more years-or color commentators talk about the demise of the fireman and the rise of the closer, and bemoan the fact that you don’t see the likes of a Rich Gossage or Dan Quisenberry coming into the game at a critical juncture in the seventh inning any more, or only occasionally in the eighth. Similarly, the sabermetric community has shown mathematically (see Keith Woolner‘s piece in Baseball Between the Numbers), that a manager willing to break from the current mold could garner a few more wins per year by bringing in his “closer” in crucial seventh- and eighth-inning situations.

Bullpen management, however, goes well beyond simple closer/fireman usage. It’s the gamesmanship of knowing what the other manager has in his lineup and on his bench. It’s about resource management and getting pitchers the right amount of work. If we solely focus on a manager’s tactical decisions throughout the course of a game and a season, most would rank the handling of the bullpen as most important compared to things like lineup construction, managing the running game, or deciding whether or not to bunt in a given situation.

How can we measure a manager’s effectiveness at bullpen management? Simply taking an overall statistic like WXRL is a start, but within that statistic is the overall talent level of the pitcher as well. There isn’t a distinction between the manager’s decision and the performance of the pitcher. What we are trying to get is the value that the manager brings to the table given the talent that he has in the bullpen. Jeremy Greenhouse (a BP Idol entrant) did some work aligning Win Probability Added of a reliever with the average Leverage Index that they faced in the game. There’s one thing that Greenhouse’s analysis misses, and that’s the ability to exploit the lefty-righty matchup. Some managers, like Tony La Russa, are very good (or obsessive) at exploiting the lefty-righty matchup, but does he go too far and leave himself vulnerable by minimizing his available resources later in the game, or even the next day?

In an ideal world, a manager would like to have their best pitcher pitch in the highest leverage situations. One step deeper, he would like his best pitcher against right-handed batters face the highest leverage situations where a right-handed batter is up. To that end, we will reward the manager who did the best job of aligning his best pitchers to the highest leverage situations with the award named for Archimedes, the Greek mathematician who first rigorously explained the principles of the lever.

The Approach

First, to define a pitcher’s effectiveness, I calculated his wOBA (or for those sabr-ing at home, you can use your favorite statistic, such as EqA or even OPS) against both left-handed and right-handed batters. Second, for each relief situation, we assign a Leverage Index, which is based on the inning, the outs, the runners, and the score differential. For each plate appearance, we multiply the Leverage Index by the pitcher’s wOBA against that-handed batter and sum these. Dividing by the sum of the Leverage Index over all relief plate appearances gives us an Effective wOBA.

If we do the same exercise, but instead of using the given pitcher’s wOBA, we use the overall team’s wOBA against the same-handed batter. This serves as a baseline if the manager essentially drew names out of a hat based on who would be the next reliever. So if we subtract the Effective wOBA from the “random” wOBA, we get a statistic that we will call BMAR for Bullpen Management Above Random. Essentially, a BMAR of 15 says that by putting the best pitchers in the highest leverage situations, the Effective wOBA of the opponent’s hitter is 15 points worse than if the manager chose his relievers at random.

The leaders in BMAR in 2009 in each league were:

  American League           National League
Manager       BMAR         Manager       BMAR
Gardenhire    20.8         Macha         17.9
Scioscia      13.9         Bochy         16.8
Maddon        13.8         Black         13.2
Francona      13.7         La Russa      12.8
AL Average    10.3         NL Average     6.6 

For the more astute readers, you may have realized that in some ways the BMAR metric isn’t completely fair in the following two ways:

  • The greater the spread in terms of performance of a team’s relief corps, the greater the opportunity is for having a greater impact on bullpen management.

  • The more times a team is in high-leverage situations (tighter games or more extra innings), the greater the potential BMAR.

To adjust for this, we will develop an upper bound bullpen management (UBBM) that is a measure of how much potential BMAR there could be. But how do we do this?

First, we sort each of the situations against right-handed batters by the Leverage Index, then assign the best pitcher against right-handers to that highest-leverage situation. We move to the next highest-leverage situation and keep assigning our best pitcher against right-handers until we have reached the number of right-handed batters faced for that pitcher. Once we have exhausted a pitcher’s situations, we move to the next pitcher, etc. If we keep on doing this, we get the potential maximum improvement in Effective wOBA that could be achieved, assuming that each pitcher faces exactly the same number of righties and lefties. Obviously, this upper bound could likely never be achieved, as it may require numerous and sometimes illegal pitching changes (i.e., a pitcher coming out of a ballgame, then coming back in later) or forcing a pitcher to pitch in a situation when he was on the DL, but it can serve as a useful benchmark. If we then look at the ratio of BMAR/UBBM (which will be a percentage), it gives us an idea of the managers who made the most of their potential. At the end of the day, it didn’t change the standings very much in 2009:

        American League                       National League
Manager       BMAR    BMAR/UBBM       Manager       BMAR   BMAR/UBBM
Gardenhire    20.8       36.8%        Macha         17.9     30.2%
Girardi       11.5       23.5%        Bochy         16.8     27.7%
Scioscia      13.9       23.3%        LaRussa       12.8     19.6%
Wakamatsu     13.0       21.2%        Tracy          9.4     17.2%
AL Average    10.3       17.9%        NL Average     6.6     11.3%

If we look at the managers at the bottom of these rankings (let’s call them the Sisyphus Awards, continuing our allusion to the Greeks and someone who definitely did not use a lever), we witness what we will call the “Lidge Effect”:

Manager       BMAR    BMAR/UBBM
C. Manuel     -17.0     -41.0% 
Pinnella       -2.9      -5.8%
Russell        -0.1      -0.3% 
Gaston          2.4       4.5% 

A manager who has a closer whose performance immediately falls off the cliff in a given year (Brad Lidge in Philadelphia, Matt Capps in Pittsburgh in 2009) will cause a manager’s BMAR to be quite poor as the closer is continually brought out in high-leverage situations despite his poor performances.

The Award Winners

Here are the Archimedes Award winners (based on BMAR) over the last five years in both leagues:

Year  American League      BMAR     National League  BMAR
2009  Ron Gardenhire       20.8     Ken Macha        17.9
2008  Gardenhire/Hillman   22.4     Lou Piniella     17.6
2007  Ozzie Guillen        24.7     Bob Melvin       19.0
2006  Mike Scioscia        20.8     Bruce Bochy      18.5
2005  Joe Torre            30.7     Frank Robinson   21.5

Over the time frame of 2005-2009, the table below lists the managers (limited to those that have managed at least four of the five years) ranked by best BMAR. Ron Gardenhire of Minnesota takes the award for best bullpen manager by a wide margin, never finishing below eighth in the majors in any year, and having the best BMAR in each of the last two years.

Manager          BMAR   BMAR/UBBM
Ron Gardenhire   20.0     34.5%        
Mike Scioscia    17.3     32.0%
Terry Francona   16.0     25.1%
Joe Torre        14.2     24.8%
Bruce Bochy      13.6     23.3%
Ozzie Guillen    13.0     20.3%
Jim Tracy        12.5     22.6%       

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Fantastic article.
Wow. Absolutely fascinating read. Great work.
Interesting article. It's not clear, but it appears that you used one year of platoon wOBA to calculate these metrics. Is that correct?

I wonder if an attempt to determine a true talent wOBA would change things at all. I'm concerned that managers who have more pitchers who pitch to their true talent wOBA in a given year will tend to do better regardless of how well the pitchers are used.

You specifically mention that the Lidge effect will kill a manager, but how confident are we that the managers who score well aren't benefitting from having consistently great closers. The last table could be expanded to include the name of the closer and we'd have:


Is that top 5 the best managers or the managers with the best closers?

I like the thought process. I'm just not sure we've actually gotten that far from the quality of the team's closer despite the attempt.
I think that you have a point there. It may not just be a quality of closer, but just the consistency of the closer from year to year, i.e., Nathan IS probably one of the most consistent closers year-in and year-out. Though Girardi wasn't particularly high on the list in 2008, despite Rivera's good year.

I think there is a point though that given the current bullpen management that is fairly similar that at this time there isn't a ton of separation beyond just simply having a good consistent closer, year-in and year-out.

One thing that may be interesting is to see how the list changes if we say develop a BMAR, excluding the times where there is an obvious save situation.
Interesting stuff. Can you post the complete list? My man Leyland isn't in either the best of worst set, but I'd be interested to see where he ranks.
As you note, one of the issues with bullpen management is "resource management and getting pitchers the right amount of work." I'm wondering if your system doesn't punish managers of teams in a position to win often at the end of the game. There might be times, for example, when Gardenhire would love to use Nathan, but he can't because he pitched the previous two nights and is essentially unavailable.
I think it also helps a great deal to have a dominant guy--perhaps even your best pitcher, who is not a closer, and can be used to good effect in the fireman role. While Washington does not make either list, for example, I expect that he got a lot smarter when Feliz showed up as he was used exclusively in high leverage situations, whatever inning they developed in, and to very good effect.
Any theory on the difference between the AL Average and the NL Average? Is that difference stable from year to year?

Perhaps there are more teams with high leverage games in the AL? Or maybe the pitcher's slot forces some suboptimal matchups?
Richard -

The difference is mostly caused by how bad Lidge was. If you take him out of the mix, the numbers are much closer. In 2008, teh difference was the same, but in 2007, the advantage was to the NL
Seems the stat depends a lot on the closer and less so on the manager. BMARxSS would be a good way to probe deeper.

Also, does your system account for missed opportunities for a "tough save" or a save that was longer than three outs?
Technically, I could search on those "tough save" situations but I wouldn't want to eliminate them because issues like that are crucial to bullpen management.

With BMARxSS, what I would want to do is to eliminate the situation where 98% of the time EVERY manager brings in his closer, and that the only differences are solely on the quality of the closer and not the manager's decision, i.e., try and remove the automatic pilot decisions, but leave the tougher decisions in there.
Yeah, definitely don't eliminate tough saves since that reflects bullpen management skills.

Perhaps another way to look at it is to compare Run Expectancies and Win Expectancies before and after a reliever is brought into a game.

Also, does your system account for the first pitching change... i.e. _when_ a manager decides to go to his bullpen for the first time?
The point is that Win Expectancy/Run Expectancy is just as much a measurement of pitcher performance than of bullpen management. What we're trying to do is to tease out the bullpen management aspect of it, separate from the pitcher's underlying performance.

And to your other question, yes I'm looking at all relief plate appearances which brings up some questions that I'll bring up in a follow-up article.
This is why I subscribe to BP. More of this and less "sportswriters" explaining why they are not voting for Tim Raines, please.
I'm wondering the extent to which Cito Gaston's last-place showing was the result of BJ Ryan's melt-down.
Don't be hard on Cito, Greg. It wasn't last place, it was fourth to last place.

With that said, B.J. Ryan isn't likely the problem as he only had 20 IP and really didn't have that high of a leverage. Fact is Casey Janssen had almost as bad of stats, had roughly the same leverage (actually even worse in that Ryan had a few high leverage situations, but a ton of low leverage, while Janssen didn't have that many low leverage situations), and pitched twice as many innings.
Tim - effective bullpen management is maybe the next vanguard of statistical optimization. Well played, and more please.
I'm interested in consistency in the managerial quality from year to year, that is, the metric's reliability over time for the same manager. I would think that this skill on the part of the manager is relatively stable, so you should see some managers rise to the top continually. How much of a spread is there from year to year for top and low performers, and how much movement is there in the rankings?
Now that I read my comment, I see I'm asking about two different things (reliability of the metric and consistency of demonstrating the skill) that are probably pretty hard to tease out from each other. I echo the other replies, though, a very interesting analysis.
Over the 5 years, Charlie Manuel has had his ups and downs, but overall has been off for the 5-year span. Reason I say this is that in 2008 (and Lidge's amazing year), Manuel was near the top.

One can make a claim that a lot of Gardenhire being at the top has been the consistency of Nathan. The manager that impressed me most year in and year out has been Jim Tracy. 3 different ballclubs and always above average.
Tim, I would argue that a manager who continues to stick with a faltering closer (Capps and Lidge were your examples) should be penalized. Sticking with a guy because he is the annointed one is just stubborn.
When the families go back and I get a little time to do some work, I'm going to look at a metric BMARxSS,(Bullpen Management Above Random excluding Save Situations).

I'm thinking kind of the final puzzle on Wheel Of Fortune. Remember when they used to ask for four consonants and a vowel? Well, pretty soon everyone said R,T,N,L and E. It ended up being kind of boring so pretty soon they gave everybody that.

For the time being, it may be interesting to "assume" that the standard play (at this time) is to put in your best closer in the standard ninth inning save situation. I'd be interested in looking at how the BMAR statistic changes if we look outside of that, so that a manager isn't overly rewarded for just having a consistent closer. Would be interested if something changes.
What about looking at stats other than WOBA? You say the same calculation can be done, shouldn't we be using some average of different stats to account for possible variance? No single stat is comprehensive.
Well that's not really true at all. Some stats are really better than others - EqA is simply better than OPS, for instance. Averaging EqA and OPS together gives you less information than simply using EqA alone.

Throwing wOBA into the mix muddies things a bit - wOBA really isn't a stat at all, it's simply a way of converting linear weights (or really any scheme of figuring a player's run contribution to his team) into a rate stat. So the accuracy of wOBA is entirely contingent upon the run estimator used. (You could say the same thing about EqA, actually - by itself EqA is not accurate or inaccurate; it is simply as accurate as the underlying run estimator. In actual practice nobody computed EqA with anything other than EqR, so it really doesn't come up.)

Presuming that Tim used a decent set of linear weights for the calculation of wOBA (I have a hunch as to what he used, and if I'm right they're perfectly fine) there's no reason to think that looking at a different run estimator will add anything to the analysis at hand.
To further suggest, my main use of wOBA (or whatever you use her) is to get some relative measure of each pitcher's effectiveness as compared to each other. My guess would be that using wOBA versus EqA or OPS would likely not make a big difference in the results.

And as for Colin's assumption, you are correct on the weights I used for wOBA
How about showing us the complete league standings?
"Make it so, number 2."

Anyways, check out a follow-up article in the next week or two that addresses many of the comments here and talks about some other little issues, plus I'll include the complete league standings for 2009.
I like articles that make me think.

I would want to mark a manager lower who brings out his 5th best reliever to pitch a tied ninth because the closer only pitches in save situations.

I do think you are capturing managers who probably thinks right/left matchups are very important when using his setup men in the7th and 8th, but then ignores matchups in the 9th when the closer is pitching.

1. Are you using end of the year stats to judge in-season decisions? It might not be fair to analyze a bullpen move in May based on the pitcher's performance in August and September.

2. Are you using each single season's worth of stats? One season, especially for relievers, especially looking at splits, makes or small sample sizes.

To try to solve both of these, you could try an in-season projection, looking at the reliever's 'true talent' estimate at that date (that's what the manager knows then) based on what the pitcher had done so far that season plus diminishing weights on the previous three seasons.

Brian, yes I'm using EOY stats, however, I had thought of doing some true-talent level as you suggested (something like a rolling 12-month performance). This may also adjust for some of the Lidge Effect given that at least for the 1st half of the season, it isn't completelly idiotic to be using Lidge as your cloer.
I don't understand this statement:

"If we do the same exercise, but instead of using the given pitcher’s wOBA, we use the overall team’s wOBA against the same-handed batter. This serves as a baseline if the manager essentially drew names out of a hat based on who would be the next reliever."

Doesn't the team's overall wOBA already include the managers' manipulation of handedness matchups? How can you get a baseline from that? Does it ever come up CC Sabathia if he draws names from a hat?

Also, I think the sample size is too small when it comes to opposing managers' bench options. If your bullpen is right handed and the other teams in your division are full of left handed bench guys, you're going to seem to be at a disadvantage when it comes to manipulating matchups.
It will almost never be CC Sabathia, because I'm only focusing on the relief corps not the overall.

What I'm saying is that to compare what a manager is doing versus "monkey manager." If monkey manager managed a season 1,000,000 times on average in all situations (regardless of its leverage), the pitcher in that situation would be the equivalent of an average pitcher.

Ideally the better manager in higher leverage situations will have a pitcher with a lower wOBA against than the team average and in lower leverage situations he would have a pitcher who has a higher wOBA against than the team average.
So a manager could game the stat simply by being incompetent in the low leverage situations? Or look better because his best two or three relievers are righty or lefty killers?

There seems to be too many other factors that prevent you from reliably determining causation.
I think you may be missing the point a little bit. Essentially, whatever you do in low leverage situations is almost meaningless, so any plate appearances that you have your best pitchers pitch in those situations is wasted since they won't be pitching in high-leverage situations.

Also, why would anyone game the system for this stat?
My overall givens are the following:
1) The goal of a manager is to win ballgames,
2) To win the most ballgames, a manager tries (or at least should try) to have as frequently as possible their best pitchers to pitch in the most crucial situations (which are defined by the Leverage Index)

The article points out that we can start to calculate how good a manager can do this by BMAR.

On your second point is that if the his two or three est relievers are righty/lefty killers (AND he uses them in crucial situations) his BMAR will look good. Now to temper that, I have used UBBM to normalize it a little because this manager has his advantages, so typically BMAR/UBBM is the better normalized measure, but typically this makes little difference in the standings