In the article on the Archimedes Awards, we developed the metric BMAR (Bullpen Management Above Random) to quantify a key aspect of bullpen management: assigning the best pitcher to the highest-leverage situations. While it helped to isolate some of what we were looking for, especially when we normalized by "the best" that a manager could do with the UBBM (Upper Bound Bullpen Management) metric. The problem was that when one looked at the list, managers with consistent closers still seemed to rise to the top of the list. For gosh sakes, in 2008, Trey Hillman tied Ron Gardenhire for highest BMAR, mostly on the back of Joakim Soria.

Removing Vanna White from the Equation

To address this situation, let me propose a Wheel of Fortune analogy. For those who aren’t familiar with the basic concept of the game show… well, I just don’t know what to say, but it’s essentially a less morbid, glitzier version of the children’s game Hangman. After a few puzzles to separate the wheat from the chaff, the winner gets a bonus puzzle, where they are given an opportunity to show some of the letters and then guess the final puzzle.

When the final puzzle first appeared, the contestant was given the opportunity to name five consonants and one vowel. Relatively soon, anyone who had a basic understanding of the English language realized that the best letters to guess were R,S,T,L,N, and E, leading to an anticlimactic decision. To create a little more drama on the show, the game producers made the puzzles harder, and had the above letters as given, but the contestant could ask for three more consonants and one more vowel.

What does this have to do with measuring bullpen management? Since almost every manager will put his closer in the standard ninth-inning save situation (one- to three-run lead in the ninth), I wanted to separate out a manager’s portion of the BMAR because he just happens to have a consistent closer, hence a second alphabet soup statistic called BMARxSS (for Bullpen Management Above Random excluding Save Situations). Essentially, we can separate the portion of a manager’s bullpen management that he gets rewarded with because he is doing what every other manager is doing, by eliminating the ninth-inning save situation from consideration. 


The table below shows the 30 teams sorted by BMARxSS in 2009 and what their BMAR score and rank are:

Team (Manager)          BMARxSS       BMAR (Rk)  
1. TBA (Maddon)           15.9        13.8(5)
2. ANA (Scioscia)         12.7        13.9(4)
3. SFN (Bochy)            11.0        16.8(3)   
4. CHA (Guillen)          7.5          7.7(18)    
5. MIN (Gardenhire)       7.2         20.8(1)
6. TEX (Washington)       7.0         10.6(13)
7. BAL (Trembley)         6.0         11.6(11)
8. MIL (Macha)            5.9         17.9(8)
9. FLO (Gonzalez)         5.9          2.5(26)
10. WAS (Acta, Riggleman) 5.6          5.5(23)
11. HOU (Cooper, Clark)   5.4         13.2(7)
12. CLE (Wedge)           5.4          7.7(19)
13. SDN (Black)           4.3         12.8(9)
14. ARI (Melvin,Hinch)    4.2          8.2(17
15. DET (Leyland)         4.1          5.4(24)
16. PIT (Russell)         3.6         -0.1(28)
17. SLN (LaRussa)         3.6         12.3(10)
18. NYA (Girardi)         3.5         11.5(12)
19. KCA (Hillman)         2.7          9.1(16)
20. TOR (Gaston)          2.2          2.4(27)
21. CIN (Baker)           2.0          5.6(21)
22. OAK (Geren)           1.4          5.6(22)
23. BOS (Francona)        1.1         13.7(6)
24. SEA (Wakamatsu)       0.9         13.0(8)
25. ATL (Cox)             0.3         10.8(14)
26. PHI (Manuel)          0.1        -17.0(30)
27. LAN (Torre)          -1.1          7.5(20)
28. CHN (Piniella)       -2.6         -2.9(29)
29. COL (Hurdle, Tracy)  -2.9          9.4(15)
30. NYN (Manuel)         -3.3          2.9(25)
    Average               4.0          8.4

By looking at BMARxSS as opposed to just BMAR, this switches some of the order around, but except in a handful of situations, there aren’t too many jumps. But what about some of these changes?

We notice that BMARxSS rates Ozzie Guillen a top-five manager, but if you look at the difference between his BMARxSS and BMAR, there is very little change. This is somewhat surprising, because Bobby Jenks was an adequate closer, and not the disaster of, say, Brad Lidge or Matt Capps in 2009. This comes from the fact that Matt Thornton outshone Jenks (even against righties) and BMAR is penalizing Guillen for not having Matt Thornton (average Leverage Index of 1.503) in more higher leverage situations than Jenks (average Leverage Index of 1.938)

One further point to notice is that, on average, the actual effective improvement in effective wOBA (i.e., wOBA weighted by the leverage of the situations) in all reliever situations is 8.4 points, of which 4.0 points comes from non-save situations, yet still 4.4 points comes from save situations, meaning that half of the benefit is coming in save situations as compared to all other situations. It is interesting to note that the average team has their relievers face 2,075 batters in non-save situations, and only 182 batters in save situations over the course of the season. This suggests that the emergence of the closer (as opposed to the fireman) may not be so directly tied to just the save statistic.

The Grady Little Factor

As any Red Sox fan (circa 2003) can tell you, another significant factor of bullpen management is the decision of when to pull your starter. So, how do we try to measure a manager’s impact on this aspect of the game? For 2009, I looked at all the batters faced in the seventh inning or later by the starter. Then, using a similar methodology that we described in the first article, I examined the quality of the starting pitcher (measured by effective wOBA) against that batter’s handedness and compared that to what was typically available in the bullpen, and then aggregated.

In the table below, the Starter Predict column shows how much better (or worse) the starter was compared to the bullpen measured by points of effective wOBA. The result column is what actually happened in terms of how many effective wOBA points was the actual result as compared to the bullpen baseline. The GLF column is the Grady Little Factor, which is the difference between the Result as compared to the Starter prediction, which estimates either the manager’s luck or "skill" if you think believe in the manager’s skill.

So, to give an example of what we are talking about, let’s take LaRussa, the first on the list. There were 418 plate appearances where his starter was pitching in the seventh inning or later. Comparing the specific starters against the specific handedness of batters in those situations, we would expect that the effective wOBA of the batters in those situations would have been 24 points worse had LaRussa put in his bullpen. In actuality, the batters hit 71 points worse than had they gone against the bullpen.

Two comments need to be made on Starter Prediction. In this measurement, we are using the starter’s effective wOBA over the course of the entire year with all batters. On one hand, we may over-emphasize the starter’s abilities since we are focusing at the end of the game when they are fatigued. On the other hand, when a pitcher is pitching in the seventh inning and beyond, it is likely that they are having one of their better games (i.e., survivor bias). It seems that the latter prevails a bit more, as the average GLF is slightly positive. This means that, on average, if the starting pitcher is in the seventh inning and beyond, they pitch a few points better than their overall average would suggest.

Team (Manager)            PA       Predict    Result      GLF       
1. SLN (LaRussa)          418         24        71        47
2. HOU (Cooper, Clark)    215          8        62        54
3. PHI (Manuel)           344        -14        54        68
4. SFN (Bochy)            431         22        45        23    
5. DET (Leyland)          361         13        42        29
6. BOS (Francona)         318          8        38        30
7. PIT (Russell)          305         -5        33        38
8. ATL (Cox)              366         21        32        11
9. COL (Hurdle, Tracy)    431         11        22        12
10. CLE (Wedge)           325         -3        21        24
11. ANA (Sciosia)         387          0        19        18
12. MIL (Macha)           196        -35        18        53
13. CIN (Baker)           371        -25        14        39
14. TEX (Washington)      312        -17        13        30
15. ARI (Melvin, Hinch)   358          3        10         7
16. TOR (Gaston)          421         -4        -2         2
17. TBA (Maddon)          390         -3        -2         1
18. KCA (Hillman)         336         30        -2       -32
19. NYN (Manuel)          273        -13       -10         3
20. BAL (Trembley)        217         -4       -11        -7
21. OAK (Geren)           184        -37       -11        26
22. SEA (Wakamatsu)       365         14       -12       -26
23. CHA (Guillen)         397          3       -18       -22
24. NYA (Girardi)         341          1       -21       -23
25. FLO (Gonzalez)        259         11       -29       -40
26. SDN (Black)           212        -13       -32       -19
27. CHN (Pinella)         342          9       -33       -42
28. MIN (Gardenhire)      373        -19       -55       -36
29. LAN (Torre)           226        -13       -55       -42
30. WAS (Acta, Riggleman) 225          4       -76       -80
    Average               323         -1         4         5

An interesting note in this graph is Charlie Manuel. A manager who has been known to "go with his gut." In 2009, it seemed to pay off with regards to keeping his starters in there, as he led all other managers in GLF. In 2008, however, he was on the opposite end of the spectrum. Once again, he was negative in the starter predict column (i.e., sticking with his starters). However, in those situations, the starters performed much worse, resulting in Manuel being last in the league in GLF. With regards to stretching his starters, his hunches didn’t play out in 2008. Similarly, Tony LaRussa, was very high in 2009 with his GLF and had a -25 in 2008. Looking at all the managers who were at the helm in both 2008 and 2009, the correlation in GLF was essentially zero.

 In our next and final article on bullpen management, we will examine one other issue that BMAR doesn’t capture and that is the allocation of a pitcher’s batters faced amongst righties and lefties. I’m specifically thinking of the extreme case regarding Trey Hillman’s use of Jimmy Gobble in 2008. Gobble’s slash line against righties was .382/.517/.676, while it was .200/.257/.323 against lefties, yet he faced 89 righties and 70 lefties. Also, we will put all of this to develop an overall picture of each manager’s bullpen management over the last few years.  

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Interesting. The thought about the closer being a given has entered my mind before, but I never put it in concrete terms. Makes a lot of sense now a days (sadly) to judge bullpen management in this way. I find it quite interesting that Joe Torre does so poorly.
Actually, I think that letter selection is wrong. I think it's based on the frequency of letters that contestants pick, not what the right set would be. I think the actual order is more like:


Stripping all the vowels but E out gives you ETNSHR as what the free letters should be. In an admitted statistically insignificant sampling of the show, people seem to overselect L.

Oh, wait. This is a baseball site, isn't it? Sorry....
Since you asked...

H's position on the list is due almost entirely to the frequency of 'the' and various pronouns (he, she, this, that, who, which, what, where, they...) in English prose. Since those words are never the solution to the puzzle, you need to demote H.

C is less common than D and L, but it is disproportionately the initial letter of words. Since knowing the first letter makes the puzzle vastly easier to solve, C should be selected more often than its raw frequency would suggest.
Small point, but in the first chart there's no #2 and two #8's...does this mean the much-maligned Bochy's actually #2? Hope he uses Romo and Runzler (and Waldis Joaquin?) in high leverage situations this year...