Matt Williams has had a week. He was lambasted by fans and analysts alike on Twitter for his managing of the back end of the Nationals' bullpen after it imploded against the Cardinals. In the wake of a tormenting loss that arguably could have been an easy win, Williams felt the need to defend his strategic approach:

It was suggested to Williams that Janssen could have pitched a theoretical 10th if the Nats went to extra innings and took the lead.

"He could," Williams allowed. "Sammy Solis could as well. But Sammy's our long guy. So what we tried to do last night was stay off Casey, because he had such a heavy workload the night before. Again, it gets down to the result of the game, and the result of the game was a three-run homer. But Casey also worked through the first two outs of that game and had the previous hitter two strikes and just couldn't put him away. So you look at the result, and the result says 'Oh you should do something different,' but you don't use your closer in that regard because he needs to close that game out."

Essentially, Williams gives each pitcher a role and that role sticks unless something absurd happens. What Williams' week has really done, though, is bring into the spotlight the fact that we don't have a good feel for evaluating managers, not even when it comes to a seemingly simple background question like "how many managerial decisions does it take to add up to a noticeable effect?"

First, a little background on the aforementioned lambasting:

The public outcry got to the point where Williams' boss, Mike Rizzo, felt he had to step in and defend his beleaguered manager. Rizzo's most resonant point was that Williams has to balance information about workloads, health, future schedule, and much more when determining who to bring in for any given situation. This point is particularly salient, and it's one that we've discussed in this space before. To get a sense of all that goes into the decision, here's a taste of the factors adding branches to a manager's decision tree:

  • What is the situation? (score, inning, game importance, etc.)

  • Who is due up for the opponent?

  • Are there likely to be higher- or lower-leverage situations after this one?

  • Should I bring in a left-handed or right-handed pitcher?

    • When might a LOOGY be best deployed?

    • How many left-handed and right-handed pitchers do I have?

  • Is the opposing team a good low-ball or high-ball hitting team?

  • When was the last time each of my relievers threw?

    • How many days in a row have they thrown?

    • How many pitches did they throw over their most recent outings?

  • How do my relievers feel?

    • Did anyone have difficulty getting loose recently?

    • Is anyone coming off an injury?

    • Does any typically take longer to warm up?

    • Did anyone have poor command while throwing in the bullpen?

  • What's coming up on the schedule that I should be aware of?

  • How have my relievers been performing?

  • Am I going to need multiple relievers this inning?

  • Does this situation align with someone's role (e.g. setup man, closer)?

    • If so, will making an exception have motivational or psychological fallout?

That's a lot of things to consider. By no means is it a complete list, but simply a place to start when trying to think like an MLB manager. Keep in mind that not only does the manager need to go through all of those factors and pick the "right" guy, he needs to figure all of those things out far enough in advance for his relievers to properly warm up, far enough in advance, in fact, that he hasn't already blown his desired reliever in a less optimal setting earlier in the game, or even earlier in a series.

This isn't to defend Matt Williams and his usage of his bullpen during an important series. Rather, it's to provide context for the difficulty in making the decision. The challenge is in alignment with the importance of these late-game decisions. In a larger sense, Williams' actions are emblematic of issues that analysts have often had with managers who make seemingly poor decisions in critical moments.

All of this public outcry garnered a response from long-time analyst Mitchel Lichtman, who sent a series of tweets trying to put Williams' decision in perspective:

Dave Studeman would request clarification about whether Lichtman's math included the leverage of each situation, to which Lichtman responded "pretty much." I could be misinterpreting Lichtman's response here, but it seems to me that he's arguing that Williams' poor bullpen management is costing his team a minuscule amount when all is said and done. Lichtman would seem to double down, though, when Sean McNally argued that this thought process ignored the impacts of leverage or situational context.

In this case, however, accounting for leverage isn't the only component of context that needs to be accounted for. By regressing all the decisions a manager must make to the mean over dozens of games and hundreds of decisions, we're missing the key fact that an MLB game concludes with a binary result.

Managers make thousands of decisions in a day. Most, like which pair of shoes to wear or whether to have water or iced tea with lunch are inconsequential. Some, like whether Bryce Harper should bat second or fourth in the lineup, are of greater importance, but generally impact the outcome of a single game very little. A few decisions, though, like refusing to use your best reliever in the ninth inning of a tied game on the road, can have a tremendous one-game impact.

This concept is why some calculations of WAR give relievers a leverage bonus. Relief-pitcher value is tied closely to the average leverage of a player's appearances, with elite relievers being worth more than mop-up guys, even if they both end the season with the same stats. Or being worth less: A mop-up man with a 5.00 ERA isn't doing much harm; a closer with the same ERA is setting fire to his team's season.

Similarly, a manager who improperly has his team bunt in the fifth inning is doing a lot less damage to his team's chances of winning than a manager who routinely mismanages his bullpen late in close ballgames. In totality, all these decisions have small individual impacts on runs scored and allowed. In that way, Lichtman's point is completely fair. When a number of decisions are aggregated over hundreds of opportunities in a variety of contexts, each decision isn't that important. The real, contextual impact—did this move actually help us win or lose this specific game?—of a decision can't be ignored, though.

Perhaps the solution is to pursue a mixed-model approach, not dissimilar to the type used by BP's stats team in building DRA. In that case, the team worked to account for the wide variety of factors that impact a pitcher's performance, like ability to hold baserunners, game-time temperature, and so on. This gives a full view of pitcher performance, rather than a more simplified metric that makes assumptions about BABIP or catcher framing.

Of course, there are huge limitations to considering the "real" value of a player or manager. First and foremost, we can't predict the future, so we don't know what would have happened if a manager made a different decision or a reliever pitched differently in a particular situation. We can only speculate, which comes with its own hazards.

That in itself makes a regressed approach more attractive, because we can look at actual results and determine a mathematically accurate value for any given event or decision a manager might make. This approach, used for most everything else those of us here at Baseball Prospectus analyze, makes a ton of sense in nearly all situations.

Let's use some more specific examples to help drive home the difference between my proposed approach and Lichtman's claims. Lichtman is arguing that the difference between the optimal pitcher and the second-, third-, or fourth-best pitcher in any given situation is so small that the impact of a bad decision is minimal. In the case of Matt Williams, we might say that the difference between Drew Storen and Casey Janssen is 0.1 runs per inning, so it will take 10 outings for Janssen to give up a run that might contribute to a loss that Storen wouldn't have allowed. If we assume 10 runs equals a win, then it would be 100 outings before Williams cost his team a theoretical win. If the MLB standings were decided by best run differential over 1,300 innings, then Lichtman's approach would be completely valid.

The reality, however, is that the MLB season is broken into 162 games, each of which must produce a binary outcome. This phenomenon isn't limited to baseball. In fact, 15 years ago there was a perfect example of how the smallest factors could have a tremendous real-world impact: the U.S. presidential election. Many states, including Florida, employ a "winner take all" policy with regard to electoral college votes. If a candidate wins a state with a popular-vote edge of 60-40, s/he gets all 29 of Florida's electoral college votes. A blunder lowers the popular-vote margin to 59-41? S/he still gets all 29 votes. It would take many blunders to fall enough in the voting to lose the electoral college votes.

But what if the popular vote is much closer? What if a candidate has a 50.5–49.5 edge? S/he still walks away with all 29 votes, but if the candidate makes the same blunder as before, the one that moved the popular vote needle just one percentage point, the consequences would be disastrous. All 29 Electoral College voters would now go to the opponent.

In aggregate, across many elections and millions of popular votes, these shifts in results matter little. In swing states, like Florida in 2000, small ripples can have immense ramifications.

Few aspects of a baseball game result in more second guessing of managers than late-inning bullpen management. One major reason for this is because there's information asymmetry between managers and fans/analysts. Another reason is because, as we illustrated, there's no tried-and-true method for evaluating the moves a manager makes.

In terms of win probability, Lichtman is 100 percent correct in saying that any one decision has a small mathematical impact on the outcome of a game. In terms of game outcomes, though, a manager's decision when sitting on the razor's edge between winning and losing can have a large impact on whether that game is a win or a loss.

For major-league managers there is one simple truth. Wins are a finite resource, and a manager's job is to collect as many wins as he possibly can. After nine innings on any given night, a winner is declared, and that team's manager gets to add a win to his club's total. Remember that if the difference between a Janssen and a Storen is 0.1 runs per inning, then we can expect that Janssen would only give up an extra run compared to Storen in one of 10 innings. If those innings come when the Nationals are up 10-5, then it doesn't matter when that extra run comes. If that extra run comes in an inning when the game is on the line, however, that decision could have a very real impact on the outcome.

Regressed wins are useful, but I think it's worth considering that assessing the performance of managers isn't one of those situations: The context of their decisions is nearly everything. A bad manager can cost his team real, actual, printed-in-the-newspaper wins. That shouldn't be ignored in favor of a broader, context-neutral look at their performance.

Special thanks to Neil Weinberg and R.J. Anderson for their research and writing assistance

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Part of the issue is that when MGL refers to a "win" he's referring to a marginal win, I believe. Folks are pretty familiar with wins and losses, but have less idea of the value of a marginal win. It's why some folks think a manager can be worth 10 wins or some nonsense, when even the best are probably more like 3-4.

Each of those 2%s add up over the course of a year, though. If he is routinely mismanaging then it's entirely possible that Williams approaches the bottom of the scale. 2% of a marginal win over 50 games is a full win and this is only looking at one aspect of his purview. Of course it's easy to look at the comments of Williams and have a laugh, especially with expanded rosters, which should lead to the ability to use specialists more.
I've been wondering if Jonathan Papelbon extracted an agreement from the Nationals, formal or informal, that he wouldn't be asked to do anything but close games in the "traditional" way. Papelbon had a veto of a trade to the Nationals, and his contract with the Phillies had an option that vested only if he closed a certain number of games. Mike Rizzo guaranteed that option for 2016, with no strings attached that I've read about. But I wonder if there are strings that we don't know about, whether contractually or just in terms of an understanding between the team and Papelbon.

This isn't to say that Matt Williams wouldn't be inclined to use the bullpen the way he does even in the absence of such an agreement. But it has been remarkable to see him stretch the other relievers the way he has and use Papelbon so infrequently that I wonder if his hands are tied somehow.
It must not be forgotten that a sword is hanging over Matt Williams head for his removal of Jordan Zimmerman in last years playoff game with the Giants and bringing in Storen, who had been the goat the previous year. This, plus the inexplicable use of Janssen after he had stunk up the joint the night before, are difficult to ignore. In game managerial decisions are easy to second guess, ya think, but Williams seems to zig when he should zag and vice versa with an uncanny amount of bad luck, or is it a lack of understanding of all the yariables mentioned above. i.e. incompetence.
Williams infers that Papelbon is not going to pitch two innings, no matter what! This is a must win game and he says that! This is a time for a player to suck it up and while there is no guarantee that Papelbon would be successful it would seem to be a better choice over a used up and ineffective Janssen.
Credible information, certainly, but ulimtately extreme overanalysis.

Manage to win today. Right now, with your best players. Because tomorrow, it might rain.
Or your closer's arm might fall off because you've been managing to win for the last five nights. Note that in a domed stadium, that is ... more likely than rain.

Extreme underanalysis. Keep the good stuff coming, Jeff.
The genesis of the article is a manager that doesn't use his relievers ENOUGH, rather than too much. By sticking stubbornly to roles instead of just using the best guy available (and pitching four straight days means you probably *aren't* available), the team is losing games.
You listed 20 potential things for the manager to consider. It is hard to believe anyone can think that hard and make a timely decision at least not without a team of statheads and calculators in the dugout. Maybe he thinks of five of them. Bottom line though he is losing critical games in September with his best reliever on the bench night after night. He will finish .500 but by golly he managed perfectly; it is not his fault. Somehow it is fitting he is manager of Washington.
Don't see why any of this is a surprise. Every team out there has decided to go with a bullpen full of 1-IP max guys. The inevitable result is gonna be that by the end of a season where we have expected too much micromanaging throughout; there aren't gonna be as good options. Storen and Papelbon are both on a pace for not-much-left-in-tank for the postseason. Janssen is exactly the reliever you'd expect to eat up as many appearances as possible in Sept.

And since this game is in September (in large part BECAUSE of the 5 months of previous micromanagement that eliminates options) - not May - expect ALL of the 'feedback' to be geared towards encouraging even more of the 1-IP max micromanaging in future seasons.
"In terms of win probability, Lichtman is 100 percent correct in saying that any one decision has a small mathematical impact on the outcome of a game. In terms of game outcomes, though, a manager's decision when sitting on the razor's edge between winning and losing can have a large impact on whether that game is a win or a loss."

Jeff, you are completely 100% wrong on this (and you should know it).

The whole point of one decision being .01 or .02 better in terms of WE than another is that it does NOT make much difference in a course of a game. That completely contradicts your last sentence, which makes no sense whatsoever. It can NOT and does NOT have a large impact, unless by "can have an impact" you mean that the team might lose the game say 20% of the time after said decision. But the answer to that is, "Had the manager made the correct decision, the team would lose the game 19 or 18%." That the entire point of understanding the one game impact of a decision.

Yes, those mistakes add up and I am by no means suggesting that we shouldn't care about manager mistakes if we are a fan of their team because they have little impact on one game.

That would be like me saying that you shouldn't care about your child not wearing his helmet while bicycling because it only increases his chances of getting seriously injured by 1 in a 1000 each time he rides.

Anyone who thinks that my comments mean that you shouldn't care about manager decisions has apparently not listened to a thing I have said over the last 20 years, or they just want to write a story (I am not suggesting that you are doing that).
I think what Jeff is trying to say is that a particular decision - such as Janssen vs. Papelbon - may occur at the tipping point between winning and losing. No team loses a particular game 18%, 19%, or 20% of the time; the team wins or loses that game 0% or 100% of the time. And - after accounting for every other event that has occurred in said game - a bad managerial decision in the last inning may push a game from a win to a loss.
"And - after accounting for every other event that has occurred in said game - a bad managerial decision in the last inning may push a game from a win to a loss."

There is absolutely no way to know that, that's why the only useful heuristic is the 19/20% or the "it cost 1% in WE."

It's not like, "fielder makes an error that cost the game." If a manager decides to yank the starter for a reliever in the 9th (or alternatively, leaves him in) and the pitcher loses (or wins) the game, there is NO WAY TO KNOW the culpability of the manager based on the outcome of one game. THAT IS WHY we must use the, "the wrong decision cost 1% of a win" framework.

And if you use your heuristic, the times when the manager made the correct decision will result in a loss sometimes as well. And when he makes the bad decision, the result will sometimes be a win. There is NO way to know from the outcome whether the decision is good or bad. Again, that is why the ONLY framework that has any practical value whatsoever is the one where you estimate the "cost" or "gain" of a decision in marginal wins per decision, which is on the order of around 1%.

You and Jeff are arguing about angels dancing on the head of a pin. Actually I have no idea what you are arguing. None whatsoever.

And for the record, Jeff uses the term "regressed wins" completely speciously.
A couple of thoughts...

Generally speaking I agree with much of what you said in the comments here, and I want to make sure that nobody sees this article as a "takedown" or anything of that sort. My intention was for it to clarify something that I think is very important, that I probably didn't articulate well enough in the article itself (noting that only after your comments did I come up with appropriate language, so that's of course the reason for that). You made an important point:

"That would be like me saying that you shouldn't care about your child not wearing his helmet while bicycling because it only increases his chances of getting seriously injured by 1 in a 1000 each time he rides."

I think this is critical because the general reaction to your initial tweets was "MGL says we shouldn't care about bad decisions by managers. What?!?!" which of course wasn't your intent. You're simply pointing out that, generally speaking, any one decision impacts the win probability very little. This is, as I said in the article, 100% accurate.

I also agree with your points that A. we have no way of knowing if the manager made the right decision in any given instance (because we can't predict the future or any number of alternate futures), and that B. there will be times where the manager does make the 'correct' decision, and loses anyway. To wit, we don't know that Storen or Papelbon would've not given up runs like Janssen in that particualr instance.

The crux of this, for me, is that the timing of these decisions should not be ignored. While the impact of any one individual decision is minimal, the timing of it can amplify that impact. There's a difference between bringing in your 4th best reliever for a given spot when down by a run vs. when the game is tied. The manager's decision to bring in a reliever who is not best suited to succeed in a given situation might only have a small impact, but it is entirely within the manager's control around the timing of that decision. Minimizing the high-leverage mistakes could be the difference between winning and losing.

With perfect hindsight we can go back and critique a manager for the inLI of his relievers. How often did he make the wrong decision? Bringing in a decent or poor reliever in high leverage situations? That was a big part of the methodology here:

and here:

Anyway, the purpose here isn't to propose a solution, or even a "I'm right, MGL is wrong" proclamation. The purpose was to simply encourage greater discussion of this one aspect of maangerial performance, because it is easily critiqued but not easily understood. I hope that this article was able to accomplish that, and I truly believe that the 'solution', if there is one right solution, is somewhere in our back-and-forth here.
I'm not sure how minimal one loss per 50 to 200 bad decisions really is. If a bad field manager makes two mistakes per game, that's about 320, or somewhere between 1.6 and 6.4 losses for the season. That's certainly enough to affect the pennant race, and if it were a player it would be a terrible to unheard-of historically-terrible one.

That's assuming that the estimated effect is reasonable. At the extreme, a manager who replaced Mike Trout with some 4A guy every day all year could cost his team 10 losses on only 162 bad decisions. This is a ridiculous example, but where was the line drawn? What was accounted for?
We're all guessing of course, but I don't think any manager makes 2 bad decisions a game. Then again, we really would have to establish the base line and what constitutes a "bad decision." For example there would have to be a threshold, say, a decision that cost .5%.

In my opinion, the average manager can add 1-2 wins a season with optimal decisions. The bad manager, relative to the average one, maybe 2-3. That is a very wild-assed, but educated, estimate.

Also if you take my word for it that a bad decision costs .5% to 2%, you cannot say that 320 cost 1.6 to 6.4 losses. That 50 to 200 is a range from the most to least egregious decisions. Actually, that's not true. Obviously the limit to the least egregious is 0, but we can define any bad decision as one that cost at least .5%. I believe the cap is around 2% and that happens very infrequently.

So, over 320 bad decisions, the average decision across that .5 to 2% probably cost around 1% which would mean a total of 3.2 wins for a manager who makes 320 of them, which, as I said, is not a realistic number, I don't think.
You're absolutely right about "bad decision" being a very loose term at this point. Living in the Washington era at this particular moment, it is easy to imagine a manager regularly making two bad decisions per game, but I could be wrong about Williams's seeming howlers, or I could be underestimating by half, who can say. (Rendon, an excellent hitter who never bunts, bunting with Harper coming up in the 9th in this huge game?? This should count for five goofs all by itself.)

I don't follow the math here. 320 x 0.5% = 1.6, that's all I was doing. I may be misinterpreting what you mean by those percentages, and/or by the 50-t0-200.