September 4, 2015
On Manager Analysis
Matt Williams has had a week. He was lambasted by fans and analysts alike on Twitter for his managing of the back end of the Nationals' bullpen after it imploded against the Cardinals. In the wake of a tormenting loss that arguably could have been an easy win, Williams felt the need to defend his strategic approach:
It was suggested to Williams that Janssen could have pitched a theoretical 10th if the Nats went to extra innings and took the lead.
Essentially, Williams gives each pitcher a role and that role sticks unless something absurd happens. What Williams' week has really done, though, is bring into the spotlight the fact that we don't have a good feel for evaluating managers, not even when it comes to a seemingly simple background question like "how many managerial decisions does it take to add up to a noticeable effect?"
First, a little background on the aforementioned lambasting:
The public outcry got to the point where Williams' boss, Mike Rizzo, felt he had to step in and defend his beleaguered manager. Rizzo's most resonant point was that Williams has to balance information about workloads, health, future schedule, and much more when determining who to bring in for any given situation. This point is particularly salient, and it's one that we've discussed in this space before. To get a sense of all that goes into the decision, here's a taste of the factors adding branches to a manager's decision tree:
That's a lot of things to consider. By no means is it a complete list, but simply a place to start when trying to think like an MLB manager. Keep in mind that not only does the manager need to go through all of those factors and pick the "right" guy, he needs to figure all of those things out far enough in advance for his relievers to properly warm up, far enough in advance, in fact, that he hasn't already blown his desired reliever in a less optimal setting earlier in the game, or even earlier in a series.
This isn't to defend Matt Williams and his usage of his bullpen during an important series. Rather, it's to provide context for the difficulty in making the decision. The challenge is in alignment with the importance of these late-game decisions. In a larger sense, Williams' actions are emblematic of issues that analysts have often had with managers who make seemingly poor decisions in critical moments.
All of this public outcry garnered a response from long-time analyst Mitchel Lichtman, who sent a series of tweets trying to put Williams' decision in perspective:
Dave Studeman would request clarification about whether Lichtman's math included the leverage of each situation, to which Lichtman responded "pretty much." I could be misinterpreting Lichtman's response here, but it seems to me that he's arguing that Williams' poor bullpen management is costing his team a minuscule amount when all is said and done. Lichtman would seem to double down, though, when Sean McNally argued that this thought process ignored the impacts of leverage or situational context.
In this case, however, accounting for leverage isn't the only component of context that needs to be accounted for. By regressing all the decisions a manager must make to the mean over dozens of games and hundreds of decisions, we're missing the key fact that an MLB game concludes with a binary result.
Managers make thousands of decisions in a day. Most, like which pair of shoes to wear or whether to have water or iced tea with lunch are inconsequential. Some, like whether Bryce Harper should bat second or fourth in the lineup, are of greater importance, but generally impact the outcome of a single game very little. A few decisions, though, like refusing to use your best reliever in the ninth inning of a tied game on the road, can have a tremendous one-game impact.
This concept is why some calculations of WAR give relievers a leverage bonus. Relief-pitcher value is tied closely to the average leverage of a player's appearances, with elite relievers being worth more than mop-up guys, even if they both end the season with the same stats. Or being worth less: A mop-up man with a 5.00 ERA isn't doing much harm; a closer with the same ERA is setting fire to his team's season.
Similarly, a manager who improperly has his team bunt in the fifth inning is doing a lot less damage to his team's chances of winning than a manager who routinely mismanages his bullpen late in close ballgames. In totality, all these decisions have small individual impacts on runs scored and allowed. In that way, Lichtman's point is completely fair. When a number of decisions are aggregated over hundreds of opportunities in a variety of contexts, each decision isn't that important. The real, contextual impact—did this move actually help us win or lose this specific game?—of a decision can't be ignored, though.
Perhaps the solution is to pursue a mixed-model approach, not dissimilar to the type used by BP's stats team in building DRA. In that case, the team worked to account for the wide variety of factors that impact a pitcher's performance, like ability to hold baserunners, game-time temperature, and so on. This gives a full view of pitcher performance, rather than a more simplified metric that makes assumptions about BABIP or catcher framing.
Of course, there are huge limitations to considering the "real" value of a player or manager. First and foremost, we can't predict the future, so we don't know what would have happened if a manager made a different decision or a reliever pitched differently in a particular situation. We can only speculate, which comes with its own hazards.
That in itself makes a regressed approach more attractive, because we can look at actual results and determine a mathematically accurate value for any given event or decision a manager might make. This approach, used for most everything else those of us here at Baseball Prospectus analyze, makes a ton of sense in nearly all situations.
Let's use some more specific examples to help drive home the difference between my proposed approach and Lichtman's claims. Lichtman is arguing that the difference between the optimal pitcher and the second-, third-, or fourth-best pitcher in any given situation is so small that the impact of a bad decision is minimal. In the case of Matt Williams, we might say that the difference between Drew Storen and Casey Janssen is 0.1 runs per inning, so it will take 10 outings for Janssen to give up a run that might contribute to a loss that Storen wouldn't have allowed. If we assume 10 runs equals a win, then it would be 100 outings before Williams cost his team a theoretical win. If the MLB standings were decided by best run differential over 1,300 innings, then Lichtman's approach would be completely valid.
The reality, however, is that the MLB season is broken into 162 games, each of which must produce a binary outcome. This phenomenon isn't limited to baseball. In fact, 15 years ago there was a perfect example of how the smallest factors could have a tremendous real-world impact: the U.S. presidential election. Many states, including Florida, employ a "winner take all" policy with regard to electoral college votes. If a candidate wins a state with a popular-vote edge of 60-40, s/he gets all 29 of Florida's electoral college votes. A blunder lowers the popular-vote margin to 59-41? S/he still gets all 29 votes. It would take many blunders to fall enough in the voting to lose the electoral college votes.
But what if the popular vote is much closer? What if a candidate has a 50.5–49.5 edge? S/he still walks away with all 29 votes, but if the candidate makes the same blunder as before, the one that moved the popular vote needle just one percentage point, the consequences would be disastrous. All 29 Electoral College voters would now go to the opponent.
In aggregate, across many elections and millions of popular votes, these shifts in results matter little. In swing states, like Florida in 2000, small ripples can have immense ramifications.
Few aspects of a baseball game result in more second guessing of managers than late-inning bullpen management. One major reason for this is because there's information asymmetry between managers and fans/analysts. Another reason is because, as we illustrated, there's no tried-and-true method for evaluating the moves a manager makes.
In terms of win probability, Lichtman is 100 percent correct in saying that any one decision has a small mathematical impact on the outcome of a game. In terms of game outcomes, though, a manager's decision when sitting on the razor's edge between winning and losing can have a large impact on whether that game is a win or a loss.
For major-league managers there is one simple truth. Wins are a finite resource, and a manager's job is to collect as many wins as he possibly can. After nine innings on any given night, a winner is declared, and that team's manager gets to add a win to his club's total. Remember that if the difference between a Janssen and a Storen is 0.1 runs per inning, then we can expect that Janssen would only give up an extra run compared to Storen in one of 10 innings. If those innings come when the Nationals are up 10-5, then it doesn't matter when that extra run comes. If that extra run comes in an inning when the game is on the line, however, that decision could have a very real impact on the outcome.
Regressed wins are useful, but I think it's worth considering that assessing the performance of managers isn't one of those situations: The context of their decisions is nearly everything. A bad manager can cost his team real, actual, printed-in-the-newspaper wins. That shouldn't be ignored in favor of a broader, context-neutral look at their performance.
Special thanks to Neil Weinberg and R.J. Anderson for their research and writing assistance