Pitching Backward: Designing A Bullpen Usage Critique

Last week here at Pitching Backward we took a look at two managers, one who excelled at managing his bullpen and one who really struggled. The data is indisputable, and the analysis sound. There’s certainly a subjective component to it, but comparing the RE24 and inLI ranks for a handful of relievers on any given team is a pretty simple exercise.

The problem with everything I wrote last week is that it’s almost entirely wrong. Not wrong in principle, but wrong in theory. It’s easy for us to sit here and say, in hindsight, that a manager didn’t deploy his bullpen resources effectively, but we're looking at averages, drastically oversimplifying the decisions that must be made against the unique battle terrain of each day. While we talk often about the relatively simple decision of when to pull a starting pitcher, we have far less certainty over the subsequent decision: Who should the manager go to?

Last week I argued that Fredi Gonzalez should have used Anthony Varvaro in higher-leverage situations. I never brought up the factors that might lead to Gonzalez using David Carpenter instead. To get a sense of all that goes into the decision, here’s a taste of the factors adding branches to a manager’s decision tree:

· What is the situation? (score, inning, game importance, etc.)

· Who is due up for the opponent?

· Are there likely to be higher- or lower-leverage situations after this one?

· Should I bring in a left-handed or right-handed pitcher?

o When might a LOOGY be best deployed?

o How many left-handed and right-handed pitchers do I have?

· Is the opposing team a good low-ball or high-ball hitting team?

· When was the last time each of my relievers threw?

o How many days in a row have they thrown?

o How many pitches did they throw over their most recent outings?

· How do my relievers feel?

o Did anyone have difficulty getting loose recently?

o Is anyone coming off an injury?

o Does any typically take longer to warm up?

o Did anyone have poor command while throwing in the bullpen?

· What’s coming up on the schedule that I should be aware of?

· How have my relievers been performing?

· Am I going to need multiple relievers this inning?

· Does this situation align with someone’s role (e.g., setup man, closer, etc.)?

o If so, will making an exception have motivational or psychological fallout?

That’s a lot of things to consider. By no means is it a complete list either, but simply a place to start when trying to think like an MLB manager. Keep in mind that not only does your favorite team’s manager need to go through all of those factors and pick the “right” guy, he needs to figure all of those things out far enough in advance for his relievers to properly warm up—far enough in advance, in fact, that he hasn’t already blown his desired reliever in a less optimal setting earlier in the game, or even earlier in a series.

Let’s dissect some of these in more detail, as we aim to get inside the heads of the managers whose decisions we were so quick to criticize one week ago:

What is the situation?
The most straight-forward input for the manager is what the game situation is. Down 8-1 in the fifth is incomparably different than up 4-3 in the seventh. We all know the difference between a long man and a setup guy; their usage is largely dictated by the situation at hand when they come in to pitch. This input parallels fairly nicely with the leverage index, especially inLI, the leverage of the situation when the reliever enters the game.

Who is due up for the opponent? Should I bring in a lefty or a righty?
These two components go together, as one helps to inform the other. Being able to select the reliever that comes in allows the manager to use the platoon advantage, so identifying the batters a reliever is likely to face is important. Unfortunately, leverage index does not know the difference between Miguel Cabrera or Mark Ellis coming to the plate as the go-ahead run, an important component of the decision.

Not all high leverage scenarios are created equally. A manager might opt to use a lesser reliever in a high-leverage situation in the seventh inning, knowing that he might need one of his better relievers later in the game to face the opponents’ best hitter (or prioritizing the answer to one of the other questions on the list above).

When was the last time _____ pitched? Has he pitched on consecutive days? How many pitches did he throw those days?
This is one of the more complex components of the decision-making. Hell, there’s even an entire twitter account is dedicated to how many days in a row a closer has thrown. A manager needs to keep in mind how many consecutive days his relievers have thrown, the number of pitches they threw on those days, etc. Beyond that, there’s even more nuanced fatigue factors (how many times did he warm up in those games; how many times has he pitched in the month, in the season); plus, the manager needs to keep tabs on more subjective things, like how his relievers feel on a given day, or if the player who told him he was ready to go habitually lies about/overstates his freshness level.

It’s difficult to attempt to automate this kind of decision making; it has much more to do with feel and knowing your pitchers than just about anything else. Since we’re not actually in the clubhouse or manager’s office for every conversation we will probably never be able to accurately model this part of the decision-making process. We get glimpses of it here and there when a manager mentions that someone wasn’t available for some reason, but that’s about it.

What does the upcoming schedule look like?
This is another straightforward component. If a team has an off-day following a game, the manager might be more inclined to use a guy that he otherwise would consider saving. A manager might also consider saving one of his relievers if he knows that he might need them in an upcoming series; he might consider the next day’s starting pitcher. This might not be optimal for winning the current game, but we’d be remiss to ignore it as a factor.

Performance
This factor is without a doubt the elephant in the room when it comes to selecting the optimal reliever from the bullpen. After all, this was a primary crux of the argument I made in my last article. In using RE24 I hoped to quickly and easily approximate reliever performance. I mentioned the caveats to this approach last week, but they’re worth restating. Last week we looked at full season RE24 in retrospect, something managers don’t have the ability to do. We also largely ignored how they accumulated that RE24, so there were discrepancies in innings pitched, situations pitched in, and opponents faced that we didn’t really get into with any detail.

That raises the question: How might a manager use recent performance as an indicator? A good manager is constantly reassessing his bullpen, often in granular, dynamic ways that he wouldn’t use in setting his lineup or starting rotation. It’s not uncommon for a reliever to have their role on the staff change multiple times over the course of the season; a clear indication that managers are always evaluating the performance of their relievers. Think of this as the Ken Giles or Zach Britton Effect. At the start of the season neither guy was seen as an elite reliever: Giles started the year in Double-A, and Britton was the Orioles’ long man, throwing two or more innings in five of his first six outings. Each ended the season as one of the best high-leverage pitchers in baseball. If their managers didn’t make necessary adjustments—even on the basis of arguably small samples—these two would have been toiling away in low-leverage situations through the summer, assuming they earned MLB roles at all.

Perplexed, I posed a simple question to a colleague, Bryan Cole. I wanted to know how realistic it is for a manager to use recent performance to ‘predict’ a reliever’s next performance. Bryan built out a series of scatter plots that quickly illustrate how difficult it would be to say with confidence that recent performance was especially significant. We selected three relievers (one elite, one middle-of-the-road, and one poor) to quickly take a look at how recent performance predicts the results of a pitcher’s next outing.

We plotted the average RE24 posted by each reliever over his previous 15 outings (a completely arbitrary number, but one that seems reasonable for what a manager might consider both “recent” and “substantial”) against their RE24 in their next outing. Here are the scatter plots for those pitchers, with a description to follow.

Andrew Miller

Jeurys Familia

Justin Grimm

Every manager or team in baseball is going to have some form of short-term statistical measure that could help inform reliever selections. That said, RE24 is an ideal stat for us to use as a proxy for performance. Because it is calculated using run expectancy for the base states encountered by the pitcher, it gives a rough approximation of how a pitcher performed in a given outing. It can almost be considered a statistical account of what happened in those last 15 outings.

Unfortunately for MLB managers, the plots above show that there is little to no correlation between a pitcher’s performance in his more recent 15 games and his next outing. It’s irrational for us to think that a manager should be able to quickly know with any precision when a reliever should move up or down the leverage pole—at least statistically. To a large degree, this becomes a test of a manager’s scouting acumen, another factor we’ll never be able to model.

All of this brings us back to the post from last week where I focused on the critical thinking and decision-making abilities of Fredi Gonzalez and Robin Ventura. I lauded Robin Ventura for his ability to recognize the limits of his veteran pitchers and for using less experienced pitchers in key roles. I also criticized Fredi Gonzalez for his abhorrent bullpen usage. But really, honestly, bluntly, we can’t be so confident that the numbers are telling us any perfect truth. The data suggests that Ventura recognized his highly paid veteran relievers were faltering, and that he smartly recognized the potential of his young, talented arms to take over that high-leverage work. And it suggests that Fredi Gonzalez erred by—or at least paid the price for—using David Carpenter in higher-leverage situations than Anthony Varvaro to the bitter end.

But given what we’ve looked at here, it doesn’t seem as realistic as it did last week that Ventura would be able to accurately make those types of decisions over the course of a season. It might not even be realistic for us to expect Gonzalez to have reacted to Varvaro’s performance by moving him into a more prominent role in the bullpen. Reliever volatility year-to-year is a huge issue that teams must deal with when trading for or signing bullpen arms. If it’s unrealistic for teams to be able to accurately project performance of a reliever after a full year of performance, then why should we believe that a manager is capable of making those same judgments after just 15 outings?

In many ways Ventura did an excellent job managing his bullpen while Fredi Gonzalez, well, not so much. As to designing a model to truly assess—or, even better, predict which moves to make? That’s the challenge. One day we might have a sophisticated model that can quickly and easily identify the optimal reliever to bring in for any given situation in real time. Until then, it’s important we understand and embrace the nuance of these decisions, and keep our knives from getting too sharp.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

You need to be logged in to comment. Login or Subscribe

ClownHypothesis

11/14

Possibly silly suggestion: given that RE24 is a counting stat, if there's any substantial variance in the length of the pitcher's outings, wouldn't that distort the results of the model? Wouldn't it be better to use the last 15 IP (or 20 IP, etc.) if it's a linear model, so that you're using something closer to a rate stat? Not that I think this is going to change the findings at all, the specification just struck me as odd.

Reply to ClownHypothesis

walrus0909

It's a fair suggestion, but (at least on those three relievers) it doesn't change the findings at all.

Reply to walrus0909

Unsurprising. Thanks for checking.

ErikBFlom

11/16

Because it is so individualized across relievers and managers, what are the prospects for meaningful data sets that compare managers? After all, managers are managing different relief staffs. Manager movement is not so rapid and reliever turnover can be pretty rapid. Getting meaningful overlaps for calibration would seem to be a problem.

So how does this analysis apply for comparing managers, as opposed to finding optimal use strategies?

Reply to ErikBFlom

BSLJeffLong

11/17

I think the closest we might get to that is my post from last week (http://www.baseballprospectus.com/article.php?articleid=25011) though admittedly this week's column shows the difficulty in that as well. I think that if you replicated my research from last week but did it for multiple years for the same managers, you'd get a decent picture of their "Bullpen Management True Talent Level" or something like that.

I also have a few ideas around quantifying this, but given what we discussed above, I'm not sold on that being such a great idea (unless year-to-year volatility in 'pen management is largely nonexistent).

Reply to BSLJeffLong

Pitching Backward: Designing A Bullpen Usage Critique

Thank you for reading

Latest Articles

Fantasy Starting Pitching Planner ’24: Week Four $

speX ’24: Week Three $

Box Score Banter: Experiments in Takeout Slides B

The Atlanta Braves Can Lose, They Just Don’t $

Whitey Herzog Did It His Way B

Jeff Long

More about:

Latest Articles

Fantasy Starting Pitching Planner ’24: Week Four $

speX ’24: Week Three $

Box Score Banter: Experiments in Takeout Slides B

Thank you for reading

Related Articles

Latest Articles

More about:

Latest Articles

Related Articles