keyboard_arrow_uptop
Submit chat questions for Craig Goldstein (Wed Jul 8 at 1:00 pm EDT)

*sigh*

Fine.

Zach Britton is a problem. Not the actual human, Zach Britton. He’s emerged as one of the finest relievers in the game over the past few years and his 2016 has been nothing short of marvelous. On that, everyone agrees.

The problem is that Mr. Britton has been so good pitching in the ninth inning for the Orioles that he’s been mentioned as a potential AL Cy Young candidate (or perhaps even the MVP). And mentioned as a person who should not win the AL Cy Young. He’ll likely finish the year with 70-something innings to his credit, and in a field of a lot of very-good-but-not-amazing seasons from AL starting pitchers, maybe he can split the field and emerge victorious? But can a reliever ever really be as valuable as a good 200-inning starter?

MLB teams seem to be tilting a bit more toward recognizing elite reliever value. We’ve recently seen teams parting with large prospect-laden packages to land Aroldis Chapman (twice), Craig Kimbrel (twice), Andrew Miller, and Ken Giles. Clearly, MLB teams believe that elite relievers are worth a high price. But the Sabermetric orthodoxy has consistently pooh-poohed this sort of trading for relievers. Too inconsistent! Too small a sample size! Eventually, it comes down to “Look at those WAR values!”

What if Marvin Gaye was right? What if WAR is not the answer?

In some sense, the problem here isn’t with WAR itself, but that WAR is doing exactly what it says on the label and Britton’s case is showing a crack in its logic. WAR was created to do two things. It sought to put all players, both pitchers and hitters, on a similar scale, and it sought to do so in a way that was as context-neutral as possible. We didn’t want to give players credit for things that were beyond their direct control. RBIs tell me who you hit behind. Pitcher wins tell me about how good your team’s offense was. Saves tell me when your manager told you to pitch. So, we stripped out all that context and created a stat that exists in a de-realized universe where everyone plays on something like a “league-average” team. It’s great for creating a common baseline, but does it work all the way around?

The fundamental question driving WAR has always been “If Smith had simply disappeared before the season, how much value would he have left behind?”

For hitters and starting pitchers, the mechanics of how this is done statistically closely mirror the mechanics of what actually happens in real life. If the Orioles’ right fielder (Mark Trumbo) goes down with an injury, the Orioles would plug in some bench or Triple-A or waiver wire guy into the lineup (probably Joey Rickard in the Orioles case, but WAR assumes a composite of all “replacement” right fielders as a common baseline for everyone.) For a starting pitcher, if one of them pulls up lame, a replacement will be summoned from the minors or the waiver wire to take his place. There’s a nice, clean one-to-one transaction. It allows for the fact that the team would still get something out of that roster spot, but not as much as if the regular guy were there.

It’s not that easy for relievers. If Zach Britton were to go on the disabled list, some replacement level player would take his roster spot, but how would the Orioles actually replace Zach Britton’s innings? They could ask Brad Brach to close, but who takes over Brach’s innings? Maybe it’s Brach pitching two innings? Maybe it’s some combination of pitchers who absorbs the inning? Relievers are different.

The other thing about hitters and starting pitchers is that their ability to get themselves into impactful situations within a game is not directly related to their talent. Hitters come to bat when their time comes in the order. Starters pitch when the calendar says so, and even at that, every game starts at the same point, with a tie game in the top of the first. Again, relievers are a little different story. Managers get to pick when relievers enter the game, and not surprisingly, they tend to save the good ones for the really juicy situations. Imagine if Mike Scioscia could say “Our backup catcher is due up right now, but it’s second and third with two outs, and I really need those two runs, so I’m going to bring Mike Trout up to hit right now, even though it’s not his turn.” It would make total sense to do, if he was allowed to do it. But alas, the key situation is staffed by whoever happens to be next in line.

We know that relievers don’t just pitch at random times. Managers set up their bullpens so that the good pitchers pitch in certain contexts (i.e., when the game is on the line, something known as a “high leverage” situation.) Remember though that WAR specifically seeks to strip context out of everything. The easiest way to explain this is to look at pitching league leaders first sorted by WAR, then sorted by Win Probability Added (WPA). The WAR leaderboard is dominated by starters. WPA tells a different story, with a healthy mix of relievers (mostly closers) and starters. The reason is that WPA specifically includes the context of the situation, and context makes all the difference.

Suppose that the visiting team has staked its starter to a 1-0 lead as he takes the hill for the bottom of the first, and he retires the side 1-2-3. Now, suppose that the visiting team’s closer is brought into a one-run game in the ninth inning, and again sets down the side 1-2-3. Which one of them did more to help his team win? WAR sees the two innings as equivalent. An out is an out is an out. You could make the case that a team needs to record 27, and deep down, they’re all worth the same.

WPA sees that while the perfect first inning from the starter did bring his team somewhat closer to winning the game, there was still a lot of baseball to be played after that point. The perfect ninth from the closer (insofar as we can give all the credit to the pitcher) meant the difference between winning and losing that day for his team. (And that is… the entire point). The fact that everything is on the line in that inning… that’s leverage right there. WPA cares about leverage. WAR doesn’t.

I would frame it differently. The fact that the starting pitcher was placed into that first inning situation is simply a function of what day of the week it is. It doesn’t reflect at all on how good or bad he is. It was just his turn. For the closer, the fact that he was placed into the ninth inning to defend a one-run lead is a direct statement on how good a pitcher he is. (Or at the very least, how good a pitcher his manager thinks he is, compared to the other six guys out there. We’ll talk about that.) Since we can assume that leverage and quality are actually related for closers (though not as much for starters), it seems silly to yoke them both to the same value measure that ignores the context in which they are pitching.

I don’t think that the solution to the problem is to simply go with WPA as our indicator of quality either, but I think that we can do better and in a way that more accurately values how relievers are used and how that affects a game and a team. And we can use that knowledge to build a better replacement level (and a better WAR) for relievers, one that fulfills the mission of modeling how much a reliever—by virtue of his own performance, and not the fickle winds of the circumstances he just happens to find himself in—has been worth.

And maybe we’ll see if Zach Britton actually deserves a Cy Young vote.

Warning! Gory Mathematical Details Ahead!

Let’s return to the case of Mr. Britton and ask the question of what would happen if Britton were to simply vanish. What would happen to those innings that Britton was supposed to pitch?

Britton has two jobs

Let’s first start by acknowledging something about reliever usage. Yes, Zach Britton is the closer for the Orioles, meaning that he will pitch mostly in the ninth inning with the Orioles up by one, two, or three runs. That’s what his job description says, and yet… on April 15th of this year, Britton entered a game against Texas where he “closed” out an 11-5 win. On May 26th against Houston, he pitched the eighth inning of a 4-2 loss. On August 15th, again against the Astros, he pitched the ninth inning of a 15-8 loss. These aren’t exactly high-leverage situations.

Britton, like most relievers, actually has two jobs. He has those high-leverage gigs that everyone oohs and ahhs over, but like any reliever, he also gets called on to sponge a few innings. There are always games which just aren’t that interesting, but where the manager needs to put someone out there to throw the pitches. Those assignments might go to the designated sponge guy, but they might go to guys who need to get some work in, or in the event that it’s been a bullpen blowout day, just happen to be available. Sometimes that’s Britton. It will surprise no one to hear that for the three appearances mentioned above, they all came after Britton hadn’t been used in a few days. He just needed some work.

This is important because the way that the best-known versions of WAR tend to solve for the leverage problem by adding a leverage adjustment to a reliever’s WAR. The leverage adjustment is generally, the halfway point between 1 (which represents perfectly average leverage) and the average leverage that the reliever faced as he entered each of his games. That’s a good start, but it also means that for every game in which Britton entered and there was almost no leverage to speak of (and he has had a dozen or so games in 2016 in which the leverage was below .50), it mathematically neutralizes when he came into the game with an actual save on the line.

If Britton pitches in a situation with a leverage of 2.0 (which the Orioles care very much about) and a situation with negligible leverage (which the Orioles don’t), the two average to 1.0. Britton effectively gets credit for just pitching in two “average” situations, when in fact that’s not really what happened. One of those games was very much on the line and his performance might have meant the difference between winning and losing. The other game was mostly a foregone conclusion.

On the days when Britton pitched a garbage inning, if he had not been available, then his replacement would have been some other “need to pitch” guy or the guy who was going to get swapped out to Triple-A tomorrow anyway. Or a position player! Essentially, his replacement would have been an actual replacement level reliever. It seems that for this particular part of his job, the correct baseline to compare him to would be replacement level.

***

Defining garbage

It’s hard to say exactly where garbage time begins and ends. We know where the ends of the distribution are, but it’s hard to pick out where in the middle to draw a dividing line. For a closer, it might be easy to say “not a save situation,” although if we are to have any hope of expanding this conversation to all relievers, we’ll need more than that. Let’s see if we can come to a slightly more empirical definition. I looked at where all games from 2011-2015 stood in terms of their score, relative to the pitching team (i.e., the pitching team is winning by 1 or losing by 3) at the start of the ninth inning (both top and bottom). I also looked at who was pitching in those situations, and the average end-of-season DRA for those pitchers. Here’s what that looks like in graphical form.

Not surprisingly, we see that managers have their best pitchers in with a 1, 2, or 3 run lead. But as we reach the outer edges, where does it stop making a difference? I compared (using a simple t-test) the distribution of pitchers who pitched with a two-run deficit and those who pitched with a three-run deficit and found that there was a significant difference between the two. This suggests that managers send out significantly better pitchers to handle a two-run deficit (4.16 DRA on average) than a three-run deficit (4.34) in the ninth inning. I moved the test to look at a three- versus four-run deficit and still found a difference. It wasn’t until I got to four versus five that the finding disappeared. It seems that after we get to a four-run deficit, the manager doesn’t really much care anymore. I did the same to look at leads and found that managers top out at a six-run lead and that they will send out roughly the same pitcher as they would for a seven-run lead.

This actually makes sense, as mathematically, managers should devote more resources to protecting leads than they should chasing deficits. The two most important runs in a baseball game are the run that tie the game and the run that unties the game, and when a team has the lead and is pitching, they are by definition in danger of giving up both.

We could do the same calculations with the eighth inning or even with different configurations of innings and outs and runners, but for now, we’ll simply say that any appearance for any reliever in the ninth inning with either a four runs or greater deficit or a six runs or greater lead is a garbage time appearance. (Of note, because someone will ask, I looked to see whether closers pitched better or worse in garbage time—as defined above—than they did in save situations. The answer was neither. No differences appeared between the two situations.)

In terms of WAR calculations, it’s not entirely clear what to do with these garbage appearances. One possibility is to throw them out, since frankly, teams don’t much care what happens during these innings, but then we don’t throw out batting stats from blowouts either, or pitching stats when a team stakes a starter to an eight-run lead. But the reality is that these “we just need a warm body” assignments don’t move the needle that much. The other thing that we want to guard against is Britton (or any other reliever) being criminally under-used by circumstance. For example, if he plays for a team that gets completely annihilated all the time and spends a lot of time pitching garbage time because that’s all there is. Or perhaps his manager just doesn’t have any idea of the precious jewel that he has in his bullpen.

Since Britton is the closer, he’ll probably spend less of his time pitching in garbage time than a lesser reliever. I used a logistic regression using data from 2011-2015 to see if situations which could be classified as “garbage time” were more likely to be staffed with higher DRA pitchers. The answer was yes, and through the resulting equation, we can normalize how often a guy like Britton should have been pitching in garbage time, based on his overall quality.

(The equation, for the curious was .343 * DRA – 3.642. For the initiated, this equation will have the form of a natural log of the odds ratio. Remember to convert it to probability. This will give you the answer to “Given his overall quality, we’d expect X percent of Britton’s relief appearances to be in garbage time.”)

We can then take that percentage of his appearances and compare his output to a replacement level baseline, but also weight them commensurate to the leverage value that such a situations usually carries. Which is basically zero.

***

Does Zach Britton have a super power?

Now on to the appearances that we actually care about. What would happen if Zach Britton weren’t available, but there was a save situation? Buck Showalter would likely know in advance of the game that Britton couldn’t go, so he would probably ask someone like Brad Brach to be ready for the ninth inning. Of course, that could end up leaving the eighth unstaffed, so he’d have to shuffle some other guys around. Suddenly, each inning looks a little bit worse, but you have to go with what you have available.

Usually, we just assume that Brad Brach would pitch as well in the ninth inning as he does in the eighth inning. And here, we need to talk about “closer mentality.” Generally, Sabermetric orthodoxy has held that discussions of “closer mentality” is all a bunch of pseudo-psychological hooey, although I’m not sure that we really have the evidence to back that up. I think we’re just used to calling anything that involves feelings “pseudo-psychological hooey.” (As a real psychological, I beg to differ.)

Closing a game is a big deal. We can mathematically prove that there are situations in innings other than the ninth that aren’t technically save situations but still have higher leverage values. But the ninth is a little different, even with a three-run lead because if you screw it up, your team loses. In the eighth inning, even trying to protect a one-run lead against the middle of the order, there’s always that fallback that even if you blow it, your team still has the ninth inning to bail you out. It’s an added burden to go out there into the ninth inning with all of that on your shoulders.

It’s entirely possible that there are degrees of “closer mentality.” Some guys might only feel a slight tingle of anxiety. Some might want to run off the mound and demand their teddy bear. It’s entirely possible that the “closer mentality” exists in the way that is generally discussed, but that 90 percent of all relievers have it, so it doesn’t make much difference. Then again, what if only 10 percent of relievers have it. It’s possible that a team’s closer is the only one on the staff who can bear the pressure. Maybe even he can’t.

Now this has real implications for our discussion of Britton’s value. If Britton’s performance in the closer role suffers from anxiety (yikes, that’s him at half speed?) it’s already largely baked into his performance. But do we see changes in reliever performance when someone other than the closer is thrust into the role? Maybe the regular closer was on the DL or had pitched a couple of days in a row. It happens once in a while.

Again, I used data from 2011-2015. I removed all of the pitchers who led their teams in saves over the course of the year. I coded each situation that remained as either being a save situation or not. I only included pitchers who faced at least 250 PA over the course of the season and who were pitching in relief that day. This gives us a sample of regular relievers and we can take a look at whether that yes/no variable of whether it was a save situation makes a difference in their performance. As I commonly do, I used the log-odds ratio method to control for what we generally expect from the pitcher we’re studying and the batters that he is facing. We expect the eighth inning guy to be a little bit worse than the regular closer (otherwise, he’d be the closer), but this method allows us to control for that going in. Is there an extra penalty that he pays because this is a save situation.

The answer is… yes. I found that non-closers pitching in save situations give up marginally more singles (p = .092), but significantly more home runs, and at the expense of outs on balls in play. Strikeouts and walks were unaffected, but when batters made contact, it was more likely to go for a hit. This isn’t failsafe proof of closer mentality, but it does suggest that maybe there’s something to look further into. Maybe it will vary from pitcher to pitcher. Maybe there are eighth inning guys who can handle the pressure. But this tells us that when we’re talking about “relievers in general”, it’s not correct to expect that if the eighth inning guy is pressed into closing detail that he would perform just as well as he does in the eighth.

Still, a somewhat hobbled eighth-inning guy is (hopefully) going to be better than a replacement-level reliever. But when we set our replacement level for a closer, we need to factor that in.

***

What actually happens when the closer isn’t available?

We’ve grown so accustomed to the idea that relievers now have certain innings that are theirs, and that these inning assignments are part of a hierarchy, that we assume that in the event of closer unavailability, the eighth-inning guy will move up a notch, and the seventh-inning guy will cover him. Maybe not. There’s nothing that requires a manager to do that. He could have the eighth-inning guy go two innings. He could play matchups in the eighth, but he might also lose the ability to play matchups in the seventh like he wants to, because he needs his LOOGY to step into setup detail. Maybe he develops a slower hook, figuring that his tired starter for an extra inning might be a better idea than some of the relievers that he has out there.

I looked for evidence of some of these strategies, again using data from 2011-2015. I compared games in which a team eventually faced a ninth-inning save situation (so only games where a save eventually became available were included), and looked to see whether the team’s leading save-getter for that year pitched or not. If he didn’t, it’s a pretty good indicator that he wasn’t available that day. Looking at how those games unfolded based on whether the closer was available (as obviated by the fact that he did pitch) or by the fact that he wasn’t shows reveals some interesting differences.

For one, we see that in games where the closer wasn’t available, the starter actually does go a little longer. Again, these are all games where a ninth-inning save situation eventually developed. In some cases, the team’s closer pitched, in some he did not, which we are taking as a proxy for whether he was available. Here are the averages. All of these differences are statistically significant.

Closer was available

Closer was not available

Starter batters faced

25.2

26.1

Starter final pitch count

94.8

97.1

Starter number of outs recorded

18.5

19.8

Team total pitchers used

4.17

3.66

Percentage of PA where pitching team had a platoon advantage (innings 7-9)

51.4%

52.4%

This suggests that if the closer isn’t available, managers actually manage a bit differently. The marginal inning (innings?) that the closer covers would have been replaced by the manager trying to squeeze his resources a little harder. He rides his starter, on average, for about an out more. He also uses fewer pitchers, suggesting that if he has three “good” relievers including his closer, he is more likely to use the two who are available a bit longer, rather than go with a three-man usage pattern and having the “next best” guy fill in the hole. We also see some (slight) evidence that he tries to compensate for the absence of his closer by trying to gain the platoon advantage a little more.

While a center fielder who is unavailable is simply replaced by a replacement-level fourth outfielder, the marginal ninth inning that a closer pitches in a tight game, if he were gone, is actually covered by a combination of a starter asked to go one batter further, along with the good relievers also being asked to go a little further than they usually do.

Here’s the average DRA—again in games where a save opportunity was eventually available—for the pitcher who was on the hill at the start of an inning, again, split by whether the closer was “available” or not:

Closer was available

Closer was not available

7th inning

3.94

4.14

8th inning

3.51

3.93

9th inning

3.27

3.74

So we see that in terms of DRA, a team is essentially giving up about .4 runs worth of DRA in the late innings when it doesn’t have its closer available in terms of available talent. Given what we saw above with the “setup man pretending to be the closer” penalty, it would probably be even more than that. Add to that, the expected value that comes from asking a starter to go one more batter, and you’ve got a mixture that approximates what a realistic replacement level would look like for replacement level for a save outing.

But I think it’s worth noting that even this stew is probably a higher bar than “replacement level.” Managers don’t throw the replacement level guy in when it’s a close game. They throw the better relievers. Even a somewhat “hobbled” eighth-inning guy or one of the “good relievers” is going to be better than the truly dreadful ones that hang out at the end of the bullpen bench. That means that if we’re setting our replacement level correctly for what really happens, a closer actually accrues less value than he would against the general replacement baseline.

It’s so much easier when it’s the center fielder, and he’s just replaced in the lineup by the fourth outfielder, isn’t it?

***

The Tony Cingrani/Dellin Betances problem

Now, how to handle the leverage issue in a way that accounts for the very real relationship between quality of the pitcher and his ability to impact a game through the situations into which he is selectively placed. This has its own problem. While we’ve been cooing over Zach Britton, we need to deal with Tony Cingrani. Cingrani isn’t having a bad season on the surface. He’s picked up more than a dozen saves and has posted a 3.48 ERA, but has done so with a BB/9 rate over 5 and a K/9 rate south of 7. He ranked 500th (exactly!) in DRA in all of baseball as of the end of Sunday’s games. But he’s also the “closer” in a Reds bullpen that has been… what’s a nice word for “awful”?

When the Reds have a high-leverage situation, they turn to Tony Cingrani. On the other hand, until the trades of Aroldis Chapman and Andrew Miller, Dellin Betances was stuck in the seventh inning with the Yankees. We want to create a measure that does not reward (or punish) our reliever for who his teammates are. All that stuff that I said about the quality of the reliever determining when he will pitch? Well, that should really be “the quality of the reliever relative to the other guys in the bullpen.”

In a perfectly average world, the best 30 relievers in baseball would be closers. (Yes, please spare me the rhapsodic comments about how they should all be high-leverage firemen. When that actually starts happening, we’ll talk.) The next best 30 would have the “eighth-inning role.” The fact that Tony Cingrani is a closer and is put into high-leverage situations and that Dellin Betances was often introduced while everyone was singing that song about peanuts and crackerjack is a sure sign that the world is not an average place. Or a fair one.

If we accept that relievers should get some credit for doing their work in higher-leverage circumstances, but we also want to avoid giving them credit for who their teammates are, we need a workaround. So, let’s run a regression. Here’s a graph from 2015 showing the average leverage at entry for a reliever in a non-garbage situation, compared to his DRA.

The fit isn’t super-tight. In fact, when I used data from 2011-2015 to create a regression equation, the correlation was a modest -0.38. That’s not horrible, mind you, but not amazing. Frankly, I thought it would be stronger. Still, we can at least use the equation, which was -0.062 * DRA + 1.228. For every one full run of DRA you are better than someone else, the average leverage upon entering the game that we expect you to be placed in goes up about six-hundredths of a point. That’s… not much. In fact, it suggests that most pitchers will generally face roughly the same leverage, on average.

That sounds crazy on the surface, but I think it’s a real finding. I think that the culprit here is the age-old complaint that managers don’t necessarily structure bullpen roles to be in line with our understanding of leverage. For example, the “closer” role has long been defined according to the save rule which uses a lead of three as "save worthy." A lead of one run, though, is very different, from a leverage perspective, than a lead of three runs. In fact, a three-run save situation actually registers as a below-average leverage situation, while a one-run save is a well-above-average situation. (Much the same can be said about eighth-inning situations.) The closer might be the best pitcher on the team, but the imprecision of the closer role means that his average leverage will depend heavily on whether his team delivers a lot of one-run or three-run leads for him to protect.

If we assume that whether the closer is asked to get a one-run or a three-run save is largely random (and certainly not in his control), then we shouldn’t be surprised that our prediction for what sort of average leverage he would expect on an “average” team is actually pretty mid-range.

***

Some back of the envelope math

So, let’s see if we can get a handle on Zach Britton’s case for the Cy Young, given this new framework. A lot of this methodology should be considered as a rough draft, and it’s certainly not ready for primetime, but it can show us the basic contours of the case.

As of the time I write this (after the conclusions of Monday’s games), according to Baseball Reference, Mr. Britton’s body of work has been 20 runs better than average and 26 runs better than replacement. He currently has a DRA of 2.17. (We all believe he’s good, just not sub-1 ERA good…)

First, we figure out the amount of time we figure that he will spend in garbage time situations, situations that would normally be staffed by what we now consider a “replacement level” pitcher if Britton did not fill them. Earlier, we generated the equation 0.343 * DRA – 3.642. That gives us a logged-odds ratio of -2.897, which we can convert to a probability of 5.2 percent. We assume that 5.2 percent of Mr. Britton’s work was done in garbage time. So, 1.36 of those 26 better-than-replacement runs that Mr. Britton has gathered were gathered in garbage time. And because they would have been mostly in trivial leverage situations, we will weight them accordingly. Which is to say we will largely ignore them.

The rest of Mr. Britton’s work has been in situations that aren’t garbage. We saw earlier that his replacements in these situations, had he disappeared, would have been partly a tired starter (we’ll have to assume he’s league average), and a couple of the better other relievers in the bullpen, pitching a little past what they’re used to. We also know that it’s likely that the day’s fill-in closer might suffer some sort of penalty for pretending to be a closer. He might get the job done anyway, but we can’t just assume that he’ll be good old Brad Brach. Now, these relievers are better than the average pitcher in baseball to begin with. In fact, we saw above that in games where a save situation eventually becomes available (generally these are close games), the types of relievers that staff even the seventh inning tend to have DRA’s below 4.00. The league-average DRA tends to be around 4.6. Even if we account for extra fatigue and the faux closer penalty, it’s likely that the worst that we can say about the people who will be stepping in to fill Britton’s plate appearances (and the statement that would create the greatest amount of value for Mr. Britton) is that they will be around league average.

Mr. Britton is rated as 20 runs above average (and in the situations that we’re talking about, he would have likely been replaced by relievers who were essentially league average). We dock 5.2 percent of those (just call it 19) as medals that he picked up in meaningless garbage time. We also have a way to estimate how much leverage he would have faced on an average team, -0.062 * DRA + 1.228. Plugging in Mr. Britton’s DRA, we get 1.09. Mr. Britton’s 19 runs above average times an expected average leverage of 1.09 yields that he is 20.8 runs better than the people who would have replaced him.

Zach Britton, as good as he has been, when you really look to see who would have been replacing him and you strip away some of the circumstances that he fell into, has been about two wins or so better than the people whom the evidence suggests would replace him.

Two.

Fiddle with some of my assumptions all you want, but you’re not going to get him anywhere near the 5-6 WAR players atop the current leaderboard among AL pitchers. Zach Britton is having a really good season, for a closer. And it’s not to say that his season has been worthless, but just keep in perspective how much worth a closer can actually bring to a team.

On Context and Award Voting…

There are a lot of nuggets in here. We entered with the idea that maybe WAR wasn’t giving closers enough credit for the high=leverage situations that they face, but found that relievers spend part of their time pitching in situations where there is no leverage, and while closers do pitch in high0leverage situations (sometimes), the people who replace them when they aren’t available are actually more likely to be closer-league0average, rather than the replacement-level chaps that WAR assumes. When we model reality a little better, we find that replacement level isn’t a great baseline for relievers.

On top of that, while it is true that there are closers who constantly pitch in some very hair-raising situations (and make their way out of it), the situations that they are most commonly put into (leads of one-to-three runs) are actually rather different from each other. Protecting a one-run lead really is a big deal, and the leverage index shows it. Protecting a three-run lead is actually less important—leverage-wise—than an average situation in a game. So, even among the non-garbage time situations that our closer participates in, there are some that are important and some that are not so important. While those one-run nail-biters have the most emotional salience, the job of “closer” isn’t (in the aggregate) that insanely pressure-packed.

We also have to face up to the idea that closers have no control over what sort of save situations they will be handed. If they just happen to land on a team who gives them a lot of one-run saves to work through, then they will come away with a fantastic WPA for their efforts (if successful). If they get a bunch of three-run saves, they could put up the exact same stat line in those outings, but come away with much less in the WPA column. We shouldn’t credit or penalize a guy for what his team does around him, and suddenly WPA seems a bit unfair as a measure of a pitcher’s value.

Here we face down a philosophical question. We do know that closers get the role by being good at their jobs as relievers. But we also know that modern usage dictates that they will be dropped into the ninth inning, whether it’s a one-run or three-run lead. We should give them some credit for being good enough to land the closer’s gig, but not for what happens once they secure it. I’d argue that my method accounts for this duality, but in doing so, acknowledges that the closer having a fantastic WPA year might be doing so because of good fortune.

There will always be a question of whether players should get credit for the context in which they did their heroic deeds, particularly when they had no hand in creating the context. For those who prefer to rate players based only on what they control, the evidence suggests that even those shiny WPA numbers can be a mirage.

Ignoring WAR and making a Cy Young case for Zach Britton on the basis of WPA or leverage doesn’t hold water. Sure, he’s pitched great. He’s also benefitted from playing on a winning team that fed him lots of small leads. If he had done the exact same thing on a team that didn’t have a lot of leads to protect, he’d still be a good pitcher, but he wouldn’t have a lot of WPA bragging points. Imagine if he had been the same pitcher, but been condemned to live out his season as a member of a last-place team. It’s kinda like looking at RBI totals, which can vary greatly for the same hitter depending on who happens to hit in front of him. Britton has been good, but he’s been helped along by some good luck. If you’re the sort of person who wants to strip out luck, Britton just isn’t as shiny as we thought. In fact, I doubt that any closer could actually make a context-neutral case for Cy Young consideration outside of some wildly improbable scenarios.

Now, it’s important to remember that context-neutral stats like WAR are abstractions. A useful abstraction, but an abstraction. While in WARland we assume a disembodied generic baseline team for our player to play on, real teams are much more complex than that. A closer can have the same impact on a team’s chances of winning that a starting pitcher can, but the things that enable him to do that—playing on a winning team, one that just happens to like to play close games—are not things that are of his doing. So, a good chunk of a closer’s value is context-dependent. He’ll be more valuable to a good team. That does, however, bring into focus the relatively high prices that have been paid for closers lately, not surprisingly by teams who believed themselves to be contenders at the time of the trade. It’s entirely possible that Zach Britton would be actually worth a mere win upgrade to one team and three to another. That’s not a contradiction. It’s just a statement on how much the context can matter in baseball.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Kinanik
9/01
What if the manager could re-arrange his lineup to give hitters higher-leverage situations? Once or twice a game, the manager does get to move Mike Trout to the RISP AB. How, if at all, would that change the way we assess hitters? (Assuming the same total numbers of PA).
yibberat
9/01
I like the observation that definition of 'replacement level' really does differ WITHIN the bullpen itself. And so does 'leverage' itself. That perhaps 'relief pitcher' is not one position but two (or maybe three) different ones. And when you think about that, these are not just scenarios that occur within the context of a single game where it is a managerial decision about how to utilize a given roster.

I would imagine that there is some predictability as to how much of the seasonal innings load is going to be 'garbage innings', how many of those 'garbage innings' are sequential (where multi-inning relief and/or a starter going deeper than '100 pitch' - can give the rest of the bullpen a day of rest for the next game), how many 'high leverage relief' innings are sequential (eg extra innings games or even 4th time thru order by SP) vs 'short' (traditional closer role). I remember the HoF arguments around Jack Morris' 'complete game' achievements. Maybe this isn't just some old-school v saber conflict but a player who is filling two or three or four 'roster slots' with different replacement levels that game.

Right now, it appears to me that every team in MLB is constructing their pitching roster from only two points - the start of game (via SP's who are then cutoff at 100 pitches) and backwards from the 9th inning (via a single RP hierarchy with closer at the top). Where the transition zone is simply viewed as a useless residual rather than an opportunity to be taken advantage of via roster construction
dethwurm
9/01
Wow there's a lot of great stuff here!

A couple questions - maybe you're considering for future work:
1)How much is "optimal" reliever usage worth then? Like if the manager were to spread his ~60 appearances across all of the highest-leverage situations like people call for ("the fireman"), so 9th inning tied or up 1 or 2, 8th inning tied or up 1, 7th inning tied or up 1, etc., rather than "save situations"?

Apologies in advance for kind of butchering your math, but just to do a very rough estimate, if a Britton-esque closer had an average LI of 1.5 instead of the 1.1 you find per 'expectation' he'd be worth more like 3 WAR, and with average LI of 2 he'd be worth 4, which is Ryan Braun territory. (assuming similar amount of "garbage time" usage in all cases) It's a big assumption that he'd maintain this performance with such different, intense usage of course, but it looks to me like the "closer role" - i.e. exclusively usage pattern, not performance - might be costing a team with a good closer as much as 1-2 wins/year, which is actually kind of a stunning amount - way more than e.g. batting order seems to matter. Especially for a team like Baltimore that's fighting for a Wild Card spot. Is that really possible?

2)How might one go about valuing quality-for-quantity trades like the Giles swap?

What I mean is, say Velazquez and Appel were both likely to provide mediocre (somewhat below-average) starter innings, say combined 300 IP and 2 WAR. Say Giles were actually a top closer, which this analysis says is also worth 2 WAR. (Ignore contracts and the other players involved.) Is it really essentially a swap of 300 IP for 70 IP?

I guess what I'm thinking is that in this case (i.e. an offseason trade, and of course assuming no other moves) Giles would really be replacing a replacement-level reliever, rather than having his innings covered by the next-best pitchers, since at the end of the day those 70 IP will come from that roster spot, rather than increased IP for the already-present pitchers (whether they should is another question entirely). So there's sort of another "replacement level" for the reliever - a lower one - based on presence rather than usage. So valuing the reliever relative to a starter (or, god forbid, a position player - viva long benches!) would have to be conducted using a different baseline than for awards voting. So a Britton/hypothetical-Giles closer might be worth somewhat more than 2 WAR in a transactional context, even if he's only worth 2 WAR in a usage context. Does that make sense?

Though it now occurs to me that he'd really be replacing like 50 IP of replacement-level RP, plus 10 IP of the next-worse guy, and 5 IP of the next-next-worse guy, and 5 IP next-next-next-worse guy, etc...

Relievers are weird, man.
bline24
9/01
A fascinating article.

I particularly admire your quixotic mission to isolate value from context. The halo effect is rampant not just in sports (it’s probably even worse in the NFL), but in business, law, medicine, politics... basically any endeavor where people are rated or compared and success is determined at least in part by factors out of the individual’s control.
MikePemulis
9/01
When you say 'save situation' in regards to non-closers and closer mentality, do you mean that in its traditional definition or only in situations when they entered in the ninth?
benrosenberg02
9/02
This was a great piece. Very nice job.