Extra Innings: More Baseball Between the Numbers, edited by Steven Goldman, is the sequel to Baseball Prospectus’s 2006 landmark Baseball Between the Numbers, a book that gave many their first taste of state of the art sabermetric thinking in the years after Bill James and Moneyball. BP now returns with a sequel that delves into new areas of the game, such as how to evaluate managers and general managers, the true effects of performance-enhancing drugs, how prospects are recruited and developed in Latin America, and more. The book is now available for ordering and immediate shipping from Amazon and Barnes & Noble ahead of its official release date of April 3, 2012. Today, we present the second of two excerpts from the book.

"Alcohol is like love,” he said. “The first kiss is magic, the second is intimate, the third is routine. After that you take the girl’s clothes off.” —Raymond Chandler, The Long Goodbye

At first, it truly was magical. Our first kiss was Dennis Eckersley back in 1987. Eckersley, viewed as a washed-up starter in the twilight of his career, ended up with the one manager creative enough to bring his career back to life, Tony La Russa. La Russa, no doubt with an assist from pitching coach Dave Duncan, tailored a role just for Eckersley, one that we now recognize as that of the modern closer: the pitcher would appear only at the end of the game, only to protect a lead, and almost never for more than an inning at a time. Eckersley credited the move with revitalizing his career: 

”It was a hell of an idea, and I was the lucky recipient,” says The Eck. “I was 32. Starting was getting to be difficult. I couldn’t go six or seven innings, wade through all those left-handers anymore. But just pitching one inning, my fastball came back. I was throwing like I was 25 again. One inning suited me very well. I never would have lasted if I had to pitch two or three innings all the time. Plus, I would have had my head knocked off.”

The results were bewitching. Eckersley ended up with a Cy Young and MVP award in the same season, and every team in baseball decided they had to have a pitcher just like him. Like a virus, the fever spread, the limited role designed for Eckersley evolving to include other pitchers.

Now it’s routine. La Russa, not content to simply have a designated ninth-inning guy, added pitchers devoted solely to retiring left-handed hitters late in the game. Most managers have followed La Russa in form if not in creativity; managers enter a game with a set plan for how they want to use their bullpen, and some are unwilling to deviate from it. Yankees manager Joe Girardi is notable for his inflexibility. After an eighth-inning meltdown by Rafael Soriano, Girardi told the press why he was allowed to pitch long enough to squander a four-run lead: “Soriano’s our eighth-inning guy,” Girardi said. “And by no means is four runs a game in the bag, as we just saw.”  Baseball Prospectus’s Steven Goldman responded:

[Girardi’s] is a nice thought, except that a manager can’t go through life worrying about protecting four-run leads; in 2010 and 2011, when the home team carried a four-run lead into the top of the eighth, it won roughly 98 percent of the time. Girardi also argued that he had to use Soriano there because he would have been second-guessed if he hadn’t. “If a guy gets on or a couple guys get on, and I have to get Soriano up, then I’m asked the question, ‘Why didn’t you just have him to start the inning?’” This seems to suggest that only your eighth-inning guy can pitch the eighth inning, all 162 of them, because the consequences of using a non-eighth-inning guy in the eighth-inning spot are too frightening to contemplate. Someone might yell at you. Fans. Owners. Mom.

Similarly, Girardi had to use his eighth-inning guy because had he not, he might have had to use his closer: “If we get through the eighth without giving up a run, then I don’t have to get up [Mariano Rivera,] my 41-year-old closer who, I think, is quite important to us in the course of the year.” Again, by this reasoning, no lead is so safe that you don’t have to take all possible precautions to ensure that your closer does not ever have to pitch.

Yet, even had the Yankees given up a run in that eighth inning, the game wouldn’t truly have been in jeopardy, it just would have been in jeopardy according to the saves rule, which is a different matter. The manager of the Yankees does not dictate when to use Mariano Rivera, but the arbitrarily defined “save situation” does. He is powerless before it. Even had he deemed it wiser to skip Rivera that day so that he might be available for some future clash with the Red Sox, he would have had to use him.

The closer, once an invention based upon creativity, is now either an excuse or a mandate to avoid creativity in how a manager applies his relief pitchers in the search for team wins. How did we end up where we are now? And is it truly benefiting the game of baseball?

Relief pitchers were so rare in the early days of baseball that it didn’t occur to those keeping the records to track pitchers’ performance both as starters and relievers. We can, however, estimate the split of a pitcher’s playing time in each role, which gives us a useful starting point for analysis. While the historical record is unclear for individual pitchers, it is relatively simple to estimate the number of innings pitched by all starters for a season, as well as the number of relief appearances per game. Using these league-wide estimates, as well as a pitcher’s games and games started, lets us estimate how much of a pitcher’s work came from starting versus relief pitching.

For the modern era we have play-by-play data and can calculate these things precisely, but I have chosen to continue presenting the estimates so as to make it easier to identify trends without worrying if a shift is caused by changes in what happened versus changes in how the numbers are tabulated. See Figure 3-2.1 for the percentage of innings thrown by starting pitchers.

It is not until about 1908 that we see a decline, settling at 90 percent for a few years and then drifting almost inexorably downward, so that in the modern game less than 70 percent of all innings are thrown by starting pitchers.

What is fascinating about the downward slope is that it is unimpeded by almost anything that would affect pitcher usage. The second deadball era doesn’t seem to arrest the decline at all; the 1960s actually saw a slight drop in starter innings, while shaving nearly a full run per game in comparison to the 1950s. As a whole, the correlation between the percentage of innings thrown by starting pitchers and the runs scored per game is only 0.4. If we eliminate the evolving game of the late 1800s and early 1900s and start with 1920 (the first year starters threw less than 90 percent of all innings) we get a correlation of only 0.16.

The arrow of correlation is counterintuitive in that it suggests the more runs scored per game, the more innings thrown by starting pitchers. At the very least this causes us to reconsider the commonly held belief that pitchers don’t throw as deep into games because they have to face tougher lineups than pitchers of old used to. It has often been said that replacing slap-hitting shortstops with Cal Ripken types means fewer spots in the lineup to pitch around, but even replacing the pitcher with the designated hitter—in essence a second first baseman— doesn’t seem to affect the magnitude of the downward slope.

What accounts for the change in pitcher usage? We can neatly divide the outcomes of a plate appearance into two groups—balls in play, which require action by the defense, and the so-called three true outcomes of walks, strikeouts, and home runs. Figure 3-2.2 gives us a look at the rate of balls in play as a percentage of plate appearances over time.

The two graphs are strikingly similar. The BIP rate and starter IP rate have a correlation of .88. What this suggests is that pitching has gotten harder over the years because more and more of the burden has shifted to the pitcher alone, with less and less reliance on the defense. This has created an increased need for relief pitchers. It took some time, however, for this to lead to the rise of dedicated relief specialists. Figure 3-2.3 shows the percentage of relief innings thrown by pitchers who never started a game over time.

We actually see that baseball started with “dedicated” relievers, but that can be misleading—there weren’t many relief innings to go around, and so teams pressed position players into pitching on the rare occasions where a starter couldn’t finish a game. Using pitchers as relief pitchers seems to start in the 1890s, and by 1910 or so teams relied on pitchers nearly exclusively for relief pitching appearances. Teams still hadn’t moved to using pitchers whose primary job was to pitch in relief, however; the majority of relief appearances went to starting pitchers, or at the very least, swingmen who could be counted upon to work both roles. This began to change around 1936, when teams began a gradual transition toward pitchers who specialize in relief.

Once you have dedicated relief pitchers, you’re going to notice that some of them are better than others. And you’re going to try and use your better pitchers in tight games at the expense of your lesser pitchers. This is where we see the first manifestations of what we’d now call a “closer,” but which at the time were often called “firemen,” relief pitchers who are supposed to come in with the game on the line and finish it off. By the end of the 1960s, we see most (if not all) teams having a relief ace. If we define a team’s closer as the pitcher with the most saves for his team that year, we can look for historical trends in closer usage. We’ll look at two measures: how many IP a closer pitches per appearance, and how many appearances a closer makes per team game (see Figure 3-2.4).

From 1920 through to 1960, the percentage of games where a closer makes an appearance rises dramatically from 8 percent to 33 percent. After, we see a much subtler rise up to an average of 38 percent for the past decade. The frequency with which teams used their relief ace has been relatively stable since 1960 or so. But right around 1988 we see a dramatic change in how many innings a team’s relief ace pitches each appearance. Up to that point you have a pretty stable equilibrium around 1 and 1.2 innings per outing. After a five-year decline, though, you hit a new equilibrium at just over an inning pitched per game, one that’s even more stable than the old equilibrium.

We don’t just see this change among relief aces. Looking at the percentage of innings thrown by relievers with at least one, one and a half (half-innings are the result of averaging, whereas in actuality pitchers only throw in multiples of one-third), and two innings per outing over time shows us how drastically total bullpen usage has changed (see Figure 3-2.5).

We see the same late 1980s, early ’90s inflection point for the dramatic change in closer utilization. Before that point, nearly every relief pitcher threw at least an inning per outing; as of 2010 only half of all relief innings were thrown by pitchers who averaged an inning or more per outing. Pitchers who average at least an inning and a half of work per outing have gone from representing between 40 and 60 percent of innings pitched to representing less than 10 percent. True long relievers— pitchers who threw two or more innings per outing—experienced an extinction-level event akin to that which met the dinosaurs.

We can quibble a bit about the exact moment that comet struck— maybe it was 1988, maybe 1989—but it came soon after Eckersley’s first season in Oakland. In terms of impact on the game, the creation of the modern closer by La Russa seems as influential as Babe Ruth’s home run prowess ending the primacy of the bunt and stolen base.

Having identified where the change began, it falls to us to assess if the change itself has been a positive development. We can’t answer that question directly, unfortunately; baseball is a zero sum game, and if all teams change strategies, then in the end the average team doesn’t benefit at all from the shift. Still, a change in strategy of this magnitude should have one noticeable impact: by putting a team’s best pitchers in late to finish close games, we should expect all teams to be better at holding leads in such situations. After all, there is no strategy out there that has allowed managers to get their best hitters to face the other team’s closer a disproportionate amount of the time.

To find this evidence, let’s focus on situations resembling the archetypal save, with one team leading by one to three runs at the start of the ninth inning or later. (These won’t all be save situations; sometimes a pitcher other than the closer will be called upon to start the inning, but most of them will be.)

In the 1950s, a team in such a situation would win its game 90 percent of the time; in the 2000s, a team would win such a game 91 percent of the time. Assuming 44 such chances a season (the average for the past decade), that means modern teams will win an additional game every two to three seasons due to changes in relief pitcher usage. There is a slight countervailing impact from increased run scoring, but with a correlation of just –0.28 between runs per game and these win rates, such an effect shouldn’t be expected to significantly alter these conclusions. In short, baseball has contorted its roster and raised a small class of pitchers up to be multi-millionaires for a very small benefit.

In an additional bit of irony, the rise of pitchers designed to pitch in these sorts of situations has coincided with a decline in these sorts of chances. The primary driver seems to be the rise in offense, not the change in pitcher usage. There is a .81 correlation between the rate of potential ninth-inning saves and the seasonal average for runs per game. From 1950 through 2011, 29.1 percent of games resulted in a potential ninth-inning save chance, while from 1988 through 2011, only 27.9 percent did. That decline in possible save chances, at the least, provides a countervailing effect to the ability of ace relievers to come in and close a game.

The paucity of ninth-inning save chances points out another flaw in saving your best reliever for that inning. If you only have 44 ninth inning save chances a season, but your best reliever can pitch 60 or 70 innings in one-inning stints, you end up having more than a few wasted innings from your closer. For the moment, let’s define a close game as one where the fielding team leads by two or less, is tied, or trails by one run. From 1988 through 2011, at the point when the closer first enters the game, he finds himself in a close game only 59 percent of the time. Twenty-one percent of all games pitched by a team’s closer happen when the run differential is four runs or greater. This is because managers have to “find work” for their team’s supposedly most valuable reliever, and thus must resort to putting him into a game that’s essentially already decided just so he can get his innings in. (In fairness, earlier managers were little better, with 60 percent of appearances in close games and 17 percent in blowouts of four or more runs.)

Fans in the stands would be surprised to hear this, of course; if the closer wasn’t achieving something special, would he need a special entrance song? Would he send chills up our spine when he delivered his first pitch? The cold, raw numbers feel inadequate to explain how it feels to watch a dominant closer. You can hear the familiar refrain already: “Get your head out of your spreadsheets and watch a ballgame sometime.” Yet, as it turns out, spreadsheets are in fact capable of recognizing the heightened excitement that occurs when a closer enters the game. In order to capture this feeling, sabermetricians have often turned to what Dave Studeman has called “the story stat,” win probability added (WPA). I’ll let him explain:

Here’s the basic idea. An average team, at any point in a game, has a certain likelihood of winning the game. For instance, if you’re leading by two runs in the ninth inning, your chances of winning the game are much greater than if you’re leading by three runs in the first inning. With each change in the score, inning, number of outs, base situation or even pitch, there is a change in the average team’s probability of winning the game.

. . . Bottom of the ninth, score tied, runner on first, no one out. The home team has a 71% chance of winning according to the Win Expectancy Finder (in this situation, the home team won 1,878 of 2,631 games between 1979 and 1990). Let’s say the batter bunts the runner to second. Good idea, right? Well, after a successful bunt, with a runner on second and one out, the Win Probability actually decreases slightly to 70% (home team won 1300 of 1,848 games), according to the WE Finder. The bunter hasn’t really helped or hurt his team; his bunt was a neutral event.

. . . To really have fun with this system, you can take it one step further and track something [called] “Win Probability Added” (WPA). Once again, the concept is simple. Let’s say our batter in the bottom of the ninth hits a single to put runners on first and third with no outs. This increases the Win Probability from 71% to 87%, for a gain of 16%. So, in a WPA system you credit the batter +.16 and debit the pitcher/fielder –.16. If you add up every positive and negative event from the beginning to the end of a game, you wind up with a total for the winning team of .5, and a total for the losing team of –.5. And the player with the most points will have contributed the most to his team’s win.

Related to win expectancy is the concept of “leverage,” which is simply a measure of the possible change in win expectancy given the context. For our purposes, we will fix the leverage index of each event at one, so that a situation with a leverage index of two would have twice the average change in win expectancy compared to the average plate appearance.

Examining all events from 1950 through 2011, we find the average plate appearance in the ninth inning and later has a leverage index of 1.33, compared to .96 for the first eight innings. In a model based upon win expectancy and leverage index, those late-game situations are worth 37 percent more than events earlier in the game.

Contrast this to a more traditional model of how events contribute to team wins and losses—the Pythagorean theorem, which has been revised countless times but takes the basic form of

Runs Scored^2 / (Runs Scored^2+Runs Allowed^2)

where RS is runs scored and RA is runs allowed, and the result is an estimated win percentage. The Pythagorean model doesn’t care about the order of events. It doesn’t matter if a run is allowed in the first inning or the ninth; the formula treats them exactly the same.

How can we tell if the leverage model of pitcher evaluation is better than our Pythagorean model? What we can do is come up with a prediction based upon the ideas behind the leverage model, and test them at the team level. One thing we find, if we do a little digging, is that relief pitchers tend to pitch in slightly higher leverage spots than starting pitchers. The greatest concentration of leverage occurs in the ninth inning or later, with the average ninth-inning leverage from 1988 to 2011 at 1.33. Extra innings have even more leverage. (We’ll look at the reasons for this in a little bit.) In the language of leverage, what this means is that each batter faced by a pitcher in the ninth inning is more important in deciding the outcome of a ballgame than each batter faced by a starting pitcher.

If true, this suggests that we could beat the Pythagorean theorem at estimating team wins by putting a greater emphasis on a team’s pitching performance in the ninth inning. To see if this is true, we can break Pythagorean wins down into two components: a team’s expected win percentage given only the performance of its pitchers through the first eight innings, compared to its record after. We can use these two variables to predict both a team’s Pythagorean and actual win percentages. We can then compare them to see how close the two models are, and if the Pythagorean method is underweighting a team’s pitching performance in the ninth inning.

What we see instead is incredible consistency between the two models; the difference between the weight for relief pitching in the Pythagorean model and the observed wins model is only .03. In other words, there is little practical difference in the amount of emphasis on relief pitcher performance when predicting actual wins versus Pythagorean wins—the Pythagorean model is a much more realistic model of the impacts of pitching performance than the leveraged model. A .03 change means that for a team with a ninth-inning-and-later performance of half the league run average, you would expect it to win roughly one more game than predicted by the Pythagorean model per season. (Teams pitching that well occur less than one percent of the time.) In a more realistic scenario, a team that has an RA in the ninth and later that’s 75 percent of league average (teams pitching that well or better occur about 16 percent of the time) wins one more game than predicted by the Pythagorean model every two seasons.

What the win expectancy model is truly capturing is not how much a play contributes to team wins, but how well an event predicts the outcome of the game itself. There is, of course, going to be some substantial overlap between the two, as things that lead to wins also tend to be good predictors of wins. What complicates things is that at the end of the game, the music stops and everyone has to find a chair—the winning team is at one and the losing team at zero. This is what’s known as an “assuming state”; once you enter it, it’s impossible to leave. Late-game events are more predictive in terms of win expectancy due to their proximity to the end of the game.

To this end, WPA is truly the story stat. It captures very well how exciting a game is close-and-late. A blown save is tremendously upsetting emotionally, because it takes what was very nearly a sure win and turns it into a sure loss. WPA captures this change very well. But what it does not capture nearly as well is the fact that, indeed, the closer enters the game when it is already very nearly a sure win.

Consider the toughest save spot a closer would see to start the ninth inning—the pitcher comes in with three outs left in the game and a one-run lead. In order for his team to win, all he has to do is pitch one scoreless inning. The reality is most innings in MLB are scoreless; from 1988 through 2011, 72 percent of all innings had zero runs scored. Because we’re already dealing with a high probability of success, it’s difficult to improve on this rate; the average pitcher coming into a ninth-inning save chance allowed no runs only 75 percent of the time.

Emotionally, the final inning is an assuming state as well; the pitcher on the mound when a team wins or loses the game tends to bask in the reflected glory of the triumph or wallow in the agony of the defeat. However, in reality all that matters is the final score. If the starter pitches a scoreless fifth, that’s just as meaningful to deciding the outcome of a one-run game as it is if the closer pitches a scoreless ninth. Win expectancy may tell a better story than the Pythagorean analysis, but it tells us less about the relative contributions of closers versus starting pitchers to team wins and losses.

If the change in reliever usage hasn’t altered how effective teams are pitching late in games, it has changed how managers handle their tactical choices, and by doing so has affected the way we watch the game. The shift to relievers pitching fewer innings per appearance did nothing to arrest the decline of innings pitched by starting pitchers. The result of this change has been more pitching changes per game; in the 1980s there were 3.4 relief appearances per game, while in the 2000s there were 5.6 relief appearances per game. This has meant less space on the roster for position players (see Figure 3-2.6).

After a sharp jump up to the levels of the late 1890s, we see a gradual rise until the late 1980s, where again we see a dramatic increase. Teams are increasingly using more pitchers to fill their roster spots.

A manager’s chief strategic weapon is no longer the position player, but the relief pitcher. While specialization came naturally to position players, it had to be created for relievers. A manager can probably tell which player is his pinch-hitter and which is his pinch-runner just by looking at the him, but can’t tell which pitcher is which without some sort of guidance. Thus, managers have created increasingly narrow pitching roles to help them make those decisions: one pitcher is your closer, one your setup man, one your seventh-inning guy, one guy goes after tough lefties.

This increasing parade of relievers may not make it any easier to hold leads late in games, but they do in fact make the “late” in games more accurate. Looking at all seasons from 1950 through 2011, each reliever used per game adds an additional 10 minutes to the length of the game. This holds even after you control for increased run scoring (which is not a significant predictor of game length once you control for the number of relievers used). And changes in reliever usage account for over 70 percent of the variance in game length over that time period.

What this means is that, from 1950 through the present, we’ve added more than half an hour to the length of a ballgame. If this addition meant more play, it might be worth it. But for the most part, it’s an addition of seeing managers coming out of the dugout with an arm in the air, warm-up pitches from the mound, and catchers jawing with their starters to give the fresh arm in the pen a little more time to loosen up. Seeking ephemeral advantages, managers have instead colluded to add 30 minutes of tedium to our national pastime.

If history teaches us anything, it’s that nothing lasts forever. Someday, some enterprising manager will decide to eschew the staid traditions of the closer for something new—after all, this is how the notion of the closer got its start, and so it’s how it will meet its inevitable end. Just don’t expect it to happen anytime soon.  

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Good story Colin. I for one have hated the march of relief "specialists." When I first started watching baseball in the late '60s and understanding it in '70s,
relievers regularly pitched multiple innings and "closers" didn't exist. Look at
the Reds-Clay Carroll, Pete Borbon, Tom Hall...they all pitched no matter the inning or situation. Being a Pirate fan it was always very frustrating.

Is the game any better now? No, just longer.
Very nice article.
Hey Colin, this is nicely put:

"What the win expectancy model is truly capturing is not how much a play contributes to team wins, but how well an event predicts the outcome of the game itself."

A fine distinction, but one that's worth chewing on.
Thanks! That means a lot to me, coming from you.
You're welcome. I appreciate the lengthy quote, particularly coming from you. :)

Just to detract from the warm feelings a little, while I don't disagree at all that WPA overrates relievers vs. starting pitchers, I do think there's value in using WPA/LI to assess reliever usage patterns.
I am having a real hard time trying to find some old Book Blog threads about WPA/LI that I want to reread before trying to formulate a response to this. I remember Guy (I think) saying some rather insightful things but I'm fuzzy on the details.
My recollection is that Guy doesn't buy the fundamental premise of WPA--assessing impact in linear "real time." I'm pretty sure you and he see eye-to-eye in that regard.
Let's assume for the sake of this discussion that we all like WPA as an "in context" measure of batting or pitching.

Can you explain, without resorting to math, what WPA/LI is supposed to represent?
Great article.

It has always been my belief (not informed by stats, tho), that it was dangerous to constantly flip from one reliever to the next.

I'd rather stick with a relief pitcher pitching well for an additional inning than call on someone else- I already know the first pitcher has good stuff today, but don't know if the same could be said for the second (or the third or the fourth).

I'd love to see a team ditch the "rigid role" mentality and assemble a bullpen of 4 good pitchers who could all go 2 innings at a time.
Great read.
"This has meant less space on the roster for position players." Yeah, but so what?

A less-than-comprehensive, but still detailed, perusal of run producers in the 1930s, compared to in the last ten years, seems to show that in both eras, the dominant RAR/WAR production was coming from about six to ten players per team, and once you get deep into the bench, you're talking about unproductive players. More accurately, you were talking about unproductive players in the 1930s, when the bench did go 8 or 9 players deep. Only two or three of them would be productive, a number not unlike what's seen in today's game. So what's the point of "space on the roster for position players" who don't contribute much by being there?
I bet if you looked at pitching staffs, you'd see much the same thing - the majority of value comes from six or seven pitchers per team, maybe even fewer (depending on where you set the bar). A dedicated pinch runner, for instance, may seem like a rather limited use of a roster spot. But I don't know that a pitcher who can only face left-handed batters in the last three innings of a ballgame is any more useful.
"A dedicated pinch runner, for instance, may seem like a rather limited use of a roster spot. But I don't know that a pitcher who can only face left-handed batters in the last three innings of a ballgame is any more useful." One would think not. However, one would be wrong.

Analysis of the optimum use of the last few roster spots is a subject that would take a whole column or three to explore. However, insights can be gained by doing another now-and-then analysis. Specifically, consider this question: How many hitters on a 1930-1931 team contributed more value than the SECOND most valuable LOOGY (defined for these purposes as a left-handed reliever averaging less than one inning per appearance) on a 2010-2011 team? The answer may be surprising.

Looking at NL LOOGYs only (the DH complicates things in the AL, so let's keep it apples to apples) for 2010 and 2011, the best LOOGY on a team averaged about 8.2 runs above replacement, while the second if any (at least one team didn't have two LOOGYs in a season) averaged 1.6 RAR, with very large scatter in both numbers. Perusing the 1930-1931 NL stats reveals that on average, about 11.6 guys had value as batters (measured by total RAR and therefore giving credit for defense to some lousy hitters) of 2 or greater. Of those 11.6, almost invariably one, and sometimes as many as three, would be a pitcher(! -- some of those old guys could hit...), and one or two more would be "regulars" or at least frequently-used reserves who were acquired mid-season. It would appear that at any given time, roughly 8 or 9 position players would be contributing more value on average to a team back then than that second LOOGY does now. That's all. Even the starters at fielding positions didn't necessarily contribute more value than the second LOOGY does today, let alone the last guy off the bench.

I don't claim this methodology to be perfect, or even particularly good. However, it does raise a serious question as to whether the second LOOGY is the fifth wheel you seem to believe him to be, and if he's not a fifth wheel, you certainly can't blame a manager for using him.
Who are you counting as a LOOGY, and how are you defining RAR?
As I said: a LOOGY is "defined for these purposes as a left-handed reliever averaging less than one inning per appearance." The "second LOOGY" is then the LOOGY on a staff who accumulates the second most value as measured by RAR. I will be the first to agree that that isn't the best possible definition (for one thing, it discounts the ill effects of a really bad LOOGY who stays in the pen all year, pitching badly, so that a late-season call-up gets more value without being there for most of the season), but it's simple, easy to use, and not all that misleading. Literal LOOGYs, left-handed pitchers who get exactly one out per appearance, are rare. Second left-handers who sit at the end of the pen and cause short-pen advocates to grumble about them aren't. More nuanced definitions also run into a problem that I'll come back to.

For RAR I just used the definitions (and statistics). Again, one might do better, but they're easy to use for something like this. Regardless of what offensive metric one uses, the point is clear: back in the Good Old Days of long starts and short bullpens, there was a drastic dropoff in the quality of offensive players after the top six or eight per team, just as there is today. This does not recommend a short bullpen to me.

People tend to believe much about pre-WWII baseball that is not necessarily so, and the belief in the 4-man "iron horse" rotation is a good example. VERY few starting pitchers back then really approached starting in 1/4 of their team's games, and there were numerous pitchers on each staff who got occasional spot starts frequently enough for the whole thing to look more like a modern 5-man rotation than one might think. This fact complicates direct reliever-to-reliever comparisons. Coming up with a set of definitions that works well in both eras is therefore not easy.
Is it possible that the value of a usable bench or back end of the bullpen comes primarily from giving the high-value guys enough rest to keep their value high? I have little sense of how to start measuring such a value, though.
Fantastic stuff Colin. I hope a lot of MLB FOP read this.
I believe Bill James (maybe Tom Tango, too) has concluded that the extra pitchers do help. Some. There's just plenty of guys who, matched up with the platoon advantage and putting everything they got into a dozen or two pitches, can get hitters out with reasonable effectiveness. Whereas there are just not that many Matt Stairs. Once the knees go, you can't hit well anymore either. So hitting wise, you either hit well enough to play, or you can't hit. No corresponding way to squeeze some type of use out of marginal hitters.
Very interesting. Lots to think about - and some changes in MLB to look forward to, I expect fairly soon.
Please, explain why this comment was dinged.
Colin - great stuff.

Like some of the folks that have commented above I have always been highly skeptical of the structured bullpen and closer model so it's nice to see that the numbers actually back that up. Your point on the length of games is also well taken and another major peeve of mine.

It seems to me that having to rely on 3-4 (or even 5) pitchers a game is simply an algorithm for finding the one guy who is having a bad enough night to cost you the game.

I (sadly) agree with you that change will come slowly in this area. However, I'm encouraged by the fact that this year's SI Baseball Preview specifically mentioned the Rays' getting > 1000 innings out of their SP as a big advantage.
Heck, I just feel like dinging the name 'Hoot Stromboli'.
Sorry, I tried to pick a name that is fun. Hoot Evers was a player on my favorite team (Detroit Tigers) and Stromboli is a romantic island in Italy - and my wife is Italian - and I like the sound of "strom-bow-lee". I can't change my name for another 8 months (aprox.) Why do you ding my name? I try my best to write only worthwhile comments.
I wonder, too, if a deeper bullpen better serves an organization in terms of player development and money saved. Most teams want their young hitters with potential to play every day in the minors, rather than stagnate on the big league bench while accruing service time. This likely means a team would use its older "Quad-A" players for its bench, or spend money on someone like Johnny Damon or Hideki Matsui. But assuming those latter players would even accept such a role, would they remain effective getting only occasional ABs? Meanwhile, it seems far easier to identify pitchers that have only so much upside to them, hence, you don't mind sitting them in your bullpen as the sixth, seventh, or even eighth option. They're often young, cheap, and their skills are less likely to erode from lack of consistent use. Finally, if a regular is injured during the season, and there's no Trout/Harper waiting in the wings, would you rather replace them with a Quad-A player that has been playing every day or one that has been stagnating on your bench?
Some good points. ABout your final one -is there proof that stagnating on a bench makes you less effective when pushed into full time duty than a quadA player playing everyday in a lesser league?
Wasn't me who dinged it. But looking at your comment, that had to be it.
I have one quibble; a blown save does not turn a game into a sure loss.
"The BIP rate and starter IP rate have a correlation of .88. What this suggests is that pitching has gotten harder over the years because more and more of the burden has shifted to the pitcher alone, with less and less reliance on the defense."

There seems to be an awfully big gap between the data and the conclusion and it seems to be logically inconsistent as well.

Maybe BIP rate was increasing because the pitchers were not as high of quality of players as previous pitcher? Just one possible explanation as to why BIP was increasing as opposed to just saying pitching has gotten harder.

If the BIP rate is increasing, then doesn't that mean the pitchers are carrying less of the burden to retire opposing hitters since hitters are putting more balls in play (and therefore requiring the fielder to get the out)?

To say that the pitcher is carrying more of the burden for getting hitters out doesn't make much sense to me anyway because it's not like the fielders were up on the mound helping the pitcher try to strike hitters out.
Where does Colin say the BIP rate is increasing? His graph shows that it is decreasing.
I'm not sure you've really captured the effect of modern reliever usage by simply comparing the overall rate of 9th inning leads saved by era. James once said (something like)baseball innovations are consistently mocked then universally adopted within a few years. When you compare rates holding 9th inning leads between the 50s and the 90s you're largely comparing groups of teams all using the same strategy. For most of the 50s everyone used similar strategies, and for most of the 90s everyone had adopted something like the LaRussa construct. It seems likely that teams using similar strategies are going to have similar success rates.

What someone needs to look at are the transition phases, where you have something like teams using the LaRussa strategy playing teams still using early Gossage-style multi-inning strategies.

It would be very interesting and bizarre if this nearly universal and more than century-old trend toward pitcher specialization had little or no benefits. It would be almost unbelievable.