755, .406, 56. Each of those numbers probably triggers an image in your mind’s eye. The timelessness of baseball’s statistics is what makes baseball such an appealing sport to so many people, and what keep us interested long after the heroes of our youth have retired.
Of the major American team sports, baseball is the one that most resembles the game our grandfathers played when they were children. Football? The forward pass had barely been invented. Basketball? There was no such thing as a dunk, a three pointer, or even a jump shot.
Baseball has remained pretty much unchanged. Sure, the players today are bigger, stronger, and faster than those of the past, but the same is true of every other sport. Despite the players’ changes, the balance of baseball is still roughly the same as it has been for 90 years, when Babe Ruth started hitting home runs and the dead-ball era gave way to the modern game.
There have been fluctuations: the high scoring 1930s, the pitcher’s era of the 1960’s, the “PED era” of the last decade. Despite the ebbs and flows, scoring has generally remained close to 4.5 runs per game and league wide batting averages have hovered around .260.
While the end result has stayed relatively stable, the way pitchers are deployed has changed dramatically. Pitcher usage has evolved to counteract the ever-improving hitter. Once a pitcher might have been able to coast through the bottom of the lineup, conserving energy for the heart of the order with little fear of giving up a home run to the #8 hitter. Today he must pitch carefully and near full effort throughout he lineup.
As a result of the sustained higher effort and increasing strike out rates (with strikeouts always requiring at least three pitches), pitchers have completed fewer starts ever since the 1870s. As discussed in Baseball Between the Numbers and elsewhere, the rate of complete games has dropped from around 90% in 1893 when the pitching mound was moved back to its current distance to under 4% today.
A second result is that the rate of decisions by the starting pitcher has also declined steadily with time. Since pitchers do not go as deep into games as they once did, their won-loss records are more susceptible to things beyond their control such as relievers blowing an inherited lead or teammates mounting a late innings rally. The earlier a pitcher exits a game, the less likely he is to earn the decision.
Countless articles have been written about this in recent years, as analysts wonder if anyone will ever again reach 300 victories in a career. While the fate of the 300 game winner is well worth considering, another byproduct of the decline in decisions is the lack of correlation between pitcher wins and pitcher quality, particularly over a single season.
The idea that wins are a poor way of measuring how good a pitcher is should not be novel to any reader of Baseball Prospectus. Indeed, most of us probably determined it was stupid as soon as we understood the rules, with thoughts along the lines of: “You mean that if a pitcher gives up one run in nine innings, but his teammates don’t score he is considered to have pitched poorly?”
Any number of people have come up with methods of measuring pitcher performance that correlate far better from year to year than actual wins and losses (such as SNLVAR, VORP, and QERA). Despite these advances, the won-loss record of a pitcher remains a critical tool for gaining recognition through the standard channels: All Star games, Cy Young Awards, and even the Hall of Fame. Why are wins and losses still so important in deciding who the top pitchers are? Blame your grandfather.
At the end of the dead ball era in 1919, the top pitchers earned decisions in approximately 90% of the games they started. As the game has evolved and pitchers have been relieved earlier in games, that number has steadily declined, currently residing around 70%. Note that in order to consider only the top starting pitchers, I am restricting this study to pitchers who started at least 25 games and made 2 or fewer relief appearances.
When pitchers stayed in games longer and earned decisions at such a high rate, it was more likely that their won-loss record would correlate to their own performance, since the uncertainty that comes from turning a game over to the bullpen was removed. Run support obviously plays a large part in earning wins and losses, but wins could reasonably be construed as being a meaningful measure of pitcher performance in the 1920s, particularly when considering the lack of advanced metrics.
Contrary to popular belief, the number of games started has not declined dramatically with time. It is true that today’s pitcher average fewer starts than their predecessors in the 1960s. However, the 1960s actually represented an increase in the average number of starts by top pitchers since the 1920s.
As a result of the evolution of pitcher usage, the average number of decisions by a top starting pitcher today is its lowest value in history, excluding the strike-shortened years of 1981, 1994, and 1995. The average starter in 1919 earned approximately 30 decisions while the average starter in 2008 earned just 22 decisions.
The upshot is that it is now much harder for elite pitchers to separate themselves from the pack. To illustrate this, consider the following table, which shows the expected won-loss records (eW-L) for pitchers whose statistics indicate they should win 80%, 70%, 60%, and 50% of their decisions (eW%) in 20-year intervals since the end of the dead-ball era.
1920 1940 1960 1980 2000 eW% eW-L eW-L eW-L eW-L eW-L 80% 24- 6 22- 6 22- 5 20- 5 18- 5 70% 21- 9 20- 8 19- 8 18- 7 16- 7 60% 18-12 17-11 16-11 15-10 14- 9 50% 15-15 14-14 14-13 13-12 12-11
The separation between the best pitcher and an average pitcher was about nine wins in 1920, but has been reduced to only six wins today. Since there is less of a spread between the elite pitchers and everyone else, it is easier for a lesser pitcher to get lucky and surpass the elite pitcher in wins.
Baseball Prospectus has a metric called “luck” which measures the difference between a pitcher’s expected wins and losses (definition: “Expected win record for the pitcher, based on how often pitchers with the same innings pitched and runs allowed earned a win or loss historically”) and his actual wins and losses. Luck scores in a given season typically range from about -10 to +10, with most pitchers near 0.
I collected the luck scores for all pitchers in 1960, 1970, 1980, 1990, and 2000 (luck scores only go back to 1954). The standard deviation in luck for pitchers with 25 or more starts was 4.2. For a luck score of 4, a pitcher one standard deviation in the positive direction would win roughly 2 games more and lose 2 games less than expected based on his statistics. Thus a pitcher whose expected record was 12-11 could be expected to go 14-9 about 13.6% of the time, 16-7 about 2.1% of the time, and 18-5 about 0.1% of the time.
Talent in the major leagues is not distributed evenly. There are many more average players than stars, and far more mediocre players than average players. In theory, the talent can be approximated by an exponential function, however when considering only the better starting pitchers (again defined as those with 25 or more starts from the 1960, 1970, 1980, 1990, and 2000 seasons), the distribution approximates a bell curve when talent is measured by expected win percentage.
There are very few aces and many more pitchers in each progressively lower talent level. Presumably there are far more pitchers capable of winning less than 50% of their games…they just aren’t allowed to make 25 starts in a season. The larger the pool of players, the more likely that someone will outperform his expected won-loss record by a substantial amount. Given enough players in the pool, it is reasonable to have someone outperform his expected won-loss record by three or more standard deviations.
With the much larger number of players expected to win 50% or 60% of their games than 70%, it may not be unusual for a slightly above average pitcher to get lucky and lead his league in wins, beating out better pitchers with worse luck.
I created a model to see how often this is likely to occur. For simplicity, I assumed that talent is distributed as in the histogram above and has been since 1920. Thus, 0.9% of pitchers were expected to win 70-75% of their games, 4.5% were expected to win 65-70%, 8.1% were expected to win 60-65% of their games, 17.8% were expected to win 55-60% of their games, and 28% were expected to win 50-55% of their games in each year (1920, 1940…2000). I assumed a constant luck standard deviation of 4.2 and generated 5000 random seasons to see how often the league leader in wins came from each talent category. The percentages are below:
1920 1940 1960 1980 2000 ExpWpct mostW mostW mostW mostW mostW 70-75% 39.7 40.4 40.4 41.4 41.5 65-70% 55.9 55.4 55.1 54.0 54.0 60-65% 4.3 4.1 4.5 4.6 4.5 55-60% 0.1 0.1 0.0 0.0 0.0 50-55% 0.0 0.0 0.0 0.0 0.0
Surprisingly, the decrease in the number of decisions from 1920 to 2000 did not make much difference in the fraction of the time that the league leader in wins came from each talent group. In each era, the wins leader came from one of the top two groups about 95-96% of the time. However, these results reiterate that the league leader in wins will quite often not be the best pitcher; 59% of the time the wins leader will not be among the best 1% in expected won-loss record, and about once every 20 years or so, he will not be among the best 5.4% of starters.
So what have we learned? As the game has evolved, starting pitchers are going less deep into games and earning fewer decisions as a result. This compresses the range of expected won-loss records, making it harder for today’s top pitchers to distinguish themselves from the merely above average. Despite this, the variation in wins due to things beyond the pitcher’s control is large enough that the league leader in wins is no less likely today to be the best pitcher than he was in the past. Wins definitely aren’t the best way to judge a pitcher, but they probably aren’t any worse now than they were when your grandfather was a boy.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.Subscribe now
I also don't really buy that the "lack of advanced metrics" in our grandfathers' day does much to justify the invention and use of W-L as a pitcher stat. It doesn't take an advanced metric (or a lot of conceptual reasoning power) to see that a stat that concerns itself only with runs allowed during a pitcher's involvement in the game (e.g. ERA) is an enormous advance over a stat that also concerns itself with runs scored and runs allowed while a different pitcher is involved in the game. W-L record was a stupid idea when it was invented, no less so than it is today.
With that said, I liked this article....and like everyone else, I especially appreciated the graphs.
I wish Matthew had made this point in his article. Explaining why W/L wasn't a totally moronic idea from the start is an important part of the story.
Two quibbles.. when you spend the first section of the article saying what "everyone/anyone" etc. I wonder why you're joining the crowd. That first part could've been tightened up a lot. Also the sentence "Since pitchers do not go as deep into games as they once did, their won-loss records are more susceptible to things beyond their control such as relievers blowing an inherited lead or teammates mounting a late innings rally." was a bit unclear... I assume you're saying that W-L does not reflect accurately on a pitcher because after a pitcher leaves a game, the bullpen could blow what was a win or their teammates could score enough runs to turn it into a no decision.
Easy thumbs up.
Swartz's best article was when he hammered a specific topic (PECOTA) into the ground. Swartz usually analyzes things from so many angles though that sometimes the focus of the article itself gets lost. I don't think one can ever accuse Swartz of being sloppy though.
Knight repeatedly stays on topic and hammers that topic into the ground, but sometimes he loses focus within the subject matter itself so the writing, assumptions or resulting analysis get a bit sloppy or, as in the case of the TTO Entry, erroneous.
Different strengths, and both of them definitely have strengths. Overall, Swartz has done better in this competition but Knight might almost be there.
Beyond this point, which needed further exploring, I have to concur with Kevin. I don't think I learned a dar-gone thing, and I doubt many BP readers did either.
Writing B Matthew's essay trotted along very nicely until, "While the fate of the 300 game winner is well worth considering . . .". Is it worth considering whether this is worth considering?
" . . . another byproduct of the decline in decisions. . ." Did you really want to turn baseball stats into something that brings to mind industrial waste?
After that. the narrative was bogged down in charts without a clear direction of where this was going.
Year 25-29 30-32 33-36 37+ Total
1920 0.81 0.81 0.88 0.75 3.25
1940 0.69 0.75 0.69 0.13 2.25
1960 0.88 0.44 1.19 0.13 2.63
1980 1.00 0.96 1.04 0.31 3.31
2000 0.73 1.13 0.87 0.00 2.73
Only one player in any of these years had 40 starts in a season (Pete Alexander, 1920). The real transition in games started came at the end of the dead-ball era. Prior to 1920 the league leader almost always had 40+ starts.
I didn't make any effort to correct for the increased length of season since 1960. I'd guess that this sneaks a few more guys onto the bottom of the list for the 1980 and 2000 samples which cancels out any increases at the top. But that's just a hunch. If anyone with a more historical perspective has thoughts, feel free to chime in.
I can't easily figure out how many days were used to play the 154 game schedules versus 162 game schedules, so I don't know if that would have any effect. My recollection is that in the days of train travel there were quite a few more double headers and off (travel) days which may have made it harder for top pitchers to exceed 35-37 starts.
However, I am confused. Do numbers with the decimals in the body of this chart represent the average number of pitchers each team has in those no. of starts slots? I don't see what win ranges you are referring to here.
If that chart is what I presume it to be, that's interesting. As I recall, in 1980, many teams still have four man rotations or had four main starters and guys who would pitch on that fifth day when they had five games in a row. Since then the number of starters per team with 33 or more starts has only reduced from 1.35 to 0.87 - reduced by only half of a pitcher.
My impression was that pitchers in a five man rotation generally max out at 32 starts with just the ace often squeezing in a 33rd or 34th. While back in the four man 60s and 70s, the guys in the rotation all year had 35 or 36 starts. With less than 1.5 pitchers showing up in the 33+ slots for those years, I am wondering how that is possible. Were pitchers getting hurt more? Traded more? I certainly don't think pitchers were returned to the minors more frequently. I think we are at the peak of that behavior now.
I also wonder what was going on in 1940 that we had even slightly fewer pitchers with 33+ starts than we have today. You made a good point about train travel and double headers. That must have weighed more heavily on a pitcher's ability to make a large number of starts than we imagined.
Brian, thanks for the clarification about how starts were allocated.
It is surprising to see so many pitchers in 1960, 1980, and 2000 with over 30 Wins or 30 Expected Wins. I'm having trouble wrapping my head around that, when there was only one pitcher in the last, what, 70 years to actually win that many.
That said, it felt like a well written piece with graphs with meaning; it still might get a thumbs up. I haven't decided.
If I ignore the title, I walk away with very interesting information.