755, .406, 56. Each of those numbers probably triggers an image in your mind’s eye. The timelessness of baseball’s statistics is what makes baseball such an appealing sport to so many people, and what keep us interested long after the heroes of our youth have retired.

Of the major American team sports, baseball is the one that most resembles the game our grandfathers played when they were children. Football? The forward pass had barely been invented. Basketball? There was no such thing as a dunk, a three pointer, or even a jump shot.

Baseball has remained pretty much unchanged. Sure, the players today are bigger, stronger, and faster than those of the past, but the same is true of every other sport. Despite the players’ changes, the balance of baseball is still roughly the same as it has been for 90 years, when Babe Ruth started hitting home runs and the dead-ball era gave way to the modern game.

There have been fluctuations: the high scoring 1930s, the pitcher’s era of the 1960’s, the “PED era” of the last decade. Despite the ebbs and flows, scoring has generally remained close to 4.5 runs per game and league wide batting averages have hovered around .260.

While the end result has stayed relatively stable, the way pitchers are deployed has changed dramatically. Pitcher usage has evolved to counteract the ever-improving hitter. Once a pitcher might have been able to coast through the bottom of the lineup, conserving energy for the heart of the order with little fear of giving up a home run to the #8 hitter. Today he must pitch carefully and near full effort throughout he lineup.

As a result of the sustained higher effort and increasing strike out rates (with strikeouts always requiring at least three pitches), pitchers have completed fewer starts ever since the 1870s. As discussed in Baseball Between the Numbers and elsewhere, the rate of complete games has dropped from around 90% in 1893 when the pitching mound was moved back to its current distance to under 4% today.

A second result is that the rate of decisions by the starting pitcher has also declined steadily with time. Since pitchers do not go as deep into games as they once did, their won-loss records are more susceptible to things beyond their control such as relievers blowing an inherited lead or teammates mounting a late innings rally. The earlier a pitcher exits a game, the less likely he is to earn the decision.

Countless articles have been written about this in recent years, as analysts wonder if anyone will ever again reach 300 victories in a career. While the fate of the 300 game winner is well worth considering, another byproduct of the decline in decisions is the lack of correlation between pitcher wins and pitcher quality, particularly over a single season.

The idea that wins are a poor way of measuring how good a pitcher is should not be novel to any reader of Baseball Prospectus. Indeed, most of us probably determined it was stupid as soon as we understood the rules, with thoughts along the lines of: “You mean that if a pitcher gives up one run in nine innings, but his teammates don’t score he is considered to have pitched poorly?”

Any number of people have come up with methods of measuring pitcher performance that correlate far better from year to year than actual wins and losses (such as SNLVAR, VORP, and QERA). Despite these advances, the won-loss record of a pitcher remains a critical tool for gaining recognition through the standard channels: All Star games, Cy Young Awards, and even the Hall of Fame. Why are wins and losses still so important in deciding who the top pitchers are? Blame your grandfather.

At the end of the dead ball era in 1919, the top pitchers earned decisions in approximately 90% of the games they started. As the game has evolved and pitchers have been relieved earlier in games, that number has steadily declined, currently residing around 70%. Note that in order to consider only the top starting pitchers, I am restricting this study to pitchers who started at least 25 games and made 2 or fewer relief appearances.



When pitchers stayed in games longer and earned decisions at such a high rate, it was more likely that their won-loss record would correlate to their own performance, since the uncertainty that comes from turning a game over to the bullpen was removed. Run support obviously plays a large part in earning wins and losses, but wins could reasonably be construed as being a meaningful measure of pitcher performance in the 1920s, particularly when considering the lack of advanced metrics.

Contrary to popular belief, the number of games started has not declined dramatically with time. It is true that today’s pitcher average fewer starts than their predecessors in the 1960s. However, the 1960s actually represented an increase in the average number of starts by top pitchers since the 1920s.


As a result of the evolution of pitcher usage, the average number of decisions by a top starting pitcher today is its lowest value in history, excluding the strike-shortened years of 1981, 1994, and 1995. The average starter in 1919 earned approximately 30 decisions while the average starter in 2008 earned just 22 decisions.


The upshot is that it is now much harder for elite pitchers to separate themselves from the pack. To illustrate this, consider the following table, which shows the expected won-loss records (eW-L) for pitchers whose statistics indicate they should win 80%, 70%, 60%, and 50% of their decisions (eW%) in 20-year intervals since the end of the dead-ball era.

          1920   1940   1960   1980   2000
eW%       eW-L   eW-L   eW-L   eW-L   eW-L
80%      24- 6  22- 6  22- 5  20- 5  18- 5
70%      21- 9  20- 8  19- 8  18- 7  16- 7
60%      18-12  17-11  16-11  15-10  14- 9
50%      15-15  14-14  14-13  13-12  12-11

The separation between the best pitcher and an average pitcher was about nine wins in 1920, but has been reduced to only six wins today. Since there is less of a spread between the elite pitchers and everyone else, it is easier for a lesser pitcher to get lucky and surpass the elite pitcher in wins.

Baseball Prospectus has a metric called “luck” which measures the difference between a pitcher’s expected wins and losses (definition: “Expected win record for the pitcher, based on how often pitchers with the same innings pitched and runs allowed earned a win or loss historically”) and his actual wins and losses. Luck scores in a given season typically range from about -10 to +10, with most pitchers near 0.

I collected the luck scores for all pitchers in 1960, 1970, 1980, 1990, and 2000 (luck scores only go back to 1954). The standard deviation in luck for pitchers with 25 or more starts was 4.2. For a luck score of 4, a pitcher one standard deviation in the positive direction would win roughly 2 games more and lose 2 games less than expected based on his statistics. Thus a pitcher whose expected record was 12-11 could be expected to go 14-9 about 13.6% of the time, 16-7 about 2.1% of the time, and 18-5 about 0.1% of the time.

Talent in the major leagues is not distributed evenly. There are many more average players than stars, and far more mediocre players than average players. In theory, the talent can be approximated by an exponential function, however when considering only the better starting pitchers (again defined as those with 25 or more starts from the 1960, 1970, 1980, 1990, and 2000 seasons), the distribution approximates a bell curve when talent is measured by expected win percentage.


There are very few aces and many more pitchers in each progressively lower talent level. Presumably there are far more pitchers capable of winning less than 50% of their games…they just aren’t allowed to make 25 starts in a season. The larger the pool of players, the more likely that someone will outperform his expected won-loss record by a substantial amount. Given enough players in the pool, it is reasonable to have someone outperform his expected won-loss record by three or more standard deviations.

With the much larger number of players expected to win 50% or 60% of their games than 70%, it may not be unusual for a slightly above average pitcher to get lucky and lead his league in wins, beating out better pitchers with worse luck.

I created a model to see how often this is likely to occur. For simplicity, I assumed that talent is distributed as in the histogram above and has been since 1920. Thus, 0.9% of pitchers were expected to win 70-75% of their games, 4.5% were expected to win 65-70%, 8.1% were expected to win 60-65% of their games, 17.8% were expected to win 55-60% of their games, and 28% were expected to win 50-55% of their games in each year (1920, 1940…2000). I assumed a constant luck standard deviation of 4.2 and generated 5000 random seasons to see how often the league leader in wins came from each talent category. The percentages are below:

          1920   1940   1960   1980   2000
ExpWpct  mostW  mostW  mostW  mostW  mostW
 70-75%   39.7   40.4   40.4   41.4   41.5
 65-70%   55.9   55.4   55.1   54.0   54.0
 60-65%    4.3    4.1    4.5    4.6    4.5
 55-60%    0.1    0.1    0.0    0.0    0.0
 50-55%    0.0    0.0    0.0    0.0    0.0

Surprisingly, the decrease in the number of decisions from 1920 to 2000 did not make much difference in the fraction of the time that the league leader in wins came from each talent group. In each era, the wins leader came from one of the top two groups about 95-96% of the time. However, these results reiterate that the league leader in wins will quite often not be the best pitcher; 59% of the time the wins leader will not be among the best 1% in expected won-loss record, and about once every 20 years or so, he will not be among the best 5.4% of starters.

So what have we learned? As the game has evolved, starting pitchers are going less deep into games and earning fewer decisions as a result. This compresses the range of expected won-loss records, making it harder for today’s top pitchers to distinguish themselves from the merely above average. Despite this, the variation in wins due to things beyond the pitcher’s control is large enough that the league leader in wins is no less likely today to be the best pitcher than he was in the past. Wins definitely aren’t the best way to judge a pitcher, but they probably aren’t any worse now than they were when your grandfather was a boy.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
I really like the use of graphs. As with Matt's article last week, the graphs and dense writing make it seem longer than it actually is, but unlike Matt, Matthew stays focused on one topic and pounds it mercilessly until we get to the conclusion. This is really solid work and the kind of thing that he's shown he can do week in and week out. Knight's not the best pure writer, but he's solid enough to hold his own with people that are. He's not that stathead that Kniker is or the writer that Swartz is, but he might be the best "middle ground" in the competition.
One issue I had with this piece was structural. I would have liked to be clued into what specifically I would be reading about prior to the tenth paragraph, if not later. The sheer number of short paragraphs throughout is a structural problem too, as they made for very choppy reading. These are problems that an editor would normally solve, but shouldn't have to at the professional level. This got to an interesting minor point (in the final analysis, pitcher wins still aren't a particularly useful, so the main thing we learned is that they aren't more useless than they used to be, just as useless) but it was a bit of a chore getting there.
I was intrigued by Matthew's decision to go to a big-picture topic, but like Steven, I found that there tended to be an awful lot of throat-clearing, especially when that presented us with a few 'freshmanisms' (a term I borrowed from one college prof, defined as sweeping assertions that aren't really true) early on. Once he got to his main talking points, he did effective work, but there'd be a lot of trimming from the early paras.
I don't know how I feel about this one. I liked it, and I was informed by it, and like Will, I thought the graphs were very effective, but in the end, this becomes the opposite of some other pieces this week. This piece has execution, but problems with the concept. Very few (and probably zero BP readers) think W-L records are indicative of anything, so why do the research to prove it?
One minor point: the fact that starting pitchers are now involved in a lower percentage of decisions than they were 80 years ago - correctly identified by the author as a factor that weakens the correlation between W-L record and pitcher performance - is at least partially offset by the increase in strikeouts, which means that the runs that score (or do not score) while the starter is still in the game are more likely to be due to his own efforts than those of his defense (or the baseball gods.) This may help explain why wins are not significantly less useful now than they were back then.

I also don't really buy that the "lack of advanced metrics" in our grandfathers' day does much to justify the invention and use of W-L as a pitcher stat. It doesn't take an advanced metric (or a lot of conceptual reasoning power) to see that a stat that concerns itself only with runs allowed during a pitcher's involvement in the game (e.g. ERA) is an enormous advance over a stat that also concerns itself with runs scored and runs allowed while a different pitcher is involved in the game. W-L record was a stupid idea when it was invented, no less so than it is today.

With that said, I liked this article....and like everyone else, I especially appreciated the graphs.
I disagree, jepson. One side effect of the "coast until you really need to get outs" approach that was possible in the early days is that pitchers very probably really did "pitch to the score", being willing to allow a few meaningless runs in a game they were winning handily, and bearing down extra-hard in close games. That made ERA less useful as a measure of ability (or even performance) in those days, and W/L at least partially a reflection of how well pitchers adapted to the situation.

I wish Matthew had made this point in his article. Explaining why W/L wasn't a totally moronic idea from the start is an important part of the story.
In those days, if you had an ERA over 4.00 you were a bad pitcher. I doubt there was less room to "pitch to the score".
It took a long while to get going... but once it did, I loved it. It may seem like a trivial kind of topic, but the idea that elite pitchers are among the league leaders in wins no more often than they were 80 years ago is quite interesting. I like how he showed the gap narrowing between elite pitchers and average pitchers, and if I were to trim out a chunk of the first half, I'd spend more time analyzing that portion.

Two quibbles.. when you spend the first section of the article saying what "everyone/anyone" etc. I wonder why you're joining the crowd. That first part could've been tightened up a lot. Also the sentence "Since pitchers do not go as deep into games as they once did, their won-loss records are more susceptible to things beyond their control such as relievers blowing an inherited lead or teammates mounting a late innings rally." was a bit unclear... I assume you're saying that W-L does not reflect accurately on a pitcher because after a pitcher leaves a game, the bullpen could blow what was a win or their teammates could score enough runs to turn it into a no decision.

Easy thumbs up.
Oh and for the record, I think Swartz is a good stathead too... but Will's got a good point.

Swartz's best article was when he hammered a specific topic (PECOTA) into the ground. Swartz usually analyzes things from so many angles though that sometimes the focus of the article itself gets lost. I don't think one can ever accuse Swartz of being sloppy though.

Knight repeatedly stays on topic and hammers that topic into the ground, but sometimes he loses focus within the subject matter itself so the writing, assumptions or resulting analysis get a bit sloppy or, as in the case of the TTO Entry, erroneous.

Different strengths, and both of them definitely have strengths. Overall, Swartz has done better in this competition but Knight might almost be there.
Content C - It was very surprising to discover the average number of starts among pitchers with 25 or more starts was at its peak in the 60s and early 70s, but has remained stable throughout the rest of time. However, it is a fuzzy stat. Weren't there more pitchers with 35 or more and 40 or more starts prior to the 60s - at least, on a per team basis? Did they have more pitchers with 25-30 starts to bring the average down? In the early 60s, the schedule increased by 8 games. Is that compensated for in any way in these measures? Could that have accounted for the increase in average number of starts among the "top" at that time?

Beyond this point, which needed further exploring, I have to concur with Kevin. I don't think I learned a dar-gone thing, and I doubt many BP readers did either.

Writing B Matthew's essay trotted along very nicely until, "While the fate of the 300 game winner is well worth considering . . .". Is it worth considering whether this is worth considering?

" . . . another byproduct of the decline in decisions. . ." Did you really want to turn baseball stats into something that brings to mind industrial waste?

After that. the narrative was bogged down in charts without a clear direction of where this was going.
Seems you and I have been pretty close in previous weeks and I like your grading scale idea. I guess we differ a bit on this article though... Look forward to seeing your feedback on my hitlist tomorrow.
hotstatrat, those are good points about the distribution of starts over the years. Here are number of starts per team for pitchers in various win ranges (removing the relief appearances restriction that I used in the article):

Year 25-29 30-32 33-36 37+ Total
1920 0.81 0.81 0.88 0.75 3.25
1940 0.69 0.75 0.69 0.13 2.25
1960 0.88 0.44 1.19 0.13 2.63
1980 1.00 0.96 1.04 0.31 3.31
2000 0.73 1.13 0.87 0.00 2.73

Only one player in any of these years had 40 starts in a season (Pete Alexander, 1920). The real transition in games started came at the end of the dead-ball era. Prior to 1920 the league leader almost always had 40+ starts.

I didn't make any effort to correct for the increased length of season since 1960. I'd guess that this sneaks a few more guys onto the bottom of the list for the 1980 and 2000 samples which cancels out any increases at the top. But that's just a hunch. If anyone with a more historical perspective has thoughts, feel free to chime in.

I can't easily figure out how many days were used to play the 154 game schedules versus 162 game schedules, so I don't know if that would have any effect. My recollection is that in the days of train travel there were quite a few more double headers and off (travel) days which may have made it harder for top pitchers to exceed 35-37 starts.
Thanks very much, Matthew.

However, I am confused. Do numbers with the decimals in the body of this chart represent the average number of pitchers each team has in those no. of starts slots? I don't see what win ranges you are referring to here.

If that chart is what I presume it to be, that's interesting. As I recall, in 1980, many teams still have four man rotations or had four main starters and guys who would pitch on that fifth day when they had five games in a row. Since then the number of starters per team with 33 or more starts has only reduced from 1.35 to 0.87 - reduced by only half of a pitcher.

My impression was that pitchers in a five man rotation generally max out at 32 starts with just the ace often squeezing in a 33rd or 34th. While back in the four man 60s and 70s, the guys in the rotation all year had 35 or 36 starts. With less than 1.5 pitchers showing up in the 33+ slots for those years, I am wondering how that is possible. Were pitchers getting hurt more? Traded more? I certainly don't think pitchers were returned to the minors more frequently. I think we are at the peak of that behavior now.

I also wonder what was going on in 1940 that we had even slightly fewer pitchers with 33+ starts than we have today. You made a good point about train travel and double headers. That must have weighed more heavily on a pitcher's ability to make a large number of starts than we imagined.
#1 starters would get 36-40, sometimes even 41 or 42 starts...they would go every 4 days when possible, not 4 games. 180 days/4 = 45. The starts would drop off quickly for many teams, with the #3 and #4 getting 20 some starts and also relieving. Today there's a more fixed rotation, so while 1 and 2 get fewer, 3, 4 and 5 get more.
You are reading the table correctly. The formatting looked fine in the comments box but it came out ugly when it posted. The top row is the year, then win ranges. Subsequent rows are the year, then number of players per team in each win range. So in 1920 there were 0.81 players per team with 25-29 wins, 0.81 with 30-32 wins, etc.

Brian, thanks for the clarification about how starts were allocated.
Thanks, guys, for the clarifications.

It is surprising to see so many pitchers in 1960, 1980, and 2000 with over 30 Wins or 30 Expected Wins. I'm having trouble wrapping my head around that, when there was only one pitcher in the last, what, 70 years to actually win that many.
Oops. I just realized I said "wins" instead of "starts" in that last post. Definitely a mistake! That should read "...25-29 STARTS" (etc.)
The topic of historical pitcher usage is one of my favorite, and has a goldmine of possible material waiting to be dug into. This article said some things I was intrigued by, but didn't quite go where I hoped. This one is very close to thumbs up, but I will reserve judgement until I see the rest.
I think Christina hit it on the head concerning the plethora of 'Freshmanisms'. Maybe he did get somewhere by the 10th paragraph, but, by then, I was just reading words and comprehending little. Upon returning to the article, I did like the graph work, but It still didn't seem enough.
After reading this, I had to check Mr. Knight's initial entry - this was the back-of-the-envelope stats guy! This was a back-of-the-envelope statistical study - right up his alley. It was a solid enough article, but I found there to be some disconnect between the stuff about trends with starters' starts and decisions over time, and the expected decision distribution stuff at the end. It came off as a disjointed statistics exercise without the necessary corresponding analysis to make it coherent.
I thought the introduction was weak, as it took 10 paragraphs for the writer to introduce the topic and describe what he was trying to do. The charts look good but the topic failed to hold my attention.
Matthew is the only author left in the competition for whom I have never voted. That's not going to change this week, though I do think this is the strongest of his articles so far.
While I thought that the data was interesting, I never really felt like this had direction. In the end, I wasn't really satisfied by the conclusion.

That said, it felt like a well written piece with graphs with meaning; it still might get a thumbs up. I haven't decided.
After a few more reads, I think that the title is my biggest problem. Why should I blame my grandfather? I keep expecting to read something about the history or reasoning behind the W-L stats that reveals some flawed thinking of baseball's statistical founders.

If I ignore the title, I walk away with very interesting information.
The charts are informative and I find the use of simulation to understand what the data might look like is very helpful, especially in a field like baseball where there is so much variance and so many of us tend to regard today's figures as sonehow "true"