Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52

Sometimes, baseball research happens because you go out looking for something and you find it. Other times, it happens because you go off looking for something else and you trip over something far more interesting. This is the latter. While looking through historic team records for another project I was working on, I came across an interesting puzzle—there were far fewer teams exactly at .500 than I would have expected. I thought maybe it was a wacky feature of the sample set I was using, but I expanded my search to nearly 50 years of Major League Baseball, and the same puzzle was still staring me in the face. So I was left with three questions: Was what I was seeing really there? Why was it happening? And what did it mean?

One of the best parts of working at Baseball Prospectus is the ability to pester the staff email list with really bizarre questions. Some people use this power to ask questions where they don’t know the answer. Those people are probably much more well-liked than I am by the other staffers. I, instead, ask questions to which I already know the answer and request that people make wild guesses without doing any research first. I do this because sometimes when I’m looking at data, it helps me to get an unbiased perspective of what someone might expect the data to look like. But to get that, you need to ask people who haven’t seen the data, because once you’ve been staring at the data for too long you expect the data to look like the data.

So here’s the question I posed to the staff list:

What's the most common win percentage, out to three significant digits, for teams who have played exactly 162 games in a season?

Exactly 162 games doesn’t really affect much of anything except that it limits the number of possible responses—you know that everything has to be in increments of roughly .006. Please, before you go any further, think about it yourself—go on gut, don’t look up anything. Okay? Got it? Good, let’s proceed.

Now, I suspect that if I could ask the question in such a way as to get an immediate response—that is, without giving you time to think that boy, isn’t it odd that he’s asking this question—the most common answer would be .500. It’s a pretty reasonable guess, since .500 is both the mean and the median for team win percentage. Of course, I can’t ask the question without asking the question, and the very fact that I’m asking the question lets on that I’m looking for a somewhat less obvious answer.

Interestingly, the answers I got from the staff at first were uniformly low—Harry Pavlidis guessed .488 right out of the gate, which is actually the fourth-most-common win percentage for teams who have played exactly 162 games. And there’s actually a compelling logic to the idea that it’s easier to be below .500 than above it. It’s even true, to an extent—the worst team record in the sample I tested, .250, is further from .500 than the highest record of .716.

It’s a good theory. But the most common team record for teams with 162 games played is actually above .500:

























(I went out to 11 entries because there was a 10th-place tie.)

The three most common win percentages are all above .500, and six of the top 11 are above .500. An even record is only the seventh-most-common record, and it is tied with two other records for that spot.

It’s a bit bizarre, really, or at least it surprised me—in a normal distribution, the mean, median, and mode should all be the same. The distribution for team win percentages certainly doesn’t seem skewed—the mean and median line up—but something seems decidedly abnormal here. It might help to look at a histogram showing all teams from 1962 (the first year of the 162-game schedule) through 2012:

The normal distribution is a pretty good fit, but it isn’t perfect. As anticipated, the tail looks slightly different for bad teams as compared to good. But shockingly, the biggest point of disagreement isn’t the tails but right in the middle—there’s an odd little dip right at .500. What our histogram suggests is what’s known as a bimodal distribution—instead of a bell curve with a single peak, what we seem to have is a combination of two overlapping bell curves with two peaks.

Histograms, of course, are subject to small sample sizes and the largely arbitrary decision on the number of bins to use (I used 29 bins for that histogram, for no other reason than the program gretl suggested it). We have 1,338 teams in the sample, which isn’t small but isn’t so large that we can necessarily trust in the eyeball test. What we need is a statistical test of the number of modes in a distribution. We can’t necessarily tell how many modes a distribution has—it can be hard to tell at what point overfitting begins to occur. But we can test whether or not a distribution is unimodal (the standard bell curve) or not.

I turned to Hartigan’s dip test of unimodality, which as it turns out has nothing at all to do with how much oil is in your car. What the dip test measures is the largest difference between the observed data and the normal distribution that best fits that data (or at least, that has the smallest such distance possible). That distance can be compared to a Monte Carlo simulation that sees how frequently that distance would occur randomly, assuming the data was randomly sampled from a normal distribution, given the sample size. The diptest package in GNU R reports a distance of 0.0164, which according to its simulations would occur at random less than two percent of the time (five percent is the generally accepted standard for statistical significance). In other words, the statistical test seems to back up what we see in the histogram.

Can I explain it? Maybe. I can come up with some possible explanations, at least. One explanation is that teams are, rather than being purely random coins, self-aware. Most of the rewards—that is, playoff spots—are for teams above .500, so the incentive is either to finish above .500 and contend, or to play for the long term by selling off veteran parts for younger players.

Another possible explanation is that there are structural imbalances in baseball that lead to the results we see. Because of long-term deals and club control of young players, roster turnover is limited. Unlike basketball or (to a much lesser extent) football, a single high draft pick cannot remake the fortunes of an entire team overnight. And teams that have structural advantages (like the ability to carry a high payroll, or a well-run farm system) tend to carry those advantages over a period of several years, sometimes a decade or more. Similarly, teams that are poorly run or poorly funded tend to be bad over the long run.

Another explanation is that it has to do with the differences between leagues—we know that recently, the AL has been the stronger league, for instance. It’s possible that what shows up so clearly in interleague play has an effect on the overall distribution of win percentages.

Now, it’s possible that one explanation is right, that some combination of the above is right, or that some other explanation (or explanations) I haven’t even considered plays a significant role (either by itself or in concert with one, some, or all of the possible explanations I have offered here). It’ll probably take some more work to figure this out.

But the next question is, what does this mean in a practical sense? What it suggests is that our current view of how teams behave, and how talent is distributed in major league baseball, is flawed. Now, as I have pointed out in the past, flawed models can still be useful. Treating MLB teams as though they come from a normal distribution can still be useful. I want to emphasize this because I’m going to talk about a lot of people’s favorite whipping boy next, and I want to make it clear that one can improve upon something without invalidating it entirely.

Now, then, the whipping boy. Sabermetricians and fellow travelers love to talk about regression to the mean. It’s a somewhat more subtle and nuanced concept than I think most writers (even writers from a sabermetric background) manage to convey. You can overstate the impact of regression to the mean—it’s a probability, not an iron law, and it deals with populations more so than individuals. I like to say that groups will tend to regress to the mean over time, while individuals can do any damned thing they like (with some damned things, of course, being more likely than others). But ignoring overenthusiasm for the concept, you can pretty much divide people who analyze baseball into three camps:

  1. People who recognize and use the concept,
  2. People who mock it, and
  3. People who honor the concept more in the breach than in the observance (otherwise known as people who mention the concept and immediately follow it up by saying “but,” or people who believe in regression for everyone except for the special snowflakes they favor).

None of what I say here should indicate that I favor the second and third camp over the first; I very much do not. I nearly sold off my Mark Prior jersey for a Chris Sale one when Rick Hahn talked about small sample size.

But the standard model for regression to the mean assumes a normal distribution. If baseball teams aren’t normally distributed, in most cases it will still probably do pretty well. But there are going to be edge cases where it does not work so well. It implies that most teams above .500 should perhaps not be expected to regress towards .500 quite as much as we would otherwise suspect. But by the same token, some above-.500 teams should be expected to regress to something below .500, so performing even worse than the normal model would suggest. And most paradoxically, a .500 team should be expected to regress away from their current record! (The question then becomes, regress towards what?)

It also tells us that there are things about how talent is distributed among MLB teams that we don’t yet understand. Instead of baseball being a bunch of nearly-average teams with some good teams and bad teams at the margins, it seems as though baseball may instead be a collection of good teams and bad teams, with something of a gulf between them. That would seem to have impacts on evaluating roster construction, trades and free-agent signings, the structure of the amateur draft (and the acquisition of foreign talent) and our expectations of a team’s future performance. The sabermetric study of how individual players relate to things like runs and wins has thus far outpaced the study of how talent is distributed among teams, and it seems as though that’s been an oversight on our part.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Colin, interesting article. My hypothesis would fall into the self-aware category, but stated slightly differently. As July 31st approaches, teams determine if they are contenders or not. Non-contenders do things that may hurt current chances for future seasons (trade players for prospects, call-up minor-leaguer to get experience, etc.) while the contenders "go for the flag."

It would be interesting to see if we looked at records on July 31 (or a little before), if there is a Normal distribution and that it is games post trade deadline where the bimodal nature begins to emerge.

Or does it happen even before the season starts where teams have already assumed they are likely not going to make the playoffs and rebuild
It obviously happens to a degree even pre-season. It's not like teams like the Twins or Astros are playing to maximise their wins in 2013.
Yeah, I don't think you can entirely make the "self-aware coin" thing an in-season phenomenon, as teams are obviously aware to some degree even in the offseason (although with a higher amount of uncertainty, of course).
Contenders also sometimes do things at the trading deadline that might improve their chances in the short term (that season) but also increase their chances of becoming a bad team (not merely a .500 team) in the longer term. For example, if you make a playoff push by trading prospects for veterans, in a year or two the veterans have left via free agency or succumbed to the diminishments of age, but you no longer have any prospects to take their place.
Colin, as usual a very interesting piece. I suspect there are many factors at play. However, the structural imbalance (payroll disparity) is probably a big one.
Agreed. You would not expect a unimodal distribution when teams have different resources. Each team has some expectation value of wins coming into the season, and you might imagine a Gaussian centered on their PECOTA projection. The league's distribution would (after enough seasons to settle the noise) look like a sum of these peaks.

Because the payrolls are not random, patterns emerge over time-- if you guessed the Yankees' record in 2025, you'd probably pick a number over .500. A hypothetical league with 15 teams owned by the Steinbrenners and 15 teams owned by Jeffrey Loria would be sharply bimodal. Also, horrifying.

It might be interesting to bin the data by payroll, possibly as a ratio to the league average.

In 2012, the 5 highest payrolls won 95 (NYY), 81 (PHI), 69 (BOS), 89 (LAA), 88 (DET). (Avg: 84.4)

In 2012, the 5 lowest payrolls won 79 (PIT), 72 (KCR), 55(HOU), 94 (OAK), and 76 (SDP). (Avg: 75.2)
It would not give a huge sample, but you could test this. Divide the sample up into two groups, 1962 to 1992, say, when payrolls were all pretty close, and 1993 to 2012, when they weren't.

Even this isn't perfect; looking at the data, the payroll disparity really didn't come into existence until the very late 90s, but 1998-2012 seemed too small a sample; maybe it is the right one?

The Royals had a top 5 payroll in 1993 and a top 7 payroll in 1994. In 1996, the highest payroll (Baltimore) was only 3.5 times the lowest (Les Expos) and less than 2.5 times every other team's payroll. A 2.5-3.5 times gap seems to be consistent with the limited salary data available from the 60s and early 70s. The first time the top payroll was more than 5 times the bottom payroll was 1997, and then only because the Pirates sold off everything. Ditto 1998, but add the Expos to the Pirates. By 1999 (with expansion fees in their owners' pockets?) it was back down to around 3 times.

The first over $100M payrolls were the Yanks, Red Sox, and Dodgers in 2001 ($112M, $110M, and $109M, respectively). Even then, the highest payrolls was less than 5 times the lowest.

Anyway, Because it is a more recent phenomena, and matches up with the latest rounds of expansion, which really diluted talent, I am skeptical that payroll disparity has much to do with this. I could be wrong!
Well, if you consider Rottenberg's Invariance Principle, I'm not sure that payroll disparity is the best way to look at it. Under the reserve clause system, payrolls were artificially suppressed, but players still seemed to go disproportionately to large market teams.
My hypothesis is also self awareness. I believe it's a goal of some teams to finish above .500. If a team has a chance at to finish above .500, they may win a disproportionate number of late-season games versus other out of contention teams who are playing out the string with call-ups. It would be interesting to see how the 82-85 win teams did in the final days of seasons.
Interestingly, .500 teams are probably less likely to be self aware. Or at least less likely to act like they're self-aware.

A .300 team is very self aware.
A .700 team is very self aware.

A .400 team is probably pretty self-aware but might be slightly deluded (thinking they can contend when they can't; I'd guess not many .400 teams think they can't contend but actually can)

A .600 team is probably pretty self-aware but might be slightly deluded (thinking they're really a .600 team when they're not)

A .500 team ... what do they think? They probably don't do much either way at the deadline. Maybe they "go for it" but they're certainly not going to tear apart a .500 team except in rare cases (White Sox/White Flag).
I'm not sure if this would make a huge difference, but could be interesting in terms of insight. Do you notice this type of distribution when you look at Pythg W-L as well? I'm wondering if things look more Normal or if you still see the same phemonenon
I'm pretty sure Bill James did a study on this topic about 20-25 years ago.
Seems like the reason stated at the outset of the article for guessing below .500 -- that the worst teams are farther below .500 than the best teams are above it -- is actually a reason to guess that the most common percentage, and the median record, is above .500. The best record won't cancel out the worst record, requiring a team at .506 or .517 to balance out the wins and losses.
To nitpick: it's impossible to play exactly 162 games and have a record of .250 because 162 is not evenly divisible by 4. The 1962 Mets, obviously the .250 team discussed in the essay, played 160 games, not 162. More importantly, it seems to me that expansion years should be omitted from the study because the new teams will have records far below .500 while the existing teams will reap the benefits of the extra wins.