BP Comment Quick Links


June 28, 2013 Manufactured RunsThe Mystery of the Missing .500 Teams
Sometimes, baseball research happens because you go out looking for something and you find it. Other times, it happens because you go off looking for something else and you trip over something far more interesting. This is the latter. While looking through historic team records for another project I was working on, I came across an interesting puzzle—there were far fewer teams exactly at .500 than I would have expected. I thought maybe it was a wacky feature of the sample set I was using, but I expanded my search to nearly 50 years of Major League Baseball, and the same puzzle was still staring me in the face. So I was left with three questions: Was what I was seeing really there? Why was it happening? And what did it mean? One of the best parts of working at Baseball Prospectus is the ability to pester the staff email list with really bizarre questions. Some people use this power to ask questions where they don’t know the answer. Those people are probably much more wellliked than I am by the other staffers. I, instead, ask questions to which I already know the answer and request that people make wild guesses without doing any research first. I do this because sometimes when I’m looking at data, it helps me to get an unbiased perspective of what someone might expect the data to look like. But to get that, you need to ask people who haven’t seen the data, because once you’ve been staring at the data for too long you expect the data to look like the data. So here’s the question I posed to the staff list:
Exactly 162 games doesn’t really affect much of anything except that it limits the number of possible responses—you know that everything has to be in increments of roughly .006. Please, before you go any further, think about it yourself—go on gut, don’t look up anything. Okay? Got it? Good, let’s proceed. Now, I suspect that if I could ask the question in such a way as to get an immediate response—that is, without giving you time to think that boy, isn’t it odd that he’s asking this question—the most common answer would be .500. It’s a pretty reasonable guess, since .500 is both the mean and the median for team win percentage. Of course, I can’t ask the question without asking the question, and the very fact that I’m asking the question lets on that I’m looking for a somewhat less obvious answer. Interestingly, the answers I got from the staff at first were uniformly low—Harry Pavlidis guessed .488 right out of the gate, which is actually the fourthmostcommon win percentage for teams who have played exactly 162 games. And there’s actually a compelling logic to the idea that it’s easier to be below .500 than above it. It’s even true, to an extent—the worst team record in the sample I tested, .250, is further from .500 than the highest record of .716. It’s a good theory. But the most common team record for teams with 162 games played is actually above .500:
(I went out to 11 entries because there was a 10thplace tie.) The three most common win percentages are all above .500, and six of the top 11 are above .500. An even record is only the seventhmostcommon record, and it is tied with two other records for that spot. It’s a bit bizarre, really, or at least it surprised me—in a normal distribution, the mean, median, and mode should all be the same. The distribution for team win percentages certainly doesn’t seem skewed—the mean and median line up—but something seems decidedly abnormal here. It might help to look at a histogram showing all teams from 1962 (the first year of the 162game schedule) through 2012:
The normal distribution is a pretty good fit, but it isn’t perfect. As anticipated, the tail looks slightly different for bad teams as compared to good. But shockingly, the biggest point of disagreement isn’t the tails but right in the middle—there’s an odd little dip right at .500. What our histogram suggests is what’s known as a bimodal distribution—instead of a bell curve with a single peak, what we seem to have is a combination of two overlapping bell curves with two peaks. Histograms, of course, are subject to small sample sizes and the largely arbitrary decision on the number of bins to use (I used 29 bins for that histogram, for no other reason than the program gretl suggested it). We have 1,338 teams in the sample, which isn’t small but isn’t so large that we can necessarily trust in the eyeball test. What we need is a statistical test of the number of modes in a distribution. We can’t necessarily tell how many modes a distribution has—it can be hard to tell at what point overfitting begins to occur. But we can test whether or not a distribution is unimodal (the standard bell curve) or not. I turned to Hartigan’s dip test of unimodality, which as it turns out has nothing at all to do with how much oil is in your car. What the dip test measures is the largest difference between the observed data and the normal distribution that best fits that data (or at least, that has the smallest such distance possible). That distance can be compared to a Monte Carlo simulation that sees how frequently that distance would occur randomly, assuming the data was randomly sampled from a normal distribution, given the sample size. The diptest package in GNU R reports a distance of 0.0164, which according to its simulations would occur at random less than two percent of the time (five percent is the generally accepted standard for statistical significance). In other words, the statistical test seems to back up what we see in the histogram. Can I explain it? Maybe. I can come up with some possible explanations, at least. One explanation is that teams are, rather than being purely random coins, selfaware. Most of the rewards—that is, playoff spots—are for teams above .500, so the incentive is either to finish above .500 and contend, or to play for the long term by selling off veteran parts for younger players. Another possible explanation is that there are structural imbalances in baseball that lead to the results we see. Because of longterm deals and club control of young players, roster turnover is limited. Unlike basketball or (to a much lesser extent) football, a single high draft pick cannot remake the fortunes of an entire team overnight. And teams that have structural advantages (like the ability to carry a high payroll, or a wellrun farm system) tend to carry those advantages over a period of several years, sometimes a decade or more. Similarly, teams that are poorly run or poorly funded tend to be bad over the long run. Another explanation is that it has to do with the differences between leagues—we know that recently, the AL has been the stronger league, for instance. It’s possible that what shows up so clearly in interleague play has an effect on the overall distribution of win percentages. Now, it’s possible that one explanation is right, that some combination of the above is right, or that some other explanation (or explanations) I haven’t even considered plays a significant role (either by itself or in concert with one, some, or all of the possible explanations I have offered here). It’ll probably take some more work to figure this out. But the next question is, what does this mean in a practical sense? What it suggests is that our current view of how teams behave, and how talent is distributed in major league baseball, is flawed. Now, as I have pointed out in the past, flawed models can still be useful. Treating MLB teams as though they come from a normal distribution can still be useful. I want to emphasize this because I’m going to talk about a lot of people’s favorite whipping boy next, and I want to make it clear that one can improve upon something without invalidating it entirely. Now, then, the whipping boy. Sabermetricians and fellow travelers love to talk about regression to the mean. It’s a somewhat more subtle and nuanced concept than I think most writers (even writers from a sabermetric background) manage to convey. You can overstate the impact of regression to the mean—it’s a probability, not an iron law, and it deals with populations more so than individuals. I like to say that groups will tend to regress to the mean over time, while individuals can do any damned thing they like (with some damned things, of course, being more likely than others). But ignoring overenthusiasm for the concept, you can pretty much divide people who analyze baseball into three camps:
None of what I say here should indicate that I favor the second and third camp over the first; I very much do not. I nearly sold off my Mark Prior jersey for a Chris Sale one when Rick Hahn talked about small sample size. But the standard model for regression to the mean assumes a normal distribution. If baseball teams aren’t normally distributed, in most cases it will still probably do pretty well. But there are going to be edge cases where it does not work so well. It implies that most teams above .500 should perhaps not be expected to regress towards .500 quite as much as we would otherwise suspect. But by the same token, some above.500 teams should be expected to regress to something below .500, so performing even worse than the normal model would suggest. And most paradoxically, a .500 team should be expected to regress away from their current record! (The question then becomes, regress towards what?) It also tells us that there are things about how talent is distributed among MLB teams that we don’t yet understand. Instead of baseball being a bunch of nearlyaverage teams with some good teams and bad teams at the margins, it seems as though baseball may instead be a collection of good teams and bad teams, with something of a gulf between them. That would seem to have impacts on evaluating roster construction, trades and freeagent signings, the structure of the amateur draft (and the acquisition of foreign talent) and our expectations of a team’s future performance. The sabermetric study of how individual players relate to things like runs and wins has thus far outpaced the study of how talent is distributed among teams, and it seems as though that’s been an oversight on our part.
Colin Wyers is an author of Baseball Prospectus. Follow @cwyers
14 comments have been left for this article. (Click to hide comments) BP Comment Quick Links Sharky (12101) Colin, as usual a very interesting piece. I suspect there are many factors at play. However, the structural imbalance (payroll disparity) is probably a big one. Jun 28, 2013 07:29 AM jdeich (50647) Agreed. You would not expect a unimodal distribution when teams have different resources. Each team has some expectation value of wins coming into the season, and you might imagine a Gaussian centered on their PECOTA projection. The league's distribution would (after enough seasons to settle the noise) look like a sum of these peaks. Jun 28, 2013 08:31 AM Shaun P. (676) It would not give a huge sample, but you could test this. Divide the sample up into two groups, 1962 to 1992, say, when payrolls were all pretty close, and 1993 to 2012, when they weren't. Jun 28, 2013 08:36 AM tiger337 (5175) My hypothesis is also self awareness. I believe it's a goal of some teams to finish above .500. If a team has a chance at to finish above .500, they may win a disproportionate number of lateseason games versus other out of contention teams who are playing out the string with callups. It would be interesting to see how the 8285 win teams did in the final days of seasons. Jun 28, 2013 07:41 AM jfranco77 (64578) Interestingly, .500 teams are probably less likely to be self aware. Or at least less likely to act like they're selfaware. Jun 28, 2013 10:29 AM Tim Kniker (42100) I'm not sure if this would make a huge difference, but could be interesting in terms of insight. Do you notice this type of distribution when you look at Pythg WL as well? I'm wondering if things look more Normal or if you still see the same phemonenon Jun 28, 2013 11:03 AM anderson721 (18704) I'm pretty sure Bill James did a study on this topic about 2025 years ago. Jun 28, 2013 11:49 AM John Collins (110) Seems like the reason stated at the outset of the article for guessing below .500  that the worst teams are farther below .500 than the best teams are above it  is actually a reason to guess that the most common percentage, and the median record, is above .500. The best record won't cancel out the worst record, requiring a team at .506 or .517 to balance out the wins and losses. Jul 01, 2013 17:11 PM AWBenkert (10169) To nitpick: it's impossible to play exactly 162 games and have a record of .250 because 162 is not evenly divisible by 4. The 1962 Mets, obviously the .250 team discussed in the essay, played 160 games, not 162. More importantly, it seems to me that expansion years should be omitted from the study because the new teams will have records far below .500 while the existing teams will reap the benefits of the extra wins. Jan 06, 2014 17:40 PM Not a subscriber? Sign up today!

Colin, interesting article. My hypothesis would fall into the selfaware category, but stated slightly differently. As July 31st approaches, teams determine if they are contenders or not. Noncontenders do things that may hurt current chances for future seasons (trade players for prospects, callup minorleaguer to get experience, etc.) while the contenders "go for the flag."
It would be interesting to see if we looked at records on July 31 (or a little before), if there is a Normal distribution and that it is games post trade deadline where the bimodal nature begins to emerge.
Or does it happen even before the season starts where teams have already assumed they are likely not going to make the playoffs and rebuild
It obviously happens to a degree even preseason. It's not like teams like the Twins or Astros are playing to maximise their wins in 2013.
Yeah, I don't think you can entirely make the "selfaware coin" thing an inseason phenomenon, as teams are obviously aware to some degree even in the offseason (although with a higher amount of uncertainty, of course).
Contenders also sometimes do things at the trading deadline that might improve their chances in the short term (that season) but also increase their chances of becoming a bad team (not merely a .500 team) in the longer term. For example, if you make a playoff push by trading prospects for veterans, in a year or two the veterans have left via free agency or succumbed to the diminishments of age, but you no longer have any prospects to take their place.