A couple of weeks ago, I wrote about the distribution of team wins, and the discovery that the distribution may in fact be bimodal, not normal as one might expect.
One of the predictions that came from this theory was that teams right at .500 would, counterintuitively, tend to regress away from the mean. So one thing we can do is actually check to see if the real world behaves the way we expect it to. I took all teams from 1969 on with even numbers of games and split them into “halves” of evennumber games. I use scarequotes for halves since in order to boost the sample size, I split into increments of two and kept any pair where both “halves” were within 20 games of each other. Then I looked at teams that were exactly .500 in the “before” sample— 716 teams total—and saw what they did afterward:
Again, we see a pronounced bimodal pattern in the data. What’s interesting is that we don’t even see it coming out to .500 in the aggregate; the average is .497, close enough to .500 that we could chalk it up to a sampling issue, but the median is .489. While 323 teams have records greater than .500, 359 teams have records under .500 (with 34 teams exactly at that mark). Looking at the most common records (after prorating to wins in a 162game schedule, to control for the uneven number of games in the “after” samples):
Num 

0.525 
55 
0.475 
46 
0.488 
44 
0.512 
42 
0.500 
34 
0.444 
24 
0.537 
23 
0.568 
21 
0.451 
21 
0.463 
20 
0.420 
20 
Which looks rather similar to the chart in the last article.
So what we have is a weird little case of teams actually fleeing from the mean, rather than regressing toward it. We have a theory as to why this might be, if you’ll recall what we said last time. There’s very little glory in finishing right at .500. It’s hard to make the playoffs at that record, and if you do you’re at a disadvantage compared to the other playoff teams in both seeding and talent. The incentives are lined up for teams to either finish above .500 and contend, or retool and finish below .500.
So I took my splithalves sample and looked at all teams from 1985 on (1985 being the first year I have salary data for all 30 teams), not just teams at .500 in the “before” sample. And I looked at five variables:
 A team’s actual win percentage in the before sample,
 A team’s thirdorder win percentage in the before sample (not the whole season),
 How many games back of the division leader a team was as of the last day of the before sample,
 A team’s TV market size, as defined by this Nate Silver study, and
 A team’s payroll for that season, divided by the average team’s payroll that year (which I termed the “payroll index”).
And I looked at how well they predict restofseason win percentage, using an ordinary least squares regression:
Coefficient 
Standard err. 
pvalue 

Constant 
0.2242 
0.0121069 
1.08E74 
Win Percent 
0.127259 
0.0295882 
1.72E05 
ThirdOrder WPCT 
0.385595 
0.0245993 
2.09E54 
Games Back 
0.000741 
0.00025036 
0.0031 
TV Market 
9.39E10 
2.78E10 
0.0007 
Salary Index 
0.0159546 
0.00331814 
1.56E06 
“Constant,” also known as the intercept, is the predicted value if all the input values are zero. Pvalue is a test of statistical significance; the common rule of thumb (bear in mind that’s all it is, though) is that values above .05 are not significant. All of our values are statistically significant. This could be the result of overfitting, so we can check three model selection criterion—the Bayesian information criterion, the Akaike information criterion, and the Hannan–Quinn information criterion. None of them improves when the games back, TV market size and salary index terms are omitted. That suggests that although the differences between a regression equation omitting them and one including them are small (the adjusted Rsquared goes from 0.27 to 0.28, and the standard error goes from .0722 to .0717), it’s not a product of overfitting.
What’s the practical use of all of this? One standard deviation’s change of games back results in a change of .005 in predicted restofseason win percentage, one SD change of TV market size means a change of .004, and one SD change of salary index is worth .006. Because TV market size and salary are substantially correlated at .61, the observed results are likely to be somewhat more pronounced than this suggests.
What this seems to tell us is that there is a small but real targeting effect in restofseason wins, akin to the notion of a “selfaware coin.” Teams that are closer to the division leaders are going to perform better in the after sample than teams further behind, given the same expected performance otherwise. Moreover, “largemarket teams” (broadly speaking) are going to do better than smallmarket teams, all else being equal. Here, there’s a bit of a mystery as to what the root cause could be—it could be that such teams have more resources to invest in improving the team midseason, it could be that they have greater financial incentives to do so, it could be that they are protecting a larger previous investment in the club by doubling down. Or it could be some combination of some or all of these causes. (It could even be that a highpayroll .500 team is more likely to be underperforming their true potential, while a lowpayroll .500 team is overperforming theirs.)
In terms of what we do around here, this means that our playoff odds report might be underestimating the restofseason performance of teams close to the division leader (although it suggests that we’re underestimating the restofseason performance of division leaders as well, so it’s possible that the net result is no significant change in playoff odds probability). It also means that our assessments, both now and in the preseason, are slightly underrating largemarket clubs and overrating small market clubs. (The Dodgers’ recent acquisition of Ricky Nolasco and his salary from the Marlins in exchange for some magic beans is an illustration of the sort of thing our naïve model is likely not capturing.) It’s something we’ll look at including in our simulations in the future.
In the larger picture, this is a reminder that MLB teams are not simply random number generators, nor are the players on them. They’re run by and composed of real people who respond to incentives, and they can change what they’re doing in response to results. This doesn’t invalidate the use of tools that treat teams and players as randomnumber generators, mind you—they can and often do produce useful results. But it does suggest that there are other approaches to analyzing baseball that can produce new and surprising conclusions, ones that can deepen our understanding of the game and the people playing it.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
Your description of the specification search implies that you automatically use a thirdorder variable when you forecast. Is that true, or does it just appear that way?