CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here for forgotten password Click here to subscribe

World Series time! Enjoy Premium-level access to most features through the end of the Series!

<< Previous Article
Premium Article Team Health Reports: O... (03/24)
Next Article >>
Premium Article Under The Knife: Sprin... (03/24)

March 24, 2005

2005--Setting the Stage

Randomness in Team Standings Predictions

by Keith Woolner

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

Subscribe for $4.95 per month
Recurring subscription - cancel anytime.


a 33% savings over the monthly price!

Purchase a $39.95 gift subscription
a 33% savings over the monthly price!

Already a subscriber? Click here and use the blue login bar to log in.

Spring training is in full bloom, and besides the furtive fantasy league planning going on in offices throughout the country (should the GDP's seasonal adjustments take this into account?), it is also the time for predictions to be made about the upcoming season. Most baseball coverage includes player or team projections of some kind, and Baseball Prospectus is no exception.

However, I've often wondered just how accurate even the best predictions can be. We've written about predicting player performance before, so I am going to focus on predictions at the team level, namely the order of finish within a division.

An interesting aspect of the question is whether even perfect predictions of team quality can result in reliable predictions of standings. Is a 162-game season sufficiently long enough for competitive teams to differentiate themselves conclusively?

One way to approach this is to use a past season's actual results to create a "perfect" predictor of a team's ability to win a game over a span of time, and simulate a season's worth of games to see if you recreate the actual observed standings. For example, knowing that a team won 55% of its games in a season (89 wins out of 162 games), simulate a season in which they have a 55% chance of winning each game. This is the probability that maximizes the chance of observing 89 wins in the simulated season.

Repeating this process for each team in turn creates a full season of predictions, each tuned to maximize the chance that the simulated outcome will match the known outcome. This represents the best that a perfect estimate of each's team's ability to win games over the course of a season can do, with randomness over 162 games accounting for the only source of discrepancy in this artificially-constrained experiment. And this is exactly the approach I took to the question.

I simulated 1000 seasons. For each team, I used their actual 2004 winning percentage as the probability of winning a game, and used a normal distribution as an approximation for the expected number of wins for each team over 162 games.

A side note to the statistically-minded: though winning and losing is really more of a binomial probability distribution, for a large enough number of trials, the distribution is approximately normal. Usually, [# trials] * [probability] and [# trials] * (1- [probability]) both have to be greater than 5 to be a "good enough" approximation. In baseball terms, this means that we need to be able to expect at least 5 wins and 5 losses over whatever number of games we're looking at. With 162 games (or trials) over a season, both conditions are easily met whether we're talking about the 2001 Mariners or the 2003 Tigers.

For example, Atlanta won 96 games and lost 66, for a .593 winning percentage. Using a 59.3% chance of winning each game over 162 games, I approximated the chance of winning a certain number of games by a normal distribution with a mean of 96 wins, and a standard deviation of SQRT(162*59.3%*(1-59.3%)) = 6.25 wins.

This approach does not guarantee that all teams end up with the proper aggregate number of wins and losses, since it treats all teams as independent but it is still a useful model. Over the course of all the simulated seasons, each team will fluctuate in the number of wins, sometimes being higher than expected, sometimes lower, but should, in the long run, average out to their "true" level of ability, which is fixed at their observed 2004 level in this model. Here are the aggregate results for each team.

ActW = Actual Wins in 2004
AvgW = Average number of wins across 1000 simulated seasons
MinW = Minimum number of wins in any simulated season
MaxW = Maximum number of wins in any simulated season
StdDev = Standard Deviation in the number of wins per season


TEAM  ActW      AvgW  MinW     MaxW   StdDev
MIN     92      92.3    72      111     6.43
CHA     83      82.8    62      104     6.31
CLE     80      79.6    60      98      6.25
DET     72      72.1    51      92      6.26
KCA     58      57.8    38      77      6.13

NYA     101     100.9   80      120     6.20
BOS     98      97.5    80      114     6.04
BAL     78      78.1    56      102     6.24
TBA     70      70.6    43      88      6.19
TOR     67      67.4    43      87      6.23

ANA     92      91.8    71      116     6.45
OAK     91      90.8    70      114     6.26
TEX     89      89.0    71      109     6.09
SEA     63      63.2    44      82      6.07

SLN     105     105.0   84      124     5.96
HOU     92      92.1    69      115     6.28
CHN     89      89.0    69      106     6.44
CIN     76      76.4    55      96      6.52
PIT     72      72.7    51      95      6.23
MIL     67      67.5    45      88      6.44

ATL     96      96.4    75      118     6.49
PHI     86      85.9    65      107     6.35
FLO     83      83.1    63      106     6.12
NYN     71      70.7    50      89      6.22
MON     67      66.9    45      86      6.45

LAN     93      93.4    71      113     6.33
SFN     91      91.2    72      111     6.55
SDN     87      86.6    65      108     6.57
COL     68      68.2    46      91      6.20
ARI     51      51.1    33      73      6.28

The individual teams seem to be showing the expected amount of variation in simulated wins. The next step is to determine the division standings based on these simulated results. Recall that the model parameters have been chosen so as to be the most likely to reproduce the actual 2004 results. For each simulated season, I looked at the standings in each division to see if the simulation resulted in the actual standings observed in 2004.

There are other measures we could have used other than "completely accurate prediction of division standings". We could have counted the number of teams who were correctly placed. Or we could have computed the average error in the number of wins predicted for each team. Each of which have their merits. But the most basic question one might ask is "did the prediction match the eventual outcome?" Since the order of finish is typically of more interest than the size of the gaps between teams (a first place team who wins by 20 games gains no or minimal advantage compared to a team who wins by 2 games), we'll use just that to evaluate the results.

Match = Percentage simulated standings matched actual 2004 standings



Division    Match

AL East     25.0%
AL Central  27.8%
AL West     20.8%

NL East     23.5%
NL Central  18.0%
NL West     26.9%


Standings in six divisions predicted correctly: 0.1% (1 out of 1000)

The NL Central had the lowest match rate. This is to be expected, because it has the most teams (six) of any division. The number of possible permutations of order of finish, 720, is six times larger than in a five team division, with just 120 possible orderings.

Given the combinatorial realities, it is somewhat surprising that the AL West, with just four teams and just 24 possible orderings, did not match more often. However, another factor is at work here. The AL West had three teams finish within 3 games of one another (the Angels with 92 wins, the Athletics with 91 wins, and the Rangers with 89 wins). Teams that are predicted to be close in ability, say two teams expected to win 85 and 86 games respectively, will obviously be more likely to confound the prediction. It is the combined impact of the number of teams, and how well they are separated in ability that determines the effect of randomness over a season on the likelihood of a matching prediction.

The other 5-team divisions average out to about a 25% match rate. While this is about 30 times better than a completely random guess at the standings (which has a 1-in-120 chance, or about 0.8% of being correct), it's still much lower than the casual fan might have expected.

The moral of the story is that even 162 games is still a fairly small sample when trying to separate competitive teams from one another. Teams usually fall in a narrow band of ability -- between 60 and 100 wins, rather than the full theoretical range of 0 to 162 wins. Even assuming perfect assessments of a team's ability to win games over the course of a season, as we did in this model, the actual differences between teams are often too small to overcome the noise of a single season's worth of games.

So as we head into the 2005 season, keep in mind that of everyone putting out their projections now, there is surely someone who will be crowing about their successes at the end of the year about the accuracy of their predictions (and, yes, we sure hope that it's Baseball Prospectus doing the crowing.) There is a certain amount of analysis that helps inform good predictions, but beyond a certain level, whether the results favor one well-formed prediction or another is largely the luck of the draw. The most useful part of a prediction may not be the actual numbers themselves, but the quality of the thought process that generates them. Great predictions are byproducts of carefully considered analysis. Read *how* the predictions were generated, and what insights and assumptions they rest on, rather than just *what* the predictions are, and be a better-informed fan because of it.

Keith Woolner is an author of Baseball Prospectus. 
Click here to see Keith's other articles. You can contact Keith by clicking here

Related Content:  The Who,  Predictions,  Perfect Games,  The Process,  Wins

0 comments have been left for this article.

<< Previous Article
Premium Article Team Health Reports: O... (03/24)
Next Article >>
Premium Article Under The Knife: Sprin... (03/24)

RECENTLY AT BASEBALL PROSPECTUS
Minor League Update: Games of Thursday, Octo...
Pebble Hunting: An Illustrated Guide to the ...
Raising Aces: Ghosts of World Series Past
Playoff Prospectus: PECOTA Odds and Game 3 P...
Playoff Prospectus: A Decade of Planning an ...
Playoff Prospectus: Never-Wrong Ned?
Playoff Prospectus: PECOTA Odds and Game Fou...

MORE FROM MARCH 24, 2005
Five Players to Watch in 2005
Premium Article Under The Knife: Spring Training Updates
Premium Article Team Health Reports: Oakland Athletics
Premium Article Crooked Numbers: Whiff or Whiff-Out You
Prospectus Triple Play: Angels, Chicago Cubs...

MORE BY KEITH WOOLNER
2006-03-02 - Premium Article Aim For The Head: Mailbag: Outcomes and Outr...
2006-02-02 - Premium Article Aim For The Head: Five More Reasons to Hate ...
2006-01-24 - Premium Article Aim For The Head: Three True Outcomes, 2005
2005-03-24 - Premium Article 2005--Setting the Stage
2005-03-04 - Premium Article Aim For The Head: Three True Outcomes, 2004
2004-12-02 - Casey's Random Batting Trial
2004-10-08 - Aim For The Head: Rookies, RBI and Revamped ...
More...


INCOMING ARTICLE LINKS
2012-04-27 - The BP Wayback Machine: Royal Flush
2005-08-18 - Premium Article Crooked Numbers: Royal Flush
2005-04-21 - Premium Article Crooked Numbers: April Fools