May 19, 2005
"You show the Orioles with a 60% chance of winning the division, and the Yankees not even at 10%. How can this be?"
Yes, I've gotten multiple e-mails just like that so far this season, people wondering how I can justify such long odds in the Orioles' favor so early in the season. You can look up the methodology in articles found here and here from last season. Even though we didn't start running the postseason odds reports until August last year, I haven't changed the program in any way: The settings are determined solely by the way a team has played so far this season, without any knowledge whatsoever of which players are on the team. It doesn't know or care that Yankee pitchers are all a run or more behind their recently established levels of performance, and it doesn't know if Brian Roberts will really hit for a .390 EqA over the entire season. It only knows what the team has done to date.
Given that it has no idea whether a team can be described as overperforming or underperforming, you have to wonder just how good a job it can do with the simulations. To answer that, I wrote a version of the program that would calculate the adjusted standings and expected post-season odds for any date in history, with hearty thanks once more directed to the incomparable work of Retrosheet. The differences between the historical runs and the current season are:
For timing, I had several options, but chose to go with the simple "days since Opening Day". As of Monday, May 16, we were 44 days removed from the first game of the season, back on April 3; I ran the adjusted standings and playoff odds using 45 days. In 2005, teams had played about 36 games apiece in those first 44 days; historically, teams have averaged about 35 games, with lows of zero (mid-season replacement teams in some 19th century leagues) to a high of 47 for several teams in 1944--wartime restrictions led a certain compression of the schedule.
Over all of baseball history, from the start of the National Association in 1871 through last year, nine teams have managed to get a 90% title rate through this point of the season. The strongest lock ever was the 1880 Chicago White Stockings, who won 96.75% of the simulations for 1880. They were 30-3 after 45 days, had a 10.5-game lead on the second-place Providence Grays, and were only playing an 84-game schedule, which goes a long way toward explaining why they were such a lock. They won the league by 15 games, with a 67-17 record; it would take 130 wins to match that percentage over a 162-game schedule.
If you don't think of the 19th century as "real" baseball that should count for this kind of analysis, then the best team ever by these standards is easily within our memories. The 2001 Mariners, who won a record 116 games, were 29-9 at the simulation date and won 95.7% of the simulations. After them, we've got the 1902 Pirates, who kept all their players while so many were jumping to the second-year American League, winning 94.5% of the time. The 1990 Reds, famously thought of as a weak team who upset the vaunted A's in the World Series, were a 93% lock to win the NL West at this early time. There's the 1995 Indians, 91.9%, who won 100 games despite playing in a strike-shortened 144-game season. Perhaps the most famous start to a season belongs to the 1984 Tigers, whose 29-5 record established a 91.7% chance of winning the pennant. The 1905 Giants rode Christy Mathewson to a 91.6% title shot. The 1939 Yankees (91.5%) were rated by as the best baseball team of all time in Rob Neyer and Eddie Epstein's Baseball Dynasties. The 1977 Dodgers got off to a 29-9 start and had a 90.9% chance of winning their division.
All nine of the 90%-plus teams went on to win the league they were playing in, by an average of 16 games. They really were locks.
Let's summarize the rest of the results with a table:
Percentage Total Champs Expected Binomial 90+ 9 9 8.4 .48 80-90 28 25 23.9 .20 70-80 25 17 18.5 .32 60-70 47 28 30.4 .28 50-60 69 28 37.6 .014 40-50 85 39 37.9 .36 30-40 143 45 49.6 .24 20-30 198 51 48.6 .31 10-20 315 56 45.9 .049 5-10 326 28 23.2 .13 1-5 518 11 13.8 .27 0-1 680 2 1.8 .28In the table, percentage is the range for title winners; the current Orioles, with their 61% chance of winning, would have been lumped into the 60-70 range. There were a "Total" of 47 teams who fell in this range; 28 of them were "Champs" in their division, while we would have "Expected" 30.4 of them to be champs if the rates given were perfectly accurate. "Binomial" gives an indication of how different the actual number of champs is from the expected number, given the probabilities and number of teams involved; values closer to zero are less likely to be just random scatter.
If we were to look at this table from a 99th-percentile standpoint, then the entire distribution would be considered on target, a comforting result. At the less restrictive 95th percentile, we'd consider the possibility that teams in the 50-60 range don't win quite as often as they should, while teams in the 10-20 range win a little more often. Considering the limited nature of the input data, the simulator does a remarkably good job of evaluating a team's future, pegging the end-of-season record to within seven games, on average. Overall, it does indeed give you an excellent estimate of each team's title chances.
So, yes, when you see the system saying that the Orioles have a 60% chance of winning their division, you can trust that the odds are sound. However, past results are no guarantee of future success; if anyone should try to exploit a difference between these odds and what you can get in Vegas, you're working on your own.
Top 10 teams with the highest championship rates who didn't win:
Lowest chances for teams that eventually won:
Maybe the reason it took the Red Sox 86 years to win a championship was because it took that long for the city of Boston to balance the karmic scales after beating the odds in 1914, 1915, and 1916.
Teams that most over-performed their expected end of season wins from the simulation: