Arizona Diamondbacks Atlanta Braves Baltimore Orioles Boston Red Sox Chicago Cubs Chicago White Sox Cincinnati Reds Cleveland Indians Colorado Rockies Detroit Tigers Houston Astros Kansas City Royals Los Angeles Angels Los Angeles Dodgers Miami Marlins Milwaukee Brewers Minnesota Twins New York Mets New York Yankees Oakland Athletics Philadelphia Phillies Pittsburgh Pirates San Diego Padres San Francisco Giants Seattle Mariners St. Louis Cardinals Tampa Bay Rays Texas Rangers Toronto Blue Jays Washington Nationals
 << Previous Article Prospectus Today: Off ... (09/23) Next Article >> Under The Knife: Open ... (09/24)

September 23, 2004

# Playoff Odds Report, Redux

## Reworking the System

When I rewrote the Playoff Probabilities routine, it is fair to say that I didn't know what I was getting myself into. I thought I was running a fun little toy we had produced before, one which hardly anyone would notice or take very seriously.

It turns out that I was delving into a feature that was widely read, very popular and taken very seriously. My article on the system's methodology generated more response than anything else I've written in ages.

I received many suggested improvements for the routine, some of which have been incorporated into the current system, some of which should be in there by next season (I can't see any reason NOT to run it from Opening Day next year, for everyone who asked that it start earlier in the season), and some of which I considered, looked into and rejected. Some of the changes I've made were trivial to the calculation--a better algorithm for figuring out who won and tied for division championships and wild cards, for instance, is good to have, but doesn't affect the estimates themselves. Among all the suggestions, though, no one noticed what turned out to be the biggest flaw in my original setup.

The most important piece of this Monte Carlo simulation is the winning percentage assigned to each team; everything follows from those. How I set those percentages has evolved in four distinct stages.

Originally, I was using the third-order winning percentage from the adjusted standings report as my estimator. The "W3" is essentially the Pythagorean won-lost percentage, modified by using the estimated runs scored and allowed instead of actual, and further modified by the accounting for the average ability of the team's opponents. Several people challenged me on the use of that as an estimator, arguing that the actual record is better; implicit in this is the idea that teams have a reason for exceeding their Pythagorean record, that it is not simply luck.

Unfortunately, I can't easily make a full study of W3 as an estimator; the data is not readily accessible to me. I do have the data available for the regular Pythagorean record ("W1"), and I think--but don't know for certain--that this serves as a reasonable proxy, so consider yourself warned. I tested whether actual record or Pythagorean record is a better predictor of future record.

I used the Retrosheet game logs to get the records for every team that has played 150 games or more in a season, not quite 1900 teams going into 2004. At intervals of 10 games, I pulled out the team's record to that point in the season, along with how many runs they had scored and allowed. That allowed me to set up some simple regression tests between current actual record and current Pythagorean record as the predictors, and rest of season record (not final record!) as the predictand.

Here are the results.

Actual           Pyth
G    Pyth-Act     R2    A    B     R2    A    B

10   1076-797   .121 .176 .412   .155 .231 .385
20   1060-823   .195 .293 .353   .227 .355 .322
30   1027-862   .247 .385 .307   .278 .452 .273
40   1047-845   .298 .470 .265   .333 .540 .230
50   1028-863   .338 .550 .225   .369 .613 .193
60   1028-866   .373 .607 .196   .402 .673 .163
70   1005-888   .380 .637 .181   .405 .703 .148
80   1007-886   .374 .673 .164   .403 .743 .128
90   1020-872   .388 .710 .144   .410 .772 .113
100   980-913   .394 .759 .121   .408 .813 .094
110   988-906   .383 .800 .101   .397 .858 .071
120   971-924   .357 .822 .089   .366 .877 .061
130   965-930   .310 .853 .074   .320 .912 .044
140   931-963   .233 .844 .078   .237 .897 .052
The second column indicates how many times did the Pythagorean record do better than the actual record at predicting the future record? After 10 games played, the Pythagorean was the better estimator in 1076 cases, and the actual record was better 797 times; a clear win for Pythagoras (who would have loved baseball, by the way). The Pythagorean record turns out to almost always be a better predictor than the actual record, but its advantage steadily declines with every game played, until actual record becomes a better predictor after 140 games. (Different numbers of total games reflect the times where actual and Pythagorean records were identical, almost always a .500 team with R=RA.)

Before conceding the point, take a look at the regression equations for the actual and Pythagorean records. The Pythagorean record always has a better r-squared value (that's the R2 column) than the actual record, even at the 140-game mark. As with the straight binary test, the advantage declines with increasing games played. I think it would be reasonable to conclude that anyone interested in handicapping the playoffs should be using actual record rather than Pythagorean record, and perhaps the former should also be the choice for the last two weeks or so of the season.

There is another item to take home from the regression listings, and that is in the A and B components (in a regression, y=Ax+B). If record to date was the likely record for the remainder of the season, as I assumed in my initial report, then A would be equal to 1.0 and B would be equal to zero. I totally neglected regression to the mean, and nobody noticed (or at least, no one told me they'd noticed); the most likely rest-of-season record for a team playing .600 ball after 100 games is not .600, but something like .576. As you can see, as games go up, the A component gets closer to 1 and the B component closer to 0; but the A component for the Pythagorean is always higher than that of the actual. The Pythagorean record is a more conservative estimator than actual record; some of the regression to the mean needed for the actual record is built in to the Pythagorean.

I don't think the difference between the two makes it worthwhile to switch between actual and Pythagorean records during the season; Pythagorean record is clearly superior for most of the season, and is not clearly inferior at any point, so I am going to retain the Pythagorean values as the primary estimator. However, the big change here from the original model is that I'll use the regression equations to get a regressed-to-mean W3, not W3 itself, as the primary estimate.

A second major change was the realization, which dawned slowly and only after several people tried repeatedly to convince me of it, that even these regressed values were estimates, not a hard fact about the team's future performance. Of course I knew that, but my initial take was that the Monte Carlo simulation itself would supply sufficient variation around the estimate. After more correspondence with people who actually use Monte Carlo simulations as a regular part of their professional career, and more reading on my part, I no longer believe that. There is a real need to recognize, up front, that while I am calling Boston a .550 team, they may in fact be a .650 team, or a .450 team; they may even be a .999 team that has gotten incredibly unlucky, although the odds against that are staggering. The simulation will work better if instead of using the same estimate for Boston's winning percentage in every run, I let the estimate vary.

How much it should vary is answered, in part, by returning to those regression equations. The standard errors for the estimates were never zero; they varied between .075 and .130, depending on how many games had been played. As a very crude (but simple and easily programmed) solution, I added a random number between -.100 and +.100 (.100 being roughly the standard error) to the team's winning percentage on every iteration, and it made a big difference. Everything pushed a little farther away from the endpoints, zero and one, and a little closer to .5; the certainties were not nearly so certain anymore. But this was, like I said, a crude solution, and after a little more mathematical effort I've replaced it with a system that replicates a Gaussian (normal) distribution around the primary estimate, with a standard deviation of .100; getting the SD to vary with games played is next on the to-do list for this routine, but it isn't there quite yet.

So let me summarize: suppose Boston's W3 after 120 games was .560. In the first version of the playoff odds, I would have used .560 as Boston's estimated winning percentage for the rest of the season. In the current version, I take the .560 and correct for regression to the mean, and get .549 as their new base estimate. Before every one of the million iterations of the season, I sample a normal distribution around .549 to get their estimated record for that iteration, capping them between .250 and .750. I have, essentially, done a Monte Carlo simulation to replicate the whole range of outcomes from the regression equations to get inputs for the Monte Carlo simulation of the rest of the season.

My crude estimator was on the right track, but still underestimated the impact of spreading the initial values, since I've replaced a system that has an entire range of +/- .100 with one that has a standard deviation of .100, meaning that only 67%, not all, of the points will be between +/- .100. I'll admit that that sounds high, but that is what the data tells us. Let's go back to July 1, and look at how the different versions of the model would have assessed team's chances of making the playoffs:

Playoff Probabilities, July 1, 2004

Original    Regress      Crude    Gauss
Yankees        98.4       97.5       89.2     78.1
Red Sox        50.1       47.4       45.4     43.6

Twins           5.2       11.0       18.9     22.7
White Sox      91.6       84.0       68.3     58.6
Indians         3.0        4.9       13.0     17.8
Tigers          5.4        8.1       16.0     20.7

A's            52.1       52.5       49.4     46.4
Angels         18.1       21.4       28.3     30.1
Rangers        75.1       71.2       62.1     56.3

Braves          9.3       11.0       16.8     19.2
Marlins        28.9       31.6       30.5     31.6
Phillies       52.5       48.9       42.5     38.8
Mets           12.3       13.1       18.5     22.2

Cardinals      91.7       87.4       72.4     61.3
Cubs           57.5       50.6       44.2     39.7
Astros         22.5       22.9       25.6     27.1

Dodgers        17.7       20.1       25.7     29.6
Giants         54.4       55.7       50.4     46.0