Baseball Prospectus is looking for a Public Data Services Director. Read the description here.

Let’s check on one of many people’s favorite products, the Postseason Odds report. This report starts with today’s standings, and then uses random numbers to play the rest of the season out a million times. The only data it uses about the team is based on how well the team has played this year–how many runs they have scored and allowed, how many runs they should have scored and allowed based on their other statistics, and who they’ve played. It knows nothing about the individual players on those teams, like whether someone is playing over his head or is struggling early, and consequently makes no assumptions about how the team should perform for the rest of the season. It makes an assumption that all teams will regress towards the mean–that is, towards .500–as the season goes on. The size of that assumption declines over time. Right now, a team playing .600 ball is only going to be counted as about a .540 team for purposes of the simulation; if they’re still a .600 team after 100 games, they will be counted at .580.

As a result, last year’s postseason odds had some strange results–like jumping on the Orioles’ hot start to give them a 75% chance of making the playoffs, odds which many readers disputed at the time. I’ve tried to temper that by providing a second report this year, which you can reach from the regular report. This second one works exactly like the main report, except it uses the projected standings that Nate Silver generated using PECOTA as its baseline for the regression to mean, not .500. Last year, the Orioles were projected to be a .484 team by PECOTA, with the Yankees (.588) and Red Sox (.610) substantially better.

Forty games into last season, on May 20, the regular report looked like this:

           Record    Actual WP   W3Pct    Sim WP   Chance
Orioles     26-14      .650       .635     .570     74.3
Red Sox     23-17      .575       .540     .528     31.3
Yankees     21-20      .512       .551     .534     22.0

If I add PECOTA adjustments to the mix, that will change the simulation winning percentages. Since the Orioles were below .500 in the projection, their simulation winning percentage will be knocked down a little more…because the adjustment is going to drag them towards .484, not .500. Since the Red Sox and Yankees were projected to have over .500 records (way over .500), their simulation percentages are going to get better by a lot–because their target for adjustment is so much higher than .500. In fact, running the numbers, the Orioles go from .570 to .563, the Red Sox go from .528 to .577, and the Yankees go from .534 to .572. And those changes have dramatic effects on the odds:

           Old Odds    New Odds
Orioles     74.3         58.6
Red Sox     31.3         50.2
Yankees     22.0         31.0

That’s a much more even race. It still favors the Orioles, but look at the details. The new projected winning percentages still put the Orioles at near parity with the Yankees and Red Sox–and they are starting with leads of three games over the Sox and 5.5 on the Yanks. The system tends to be very bullish on the team with the lead, especially when it gets to be five games or more or has more than one team ahead of them (the Yankees’ problem in the case above; their playoff odds are diminished by the need to pass more than one team to make the playoffs).

I’ve been asked if you can read any meaning into the difference between the playoff chances on the two reports, and I think the answer is “no.” On the one hand, a team whose PECOTA projection is much better or worse than their current performance will see an appropriate change; their odds will improve if their PECOTA is better than reality, and decline if they are doing worse. However, take a case like the Royals. They are doing about as badly (.358 W3Pct) as their PECOTA projection (.377), but their odds are dramatically worse in PECOTA (.01 in PECOTA-adjusted, .60 in the regular). The regular version mitigates their incompetence with the regression to the mean, while the PECOTA-adjusted version simply confirms it. All you can really say about the difference is that PECOTA thought they were better or worse than .500.

As to the question, “Is that really the right way to do it?” the answer is, “I don’t know.” I don’t have pre-season PECOTA projections for every team in history, although I am working on something that should be roughly similar (less accurate in all likelihood, but much faster and easier to implement over broad chunks of history), and perhaps then I can put it on a better factual basis. Just eyeballing it, it looks like the projection may still be based too tightly on what the team has done to date…at least in the specific case of last year’s Orioles. It does still inject an objective assessment of team strength into the model, and it allows the model to react to changes in team personnel. (Clemens signs with Houston? If you figure he’s worth seven wins over his replacement, then that changes Houston’s estimate from .500–what they were in this year’s PECOTA estimates–to .543; start using that number for them instead.) These kinds of changes haven’t been made yet, but expect to see them implemented during the season.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe