Let this be a lesson to everyone to use good coding practices, even if the code is just for yourself, and no one else is going to see it. A year later you won’t be able to figure out what you did, and there are so many different ways you can screw it up. That, in short, is what’s happened with the postseason odds.
The first mistake was that the program did not read in simple, raw data, but instead a team rating that first had to be calculated, by hand. How could anything go wrong? The second was problem was that nothing in the input file was labeled, so there was lots of room for more error. The two biggest problems that resulted were that I used DERA, instead of NRA, for the ratings of the various pitchers. The DERA has had the team defense stripped away from it, a fact that didn’t matter to the Phillies (+2 fielding rating, basically zero), but made huge differences for the for the Dodgers (best in the playoffs at +56) and Red Sox (worst at -40). That is a third-of-a-run per game difference between the Dodgers and Phillies that was totally analyzed out of existence. Mistake #2 was a very poor choice for rating team performance against left- and right-handed pitchers.
I wound up stripping the program almost back to the studs. The input file is now nicely labeled and calls for specific, simple information, like “team EQA”. All of the calculation is now done inside, so no chance for misremembering what was supposed to go in and how it was supposed to be combined. The old program started from the team’s expected winning percentage behind each of their different starters, and matched different starters against each other to get the likelihood of which team would win. The new version goes back to runs, something like this:
Game 3, LA vs Philladelphia, expecting Kuroda (for the Dodgers) and Lee to pitch. The Phillies had a team EQA of .276; in a 4.5 rpg environment that works out to 5.22 runs (.276 divided by .260, raised to the 2.5, times 4.50 = 5.22). Home game, so raise by 4% to get 5.43. They’re going against a RHP, and they had a .779 OPS aginst RHP, and .781 overall. Run scoring changes with the ratio of OPS, squared, but we can only count on the starter to be in the game for about six innings (and frequently less). So we’ll have six innings with a run rate of 5.43 * (779/781)^2, and three innings where we’ll use the 5.43 rate, so now we have them at 5.41. Their opponent, Kuroda, carries a 4.82 NRA but, once again, he’s only in the game for six innings. The other three go to the Dodger bullpen, which we’ve rated - by taking the average NRA of the five relievers most likely to be used - at 2.88. The total Dodger team rating with Kuroda becomes 4.17. So we take the Philly run total of 5.41, multiply by 4.17/4.50, to get an estimate of 5.01 runs.
If we do the same math for the Dodgers, we end up with an estimate of 3.89 runs. The win probability for the Phillies is just the Pythagorean percentage from 5.01 runs scored and 3.89 allowed - or .624.
Repeat that for every game played, figure in the changes to the teams’ rotations since the division series ended, and the new odds set looks like this:
So the best team in the AL is playing the 3rd-best team in the AL, and is almost a 3/1 favorite?? Boy, that sounds high to me. Have you guys backtracked this to see if it actually does explain past results accurately?
While I'll admit to a personal expectation that the Angels have better than a puncher's chance, I do wonder if this isn't because Sabathia's being plugged in as a three-game starter. Add in the decision to slot Saunders second...
Help me. I still don't understand the general thinking that having Sabathia start 3 times is some kind of huge advantage. It's not like C.C. "owns" the Angels or has any kind of history of success against them. On the contrary, several Halos (most notably Kendrick and Izturis) have done extraordinarily well facing him. Throw in small sample size solid performances from Abreu, Napoli, Rivera, Figgins (who's actually taken him deep lol), and Hunter, and I don't get what all the fuss is about. Baseball Prospectus is supposed to analyze the numbers, yet for some reason this part of the analysis is left out.
Really, 8.3% before a game's been played? The Yankees are better than the Angels, sure, but THAT much better? Are the odds off for similar reasons that their third-order win percentage is so much lower than their actual win percentage?
8.3% is the odds of the Angels winning the World Series. If you look closely, you'll notice two columns. The first is the ALCS odds, the second is the WS odds.
I throw this out here in the comments as a long-time subscriber: What are BP's plans to invest more in the site's design and data presentation?
I love the chats and roundtables and comments, but...
Thinking about BR's tremendous feature where you can create your own splits, or Fangraph's live win probability charts, I wish BP's site were more user-friendly.
That said, I love the writing and research, so keep up the good work.
In order for the Yankees to have that high of a probability of beating the Angels, we'd have to assume that they are being treated as having a winning percentage something like 0.234 better.
So if the Angels are being treated as a .500/81 win team, the Yankees would be a .734/119 win team. They're good, but they ain't that good (and the Angels aren't that bad).
Granted, home field advantage and platoon advantages/pitching matchups could contribute to a greater disparity than the raw numbers would indicate, but I have a tough time seeing them causing THAT much of a disparity.
FWIW, I do my own odds and I get the Yankees at around 58-60%, dependent on whether Sabathia makes two or three starts.
Is the location/home team advantage of each game factored in as well?
And yeah, commenting/labelling code is a happy thing. I started programming again 3 years ago after a 10-15 year break and I'm kicking myself as I'm now revisiting and revising code I had written. I basically have to spend an hour staring at the screen, almost meditating, to try to figure out what it was I might've been doing.
Even with the fixes, the Angels had only a 36.7% of beating the Red Sox. That's better than the 30% originally slotted to them prior to the fixes, but it's still hard to believe.
We keep hearing that the post season is a 'crapshoot' but these odds seem to discourage that belief. The odds show the Angels as a severe underdog. Is the NL East really that much better? Wow.
Even with the revisions, the odds are ridiculous. You gave the Twins, an 87 win team in a weak division, with two players who helped them win even that many on the DL (Morneau and Slowey), about the same odds of upsetting the Yankees (best team in baseball) as the Angels (3rd best) had of "upsetting" the Red Sox (2nd best).
And then somehow, the Twins have almost the same odds of winning the WS as the Angels and Rockies combined?
Even now, the Angels have only an 8% chance of winning the WS, despite a 26% chance of getting there (which is itself absurdly low). So the Halos have less than a 1/3 chance of winning the WS, despite having homefield advantage against a much weaker league?
Great job Clay, the masses will NEVER think to accuse you of rigging the machine to further put forth the notion that BP hates all things Phillies.