keyboard_arrow_uptop
Image credit: USA Today Sports

“It’s tough to make predictions, especially about the future.” —Yogi Berra

Everybody makes projections at the beginning of the season. We do, FanGraphs does, ESPN does, Sports Illustrated does. Many others. Maybe you, too. It’s kind of a silly exercise, in that it has no bearing on what happens in the coming season. A projection of which teams are going to win the six divisions doesn’t convey information the way that an analysis of a pitcher’s repertoire, or a hitter’s platoon differential, or a manager’s bullpen strategy does. But they’re popular and people like them, so we do them.

They have a short shelf life, though. Generally speaking, people don’t look at projections much once they’re made. For player projections, that’s a mistake. Projections from a good model, like our PECOTA, are a better predictor of both hitters’ and pitchers’ rest-of-year performance than their actual statistics until pretty late in the season. (Click on the links, or re-read the sentence, if you’re not familiar with this concept; it’s counterintuitive as all get-out.)

However, projections of team win-loss records … well, who really cares about them once they’re made? We know that the Twins and Diamondbacks did surprisingly well, and the Mets and Giants did surprisingly poorly. We don’t need preseason predictions to quantify that.

But rather than ignore them, let’s look back on preseason projections. Who did well? Who didn’t? Not that it means much of anything, unless you’re gambling on baseball, but just the same, it’s something.

Fortunately, the heaviest lifting has already been done for us. For each of the last three seasons, Darius Austin of Banished to the Pen—a baseball blog started by fans of the Effectively Wild podcast—has evaluated preseason projections of teams’ records. He uses seven sources:

  • Baseball Prospectus’ PECOTA
  • FanGraphs’ projections based on a mix of the Steamer and ZiPS systems
  • Clay Davenport’s projection system
  • Banished to the Pen writers
  • Guests on Effectively Wild
  • A composite of the above five
  • Adjustments to PECOTA based on a poll of Effectively Wild listeners

Now, a big caveat: BP, FanGraphs, and Davenport all hew to the discipline of having wins and losses match up. That is, our projections for the 30 teams will sum to 2,430 wins and 2,430 losses (subject to rounding). The other systems don’t, and are often subject to doses of optimism. Austin addressed this issue when he listed each of the projected win totals last March.

To analyze the systems, Austin calculated both mean absolute error (MAE)—the difference between projected and actual wins—and root mean squared error (RMSE)—the square root of the average of the squares of the differences. The key difference between the error measures is that RMSE, by squaring the difference, penalizes systems more for large errors (thanks, Giants) than does MAE.

Austin does a nice job of explaining all of this and illustrating the results—in sortable tables no less—in this link. If you don’t feel like heading over there, though, here’s the conclusion: The top three prediction systems by MAE were FanGraphs, PECOTA, and the composite, in that order. The top three by RMSE were PECOTA, FanGraphs, and Davenport, in that order.

Now, we’re not going to take too much a victory lap, because PECOTA didn’t have a good year in 2016. But we made some adjustments, and for 2017 at least, they seem to have served us well. We led the pack in RMSE, which is generally considered to be the better measure of error.

We’ll take that, particularly given the grief we took when PECOTA came out, primarily for:

  • Projecting 80 wins for a Twins team that lost 103 games in 2016
  • Predicting that Dodgers would have eight more wins than the defending champion Cubs—the Cubs!
  • Illustrating our anti-Missouri bias by projecting 76 wins for the Cardinals and 71 wins for the Royals

Of course, the Twins won 85, the Dodgers won 12 more than the Cubs, and while the Cardinals and Royals won more than we projected (83 and 80 wins, respectively), they were each third in their divisions.

Everyone on staff at BP was invited to make preseason predictions as well. We had 62 writers, statisticians, and editors answer the bell. We didn’t have to project win totals, only list predicted order of finish. That kept things pretty straightforward.

I used MAE to evaluate our projections. For example, here’s how I did. I’ll list teams in the order I had them before the season:

  • AL West: Houston (off by 0), Seattle (1), Texas (1), Los Angeles (2), Oakland (0)
  • AL Central: Cleveland (0), Detroit (3), Kansas City (0), Minnesota (2), Chicago (1)
  • AL East: Boston (0), Toronto (2), New York (1), Tampa Bay (1), Baltimore (0)
  • NL West: Los Angeles (0), San Francisco (3), Arizona (1), Colorado (1), San Diego (1)
  • NL Central: Chicago (0), St. Louis (1), Pittsburgh (1), Milwaukee (2), Cincinnati (0)
  • NL East: Washington (0), New York (2), Philadelphia (2), Miami (2), Atlanta (2)

That’s a total of 32, which was pretty middling.

Here’s how everyone on staff did, ranked by smallest error to largest error:

Error Staff
26 Mark Barry, Matthew Trueblood
28 Demetrius Bell, Rob McQuown, Meg Rowley, Scott Simon
30 Tim Finnegan, Larry Brozdowski, Aaron Gleeman, Kendall Guillemette, Scooter Hotz, Andrew Mearns, Erich Rothmann, Darin Watson, Rian Watt
32 Martin Alonso, David Brown, Mike Gianella, Stacey Gotsulias, David Lesky, Rob Mains, Eric Garcia McKinley, Kate Morrison, Bret Sayre, Ryan Schultz, Greg Wellemeyer, Collin Whitchurch, Austin Yamada, Kazuto Yamazaki
34 Emma Baccellieri, Brett Cowett, Zach Crizer, Patrick Dubuque, Jeff Euston, Derek Florko, Nathan Graham, Nate Greabe, Bryan Grosnick, Aidan Kearns, Jeff Long, Tommy Meyers, Eric Roseberry, Seth Victor
36 Mark Anderson, Alex Chamberlain, Tim Collins, Brian Duricy, Michael Engel, Ken Funck, Joshua Howsam, Stephen Reichert, Hunter Samuels, Clinton Scoles, Jared Terry, Jared Wyllys
38 Colin Anderle, Craig Goldstein, Matt Sussman, Ashley Varela
40 Craig Brown, Jake Devereaux
44 Nicholas Zettel

Mark Barry got all six division champions right and correctly identified the Rockies and Yankees as Wild Cards, but selected Cleveland as World Series winner. Matthew Trueblood also got the divisions and had the Diamondbacks as a Wild Card, but predicted a Cubs championship.

Other tidbits:

  • Bell, Mains, Sayre, Yamazaki, and Zettel had Houston as world champions.
  • Nobody picked Aaron Judge or Cody Bellinger to win Rookie of the Year, but three staffers had Bellinger in their top three, and 17 selected Judge in their top three.
  • Cy Young award winners Corey Kluber and Max Scherzer were in the top three for 34 and 31 BP staffers, respectively. Only Sussman and Trueblood selected Scherzer as the National League winner. Brown, Gleeman, Hotz, Meyers, and Schultz tabbed Kluber.
  • While 12 staffers had Jose Altuve in the top three for AL MVP, only Rothmann, Simon, and Zettel identified him as the winner. Only Engel and Meyers had Giancarlo Stanton on their MVP ballots, and only in third place.
You need to be logged in to comment. Login or Subscribe
mattrnm
11/20
Looks like that Yogi Berra quote was not an actual Yogi Berra quote. https://quoteinvestigator.com/2013/10/20/no-predict/
Rob Mains
11/20
Yeah, looking up the provenance of that was pretty amusing. There were four or five variations, all attributed to him, as well as the suggestion that he never said it. I suspect that the latter is true of much of what he allegedly said.