March 30, 2014
Beware of Bias in Predicted Team Win Totals
Sam Miller and I recently interviewed 28 Baseball Prospectus 2014 authors as part of the Effectively Wild season preview podcast series. At the end of each episode, we asked our guest to predict the 2014 win total for the team we’d just talked about. Listener Jeffrey A. Friedman sent us the following unsolicited submission about bias in these predicted win totals, which we decided to publish with his permission. Beware of bias in your own predictions! —Ben Lindbergh
I enjoyed listening to Effectively Wild’s preseason team previews, and I like that you asked guests to predict team wins numerically. This provides us with an opportunity to examine forecasting patterns, and in particular, to analyze how and why your guests were consistently optimistic about their clubs. On average, your guests predicted that their teams will win 84 games this year. Only four of your 30 guests predicted that their teams would underperform their PECOTA projections. You can view all 30 predicted win totals here.
I find this interesting, because while we expect fans to overrate their teams, it’s not clear why your well-informed guests would do this so consistently as well. They spent 20 minutes discussing their team’s strengths and weaknesses in detail and showed little evidence of bias in doing so. But when the time came to make their predictions, they were consistently overoptimistic. Regardless of how much you believe in PECOTA as a baseline, it’s just not possible for the average team to win 84 games.
Why were your guests so optimistic? Though we only have 30 data points to work with, a few patterns appear.
For example, your guests were much more likely to overrate good teams. If a team finished below .500 in 2013, your guests were only slightly overoptimistic about them for the coming season, predicting that they would be only 0.9 wins better than their PECOTA projections, on average. By contrast, your guests predicted that teams who finished above .500 in 2013 would beat PECOTA by an average of 4.8 wins this year.
This could be because it's easier to envision better teams overachieving, and thus easier to engage in wishful thinking about those teams. Another explanation I find more plausible is that people find it difficult to predict regression from one season to the next. Things like injury risk, aging, and BABIP are the kinds of abstract, probabilistic factors that PECOTA handles much better than individuals do. I suspect that many of your guests were simply not giving these factors enough weight.
Consistent with this idea, only eight of your 30 guests predicted that their teams would win fewer games than they did in 2013. Moreover, of the four guests who thought their teams would underperform PECOTA this year, two (Adam Sobsey/Blue Jays and Mike Curto/Mariners) were specifically concerned with what they saw as above-average injury risks, and a third (Ken Funck/White Sox) thought that his team might hold a firesale. So when your guests had firm reasons to expect that key players would be taken out of commission, they did work this into their forecasts. I suspect your guests (like many analysts and fans) were simply much less sensitive to the kinds of randomly-occurring, low-probability injuries and regressions that are widespread and important, yet much harder to envision due to the fact that they are unpredictable.
Another interesting pattern that emerged was what psychologists might call a “priming effect,” which I noticed whenever you or your guests mentioned a team’s PECOTA projections around the time you asked them to make a forecast. When the PECOTA projection was mentioned, guests predicted that their teams would outperform it by an average of 1.4 wins, but when PECOTA wasn’t mentioned, guests predicted their teams would beat it by an average of 3.9 wins. Since these instances of priming weren’t randomly assigned, we can’t draw too much of a conclusion here. But it is interesting to note that priming correlates with much more reasonable projections.
Two final notes. First, I checked whether your guests were more likely to overrate teams that PECOTA projected to be better on offense versus defense. Interestingly, the coefficients on these variables were nearly identical. (I had expected that your guests would find it easier to envision a club overachieving if it had a great offense). I think this lends support to the idea that your guests are fairly objective and unbiased on the whole, but that they find it difficult to foresee random regression across the board.
Last, I suspect that there is another bias going on here that I did not know how to test. A lot of your discussions about each team revolved around new players coming up to the big leagues, young players improving, new coaches making an impact, or any number of “little things” that make teams better from one year to the next. The problem from a forecasting standpoint is that almost all teams benefit from marginal improvements like these. In order to factor these things into your forecast, you’d really need to say that your team is making more marginal improvements than the average team. But since your guests know most about the teams that they’re discussing, they are not necessarily in a great position to say that. Thus, they may have a tendency to give their teams too much of a boost based on the “little things” that they’re not in a position to observe elsewhere.
 It's actually more than that. Adam got off the hook without making a numerical prediction about Toronto. But he said they would come in last, so I gave them an estimated win total of 75, which is what PECOTA says it will take to come in last in the AL East. If you take this synthetic prediction out, your guests predict their teams will outperform PECOTA's projections by an average of 3.2 wins.
 I used the PECOTA projections released on February 4, just before the preseason podcasts started.
 An additional factor supporting this argument is that your guests’ overoptimism was totally uncorrelated (correlation=-.01) with a team’s projected wins for this year. So there is a difference between saying that your guests overrate good teams and saying that they overrate teams that were good last year. The latter is what we find in these data, which is consistent with the idea that your guests are not expecting enough regression.
 A two-way t-test indicates there is a 90 percent chance that this pattern is not random.