A look at how to avoid allowing biases to influence your projections.
As soon as the baseball season comes to its inevitable and saddening end, baseball, as it does each year, will enter the offseason. For the fantasy baseball community, this means we will be entering ranking and projection season. After following “our players” and players of interest all season, we are now asked to take an all-encompassing look at the league’s baseball players. The result of doing projections periodically, as opposed to continuously, is that we are likely to invite certain biases into our processes, which can negatively impact our results. We will take a look at why we do periodic projections, the biases that come with such a process, how these biases manifest themselves, and some ways to hopefully de-bias our process.
The devil’s advocate in me asks, “if periodic projections causes certain problems, why not do continuous projections?” The short answer is that doing continuous projections is not feasible or desirable for most of us. A computer program could certainly perform continuous projections, but we—as mere people (note: people are awesome)—do not have the ability to continuously adjust our valuations on such a large scale. Sure, each time we watch, read about, or hear about a player, our impression of said player will be altered or reinforced consciously or subconsciously, but that is not what I am getting at. Rather, what I mean is that we cannot watch all players play every one of their plays, and we cannot fully analyze all of what we see or all of the available data. The result of all this humanness is that we can really only fully update our projections on a league-wide basis come decision times; those being the offseason for auctions and drafts, as well as, to some extent, the trade deadline. While we constantly update our valuations for the players we follow, my assumption is that very few people follow every player and those who do probably do not do so diligently enough to properly continuously update each player’s projection.
Why predicting player breakouts is more important than minimizing error.
Last week, the sabermetric community had—well, not an argument, because the participants were generally professional and cordial to one another, but a debate about what we might expect over the rest of the season from a player who is currently enjoying a hot (or cold) streak. It all started with researcher Mitchel Lichtman (better known by his initials, MGL) posting two articles, one on hitters and one on pitchers, that made the case that we should trust the projection systems rather than expect a player’s recent performance to continue. Remember Charlie Blackmon, who was the best player in baseball for three weeks and was smart enough to make those weeks the first three weeks of the 2014 season? He’s a good example. He had never been anything special, nor was he projected for greatness this year. And in retrospect, his hot streak to start the season looks a lot like a small-sample fluke.
After we released the PECOTA Top 100 prospects list last week, a few commenters remarked on PECOTA’s apparent catcher leanings. Eleven of them appeared on the list, some higher than nationally beloved prospects. How dare PECOTA! In comparison, Jason Parks’ top 101 featured eight catchers, suggesting a small discrepancy in the position distribution of PECOTA’s rankings.
The rest of this article is restricted to Baseball Prospectus Subscribers.
Not a subscriber?
Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get access to the best baseball content on the web.
Have we been underrating big-market, high-payroll teams?
A couple of weeks ago, I wrote about the distribution of team wins, and the discovery that the distribution may in fact be bimodal, not normal as one might expect.
One of the predictions that came from this theory was that teams right at .500 would, counterintuitively, tend to regress away from the mean. So one thing we can do is actually check to see if the real world behaves the way we expect it to. I took all teams from 1969 on with even numbers of games and split them into “halves” of even-number games. I use scare-quotes for halves since in order to boost the sample size, I split into increments of two and kept any pair where both “halves” were within 20 games of each other. Then I looked at teams that were exactly .500 in the “before” sample— 716 teams total—and saw what they did afterward:
The teams that have outhit and outpitched their projections, or fallen the farthest short.
We’re approaching the halfway point of the season, though we’re still over a month away from the nominal start of the second half. And that means we’re also approaching the point at which we stop thinking about how we thought the season would play out (except for our probably accidentally accurate predictions, which we treasure forever). According to Colin Wyers, in-season team records become more reliable than pre-season projections around Game 103. Most of us don’t have a particular point of the season at which we entirely abandon pre-season projections—nor should we—but every day we trust what we’ve seen so far a little more and what we expected to see a little less. And eventually, we look back and wonder why we didn’t see certain things coming.
PECOTA has had plenty of successes. The projected team TAvs for the Rangers and Brewers, for example, have been correct to the point, and the projected team ERAs for the Mets and Diamondbacks have been less than 0.02 points off. But while PECOTA deserves a pat on its back for its accurate predictions, there’s much more to say about the surprises. This article is about the lineups and pitching staffs that have defied our expectations so far.
If everyone on the Astros played to their 90th-percentile projections, and everyone on the Angels played to their 10th-percentile projections, which would win more games?
Last year around this time I had plans to compare the Astros’ teamwide PECOTA projections to those of a variety of lower-level squads: the best Triple-A roster, the best Double-A roster, an All-Star High-A team, etc. I didn’t get to it, and then the season started, and I still didn’t get to it, because the Astros started off hot and it would have been weird to have run that piece about a team that was 22-23 in mid-May. I was sort of glad I didn’t run it, because the longer I lived with the idea the more it started to feel mean.
So this year, I have a similar idea, and I’m rushing it out before the guilt kicks in. Again I’m going to be exploring just how bad the worst team in baseball is. Or just how good the worst team in baseball is. That’s the point of it, after all. It’s not to prove that the Astros are as bad as, say, a team of High-A All-Stars. It’s to see if the Astros are as bad as a team of High-A All-Stars, and if they’re significantly better (as I suspect they would have been), then we’ve learned a little something about baseball.
Asking questions about PECOTA's projections, and explaining what the system thinks is in store for Bryce Harper.
When the PECOTA spreadsheet appears, one of the first things people do is pick out the players projected to make the greatest gains or suffer the largest declines. Then the questions start: Why does PECOTA like/dislike so-and-so so much? Is there a problem with the projections? Or is the system just picking up on something I’m not seeing?
Behind the scenes, the BP staff goes through the same thought process. Before we publish the projections, we approach PECOTA’s output with a skeptical eye, on the lookout for anything that could be a bug. But even after we’re satisfied with the spreadsheet and release it to our subscribers, PECOTA retains the capacity to surprise.
PECOTA's projected award winners, bounceback candidates, and betes noires.
If your holiday was anything like most of mine, you’ll want a couple of Tylenol and some Gatorade this morning because you’re feeling the effects of PECOTA Day. Now that we’ve slept it off, it’s time to take a look at some of the highlights of the data as they project the 2013 season.
Team win totals can be found here if you want to use the projection system to forecast the playoff races eight months before the Division Series. But individual performances are easier to assess because they’re not compounding (or more accurately, just adding together) error with the projections.