Can the uncertainty in a player's projections be projected?
There are two important aspects of prediction. The first concerns the accuracy of the prediction—that is, how close a prediction is to the actual, observed result. The second is uncertainty, which is how sure a forecaster is about his or her projection. These issues are fundamental forecasting concepts, and similarly apply to predictions of the weather, the stock market, or the outcome of tomorrow’s ballgame. At present, only one of these facets of a prediction gets much attention in the world of baseball projections, and that is accuracy. Accuracy is measured by the absolute error, which defines how close, on average, a forecast is to the actual, observed result. Projectionists struggle primarily to minimize this number.
The under-examined facet of prediction that we will address in this article is the uncertainty. Whereas we know that predictions tend to be accurate to within a hundred or so points of OPS, we would also like to know whether we are more or less likely to be wrong on certain players. The uncertainty is often treated as a second-order concern because it is usually more difficult to estimate. However, as we show, it is possible to predict ahead of time which players’ forecasts are more uncertain than others. This concept is important because certain teams may prefer high versus low-risk players—a team with high win expectations (90+ wins) might prefer to reduce risk, whereas a middle-of-the-road team (80-85 wins) would presumably seek risk in order to “get lucky” and reach the postseason.
The rest of this article is restricted to Baseball Prospectus Subscribers.
Not a subscriber?
Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get access to the best baseball content on the web.
A look at how to avoid allowing biases to influence your projections.
As soon as the baseball season comes to its inevitable and saddening end, baseball, as it does each year, will enter the offseason. For the fantasy baseball community, this means we will be entering ranking and projection season. After following “our players” and players of interest all season, we are now asked to take an all-encompassing look at the league’s baseball players. The result of doing projections periodically, as opposed to continuously, is that we are likely to invite certain biases into our processes, which can negatively impact our results. We will take a look at why we do periodic projections, the biases that come with such a process, how these biases manifest themselves, and some ways to hopefully de-bias our process.
The devil’s advocate in me asks, “if periodic projections causes certain problems, why not do continuous projections?” The short answer is that doing continuous projections is not feasible or desirable for most of us. A computer program could certainly perform continuous projections, but we—as mere people (note: people are awesome)—do not have the ability to continuously adjust our valuations on such a large scale. Sure, each time we watch, read about, or hear about a player, our impression of said player will be altered or reinforced consciously or subconsciously, but that is not what I am getting at. Rather, what I mean is that we cannot watch all players play every one of their plays, and we cannot fully analyze all of what we see or all of the available data. The result of all this humanness is that we can really only fully update our projections on a league-wide basis come decision times; those being the offseason for auctions and drafts, as well as, to some extent, the trade deadline. While we constantly update our valuations for the players we follow, my assumption is that very few people follow every player and those who do probably do not do so diligently enough to properly continuously update each player’s projection.
Why predicting player breakouts is more important than minimizing error.
Last week, the sabermetric community had—well, not an argument, because the participants were generally professional and cordial to one another, but a debate about what we might expect over the rest of the season from a player who is currently enjoying a hot (or cold) streak. It all started with researcher Mitchel Lichtman (better known by his initials, MGL) posting two articles, one on hitters and one on pitchers, that made the case that we should trust the projection systems rather than expect a player’s recent performance to continue. Remember Charlie Blackmon, who was the best player in baseball for three weeks and was smart enough to make those weeks the first three weeks of the 2014 season? He’s a good example. He had never been anything special, nor was he projected for greatness this year. And in retrospect, his hot streak to start the season looks a lot like a small-sample fluke.
After we released the PECOTA Top 100 prospects list last week, a few commenters remarked on PECOTA’s apparent catcher leanings. Eleven of them appeared on the list, some higher than nationally beloved prospects. How dare PECOTA! In comparison, Jason Parks’ top 101 featured eight catchers, suggesting a small discrepancy in the position distribution of PECOTA’s rankings.
Have we been underrating big-market, high-payroll teams?
A couple of weeks ago, I wrote about the distribution of team wins, and the discovery that the distribution may in fact be bimodal, not normal as one might expect.
One of the predictions that came from this theory was that teams right at .500 would, counterintuitively, tend to regress away from the mean. So one thing we can do is actually check to see if the real world behaves the way we expect it to. I took all teams from 1969 on with even numbers of games and split them into “halves” of even-number games. I use scare-quotes for halves since in order to boost the sample size, I split into increments of two and kept any pair where both “halves” were within 20 games of each other. Then I looked at teams that were exactly .500 in the “before” sample— 716 teams total—and saw what they did afterward:
The teams that have outhit and outpitched their projections, or fallen the farthest short.
We’re approaching the halfway point of the season, though we’re still over a month away from the nominal start of the second half. And that means we’re also approaching the point at which we stop thinking about how we thought the season would play out (except for our probably accidentally accurate predictions, which we treasure forever). According to Colin Wyers, in-season team records become more reliable than pre-season projections around Game 103. Most of us don’t have a particular point of the season at which we entirely abandon pre-season projections—nor should we—but every day we trust what we’ve seen so far a little more and what we expected to see a little less. And eventually, we look back and wonder why we didn’t see certain things coming.
PECOTA has had plenty of successes. The projected team TAvs for the Rangers and Brewers, for example, have been correct to the point, and the projected team ERAs for the Mets and Diamondbacks have been less than 0.02 points off. But while PECOTA deserves a pat on its back for its accurate predictions, there’s much more to say about the surprises. This article is about the lineups and pitching staffs that have defied our expectations so far.
If everyone on the Astros played to their 90th-percentile projections, and everyone on the Angels played to their 10th-percentile projections, which would win more games?
Last year around this time I had plans to compare the Astros’ teamwide PECOTA projections to those of a variety of lower-level squads: the best Triple-A roster, the best Double-A roster, an All-Star High-A team, etc. I didn’t get to it, and then the season started, and I still didn’t get to it, because the Astros started off hot and it would have been weird to have run that piece about a team that was 22-23 in mid-May. I was sort of glad I didn’t run it, because the longer I lived with the idea the more it started to feel mean.
So this year, I have a similar idea, and I’m rushing it out before the guilt kicks in. Again I’m going to be exploring just how bad the worst team in baseball is. Or just how good the worst team in baseball is. That’s the point of it, after all. It’s not to prove that the Astros are as bad as, say, a team of High-A All-Stars. It’s to see if the Astros are as bad as a team of High-A All-Stars, and if they’re significantly better (as I suspect they would have been), then we’ve learned a little something about baseball.