March 28, 2013
PECOTA Percentiles Are Here
PECOTA percentiles are now available to subscribers.
Those of you new to BP, or to PECOTA, might wonder why we publish percentiles in addition to the weighted-mean projections for players, which we’ve already released. The answer is that forecasting is an inexact science; the future is not exactly what you'd call certain. The percentiles allow us to put a range of outcomes around a single-point forecast, to illustrate how uncertain the forecast is and what range of outcomes are most likely.
The percentiles, then, represent the spread of outcomes if we were to have a player go through the 2013 season thousands upon thousands of times. Imagine a bell curve, with the 50th percentile at the very peak. Twenty percent of the time, a player's results should fall in between the 40th and 60th percentiles—or 60 percent of the time, a player should perform at his 60th percentile or worse, while 40 percent of the time, he should play better.
As an example, let’s take a look at Giancarlo Stanton’s percentile forecasts (click to enlarge):
Our best estimate is that Stanton will be about a five-win player, with a .314 TAv. If he plays to his 90th-percentile projection, though, he could post a .345 TAv and be about as valuable as NL MVP Buster Posey was last season. And if he disappoints to the tune of his 10th-percentile projection—well, he’d still be a pretty useful player. Giancarlo Stanton is really good at baseball.
You can find the percentiles in the “2013 Forecast” section of the player cards (not the box at the top, with the basic projections—scroll down, or select the “PECOTA ONLY” tab, and you’ll see it).
A few more notes might be helpful here. The basic inputs to the percentiles are:
Percentiles for batters cover offense (not fielding or baserunning, except as a function of playing time and opportunities). The percentiles key off the primary rate stat for each type of player, TAv for hitters and ERA for pitchers. The component stats are meant to illustrate a likely set of stats that could lead to that level of production for that player. What this means is that a hitter’s percentiles in home runs, for instance, reflect the home runs that would lead to that TAv, assuming similar changes in the other stats in a hitter’s batting line, not the chance of hitting that many home runs. There are many different batting lines that can lead to any one TAv.
We’ve tested the percentiles against historical data, and we can report that they behave how you’d expect—80 percent of batters fall between their 10th- and 90th-percentile forecasts for TAv, for instance.
In the past, we’ve forecasted a linear fit of ERA and RA for pitchers based on expected batting against. There’s a lot of variance in ERA that isn’t captured by batting-against stats, though, particularly performance with men on base. We’ve back-tested against historical data and added some extra variation to ERA and RA to account for this.
We’ve also integrated the percentiles more closely with the depth charts, for players who appear in those—we’re pulling things like lineup slots to help calculate RBIs, for instance. The percentiles take quite a while to run, though, so don’t expect them to stay in sync with the depth charts, which can be updated as often as several times in one day.