Let’s talk percentiles.
It’s probably the most famous thing about PECOTAthe fact that we provide a range of forecasts instead of just a single point estimate. Earlier this week, I talked about the accuracy of the weighted mean forecasts. But what about the percentiles?
First, some notes about the percentiles. They are derived based upon the overall unit of production (TAv for hitters, ERA for pitchers), not the underlying components. This is important, because a hitter who hits more home runs than we expect (I hesitate to call it luckhe may have been underestimated, or he may have found a way to improve his talent) isn’t necessarily going to improve his rate of hitting singles by the same amount, or at all.
What this means is that you can’t look at a single stat (say, hits or strikeouts) and think that’s the range of expectations PECOTA has for that skill. The percentiles are supposed to reflect what we know about the distribution of a player’s skill, but they are in essence the average batting line we should expect from that player if he puts up that level of performance in that season. There are a lot of different shapes that performance could take, however, and that means there’s more variance in any single component than is reflected in the percentiles. So the correct test of the percentiles is the overall level of performance, not the underlying components.
The other thing to note is that the observed performance of any individual player is a function of his playing timethe less playing time a player has, the more variance we expect in his overall performance. Things have a tendency to even out over time (although a tendency is not the same thing as a guarantee), and so the spread of observed performance goes down as playing time goes up. If a player is projected for a full season’s worth of playing time, and only ends up playing 50 games or so, the percentiles are going to be too tight. That’s not a bugit’s impossible to make one set of percentiles that functions across any amount of playing time.
Let’s start off with the hitters. Looking at only players with at least 300 PA, here’s how the distribution of players looks:

DIFF20 
DIFF40 
DIFF60 
DIFF80 
Overall 
23.9% 
34.9% 
49.2% 
63.5% 
Up 
17.6% 
24.7% 
30.8% 
36.7% 
Down 
6.4% 
10.3% 
18.4% 
26.8% 
Going from left to rightDIFF20 refers to the percentage of players between their 40^{th} and 60^{th} percentiles, through to DIFF80, which represents the percentage of players between their 10^{th} and 90^{th} percentiles. The second row represents those players above the 50^{th} percentile; the third row represents players below the 50^{th} percentile. Adding up plus down gives you the overall percentage.
What we should want to see is DIFF20 equal to 20 percent, etc. We don’t quite see it, though. It may be a bit more helpful to look at a histogram:
The first thing that sticks out should be the fact that most players are in the 50^{th} to 60^{th} percentiles, by a large margin. Why? Fundamentally, players who perform above their expectations are more likely to get playing time than players who perform below their expectations. This isn’t something that should surprise usthis is why we have the weighted means forecasts for PECOTA, which explicitly takes this fact into account. (This is also probably the explanation for why DIFF20 exceeds 20 percent.)
But there’s also more variation in observed performance than what the percentiles expect. Let’s consider the reasons we see variation from what our projections expect. The first point I want to make is that forecasting is not mathamancy; there’s no such thing as a perfect forecast, except in hindsight. PECOTA utilizes a twostage process:
 As described earlier this week, we generate a baseline forecast based on a player’s past performance, and
 We adjust for our expectation of how a player will age, using baseline “forecasts” for comparable players to create a custom aging curvewhat Nate Silver would refer to as the “career path adjustment.”
Both of those estimates are subject to a measure of uncertainty. The third source of variation is simply randomness. We use the observed variation of the performance of the comps to model this variance.
Not all forecasts have the same expected variance, thoughit seems as though some players have more variance in their baseline forecasts than their comparables do. This is a relatively simple fixthe uncertainty in a forecast is largely a function of the amount of data you have on a player. (It’s also something of a function of a player’s skill set, among other things.) When we build a player’s baseline forecast, we can compare the uncertainty in the forecast to the uncertainty of the comps’ forecasts and figure out how much additional variance we need to add to the percentiles.
We’ve also been treating the uncertainty of a forecast as symmetricalapparently there’s more uncertainty on the downside than the upside. This is something we can build into our model as well.
Now let’s take a look at our pitchers, minimum of 70 IP:
DIFF20 
DIFF40 
DIFF60 
DIFF80 

Overall 
18.0% 
29.0% 
37.3% 
50.5% 
Up 
13.6% 
19.4% 
22.4% 
29.4% 
Down 
4.4% 
9.6% 
15.0% 
21.1% 
I should clarify “down” and “up” in this contextup is an ERA below the forecast, down is an ERA above the forecast.
What we see is something similar to the hitters, but much more pronounced. Let’s examine it from a slightly different angle, and look at FIP as a standin for ERA:

DIFF20 
DIFF40 
DIFF60 
DIFF80 
Overall 
27.9% 
42.7% 
53.4% 
65.4% 
Up 
23.3% 
29.9% 
35.5% 
38.8% 
Down 
4.6% 
12.7% 
18.0% 
26.6% 
That’s a lot closer to what we saw with the hitters (and of course, everything I said about those applies equally here).
What it comes down to, I suppose, is how you define performance for a pitcher. There are three elements to preventing or allowing runs:
 The pitcher’s ability to affect the batterpitcher matchup directly (walks, strikeouts, home runs),
 The ability of a pitcher and his defense to prevent hits on balls in play, and
 The sequence these events occur in
I’ve talked in the past about how those figure into a player’s value. Suffice it to say that the range on the PECOTA percentiles are largely focused on the first element (the one which is where most of the variation in pitcher skill occurs and thus the area most relevant to forecasting).
So, lemme askwhat do you find the most useful to you in using the percentiles? Would you rather they reflect the extent to which we know pitchers have skill in preventing runs? Or would you rather the percentiles reflect the rather considerable noise in measuring a pitcher’s performance (really, the performance of a pitcher and his teammates at preventing runs)? Drop me a line in the comments and let me know.
Or you could talk to me about thator anything else related to PECOTA, or baseball stats in generalin a few hours, when I chat live starting at 1 ET, as the finale of PECOTA week. And againthis is the beginning, not the end, of a long conversation about PECOTA. Thanks for being a part of it.