Let’s talk percentiles.

It’s probably the most famous thing about PECOTA-the fact that we provide a range of forecasts instead of just a single point estimate. Earlier this week, I talked about the accuracy of the weighted mean forecasts. But what about the percentiles?

First, some notes about the percentiles. They are derived based upon the overall unit of production (TAv for hitters, ERA for pitchers), not the underlying components. This is important, because a hitter who hits more home runs than we expect (I hesitate to call it luck-he may have been underestimated, or he may have found a way to improve his talent) isn’t necessarily going to improve his rate of hitting singles by the same amount, or at all.

What this means is that you can’t look at a single stat (say, hits or strikeouts) and think that’s the range of expectations PECOTA has for that skill. The percentiles are supposed to reflect what we know about the distribution of a player’s skill, but they are in essence the average batting line we should expect from that player if he puts up that level of performance in that season. There are a lot of different shapes that performance could take, however, and that means there’s more variance in any single component than is reflected in the percentiles. So the correct test of the percentiles is the overall level of performance, not the underlying components.

The other thing to note is that the observed performance of any individual player is a function of his playing time-the less playing time a player has, the more variance we expect in his overall performance. Things have a tendency to even out over time (although a tendency is not the same thing as a guarantee), and so the spread of observed performance goes down as playing time goes up. If a player is projected for a full season’s worth of playing time, and only ends up playing 50 games or so, the percentiles are going to be too tight. That’s not a bug-it’s impossible to make one set of percentiles that functions across any amount of playing time.

Let’s start off with the hitters. Looking at only players with at least 300 PA, here’s how the distribution of players looks:




















Going from left to right-DIFF20 refers to the percentage of players between their 40th and 60th percentiles, through to DIFF80, which represents the percentage of players between their 10th and 90th percentiles. The second row represents those players above the 50th percentile; the third row represents players below the 50th percentile. Adding up plus down gives you the overall percentage.

What we should want to see is DIFF20 equal to 20 percent, etc. We don’t quite see it, though. It may be a bit more helpful to look at a histogram:

Histogram of percentiles.

The first thing that sticks out should be the fact that most players are in the 50th to 60th percentiles, by a large margin. Why? Fundamentally, players who perform above their expectations are more likely to get playing time than players who perform below their expectations. This isn’t something that should surprise us-this is why we have the weighted means forecasts for PECOTA, which explicitly takes this fact into account. (This is also probably the explanation for why DIFF20 exceeds 20 percent.)

But there’s also more variation in observed performance than what the percentiles expect. Let’s consider the reasons we see variation from what our projections expect. The first point I want to make is that forecasting is not mathamancy;  there’s no such thing as a perfect forecast, except in hindsight. PECOTA utilizes a two-stage process:

  1. As described earlier this week, we generate a baseline forecast based on a player’s past performance, and
  2. We adjust for our expectation of how a player will age, using baseline “forecasts” for comparable players to create a custom aging curve-what Nate Silver would refer to as the “career path adjustment.”

Both of those estimates are subject to a measure of uncertainty. The third source of variation is simply randomness. We use the observed variation of the performance of the comps to model this variance.

Not all forecasts have the same expected variance, though-it seems as though some players have more variance in their baseline forecasts than their comparables do. This is a relatively simple fix-the uncertainty in a forecast is largely a function of the amount of data you have on a player. (It’s also something of a function of a player’s skill set, among other things.) When we build a player’s baseline forecast, we can compare the uncertainty in the forecast to the uncertainty of the comps’ forecasts and figure out how much additional variance we need to add to the percentiles.

We’ve also been treating the uncertainty of a forecast as symmetrical-apparently there’s more uncertainty on the downside than the upside. This is something we can build into our model as well.

Now let’s take a look at our pitchers, minimum of 70 IP:





















I should clarify “down” and “up” in this context-up is an ERA below the forecast, down is an ERA above the forecast.

What we see is something similar to the hitters, but much more pronounced. Let’s examine it from a slightly different angle, and look at FIP as a stand-in for ERA:




















That’s a lot closer to what we saw with the hitters (and of course, everything I said about those applies equally here).

What it comes down to, I suppose, is how you define performance for a pitcher. There are three elements to preventing or allowing runs:

  • The pitcher’s ability to affect the batter-pitcher matchup directly (walks, strikeouts, home runs),
  • The ability of a pitcher and his defense to prevent hits on balls in play, and
  • The sequence these events occur in 

I’ve talked in the past about how those figure into a player’s value. Suffice it to say that the range on the PECOTA percentiles are largely focused on the first element (the one which is where most of the variation in pitcher skill occurs and thus the area most relevant to forecasting).

So, lemme ask-what do you find the most useful to you in using the percentiles? Would you rather they reflect the extent to which we know pitchers have skill in preventing runs? Or would you rather the percentiles reflect the rather considerable noise in measuring a pitcher’s performance (really, the performance of a pitcher and his teammates at preventing runs)? Drop me a line in the comments and let me know.

Or you could talk to me about that-or anything else related to PECOTA, or baseball stats in general-in a few hours, when I chat live starting at 1 ET, as the finale of PECOTA week. And again-this is the beginning, not the end, of a long conversation about PECOTA. Thanks for being a part of it.