Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

Let’s talk percentiles.

It’s probably the most famous thing about PECOTA-the fact that we provide a range of forecasts instead of just a single point estimate. Earlier this week, I talked about the accuracy of the weighted mean forecasts. But what about the percentiles?

First, some notes about the percentiles. They are derived based upon the overall unit of production (TAv for hitters, ERA for pitchers), not the underlying components. This is important, because a hitter who hits more home runs than we expect (I hesitate to call it luck-he may have been underestimated, or he may have found a way to improve his talent) isn’t necessarily going to improve his rate of hitting singles by the same amount, or at all.

What this means is that you can’t look at a single stat (say, hits or strikeouts) and think that’s the range of expectations PECOTA has for that skill. The percentiles are supposed to reflect what we know about the distribution of a player’s skill, but they are in essence the average batting line we should expect from that player if he puts up that level of performance in that season. There are a lot of different shapes that performance could take, however, and that means there’s more variance in any single component than is reflected in the percentiles. So the correct test of the percentiles is the overall level of performance, not the underlying components.

The other thing to note is that the observed performance of any individual player is a function of his playing time-the less playing time a player has, the more variance we expect in his overall performance. Things have a tendency to even out over time (although a tendency is not the same thing as a guarantee), and so the spread of observed performance goes down as playing time goes up. If a player is projected for a full season’s worth of playing time, and only ends up playing 50 games or so, the percentiles are going to be too tight. That’s not a bug-it’s impossible to make one set of percentiles that functions across any amount of playing time.

Let’s start off with the hitters. Looking at only players with at least 300 PA, here’s how the distribution of players looks:


DIFF20

DIFF40

DIFF60

DIFF80

Overall

23.9%

34.9%

49.2%

63.5%

Up

17.6%

24.7%

30.8%

36.7%

Down

6.4%

10.3%

18.4%

26.8%

Going from left to right-DIFF20 refers to the percentage of players between their 40th and 60th percentiles, through to DIFF80, which represents the percentage of players between their 10th and 90th percentiles. The second row represents those players above the 50th percentile; the third row represents players below the 50th percentile. Adding up plus down gives you the overall percentage.

What we should want to see is DIFF20 equal to 20 percent, etc. We don’t quite see it, though. It may be a bit more helpful to look at a histogram:

Histogram of percentiles.

The first thing that sticks out should be the fact that most players are in the 50th to 60th percentiles, by a large margin. Why? Fundamentally, players who perform above their expectations are more likely to get playing time than players who perform below their expectations. This isn’t something that should surprise us-this is why we have the weighted means forecasts for PECOTA, which explicitly takes this fact into account. (This is also probably the explanation for why DIFF20 exceeds 20 percent.)

But there’s also more variation in observed performance than what the percentiles expect. Let’s consider the reasons we see variation from what our projections expect. The first point I want to make is that forecasting is not mathamancy;  there’s no such thing as a perfect forecast, except in hindsight. PECOTA utilizes a two-stage process:

  1. As described earlier this week, we generate a baseline forecast based on a player’s past performance, and
  2. We adjust for our expectation of how a player will age, using baseline “forecasts” for comparable players to create a custom aging curve-what Nate Silver would refer to as the “career path adjustment.”

Both of those estimates are subject to a measure of uncertainty. The third source of variation is simply randomness. We use the observed variation of the performance of the comps to model this variance.

Not all forecasts have the same expected variance, though-it seems as though some players have more variance in their baseline forecasts than their comparables do. This is a relatively simple fix-the uncertainty in a forecast is largely a function of the amount of data you have on a player. (It’s also something of a function of a player’s skill set, among other things.) When we build a player’s baseline forecast, we can compare the uncertainty in the forecast to the uncertainty of the comps’ forecasts and figure out how much additional variance we need to add to the percentiles.

We’ve also been treating the uncertainty of a forecast as symmetrical-apparently there’s more uncertainty on the downside than the upside. This is something we can build into our model as well.

Now let’s take a look at our pitchers, minimum of 70 IP:

 

DIFF20

DIFF40

DIFF60

DIFF80

Overall

18.0%

29.0%

37.3%

50.5%

Up

13.6%

19.4%

22.4%

29.4%

Down

4.4%

9.6%

15.0%

21.1%

I should clarify “down” and “up” in this context-up is an ERA below the forecast, down is an ERA above the forecast.

What we see is something similar to the hitters, but much more pronounced. Let’s examine it from a slightly different angle, and look at FIP as a stand-in for ERA:


DIFF20

DIFF40

DIFF60

DIFF80

Overall

27.9%

42.7%

53.4%

65.4%

Up

23.3%

29.9%

35.5%

38.8%

Down

4.6%

12.7%

18.0%

26.6%

That’s a lot closer to what we saw with the hitters (and of course, everything I said about those applies equally here).

What it comes down to, I suppose, is how you define performance for a pitcher. There are three elements to preventing or allowing runs:

  • The pitcher’s ability to affect the batter-pitcher matchup directly (walks, strikeouts, home runs),
  • The ability of a pitcher and his defense to prevent hits on balls in play, and
  • The sequence these events occur in 

I’ve talked in the past about how those figure into a player’s value. Suffice it to say that the range on the PECOTA percentiles are largely focused on the first element (the one which is where most of the variation in pitcher skill occurs and thus the area most relevant to forecasting).

So, lemme ask-what do you find the most useful to you in using the percentiles? Would you rather they reflect the extent to which we know pitchers have skill in preventing runs? Or would you rather the percentiles reflect the rather considerable noise in measuring a pitcher’s performance (really, the performance of a pitcher and his teammates at preventing runs)? Drop me a line in the comments and let me know.

Or you could talk to me about that-or anything else related to PECOTA, or baseball stats in general-in a few hours, when I chat live starting at 1 ET, as the finale of PECOTA week. And again-this is the beginning, not the end, of a long conversation about PECOTA. Thanks for being a part of it.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
ferret
10/01
Thank you very much for soliciting your subscribers input. To answer one of your questions, I prefer the pitching percentiles reflect the hurlers skill in preventing runs.

Additionally, as this exchange with us continues and develops could you keep us aware of the schedule you are working with as I hope/anticipate your projections (and all Pecota related data) will be available much earlier next year.

Thanks again.
Mountainhawk
10/01
That histogram doesn't look too horrible to me. If you subject it to a Chi-squared test for uniformity, does it pass?
BillJohnson
10/01
The thing is, if the prediction system is optimal, the histogram shouldn't be "uniform," it should take the shape of a Gaussian. Actually, apart from the bottom two bins, it fits that shape quite well, although it's hard to be sure because of the small number of bins. (You might repeat this analysis with twice as many bins that are half as wide, i.e., 5-percentiles-wide bins. A Gaussian shape would be more obvious from that.) The system's main problem is inability to deal with the Cardinals^h^h^h players that have unexpected, complete meltdowns. It'll be interesting to see whether the incorporation of health into PECOTA addresses that. I am dubious -- including past health doesn't necessarily have predictive value for future health-related collapse -- but I'm looking forward to seeing it tried.

TangoTiger1
10/01
Bill: no, it has to be uniform. If PECOTA is saying that something is going to be between the 70th and 80th percentile, then we'd expect 10% of those something to occur between the 70th and 80th percentiles.

You might be thinking of say between the 1.0 and 1.5 standard deviations or something (scale of SD not percentile), and in that case, you would be correct.
Mountainhawk
10/01
I hadn't noticed they left on 0-10 and 90-100, the histogram below does look much worse. Judging by the graph below, PECOTA just doesn't have enough variability in their model.

It almost looks like PECOTA might be picking up process variance ok (that is, the variability caused by the fact that not every player will hit their true expectation every year) but may not be adequately accounting for parameter variance (that is, the variance caused by the fact that your estimates of each players expectation is likely wrong as well.) Obviously, that's not anything I can say for sure, but I've seen percentile graphs like that before, and many times it was from not capturing parameter variance enough.
TangoTiger1
10/01
Right, but not totally. If you compare relievers to starters, you will see that the range are similar. And, the variance around the true expectation of starters should be much smaller than that of relievers. You see that in some cases (Felix, CC), but most of the time, that's not the case.

So, it's two failures: a failure on the true estimate, and a failure on the performance.
Mountainhawk
10/01
I was focusing on the hitter stats, but absolutely correct on the relievers vs starters. Both process variance (since the IP you are projecting the ERA for is lower) and parameter variance (since the amount of data you have to make the projection is generally less) ought to be significantly higher for relievers.

I mean, I can see a rookie starter maybe having a higher variance than Mariano Rivera, but in most cases, the relievers should be much more variable around the mean.
BillJohnson
10/01
You're misunderstanding what this distribution is *of*. It isn't what percentile *accuracy* the projections achieve (you are correct that that, by definition, must be uniform across the bins), it's how the projections relate to what the *players* do, and there is no requirement for that to be uniform.

If the projection system was perfect, i.e., had a connection to a wormhole in space that allowed perfect knowledge of what the player was going to do in the following year, every player would be at exactly his 50th percentile and the distribution would look like a delta function. If it overestimated EVERY hitter's performance such that they all performed at their 25th percentile, and underestimated pitcher's performances by the corresponding amount, you'd have two delta functions, one at 25 and the other wherever the pitchers would fall. (Incidentally, it is NOT required that that delta function would be at 75%. Things are more complicated than that.)

Put differently, in the actual distribution, most of the guys whose projections match their performance at roughly the 50% level would be in the top quintile (or quartile or whatever) of prediction accuracy, and by definition, exactly one fifth (for quintile; one fourth for quartile, and so on) of the projections would fall into that bin. The ones whose projections were way, way off -- in EITHER direction -- would be in the bottom quintile. But that isn't the binning that this histogram is showing. This one is only about *player* performance relative to prediction, not *prediction* performance. Clear?
TangoTiger1
10/01
I don't think I am following you. Suppose you have this player:
mean forecast: .330
90th percentile: .370
10th percentile: .290

Actual performance: .375

This player would count in the 90-100 bin.

Are we agreed so far?

BillJohnson
10/01
That is correct, and let us extend things further, to a "league" of five players, whose projections and performances are as follows:

Player 2: mean forecast=.330, 90th percentile=.370, 10th percentile=.200, true performance=.260

Player 3: mean forecast=.330, 90th percentile=.400 (note that there is no requirement for PECOTA's 90th-percentile projections all to differ from the mean by the same delta-tAv, far from it), 10th percentile=.260, true performance=.265

Player 4: mean forecast=.330, 90th percentile=.400, 10th percentile=.260, true performance=.405

Player 5: mean forecast=.330, 90th percentile=.400, 10th percentile=.260, true performance=.350

Then players 1 and 4 would be binned in the 90-100 percentile bin in terms of how they did relative to PECOTA projections, i.e., they grossly overperformed what PECOTA expected; players 2 and 3 would be similarly in the 0-10 bin; and player 5 would be somewhere around his own 60th-percentile performance -- the exact percentile he achieved would be dependent upon more details of the PECOTA projections, but it would be somewhere above the 50th but well below the 90th.

HOWEVER: In terms of how well **PECOTA** performed, player 5 would be in (in fact, would *be*, exactly) the top quintile (because PECOTA nailed his performance compared to how it did on the others), player 1 would be in the next quintile (because PECOTA missed him by .045, which is worse than player 5 but better than all the others), player 3 would be the middle quintile (missed him by .065), player 2 the fourth (missed by .070), and player 4 the bottom (missed by .070). THIS HAS NOTHING TO DO WITH HOW THE *PLAYERS* PERFORMED, which is the subject of the histogram that Colin displayed. It has to do with how *PECOTA* performed. Incidentally, in this particular example, it would show that PECOTA had a chronic tendency to underestimate how these guys hit.

Does THAT clear it up?
BillJohnson
10/01
Sorry, "under*predict* how these guys hit" in that last paragraph, not "underestimate". 3 would have been underpredicted, 2 overpredicted. The total extent of the underprediction would have been greater than the extent of the overprediction. This is entirely normal and acceptable in the real world, btw; PECOTA can't predict what the umpires' (random?...) strike zones are going to be like in the coming year, whether the ball -- or players -- will be more juiced than expected, and so on.
TangoTiger1
10/01
"HOWEVER: In terms of how well **PECOTA** performed"

But this is not the subject of this article or that histogram. That histogram is about tracking the information in your paragraph here:

"Then players 1 and 4 would be binned in the 90-100 percentile bin in terms of how they did relative to PECOTA projections, i.e., they grossly overperformed what PECOTA expected; players 2 and 3 would be similarly in the 0-10 bin; and player 5 would be somewhere around his own 60th-percentile performance -- the exact percentile he achieved would be dependent upon more details of the PECOTA projections, but it would be somewhere above the 50th but well below the 90th."

The counts would be:
n, percentile
2, 90-100
1, 50-60
2, 0-10

That's what the histogram would show from your example.
BillJohnson
10/01
That's correct.

So what does that tell us about the actual histogram, and turn, what that histogram says about PECOTA? The answer is that the histogram shows PECOTA to do very well with most players, i.e., the top quintile of the PECOTA performance is populated with the guys whose *player* performance was right around their 52nd percentile or so. The next quintile will be ones whose performances were somewhere between their own 45th and 50th, or 55th and 60th, percentiles -- don't take those numbers too literally, they're a SWAG, but probably about right. And so on, with, as it turns out, the lowest quintile occupied by the guys for whom PECOTA missed a collapse.

And that is what is interesting about the histogram. If PECOTA works right, the results *should* cluster around 50th-percentile predictions, and indeed, they do. The width of the nearly-Gaussian distribution centered on that "Schwerpunkt" -- German has a better description for this than the English "centroid" -- is a measure of how imperfect the PECOTA predictions are. If the predictions were perfect, the Gaussian would be arbitrarily narrow (a delta function). If they were purely random, the Gaussian would be infinitely wide. Laying aside the collapses, the message is that PECOTA works pretty well.

Why is that distribution Gaussian? Well, it isn't necessarily *really* Gaussian, but a Gaussian shape is what you expect if players' under- or overperformance is a matter of luck -- getting screwed or helped on BABIP, etc. It is also consistent with the hypothesis that the set {players for whom the information used to form the predictions is exactly correct} is larger than the set {players for whom the information is completely bogus}, with the obvious gradations of correctness in between those two extremes. In other words, these guys do their homework -- but if they did it even better, the distribution would be narrower, except for the "predictions" given to players who collapse.
TangoTiger1
10/01
"And that is what is interesting about the histogram. If PECOTA works right, the results *should* cluster around 50th-percentile predictions, and indeed, they do."

No. For it to work right, the percentiles should remain the same, BUT the estimate of the percentile levels should be much narrower.

For example, PECOTA would give this:
Pujols:
10th .290
50th .330
90th .370

(Or whatever).

IDEALLY, the best forecasting system would give something like this:
10th .310
50th .330
90th .350

That is, the estimate of each level is as tight to the 50th as possible.

However, the histogram *must* show 10% of players (of whatever the population it's based on) in each 10 percentile grouping.

***

You seem to be saying that we should keep it like this:
10th .290
50th .330
90th .370

And then be happy that 95% of the data falls between the 10th and 90th points.

Well, from that standpoint, why not set the percentile ranges so wide to ensure that 95% of the data falls between the 45th and 55th points?

***

I think you are conflating the issue of accuracy, with the issue of bias. The histograms here speaks only to the issue of bias. It says nothing about accuracy (of the mean forecasts). It only say something about the "accuracy" of setting appropriate ranges.
mickeyg13
10/01
What Tango said.

BillJohnson, you do recognize that PECOTA itself comes with a list of percentiles, and that is what we are evaluating, right? I want to make sure there's no confusion on that. So PECOTA is telling us that it thinks something will happen 10% of the time, so we want to check to see if that does happen 10% of the time.

Now, if we were just using the mean PECOTA forecast, then yes that could very well be Gaussian and would be tighter if PECOTA were better. But that is not what we are talking about here. We are not evaluating how well PECOTA projects mean performance, we are evaluating the very percentiles (really deciles) that PECOTA is giving us.
BillJohnson
10/01
Now we're getting somewhere, TT.

For the projection system to "work right" is one thing; for it to be "useful" is another, quite different thing, and harder to achieve. What you're saying is that an optimally "useful" PECOTA would have the gap between any player's 10th and 90th percentile projections be as small as possible, AND players' performances against the projection system would continue to be clustered right around the 50th-percentile level or a bit higher (for reasons Colin describes). This is entirely fair and to the point.

If PECOTA had perfect knowledge of the forward-looking capabilities of every player -- not what they *will* do in the coming year, but what their skills will *allow* them to do -- then the gaps between 10th and 90th percentile predictions would be much narrower than they are. They still would be non-zero, because there is a random component to how players perform (e.g. fluctuations in BABIP) as well as sensitivity to things the players can't control (e.g. strike zones). For exactly the same reason, there would continue to be players who, even laying aside injury-driven collapse, fail to meet their 10th-percentile, or manage to exceed their 90th-percentile, projections.

What you are asking for, quite reasonably as a paying customer, is a PECOTA where both of these conditions are met: the prediction bands are narrower than they currently are (i.e., the algorithm is well informed), and the actual performances fall in as narrow a distribution around the 50% predictions as luck will allow (i.e., the algorithm "works right"). That's the holy grail of these prediction algorithms -- so perhaps the discussion should turn to how to get there.
Mountainhawk
10/01
No. The bell shape curve you are thinking of would come from calculating (Actual TAv - Expected TAv) / (SD of TAv) for each player, then plotting those results in a histogram.

This is plotting percentile results, and by the definition of percentiles, there should be 10% of the players in the 0-10 percentile range, 10% of players in the 10-20 percentile range, etc.
TangoTiger1
10/01
Please guys, don't "minus" Bill's comment. Just because he is wrong in his position, doesn't mean we should not read it. It's a view that many might have shared because they didn't think of it the right way, but it's worth discussing.
TangoTiger1
10/01
Let there be no doubt that Colin is now taking PECOTA by the b-lls. After several years of me shouting from the rooftops and my padded room that the PECOTA percentile forecasts are highly suspect, and providing probability proof, we now have empirical confirmation. Colin shows us how often players who have at least 300 PA had their TAv land in each of the percentile ranges (10-20, 20-30, ... 80-90). In a perfect world, you'd have 10% of all players in each 10% group. In a real world, we'd expect say 8-12% across the board. But, this is not at all what we get. While Colin showed the numbers for the 10-90 group, he did not show the 0-10 and the 90-100. We do know that the total of these two groups is 36.5% (Colin reports that the 10-90 group is 63.5%). So, here is what that chart looks like if we just split the 36.5% evenly in the two extreme groups:

image

(Note to Colin: I definitely think you should update your chart to reflect the 0-10, and 90-100 numbers. I think this makes it far more clear, considering that the area above the 10% line has to equal the area below the 10% line.)

That's for hitters. For pitchers, it's even worse. 50% of the pitchers (min 70 IP) had an ERA outside the 10-90 percentile ranges, whereas we would have expected just 20% total. It's an alarming total. When Felix Hernandez's 90% percentile is 3.20, and he, for two years in a row, achieves an ERA below 2.50, then you know something is dreadfully wrong.

Now, Colin makes a good point that ERA includes sequencing, something we've talked about alot here in the past few weeks. The equivalent to a hitter's TAv would be a pitcher's peripheral ERA (component ERA, BaseRuns ERA, or whathaveyou). If we do that, we get for pitchers something similar as for hitters. Therefore, if the test is not going to be against ERA, but peripheral ERA, then the PECOTA percentile page should show the header as peripheral ERA.

Nonetheless, a huge issue.

Thanks Colin. You should be proud for doing the right thing.
TangoTiger1
10/01
The chart is here:
flirgendorf
10/01
TangoTiger,
I have to disagree with your assertion that in a perfect world you'd have 10% of all players in each 10% group, because we have selected a group of players who reached 300 PA. As Colin already said:

Fundamentally, players who perform above their expectations are more likely to get playing time than players who perform below their expectations.

So there. Also, the ERA distribution was also explained by the fact that PECOTA was not accounting for uncertainty in defense/etc and was just looking at true ability. I personally would prefer the percentiles to account for luck/defense/etc but you haven't provided proof that the percentiles are not doing what they are supposed to be doing.
Mountainhawk
10/01
If you are going to call them percentiles, you should have 10% of the players fall into 10 point wide ranges of percentiles.
TangoTiger1
10/01
Right. To put it another way: under what conditions should we see 10% of the players exceed their 90th percentile forecasts?

That is, what subpopulation of the 1000 batters in 2010 are we looking at?

And once you look at that subpopulation, do we also see 10% of them going below their 10th percentile forecasts?

I would bet that there is NO subpopulation that you can select where the percentiles come anywhere close to 8%-12% in each 10% bucket.
flirgendorf
10/01
You should expect that if every player received a full season's worth of plate appearances that 10% of all players would fall into each decile. However, because players who underperform their projection to the extent that they are below replacement level may get replaced, they will not reach 300 plate appearances and as a result will not be counted in the study. If the study examined every player who was in his teams' opening day starting lineup, your argument for uniform distribution would be valid. However, the study selected for players whose teams decided to play them for 300 plate appearances, and therefore a bias was introduced and one should no longer expect players to be uniformly distributed in the deciles.
Mountainhawk
10/01
Fine, but unless you think teams are horrible at putting the best players on the field, that should result in a upward sloping graph (fewest players in 0-10% and most in 90-100%) and not the U shape in TTango's graph or the downward slope in the graph in the article.

Variance is a PITA to estimate, and PECOTA and BP wouldn't be the first (or last) to have underestimate just how much noise there really is in the data.
mtr464
10/02
"Fine, but unless you think teams are horrible at putting the best players on the field, that should result in a upward sloping graph (fewest players in 0-10% and most in 90-100%) and not the U shape in TTango's graph or the downward slope in the graph in the article."

But, you are ignoring the fact that players that are proven "starters" will be given much more leeway, even when they are failing to put up replacement level offense (see: Aaron Hill and most of the Cardinals in 2010). So, we should expect a large cluster near the bottom while these players continue to get playing time.
TangoTiger1
10/01
Then you would get a right-skew. Instead of it being:
10, 10, 10, 10, 10, 10, 10, 10, 10, 10
It would be:
4, 5, 6, 7, 8, 10, 12, 14, 16, 18

Or something.

And no way do we see anything like that.

But, the point still stands, if PECOTA is saying: "I expect this player to exceed his 90th percentile 10% of the time", then how are we to evaluate that?

Is it that we are to look at all 1000 batters in MLB, and have no PA minimum? Is it that the claim will only exist if the player is allowed to have 300 PA?

PECOTA is the one making the claim. Therefore, let's see what the conditions are in which we expect the 90th percentile to be exceeded, and let's test it based on that basis.

***

In any case, the #1 problem with the percentiles is that the uncertainty range has to be based virtually almost entirely on the sample size of the player's past performance. And this is not at all what PECOTA has been doing.

Colin himself acknowledges it exactly:
"This is a relatively simple fix—the uncertainty in a forecast is largely a function of the amount of data you have on a player."

crperry13
10/01
I just noticed that if I accidentally click "Submit comment" without any content, I get an assertive error saying "Your message appears to be blank. Stop that."

Love the attitude. Real comment to follow...
crperry13
10/01
This is a general comment on these past few articles.

Look...90% of us are casually interested fans of sabermetrics and advanced statistical analysis. We don't have a horse in the race of which-system-is-better, nor do we take sides in the which-site-has-better-analysts debate.

That said, the nitpickiness, condtradiction, and barely-veiled undermining from competitors and detractors on these comment boards are really annoying.

Someone said it well yesterday - aren't you all batting for the same team here? Aren't we all trying to raise usage of advanced baseball analysis? This infighting is just dumb. Stop driving the casual fans away with your annoying bickering about who is better than who.

mickeyg13
10/01
It's not about "who is better than who." Forgive me for putting words in his mouth, but I'd bet that Tango, for instance, would LOVE for PECOTA to absolutely destroy Marcel and for the percentiles to work as we expect them to. He's not worried about some chest-thumping competition to prove the superiority of "his system." He pretty much tells you that "his system" is the worst possible acceptable system and he pleads for you to do better if possible.

If there are flaws in existing techniques, it is a good thing for the entire baseball analysis community for those flaws to be corrected. We all want to see the best possible analysis whether it comes from BP or somebody else. I think the main problems arise when a lack of transparency (perhaps coupled with some marketing hype) gives something the illusion of being the best it could be when it in fact is not. I applaud Colin for doing the dirty work to make the process much more transparent and therefore much more open to improvement.
TangoTiger1
10/01
Ditto.
jrmayne
10/01
Yeah. This.

Talking about the competition, and talking about the flaws in PECOTA is good for everyone. Colin has addressed a number of critical issues. Tango's clearly interested in furthering the field; on his own blog he often touts others' research (and sometimes critiques others' research.) It's not just a "My game is better," thing.

I have little doubt that the majority of readers aren't supremely interested in the fine details - but the ones who are are worth something to the system.

Tango's point in the comments is very well taken; I have always been curious about the failure to have bigger ranges for players with limited histories (though the issue isn't just with players with limited histories, obviously.)

crperry13
10/01
Then maybe it's a phrasing thing, because it doesn't always sound constructive to me. (This is not just directed at TT, either) There's a huge difference between working to correct flaws (which I approve of) and mocking issues that haven't been addressed yet or aggressively criticizing the people behind the work.

For the most part, there's nothing offensive in this. But there's enough finger pointing, accusing, and comparisons going on to make it really annoying to a casual reader.

By no means stop suggesting improvements. Just be aware of how you might come across if you choose your words poorly, that's all.

The "minus"es to my original comment were expected. There's a feature I wouldn't mind seeing go away, since it amounts to an opinion popularity contest.
TangoTiger1
10/01
"But there's enough finger pointing, accusing, and comparisons going on to make it really annoying to a casual reader."

I think you should be more explicit by pointing to actual examples. I will grant you that as a casual reader who might be giving cursory views to comments, it may seem combatative. But, once you go deep into it, we're all a happy sabre family.
crperry13
10/01
I'm not making this up, but I really don't want to post quotes, though I could. From this thread. We all appreciate your work and the work of the people who manage BP. Let's not get nasty. :)
TangoTiger1
10/01
"From this thread. "

Well, you should, because how do we get resolution to problems unless we see the problem. And maybe it's not a problem, but a misinterpretation. As it stands, you pointing "something" out means nothing at all, since we (I) have no idea what you are talking about.
crperry13
10/01
Check your email. :)
TangoTiger1
10/01
Can you forward to tom~tangotiger~net (replacing ~ as appropriate). I can't check my Yahoo account from the office.
mtr464
10/02
Tom, I love your work and appreciate your comments here (generally pointing out things that I didn't think of, but can pretend I did when talking with friends), but you can come off as being arrogant. I hate to say it, because I don't want you to change you comments, or style (they are useful to the discussion here). But, for people who just see you as a competitor (or even just a random guy) trying to nit pick I can understand why people can misunderstand you. I think the issue is you constantly prodding for proof or clarification, which is absolutely a necessary thing and people don't understand that. People see it as being annoying. I appreciate it (even if I don't always agree) and I hope you continue to annoy people here, to better advance our knowledge.
crperry13
10/02
Just to clarify, I learned that Tango's comments above were direct copy-paste from his own blog. In the context of his own blog, I don't think there's anything arrogant or elitist. Thing is - nobody here knew it was copy/paste so it probably sounded a little stronger than something he would have written here from scratch. To clarify further, my initial comments weren't directed at Tango specifically.
BillJohnson
10/01
Agreed. The important thing is for Colin and company to understand what PECOTA is really doing, and what steps need to be taken to make it better.
leites
10/01
Prefer "the percentiles reflect the rather considerable noise in measuring a pitcher’s performance" . . . in others, the real-world projection.
joelefkowitz
10/01
So what's the (seven percent) solution?
erhardt
10/01
Cocaine. It's from Sherlock Holmes.
MHaywood1025
10/01
I first thought of the chapter out of the biography of one Richard Feynman. Pretty sure it had the same name as well.
rowenbell
10/01
I believe you're right -- as I recall, one of the anecdotes in that Feynman chapter involved new data that was different by 7% and they were trying to work out whether that 7% improved or worsened the experimental fit to a theory he was developing -- but, of course, Feynman's choice of title to that chapter was an allusion to Holmes.
dbiester
10/01
I am a casually interested fan of sabremetrics most interested in using the projections to get a better handle on likely outcomes in the upcoming baseball season so I don't get crushed in my fantasy leagues. This year I got crushed in my fantasy leagues, and I blame society.

Why are so many people falling off the charts? I guess what I'd like to get from the projections is a pretty good idea of a likely upcoming season from a player, and then the probability of a windfall or a disaster. Not sure if the percentiles is the best way to do that given that this chart seems to be showing so many actual performances in the outlier columns. This may also contribute to inaccuracies in the star/scrub graphic for each player, I don't know, but if so it is not a good thing.
dalbano
10/01
I have to say getting crushed in fantasy isn't any fun, as I was crushed this year after finishing in the money all 7 years of my league. After having great keepers, and what I thought was a great draft, I was easily an early favorite to start the year. A good chunk of my team wound up having a HORRIBLE first 4 months of the season. I was in last place out of 12 teams the entire time. If the season were extended, to say...January, I think I would have a good shot of winning it. Instead, I will be right in the middle after having an extraordinary final two months.

That being said, in a 40,000 ft view, it is EXTREMELY difficult to forecast baseball statistics for a single season on a large scale(obviously!) The moral of my story is, while we definitely strive for a successful forecasting model while looking at the next immediate season, I believe PECOTA does a great job of identifying performances over 900-1200 at bats and 300-400 innings.
devine
10/01
"This year I got crushed in my fantasy leagues, and I blame society."

This may be the funniest thing I read all morning. Thank you.
flirgendorf
10/01
I would prefer that the percentiles for pitchers reflect the noise; couldn't you just include SIERA in the forecasts for those curious about the "true ability"?
bmarinko
10/01
Maybe I missed something in the article, but are these stats/charts just for 2010, or for some larger time span? If its just for 2010, do other seasons have distributions that look similiar?
ScottBehson
10/01
Forgive me if this is naive, but isn't the trimodal distribution we see in Tango's chart above (and implied in Colin's in the article) perfectly explainable?

Those in the first hump (players who vastly underperfromed their projections) are those who missed significant playing time due to unanticipated injury or got sent to the minors.

The second hump are those PECOTA nailed really well.

The third hump are those who got unexpected playing time (non-top-prospects who got a lot of ABs or IP- like Thole or Leake; those who overcame past injuries to play a whole season).

Average these together, and PECTOA comes up roses.

Colin, others, what say you?
ScottBehson
10/01
Put more simply- PECOTA seems to miss very high, very low, or totally nail player performance. Can't this be mostly attributed to unexpectedly low or high amounts of playing time (due to injury or promotion/demotion)?
TangoTiger1
10/01
"Those in the first hump (players who vastly underperfromed their projections) are those who missed significant playing time due to unanticipated injury or got sent to the minors."

I highly doubt it. But even if that's the case, then what in the world does "10th percentile" forecast mean? If you want to say that 50 of the 250 players (or whatever) of the players with at least 300 PA had a very down year, then why are you setting the benchmark so high that 20% of the players reached a level that you said only 10% of the players should reach?

That is, vastly underperforming, or sent to the minors, while still reaching 300 PA is not a phenomenon limited to the year 2010.

You are *starting* with the position that only 10% will reach some baseline (hence the 10th percentile). Then you have to ask: "how much below my mean will that be?". And maybe the performance level of a group of such players that you thought should have had a .270 TAv did in fact reach only .230, then that's where the 10th percentile forecast should have been made, and not the .240 or .250 level that IS being made, such that 20% (instead of 10%), get below that level.

(All numbers for illustration purposes only.)

It goes back to exactly what I am saying: once you decide on the parameters of your subpopulation, then it's at that point that you test for the 10th and 90th percentile.

As it is, we have no way to test, because we are not being told what subpopulation to test against.
bwilhoite
10/01
Maybe I'm misunderstanding how this is supposed to work, but I don't think the projections ARE saying this is a level of production 10% of the players should reach, but IS saying this is a level that this particular player is projected as having a 10% chance of being worse than and and 90% chance of being better than.

With that, you should see players grouped around the 50th percentile, with gentle downward slopes in each direction. I would also expect to see an upward slop, or even spike as you near the 0-20% ranges as those players will begin to receive less playing time as a whole (and will not have time to have their numbers even out - getting the short end of the small numbers stick), and maybe, though I'm not certain here yet, a slight uptick at the 90th percentile for those players that exceed their projections (because no one saw them coming) or that the system has a hard time nailing down (Jose Bautista & Carlos Gonzales from the first group and Ichiro, etc... from the second).

It's not a measure of player percentiles or normalization from the league projections, but of how an individual player is likely to perform. In a perfect projection world every player would be right at the 50th percentile and every other percentile would be empty.
BillJohnson
10/01
Now you're getting there.
Mountainhawk
10/01
No! That's not what percentiles mean. If PECOTA had 100% of players between the 45th and 55th percentiles (or the 49th and 51st percentiles) or whatever, then there prediction system is just as broken as it is when 40% of players are outside the 10-90% confidence interval.
joelefkowitz
10/01
Maybe I'm misunderstanding how this is supposed to work, but I don't think the projections ARE saying this is a level of production 10% of the players should reach, but IS saying this is a level that this particular player is projected as having a 10% chance of being worse than and and 90% chance of being better than.

Ok, so if every single player has a 10% chance of being worse than his 10th percentile, why shouldn't we expect 10% of all players to hit that mark?
batpig
10/01
"but IS saying this is a level that this particular player is projected as having a 10% chance of being worse than and and 90% chance of being better than."

right -- but as a POPULATION, wouldn't you expect 10% of players to have exceeded their 90th percentile projection?

taking your explanation -- if each individual player has a 10% chance of being worse than their 10th percentile forecast, and a 90% chance of being better.... wouldn't you expect that when you take the whole set of players, shouldn't 10% of players have done worse and 90% better?

if you are laying out percentile bands, you would expect a flat histogram. That is how percentiles work!

now, there are some confounding factors in terms of sample size. Even 300AB is a low enough cut-off that there will be some noise. Plus the selection bias of "bad players lose playing time". But if every player projected as allowed to play a theoretical season of exactly 1000 PA, the numbers should converge at exactly 10% falling within each 10-percentile band, right?
Arrian
10/01
Apolgies if this doesn't make sense, but...

Ok, say a player has an AWFUL start. Further assume he's young (or old, but not prime). He may well find his (MLB) season over. Down to the minors or riding pine. Had he kept playing, he might have improved enough that his overall stats ended up being closer to his projection, but the team couldn't/wouldn't wait for him to make the adjustment (or his luck to change). So he ends up with 300 ABs instead of, say, 600.

Or let's consider the player who, due to injury, plays a partial season - and plays spectacularly well. I'm thinking of Robbie Cano's 2006 here. He missed a month or so with a hammy injury. He hit .342/.365/.525. Had he gotten more plate appearances, I would have expected him to come down to earth a bit. But he didn't get them - he was on the DL.

Does any of that help explain how a system (any system, whether it's PECOTA, CHONE, etc) might miss?
lucasjthompson
10/01
"There are a lot of different shapes that performance could take, however, and that means there’s more variance in any single component than is reflected in the percentiles. So the correct test of the percentiles is the overall level of performance, not the underlying components."

There is no way you'd get this by looking at the PECOTA cards. If the percentiles don't mean anything for the broken-out components, don't make it look like they do. Among other things, it just makes the system look bad, when, just for one example, a guy like Mauer, who's essentially hitting his 50th percentile TAv projection right on the nose, has a number of home runs this season that system _appears_ to say is nigh impossible.

Just my 2 cents. Enjoying the series.
batpig
10/01
thanks for posting this Colin, and for finally providing some transparency for PECOTA. I posted the question about the accuracy of the percentile bands in one of the previous articles, and I'm glad to see it looked at in some detail.

It has long been my suspicion that the percentiles did not account for enough variability in performance (Felix Hernandez being just one prominent example), and this data proves that it is, indeed, a problem. I am in full agreement with TT that the fact that 36.5% of hitters are falling outside the 10/90 percentile bars is an alarming result.
TangoTiger1
10/01
On my blog, someone asked me this:
"what do you make of the large clustering of outcome in the middle decile? "

I responded:
I just took one guy to see what the shape looks like. This is ARod

90o 0.323
80o 0.315

70o 0.309
60o 0.298
50o 0.288

40o 0.282
30o 0.280
20o 0.277
10o 0.273

Look at the gap between 50th and 70th: 21 points. That's way wider than anywhere else.

So, the reason that PECOTA is capturing so many players in the 50-70 range is because it provides such wide latitude at the 50-70 range.

It won't catch much in the 30-40 range, because, well, look, there's almost no gap there.

I don't know if ARod is an example or an exception.

But, given that I've seen funny stuff, like Felix having a WORSE forecast at the 90th level than the 80th level, I think there is a serious programming bug as well.
BillJohnson
10/01
This is a fair criticism, IMO, and I'd like to see an explanation. Consider the following question that your analysis implies: is the difference between a player's 40th and 60th percentile predictions (based on tAv) less than, equal to, or greater than the difference between his 70th and 90th percentile predictions? A sort-of-random check of a reasonable number of players (will give details if interested) reveals that in only about 5% of all cases is the 40-60 gap smaller than the 70-90 gap. In all the others the difference is equal or larger for the 40-60 than for the 70-90, just as it is for A-Rod. Whatever's going on here, it's systematic rather than an exception that you happened to pick.

The reversal of EqERA between King Felix' 80th and 90th percentile predictions is way down in the noise and not necessarily significant all by itself; the reason why one is the 80th and the other 90th has to do with the number of innings pitched, which differs considerably for the two so that VORP/WARP/etc. also differs by more (and in the right direction...) than EqERA implies. It is noteworthy, however, that his real-life performance far exceeds BOTH the 90th-percentile EqERA and the 90th-percentile number of innings pitched. Note also that fellow studs Doc Halladay, Adam Wainwright, Josh Johnson, etc., also exceeded their 90o EqERA by significant margins (although not always the 90o IP). So yeah, it sure looks to me like you've found a real bug here when it comes to exceptional performances.
DrDave
10/03
Probability theory makes it very clear that the (true) width of the percentile bands MUST get wider as you move away from the mean. Any set of percentile forecasts that don't obey this are (forgive the term) nonsense.

"How much wider should the 70-80 band be, compared to the 60-70 band?" is an interesting and difficult analytical question. Whether or not it should be wider is not.
TangoTiger1
10/03
Agreed, it *must* be wider. And PECOTA instead has it narrower.
kantsipr
10/03
Is that statement true in general, or only for a subset of distributions, including the normal distribution? A Poisson distribution demonstrates the behavior you describe for the high end but not necessarily on the low end. And I can certainly define a probability distribution that does not behave as you describe.

Incidentally, I've often wondered whether in modeling pitchers performance, it would make more sense to use a Poisson distribution rather than a Gaussian.
TangoTiger1
10/04
When would the range of the 0th to 10th percentile ever be smaller than the 40th to 50th percentile as an estimate of the true mean?

It's like saying this:
50th: .300
40th: .290
30th: .280
20th: .270
10th: .260
0th: .255

It's not going to happen. This is how it would look like:
50th: .300
40th: .290
30th: .278
20th: .262
10th: .242
0th: .212 (or .000 technically)

All numbers for illustration purposes only.
DrDave
10/04
It's true for any distribution with a single central peak near the mean/median. By definition, the quantiles are closer together where the density (or pdf) is highest, and farther apart where it is lower.

Incidentally, if you're talking about using a Poisson to model the number of runs allowed (or scored), it's better than a Gaussian but still not right. 0 occurs too frequently (IIRC) relative to a Poisson distribution.
TangoTiger1
10/04
The last two links on my site gives you the best estimate of run scoring distribution. Keith used this in the BPro annual from a few years ago.
AWBenkert
2/11
A Poisson distribution? Sounds fishy to me!
batpig
10/01
that explanation not only explains the clustering in the middle quintile, it also explains the clustering in the outer deciles (<10% and > 90%).

the spread of percentiles is too narrow, meaning any big miss is going to end up outside those 10%/90% goalposts...
TangoTiger1
10/01
This is Felix Hernandez:

PCT ERA EqERA
90o 3.23 3.31
80o 3.22 3.30
70o 3.27 3.35

60o 3.30 3.39
50o 3.54 3.63

40o 3.57 3.66
30o 3.68 3.77
20o 3.75 3.85
10o 3.87 3.97

Three points:
1. PECOTA is already giving us "EqERA", which is the peripheral or component or luck-free ERA we've been talking about.

2. In addition to that PECOTA is giving regular ERA (which should be much wider because it includes more luck from sequencing events, etc).

3. Look at Felix's forecast at the 80th and 90th levels. Obviously wrong. Look how wide it is at the 50-60 level, and then, how tight it is everywhere else. You are naturally going to capture more players in the 50-60 level if you are putting in estimates that are much wider at those levels.
bmarinko
10/01
TangoTiger - forgive me if you have answered this question somewhere else, I'm not that familiar with your work:
What is your over all opinion of the percentile forecasts? Are they
1) Useful and potentially possible to calculate accuratly, once all the bugs are worked out?
or 2) Useful, but probably impossible to calculate
or 3) Not worth doing

Also, this is probably the best series of discussions in a long time. I was on the fence about subscribing next year, but articles like this keep me coming back.
TangoTiger1
10/01
The *idea* behind having uncertainty of your estimated forecast is good. Indeed, we devote several pages in The Book on not only the need, but the method, to calculate the uncertainties. When I publish the Marcels, I include a "reliability" figure, which acts in a similar way.

Colin is accepting the position I've held, and MGL reiterated, and, really, what any stats professor would tell you, and that is that the uncertainty of your estimate is based on the size of your observed sample. What has been frustrating for me is that this is so obvious and commonly accepted that I was getting push back on it (not Colin). Now, Colin is going to be novel about it, and include more to the uncertainty by looking at the kind of player you have (maybe there's more uncertainty in the mean of old players, or fast players, or whatever). That's good, but more importantly is to get the basics down, which is what he is going to do.

Now, is it necessary to publish the 10th and 30th and 80th percentiles? Why not just say:
Pujols .330 +/-.030 (where that's one standard deviation)

Why does this help? Because you can then do this for Pujols' PA:
Pujols 610 +/- 70

The way the percentiles are currently laid out, it tries to give you both, but it's not really. As Colin noted, it "infers" all the component stats based on the TAv stat.

Why not do:
Pujols
(K/PA): .07 +/- .02
(BB/PA): .18 +/- .03

And so on.

Wouldn't that convey far more information, while using up the same amount of real estate?

(Note: not all things are symmetrical. You can get away with that on the rate stats, but not on playing time. On that one, and that one alone, I would LIKE to see the percentile forecasts.)

You can also follow the thread at my site, where MGL made a good point.
batpig
10/01
I do think publishing the percentile lines, baseball card style, is a useful tool for the most common end-user -- the guy playing fantasy baseball who wants an easy "snapshot" of the range of expected outcomes for Player X, in a readily understandable format.

Your suggestion of:

Why not do:
Pujols
(K/PA): .07 +/- .02
(BB/PA): .18 +/- .03

is totally valid, but not very digestible as an end product for mass consumption. We need a final stat line for the year!
TangoTiger1
10/01
But the range will be virtually the same for all starting regulars, and all starting pitchers. It's not going to differ by any amount that would be of any help to anyone.

If you are talking about rookies and guys with limited playing time, sure.... but bench players you don't care about, and all the rookies will have such wide ranges as to be useless as well.

Same thing for relievers... they'll all have similar ranges.

So, I see no practical use for a Fantasy player for the ranges.

What you DO want to have the ranges for is playing time. That's where the value is.
sensij
10/02

I strongly disagree here too. Wouldn't the range for a guys like Adam Dunn, or Ichiro, be significantly smaller that the range for a guy like Aubrey Huff? As a fantasy player, you want to make decisions based on risk-reward, and there are times where you might seek more mediocre, reliable performance than going for all or nothing.
BillJohnson
10/03
It doesn't really need to be "virtually the same for all starting regulars," because the players and their comparables don't have "virtually the same" skills and limitations. If there is significant scatter in the way the comparables did in the year following the comparison, for reasons that are apropos to the comparison, then it seems reasonable for the range to be significantly greater than if all the comparables turned out about the same. (Suppose a starting pitcher's best comp is a young Bret Saberhagen, with his weird alternating-years-of-greatness-and-mediocrity thing, for example.) If you want to say that "most" starting regulars should have fairly similar ranges, I'd agree. But not "all" or anything close to it.

I'm not a fantasy guy, so no comment on the practicality question, but for one who just loves the game and strives to understand it, the ranges are nice to see -- if they work.
TangoTiger1
10/03
"the ranges are nice to see -- if they work."

and

"If you want to say that "most" starting regulars should have fairly similar ranges, I'd agree. But not "all" or anything close to it."

In reality, you are right. Insofar as what the data can possibly tell us, then our ESTIMATES will have their ranges virtually all similar (beyond whatever their past number of PA would indicate). Only cases like Ben Sheets or other other players with injuries will be exceptions.

Otherwise, I would be shocked that the 90th percentile of every player is not something like mean TAv + 1.15 to 1.20 and the 10th percentile is not TAv -1.25 to -1.30. Something along those lines.

If someone is arguing that you are going to have some players at TAv +1.10 and others at TAv +1.40, then I don't think your expectations are going to be reasonable.

(Again, presuming we are looking at similar past PA for the players in question, and injuries notwithstanding.)

You might get a skew based on age, but again, that would apply across the board to everyone at that age.

Anyway, let's see what Colin will discover with the refreshed PECOTA, let him make his claim, and then just test it.
TangoTiger1
10/03
Note: all numbers for illustration purposes only.
penski
10/02
Along similar lines, I would like to see PECOTA replace the percentiles with a single weighted forecast, followed by three reliability numbers, perhaps on a scale of 1-10 for simplicity of presentation. The numbers would represent the confidence of the key elements of the projection.

1) How reliable is the baseline? A player with a long, consistent MLB performance history has a more reliable baseline than Matt Weiters in April, 2009.

2) How reliable is the similar player pool? A player who has a lot of similar past players to compare would have a more reliable number than Ichiro.

3) How much variance is there in the projections? Based on the similar players available, how much variety in performance has there been? Presumably certain types of players have a smaller range of potential performances than others.

I don't know if it's possible to assign numbers in this fashion, but if so, then it would also be easier to test the accuracy of the system by comparing apples to apples. This may also reveal particular weaknesses/biases of the system.
sensij
10/02

I would think that publishing the numbers with a ± interval is systemically wrong. You say that not all things are symmetrical... I would image that very few really are. The assumption of normality that is also implied to when you start using "standard deviation" is also a big leap. Publishing percentile data shows at least a rough outline of what each players probability distribution looks like, which then allows factors like "breakout" or "collapse" to be included in the model based on the comps.
tbwhite
10/02
I think part of the problem is that two distinct measures keep getting conflated: a players talent level or ability and playing time.

The number one thing I want from a projection system is some sort of assessment of the player's potential or ability. How good is this guy now, and how good can he be in the future ?

Once you have that information, then you can do things like run thousands of simulations of the next season and present a range of outcomes that we may see from that player.

In essence, the first part is rate stats, and the second part is to apply those rate stats to produce actual projections of things the typical fantasy player cares about: counting stats like homers, steals, strikeouts, etc.

My feeling is that PECOTA tries to sort of do both of these things at once which I think is a mistake. Perhaps it's not actually doing both things at once in terms of modeling, but at a minimum the presentation feels that way, and I think it muddies the water and is very confusing.

I would much prefer a very clear and distinct assessment of player ability or potential(without regard to projected playing time). That is information that I can't easily provide or create myself(at least in the comprehensive way that PECOTA can). What I can do on my own is get a handle on playing time from various internet sources. The competitive advantage that we as PECOTA users could enjoy I believe comes from applying our personal knowledge/hunches about personnel situations combined with PECOTA's knowledge of what a player might do if given a chance. But if you only present the data through the prism of how BP projects playing time, it makes it harder for me to get the information I really want from PECOTA.
brownsugar
10/02
Just wanted to say that I have really enjoyed the Reintroducing PECOTA series of articles on several levels, including the clarity about the system that the articles have provided, the discussion that the articles have spawned, and (not least) the amount of effort that has gone into revamping the system to make it better. I think most would agree that 2009 and 2010 were not exactly banner years for the system; these articles have restored my confidence that the 2011 PECOTAs will be what I've come to expect and enjoy. Great work Colin.
BurrRutledge
10/02
This is amazing. And I'm very happy that BP is publishing it for our consumption and analysis. The results are truly astounding.

If the sampling that Colin chose should be representative (300 PAs), then the percentiles are busted. Absotively busted. Or, at the very least, they're not going by the right name.

In this sample, PECOTA actually does a pretty good job at capturing the center - 23.9% of the sample fell within the 40-60th percentiles. Not too shabby at all. But, as Colin points out, and Tom and others expound upon, the percentiles above and below this midpoint are under-predicted, until you get to the way-way outliers.

It's pretty straightforward to see that the overall spread is too tight, and would need to be widened to re-capture the true 0-10% and 90-100% ranges and distribute them into the troughs.

As several commenters have done, it's easy and kind of fun to rationalize why a player might perform under his 10th percentile, or over his 90th. But, at the end of the day, only 10% of the overall population should fall into those categories if they are really percentiles at all (and if this sample is representative).

As I said, this is amazing - to see this information analyzed and published for "review." I almost say "peer review," but that would imply something I'm not willing to accept.

Kudos to Colin, Kevin, and the rest of the team. I look forward to the offseason developments with excitement.
BurrRutledge
10/02
correction... "But, at the end of the day, only 10% of the overall population should fall into *each of* those categories if they are really percentiles at all (and if this sample is representative)."

Again, kudos. Have a great weekend watching the regular season come to a close! Best, Burr
BurrRutledge
10/02
Okay, I came back for a second read through the article. Something I rarely do, but here I am. What was bugging me was my recollection that the 2010 Player Card 10-year forecasts that several readers (including me) called into question. For almost all the players, but particularly for young prospects, the 10th percentile just seemed way too low, and if I remember correctly, there had been a programming update that introduced feedback into the algorithm. Players/prospects out of baseball very quickly at their 10th (and even higher) percentile of performance.

In the context of this observation from Colin: "... apparently there’s more uncertainty on the downside than the upside. This is something we can build into our model as well;" I wonder if our initial observations were incorrect.

It might be worth revisiting.