CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here for forgotten password Click here to subscribe

<< Previous Article
Premium Article Notes from the Field: ... (03/27)
<< Previous Column
BP Announcements: Intr... (03/28)
Next Column >>
BP Announcements: Enti... (04/01)
Next Article >>
Fantasy Article Fantasy Freestyle: Twe... (03/28)

March 28, 2013

BP Announcements

PECOTA Percentiles Are Here

by Colin Wyers

PECOTA percentiles are now available to subscribers.

Those of you new to BP, or to PECOTA, might wonder why we publish percentiles in addition to the weighted-mean projections for players, which we’ve already released. The answer is that forecasting is an inexact science; the future is not exactly what you'd call certain. The percentiles allow us to put a range of outcomes around a single-point forecast, to illustrate how uncertain the forecast is and what range of outcomes are most likely.

The percentiles, then, represent the spread of outcomes if we were to have a player go through the 2013 season thousands upon thousands of times. Imagine a bell curve, with the 50th percentile at the very peak. Twenty percent of the time, a player's results should fall in between the 40th and 60th percentiles—or 60 percent of the time, a player should perform at his 60th percentile or worse, while 40 percent of the time, he should play better.

As an example, let’s take a look at Giancarlo Stanton’s percentile forecasts (click to enlarge):

Our best estimate is that Stanton will be about a five-win player, with a .314 TAv. If he plays to his 90th-percentile projection, though, he could post a .345 TAv and be about as valuable as NL MVP Buster Posey was last season. And if he disappoints to the tune of his 10th-percentile projection—well, he’d still be a pretty useful player. Giancarlo Stanton is really good at baseball.

You can find the percentiles in the “2013 Forecast” section of the player cards (not the box at the top, with the basic projections—scroll down, or select the “PECOTA ONLY” tab, and you’ll see it).

A few more notes might be helpful here. The basic inputs to the percentiles are:

  • The reliability of the forecast,
  • The level of talent forecasted, and
  • Expected playing time.

Percentiles for batters cover offense (not fielding or baserunning, except as a function of playing time and opportunities). The percentiles key off the primary rate stat for each type of player, TAv for hitters and ERA for pitchers. The component stats are meant to illustrate a likely set of stats that could lead to that level of production for that player. What this means is that a hitter’s percentiles in home runs, for instance, reflect the home runs that would lead to that TAv, assuming similar changes in the other stats in a hitter’s batting line, not the chance of hitting that many home runs. There are many different batting lines that can lead to any one TAv.

We’ve tested the percentiles against historical data, and we can report that they behave how you’d expect—80 percent of batters fall between their 10th- and 90th-percentile forecasts for TAv, for instance.

In the past, we’ve forecasted a linear fit of ERA and RA for pitchers based on expected batting against. There’s a lot of variance in ERA that isn’t captured by batting-against stats, though, particularly performance with men on base. We’ve back-tested against historical data and added some extra variation to ERA and RA to account for this.

We’ve also integrated the percentiles more closely with the depth charts, for players who appear in those—we’re pulling things like lineup slots to help calculate RBIs, for instance. The percentiles take quite a while to run, though, so don’t expect them to stay in sync with the depth charts, which can be updated as often as several times in one day.

Colin Wyers is an author of Baseball Prospectus. 
Click here to see Colin's other articles. You can contact Colin by clicking here

Related Content:  PECOTA,  Percentiles,  Forecasting

23 comments have been left for this article. (Click to hide comments)

BP Comment Quick Links

AJ

Yes! Been eagerly awaiting these.

With 10 year forecasts back on the drawing board, is there an ETA or is it still TBA?

Mar 27, 2013 23:44 PM
rating: 0
 
BP staff member Ben Lindbergh
BP staff

TBA.

Mar 27, 2013 23:46 PM
 
BurrRutledge

Thanks, Colin & team. I know from Joe's earlier posts that a lot of work was done under the good to validate these, and I appreciate it.

Psyched.

Mar 28, 2013 05:01 AM
rating: 2
 
ttt

This is great... I just wish it said "click to embiggen" instead of "click to enlarge." I don't know why, but I feel like that "word" would have been perfect there...

Mar 28, 2013 07:25 AM
rating: 0
 
TangoTiger

"We’ve tested the percentiles against historical data, and we can report that they behave how you’d expect—80 percent of batters fall between their 10th- and 90th-percentile forecasts for TAv, for instance."

Can you show those test results?

Mar 28, 2013 07:57 AM
rating: 3
 
BP staff member Colin Wyers
BP staff

I thought I just did. Can you clarify what you're looking for?

Mar 28, 2013 08:08 AM
 
TangoTiger

Colin: you provided a conclusion with no evidence. You know how I feel about that.

I'd like to see you redo the work you did here:
http://www.insidethebook.com/ee/index.php/site/comments/pecota_percentiles_finally/

You provide the evidence, the conclusion follows that evidence, and the reader can feel comfortable that your conclusion is valid.

Mar 28, 2013 08:15 AM
rating: 4
 
BP staff member Colin Wyers
BP staff

Maybe we're using the word differently, but I'm pretty sure that is evidence, not conclusion. It's also what I posted the last time I announced the percentiles, so I didn't think it was an especially controversial claim. If you want me to go into more detail about the tests, I'll see what I can do, but this close to the start of the baseball season there's a lot of other calls on my time.

Mar 28, 2013 08:25 AM
 
TangoTiger

I don't really want to get into a semantical debate of evidence and claims.

All I'm asking is for you to provide something that you've provided in the past, like this:

Mar 28, 2013 08:32 AM
rating: 4
 
BP staff member Colin Wyers
BP staff

Those tests were something I did a year ago, so it's going to have to wait until I can get a screwdriver and put the hard drive from the last computer I owned into this case and dig them up.

Mar 28, 2013 08:37 AM
 
adfeit

Can these be looked at for all players somewhere/somehow?

For example, can I download a spreadsheet of all players' 70th percentiles instead of the weighted means?

Mar 28, 2013 08:00 AM
rating: 2
 
BP staff member Colin Wyers
BP staff

Not at this time, we'll look into that if there's sufficient interest in it.

Mar 28, 2013 08:13 AM
 
Schere

Glad to see these. Two questions for you - is there a simple variability measure you could give here that would be useful? e.g., 4.9 WARP with a 2WARP St Dev?

I'm sure it's not that simple, since the percentiles don't appear to be perfectly symmetrical...nevertheless, there is surely some number that can show that player A has a more volatile expected WARP than player B. It would be nice to get that, conceptually, without having to look at each player's percentiles.

Mar 28, 2013 08:30 AM
rating: 1
 
BP staff member Colin Wyers
BP staff

They're not perfectly symmetrical, no -- that's intentional, if you're forecasted for a .300 TAv there's simply more room down than up, so the percentiles are built to reflect that.

I'll look and see if something like this is something we can provide.

Mar 28, 2013 08:34 AM
 
Schere

thanks! Seems like it would be useful.

Mar 28, 2013 08:47 AM
rating: 0
 
adfeit

This is kind of what I was thinking of too - perhaps a standard Deviation UP and DOWN?

What I would really love is to see this (or even just a total standard deviation) included next to the PFM $ amounts... What do you think?

Mar 28, 2013 09:47 AM
rating: 0
 
Sean

Colin, where would we see this? I just checked four players: Stanton, Posey, Castro, Votto, and Kemp, and each seems to show a symmetric distribution (comparing TAv for 10-50-90).

Not that I'm bothered by a symmetric distribution—though I do understand the Bayesian reasons you wouldn't have them. With how they look now, I'd definitely use the s.d.'s if you can generate them.

Mar 28, 2013 10:24 AM
rating: 0
 
TangoTiger

I think for all intents and purposes, TAv will be symmetrical (or close enough that you can barely notice it's not).

ERA won't be (can't be). But the square root of ERA should be close to symmetrical. From the few I looked at, however, ERA looks fairly symmetrical.

Mar 28, 2013 11:32 AM
rating: 0
 
Sean

Thanks, Tom. Do you mind explaining why the properties would be different? Would it be because of how TAv is rescaled?

And I do see that there is a slight bit more skewness to ERA, though the scale is also much larger, enhancing very small differences. Either way, if the distributions are barely skewed, I'd still love mean + s.d. as a more compact way to deliver the info in a spreadsheet.

Mar 28, 2013 11:39 AM
rating: 0
 
TangoTiger

TAv follows closely to OBP, which is a binomial metric, and so follows the binomial distribution. If your OBP is .330, we’re going to observe a slight skew given 600 PA, but, really, you won’t notice it.

For ERA, it’s proportionate to the SQUARE of TAv (or similarly, the square of OBP). Just think of Bill James’ Runs Created which is, at its core, OBP x SLG. Which is kinda like saying OBP squared.

So, if you take the square root of ERA, you’ll get something that is proportionate to OBP, and so, you should get that kind of distribution.

Theoretically anyway. It shouldn’t be too hard for someone out there to prove me right (or wrong!).


Mar 28, 2013 12:21 PM
rating: 1
 
Schere

I guess I was asking about WARP, which is skewed because playing time scales with TAv for most players. But TAv + SD TAv would be nice to have.

Mar 28, 2013 12:33 PM
rating: 0
 
TangoTiger

Right, if you include playing time (and on top of that a baseline), it's going to get funky.

You are asking for a distribution of this equation, basically:

PA x (TAv - baseline)

That baseline is fixed, the TAv follows a binomial-type of distribution, and PA is going to be heavily skewed.

Mar 28, 2013 13:07 PM
rating: 0
 
Schere

yeah, so I'll be happy with SD of TAv

Mar 28, 2013 13:22 PM
rating: 0
 
You must be a Premium subscriber to post a comment.
Not a subscriber? Sign up today!
<< Previous Article
Premium Article Notes from the Field: ... (03/27)
<< Previous Column
BP Announcements: Intr... (03/28)
Next Column >>
BP Announcements: Enti... (04/01)
Next Article >>
Fantasy Article Fantasy Freestyle: Twe... (03/28)

RECENTLY AT BASEBALL PROSPECTUS
Premium Article League Preview Series
Every Team's Moneyball: Minnesota Twins: Reb...
Premium Article Skewed Left: History Repeats Itself
Premium Article League Preview Series
Premium Article Pitching Backward: Why Relievers Get A Free ...
Premium Article Spring Training Notebook: Cactus League
Prospectus Feature: How the Astros do Spring...

MORE FROM MARCH 28, 2013
Premium Article Skewed Left: The Future Cost of Present Impr...
Premium Article Notes from the Field: Four Prospects with Wo...
Premium Article In A Pickle: More Unknown Facts About More U...
Premium Article Rumor Roundup: Strong Side, Weak Side
Fantasy Article Fantasy Tiered Rankings: National League Sta...
Fantasy Article Fantasy Freestyle: Twenty Endgame Targets in...
Fantasy Article Fantasy Tiered Rankings: Relief Pitchers

MORE BY COLIN WYERS
2013-04-26 - Manufactured Runs: The Hawk Trap
2013-04-22 - Premium Article Manufactured Runs: The King in Cubbie Blue
2013-04-20 - BP Unfiltered: Who's on First, Jean Segura E...
2013-03-28 - BP Announcements: PECOTA Percentiles Are Her...
2013-02-20 - The Socratic Approach to PECOTA
2013-02-15 - Baseball Prospectus News: Introducing the 20...
2013-02-11 - Baseball Prospectus News: Now Arriving: PECO...
More...

MORE BP ANNOUNCEMENTS
2013-04-10 - BP Announcements: Baseball Prospectus Day at...
2013-04-04 - BP Announcements: MLB.com Seeking Stats Stri...
2013-04-01 - BP Announcements: Entire Baseball Prospectus...
2013-03-28 - BP Announcements: PECOTA Percentiles Are Her...
2013-03-28 - BP Announcements: Introducing: Dissecting th...
2013-03-26 - BP Announcements: Free Days
2013-03-20 - BP Announcements: Starting Pitcher Guide Ava...
More...

INCOMING ARTICLE LINKS
2013-06-27 - Feature Focus: Player Cards