keyboard_arrow_uptop

We have a few pitchers now available in the beta testers only cards.  Every pitcher with last name starting with A is up.  Here are our known issues:

  • The Stars & Scrubs Chart is busted.
     
  • WARP is incorrectly 0.0 for all historical lines.
     
  • SNWX and WARP are incorrectly 0.0 for the 2010 weighted mean projection line.

We've fixed some of the things people have noticed with the hitters cards, including:

  • Corrected park effect note on the 10-year projections.

  • Card page title has player name in it.
     
  • Remember, you can get to the BP home page by clicking in the header of any page with the logo in the upper left.  We also have a link in the footer and a breadcrumb under the header that will contain both a link to both the BP home page and the main PECOTA card page (though it doesn't work that way for the beta tester only cards).
     
  • The ten year performance chart for hitters and pitchers now uses cutting-edge smiley technology to reflect an 'out of baseball' value.

I believe we have one more known issue with the hitter cards that we need to fix (beyond some of the more conceptual points you've brought up in the last beta post thread), but it's really late and for the life of me I can't remember what it is.

More is coming soon, and as always, thank you for subscribing, and helping us get these perfected.

You need to be logged in to comment. Login or Subscribe
Junts1
3/21
The only known issue I can think of is that apparent upward skew of projections due to relatively minimized dropoffs from 50th to 10th percentiles, but that may be what you're referring to as conceptual.
Junts1
3/21
As an addition, I just started looking at the pitchers and the minimal dropoff issue seems to be present here as well. I'm trying to look more at established pitchers than I am at prospects since their projections should be more 'stable' and easy to relate to previous years. In doing so, I see: Bronson Arroyo, a player with a massive collapse and nearly no breakout rate, nonetheless has a ton of improvement in his 70th-90th percentile projections (moving from 4.58 eqera up to 3.95), yet his lower percentiles almost barely budge (he moves down to only a 4.88 eqera at his 10th percentile). Bronson Arroyo has a 15% attrition rate, and I think there's almost no reasonable way to argue that his 10th percentile projection should be that good. A very bad season for him is going to be a lot worse than a 4.88 EqERA. (you know, like mid-high 5s, probably). It's one of those seasons that leads to his attrition. Tony Armas has a bit more balanced breakout/collapse chances, yet he also has the significant disparity that trends upward. PECOTA seems to refuse to suggest that someone's 10th percentile projection is a drop season or anything similar. It just doesn't want to predict such badness! Another wierd one: Matt Albers' 80th percentile projection is actually better across the board than his 90th. It is rated higher and produces more warp only because it projects him to pitch an unreasonable number of innings due to suggesting he'll get 13 starts: All his rate statistics are worse in the 90th percentile line.
jrmayne
3/21
The 80th/90th percentile problem existed with the hitters, too. See Jake Arrieta for a more compelling example of how the percentiles are busted. --JRM
Junts1
3/21
I guess the Albers projection is more confusing than wrong: its saying that Albers could be marginally more valuable if hes called on to get a lot of starts, but since starting is harder he'd likely have a bit worse rate stats than his best relief seasons, despite the fact that the extra started innings would equate to more overall value. That's not actually something that doesn't make sense for a pitcher like Albers, given what we know about rate stats worsening when a pitcher starts more often.
tbwhite
3/21
I still have issues with the 10 Year Forecasts. There seems to be a sort of false precision involved. For example, I checked out Phillippe Aumont. He is projected to steadily decline from 2010 to 2014 with his Stuff going from 12 to -13. Suddenly in 2015, at age 26, he posts a stuff score of -3, which is a big improvement, more interestingly his ground ball percentage tumbles from 51%(it had always been above 50%) to 33%. The next year his GB rate is at 38% and his Stuff is -1. Essentially, PECOTA is predicting that 5 years from now Aumont will dramatically remake himself as a pitcher, abandoning his ground ball tendencies and becoming much better in the process. That is an astounding and remarkable prediction and quite frankly absurd. PECOTA should be informed by comparables, but not a slave to them. In this case it feels like some quirk of Aumont's comps have created this absurdly detailed projection. Looking at the 10 Year Performance Chart it seems that his median projection for 2015 is to be out of baseball. So, it makes me wonder what numbers are being used in 10 Year Forecast Chart. If they are simply Weighted Means then I also have an issue when the Median Forecast says "out of baseball" but the weighted means says "above average pitcher".
Junts1
3/21
This is typical to PECOTA: A player will have rebounds late in their 10 year projections because all the lower-end potential projections for those years become drop scenarios and no longer count to the weighted mean. What pecota is saying is that there's no way Aumont makes it until 2015 unless he hits the better parts of his projections along the way .. the potential bad seasons in 2012 and 2013 that could still keep him in the league will no longer do so in 2015.
tbwhite
3/21
Thank you, I think you made my point. You're implying there is a selection bias in PECOTA. Because the crappy players flame out, only the late bloomers are left to project let's say years 7 to 10 of Aumont's forecast. Take a look at the 4th chart: http://www.baseballprospectus.com/article.php?articleid=2659 What you are saying is that for Aumont by 2015 there are no Pete Peaky's or Eddie Early's left, so by default his projection jumps to a career path like that of Lenny Latebloom. Great, except it is inconsistent and makes no sense. If you project the player to be out of baseball, project him to be out of baseball, don't say "but if by chance he's still there then I guess he has to be pretty good". That's not a prediction, it's a cop out.
Junts1
3/21
Thats totally ok though; it is a legitimate analysis of what Phillpe aumont will look like -if- hes in the league then .. the chance hes not is represented by the attrition/drop rates for that period. Aumont is highly unlikely to still be in baseball at that time and be as bad as his lower end projections from 2012 and 2013. That's actually a good projection.
tbwhite
3/21
So, heads PECOTA wins and tails I lose. What you are saying is not a projection at all. It would be like projecting each week of the NFL schedule in August, showing that the Lions should go 6-10, and then including that in Week 18, that they still have a chance of winning a wild card game, because you know if they DO go to the playoffs they must be pretty good. If they DO go to the playoffs after you said they would go 6-10, why would I give a crap what you thought about the Wild Card game ? Carry your argument to it's absurd conclusion. Let's publish 25 year PECOTA projections, what would Aumont's projection look like for his 40's, when the only guys left pitching are Hall of Famers ? I guess my point isn't that I want to know what Aumont would look like pitching in his late 30's or 40's because if he is, he must have been a good pitcher. I want to know IF PECOTA thinks he will be pitching when he's 30 or 40. I want the 10 Year Forecast to reflect that. Not the alternate universe where the crappy Aumont PECOTA projects for the next few years is secretly replaced by better quality pitchers each further year that is projected. This is a selection bias and it should be removed.
tbwhite
3/21
More 10 Year Forecast stuff, this time I'm looking at Tyson Gilles the OF prospect for the Phillies. PECOTA projects him as a .246 TAv hitter at age 21. For 9 years he shows no improvement, even regresses some, until miraculously at age 30 in 2019 he posts a .264 TAv. Really ? What kind of career path is that ? I know that not all prospects make it in the show, etc. But how many 21 year olds capable of posting a .246 TAv in the show(and that's the weighted mean I believe, not the 80th or 90th percentile) NEVER improve ? I believe the increase in coverage(200 players per team ?!?!) and strict adherence to comps is causing some unfortunate and unintended consequences. I still haven't been able to put my thumb on what exactly is bugging me, but there is just something completely unrealistic about the 10 Year Forecasts. It's almost as though each year is treated independently of every other. I get it that out of all the guys like Tyson Gilles most of them don't pan out and that might be why his TAv for age 22 and 23 drops from his age 21 TAv. But, each year should be predicated on the previous year's performance. If we fast forward to next year, and PECOTA was dead on and due to injuries in the Phillies OF, Gilles actually got 400 PA's and posted a .246 TAv, would he still be projected for a .243 TAv for 2011 ? I certainly hope not. So, why don't the projections reflect this ? It feels like every projected year is being weighed down by all the prospects that didn't make it, which is ignoring the fact that PECOTA is already saying he DID make it. Just speculating, but I assume comps are done by using ML equivalencies, but is priority given by the level of play ? A .246 TAv posted in the majors is very different from a .246 TAv MLE posted in High A. After all I'm sure lots of guys in High A who post a .246 MLE TAv don't make the show, but the guys who did it in the majors already made the show. Big difference, akin to the way PECOTA supposedly takes draft position into account.
Junts1
3/21
I think you radically over-estimate the impact of comparable players in PECOTA. I strongly recommend you read Nate Silver's articles explaining how the system works going backwards from 2008 to understand how it creates its predictions.
tbwhite
3/21
I don't mean to be rude, but who are you ? I'm really not interested in your defense of BP and PECOTA. If my understanding is wrong, I trust that a BP staffer will correct me. Please stop writing as though you are the BP help desk. As I understand it, the essence of PECOTA is finding similar players based on many types of measurements(physical traits, size, age and baseball performance). In essence clustering players, and then projecting their future performance based on the growth patterns exhibited by their cluster peers. Changing the nature of the clusters, by allowing more variation across playing level(ML,AAA,AA, etc) would probably make a big difference. It's all speculation, but BP is inviting it at this point thanks to their relative lack of transparency(although they are doing better in this regard in the past week or so).
Junts1
3/21
I equally don't mean to be rude, but you're crazy to expect BP to provide you with an extensive explanation of how PECOTA works in these regards because these things have already been explained at great length. That is why there are article archives. Go read them if you want to understand how the projection system works.
tbwhite
3/21
I don't expect an extensive explanation, you're putting words in my mouth. A simple, "that's not quite right, read this to see how it really works" (with a link to said article) would be fine.
jivas21
3/22
Dave/Clay/others, Thanks for correcting some of the items that were raised in the other beta blog. Hopefully you'll have a chance to look into or address some of the other questions that we raised in the comments on the prior blog, chief among them the apparent greater upward skewness of the 2010 PECOTAs (which Mike reiterated above). I'm not going to have much time to look at the pitcher cards over the next few days - a shame, because I'm excited to be able to do so - but I wanted to add a few comments based upon a cursory glance: (1) Really tiny point: in the Projected Playing Time box, it indicates where the pitcher appears to slot into his team's rotation. I got a huge chuckle from seeing Danny Haren's position as "Starter-2", because your depth chart has Brandon Webb as the nominal #1 starter. This may just be a single oddball case, but I'm guessing most of your readers don't put much stock into the perceived rotation slot of starting pitchers. (2) Whither SIERA? It would be great to see SIERA (or another ERA approximator) to provide some additional context to the ERA on each line (to be fair, BABIP *is* included, which certainly helps). (3) I had meant to raise this in the hitter's blog, but it works here as well: it would help to have an index on the Platoon graph. I'm guessing that the 3 rows provide platoon splits for BA, OBP, and SLG, respectively, but that's not specifically outlined on the graph. (4) Another thing I forgot to add about the hitter cards - and please advise if you'd prefer these to be included on the comment section of the other blog - but would it be possible to have BABIP included on the hitter cards? Similar to SIERA for pitchers, it would add context to the batter lines - for instance, one might be able to see how much of the projected improvement forecasted for Jay Bruce relates to regression/progression to the mean in his BABIP. I need to go do real work. I have to say, though: once you guys get all the bugs worked out here, these new PECOTA cards are going to be phenomenal.
Junts1
3/22
Hey Jivas, I wanted to address two of your requests with a reminder: PECOTA doesn't actually create projections by organically creating obp/slg etc data. It projectd eqas and the like and then converts those into realistic stat lines. Since it does that, both SIERA and BABIP would be derivitive stats of derivatice stats from a projection. I don't believe this would make it any more helpful: the numbers would be purely manufactured data and I think it would hace little practical worth Further, PECOTA uses neither stat. That means it would not be generating different numbers for siera than its predicted eras. Pecota handles luck, bad and good, through regression to the mean and knowing that only a certain perxentage of a performance spike can be expexted to carry over. Pecota is trying to predict real skill data. Its not equipped to prduce what amoiunts to data on how good or bad someones luck will be. And that's all babip or siera data add ti the numbers we already get.
jivas21
3/23
Mike, Thanks for the reminder. In any event, my guess is that people don't use the PECOTA cards strictly to take the forecasts at their face value - otherwise, the spreadsheet would suffice. Rather, they use the information on the cards to develop independent expectations on player performance in the coming year(s). So regardless of whether PECOTA uses BABIP or SIERA (which, obviously, it doesn't use), this is information that will *greatly* facilitate the processing of statistical information contained within the cards in developing these expectations. I'm guessing that a lot of people go to FanGraphs to obtain current and historical statistical lines that include this type of information, and I'm further guessing that BP would prefer that these people stay on the BP platform full-time. Personally, I always start with BP pages (either DT cards or PECOTA cards) when looking up individual player stats, but I often end up moving elsewhere - FanGraphs for BABIP/FIP/etc, ESPN for splits. If my recommendations above are implementable, at least some of this movement could be limited.
dpease
3/22
Hi all. All pitchers have now been published. Rather than reply to each individual comment, Mike Juntunen is right about his observations of the system overall. On the specific topic of long-term projections, PECOTA will ignore dropped seasons altogether, rather than factoring them in. If you see a player whose long-term projections look optimistic, keep in mind that his drop rate determines whether he'll be in the league at all that year. Nate explains it better than I do here: http://www.baseballprospectus.com/article.php?articleid=7189
jrmayne
3/23
Dave: Three notes/questions: 1. You say that PECOTA ignores dropped seasons altogether, rather than factoring them in. That appears to be true for the 2009 iteration - see Jay Bruce's 2009 card for an example. But Clay said it isn't true for 2010, right? That's why we're ending up with these very strange looking lines - excellent half-time play for guys like Montero and Heyward. 2. Mike said that the comps weren't such a big factor. That's not true, is it? For most players, comps are the backbone, not a small add. Nate made it repeatedly clear that comps were the backbone. 3. Mike said PECOTA doesn't use BABIP. Nate said PECOTA uses BABIP (and pretty much any other stat that works as a leading indicator; PECOTA looks - or looked, as the case may be - "at the interaction between every statistic and every other statistic.") Am I right, or I have I misread Nate or you or Mike? --JRM
jivas21
3/23
Dave: I see that you've rolled out the PECOTA cards in full. Congratulations to yourself, Clay, and everyone else who has worked very hard to reach this point. HOWEVER ... there is still some work to be done on your end to increase confidence that the forecasts in the current year cards are "right", and speaking for myself (and I'm guessing some of the others among the Beta testers) my confidence is nowhere near full until *someone* addresses our questions about if and why the 2010 projections appear to be more highly skewed upwards than prior year projections. Much like the comments that were made on the prior "torch & pitchfork" comment sections, you can't imagine what the perception is on the outside of BP's silence on this issue. Please pardon the all caps here (and the double negative!), but YOU CAN'T NOT ADDRESS THIS QUESTION. It won't magically disappear if you ignore it. _________ Again, congratulations, and thank you for the opportunity to help out.