keyboard_arrow_uptop

And here we are–the release of the 2011 PECOTAs.

While I have your attention, I’d like to say a few words about the production of the PECOTAs this year. I guess I don’t have to tell you that I’m filling some pretty big shoes here–Nate Silver is probably the most famous sabermetrician not named Bill James, and PECOTA is where Nate made his biggest mark in our community. And so I’m building on his work–and work by people like Clay Davenport and Gary Huckabay, too. I am, as they say, standing on the shoulders of giants. (In big shoes, apparently–this is what I get for mixing my metaphors.) So I owe them, and others I’ve probably neglected to mention, my deepest thanks.

But even with all that, I wouldn’t have gotten this far without a lot of help. I couldn’t have accomplished what I’ve done without the help of everyone here at Baseball Prospectus, who have really given me all the support I could ask for. But I want to thank a few people especially–it’s a fine line to walk, as if I list too few I risk upsetting someone unfairly left off, and if I list too many, you won’t finish reading the list. So with that in mind–very special thanks to Rob McQuown, Mike Fast, Ben Lindbergh, and Steven Goldman. Gentlemen, take a bow, and everyone, please, give them a round of applause.

Now, then, on to business.

This is the first release of PECOTA, and as such will continue to undergo revisions through the remainder of the offseason. The program we use to generate the PECOTAs is continually evolving, and when we discover new ways to improve the forecasts, we'll make those changes and pass the updated forecasts on to you. We’ll also be updating periodically to keep up with players who switch teams.

In addition, we have several PECOTA features yet to roll out out–first will be the Depth Charts, which combine the PECOTA forecasts with estimates of a player’s role and playing time. Those should be ready a week from now, and will be available in two forms: the team Depth Chart pages, and the Player Forecast Manager (which is receiving some upgrades as well).

After that, we’ll be publishing the PECOTA cards, featuring perks like the percentiles and the ten-year forecasts. We’ll update you more as we get closer to that point.

We’re also revamping the cards to use the new Wins Above Replacement Player model we’ve developed. PECOTA has already been adapted to use the new WARP, so WARP baselines have shifted a bit from what you’re used to seeing. The biggest change is among relief pitchers, who take a major hit. Please keep this in mind as you review these forecasts. We know that many of you are relying on these forecasts for your fantasy teams, and we thought that it was better to get the forecasts out now rather than wait for when the entire site was ready to transition to new WARP.

Now, some of you may be asking, “How good are the PECOTAs this year?” Of course, we won’t know the answer for another eight months or so. But we can come up with an educated guess, if we make the assumption that there’s nothing special about predicting the 2011 season, and that a system that works over previous seasons will work in succeeding seasons.

In the course of producing the PECOTAs, we generate forecasts for every player who played from 1950 through 2010. These aren’t quite the same as the full PECOTAs–they are park-neutral, rather than being adjusted for the home park a player plays in. They are not age-adjusted. And for the most part, they do not reflect minor-league performance–we have major-league data for all of MLB history, but very little minor-league data. Still, they do represent a substantial portion of the PECOTA process.

It is time-prohibitive for us to generate full age curves for all of these historic forecasts, but we did adapt a simplified set of age adjustments for past purposes. These simplified PECOTA forecasts aren’t as accurate as the full PECOTAs, but they give us a chance to view how well PECOTA fares over a large swath of history.

There’s only one other projection system available and therefore able to be pitted against PECOTA for such a large part of baseball history: the Marcels, originally developed by Tom Tango. (The version we’re using in these tests was published by Jeff Sackmann.)

I “re-baselined” each forecast for each season in the test by subtracting the average forecast and adding in the average performance of the players forecasted (weighted by playing time, in both instances). Then I took a look at two tests–one is root mean square error, which tells us that 68% of forecasts were within that margin of error. The other is simply counting which forecast was closer to a player’s actual performance. Looking at offensive stats first:

P_OBP_RMSE

P_SLG_RMSE

M_OBP_RMSE

M_SLG_RMSE

P_OBP_SR

P_SLG_SR

0.036

0.066

0.036

0.066

56%

54%

In terms of RMSE, our simplified PECOTA is in a dead heat with the Marcels. (And again, PECOTA is giving Marcels an edge, as it is forecasting everyone for a neutral park; the Marcels make no park adjustments, but most players do not switch teams or parks between seasons.) In terms of “success rate” (in other words, the percentage of head-to-head projection matchups "won"), PECOTA has a slight edge.

Now, for pitching:

P_ERA_RMSE

M_ERA_RMSE

P_ERA_SR

1.22

1.22

48%

Again, RMSE shows a dead heat. In terms of success rate, the Marcels have a slight edge over the simplified PECOTAs.

Like I said, this sort of testing emphasizes breadth rather than depth–PECOTA has foregone several of its advantages, like park adjustments and minor-league data. And yet it’s still producing accurate forecasts.

Now, just as a reminder–PECOTAs are available only to our subscribers (Premium and Fantasy) who are signed up for a whole year. If you haven’t already subscribed, you can do so here. Doing so doesn’t just get you access to PECOTA, but also to our fantasy tools like the Player Forecast Manager and Team Tracker, as well as exclusive access to some of the best baseball writing out there. If you’re already a subscriber, thank you for your support, and I truly hope that you all enjoy what we do as much as we enjoy doing it. I am continually amazed at how intelligent and knowledgeable our readers are, and I really do think that the people who read BP are among the best baseball fans I’ve ever known.

That’s all I've got–have fun, folks.

You need to be logged in to comment. Login or Subscribe
NYYanks826
2/07
+1 with a ridiculously large number of zeroes following it.
Chomsky
2/07
I see the PFM isn't quite ready yet - ETA?
benharris
2/07
"In addition, we have several PECOTA features yet to roll out out–first will be the Depth Charts, which combine the PECOTA forecasts with estimates of a player’s role and playing time. Those should be ready a week from now, and will be available in two forms: the team Depth Chart pages, and the Player Forecast Manager (which is receiving some upgrades as well)."
NYYanks826
2/07
Looks like, according to PECOTA: - Ryan Howard (41) is the only player hitting more than 40 homers this year. - Joe Mauer (.317) will lead the league in batting average, with only a small handful of players even breaking .300. - Juan Pierre will lead the league with 50 SB. Interestingly, Derrick Robinson has the next highest predicted steal total with 47. - There will be a three-way tie with CC Sabathia, King Felix and Dan Haren for the league lead in wins (15). Those same three will lead the league in strikeouts, with King Felix taking the K crown at 204. - Joe Nathan is going to be the runaway saves leader (40), with K-Rod behind him at 35. Great stuff guys! This is one of the two most exciting days of the baseball pre-season, with the other being the release of the newest MLB: The Show video game.
ndemause
2/07
Keep in mind that these are weighted means - in real life, there are inevitably going to be outliers who do better because they exceed their forecasts. So saying "PECOTA thinks only one player is going to hit 40 homers" is a bit like saying "PECOTA thinks everyone is only going to roll sevens at the craps table."
deacon14
2/07
Could someone better explain Weighted Means to me? What exactly is the spreadsheet we are looking at? One, will playing time affect Jason Heyward's home run number or only a variance away from the mean. Two, what are Mike Trout's projections supposed to represent, if he played in the majors this year?
pobothecat
2/07
excellent question. thank you for asking it.
sfbennett1
2/08
From my PECOTA experience: This spreadsheet shows the weighted mean projections for each player - which means that the numbers shown in the spreadsheet take into account ALL possible outcomes for how a player will perform in 2011 and weighs them by how likely they are to happen. All data is expressed as each player plays in the Major League environment environment of their currently listed team. Please feel free to correct any portions of the above which are incorrect...
deacon14
2/08
Thank you.
PBSteve
2/07
We just put this up! The world is asleep and you're here! I love you guys!
NYYanks826
2/07
Such is the advantage of working an overnight shift at work tonight :) ...getting a lot of work done, as you can tell.
saint09
2/07
Australia has it's advantages, most notably early evening release of pecota.
Tipman
2/07
Just looked at the Mets team quickly, and Kelvim Escobar with a 3 WARP, which is 2nd highest on Mets pitching staff? I know the Mets pitching staff is bad, but didn't think a guy who is going to pitch in the majors would be 2nd best!
rawagman
2/07
Question about playing time estimates. As this release is not adjusted for playing time, I'll have to take the over on Brandon Morrow, among others. Will the post-depth chart release include such adjustments?
smallflowers
2/07
YEAHHHHH!!! This is awesome. (In the next iteration, can you add the league to each line? Easier to sort for AL- or NL-onlys!)
smallflowers
2/07
Ah, and positions! Sort sort sort!
jpjazzman
2/07
Also please add positions for sorting, really a pain without that. Thanks!
kenfunck
2/07
Here's how you can do this yourself in Excel. If all you need is to know the first position listed for the player in the FRAA column (i.e., the primary position at which PECOTA thinks the player will appear), create a column entitled "POS" at the end of the spreadsheet in Column AQ, and then use this formula in this column for the first player listed: =LEFT(AF2,2) then copy it into every player's row. If, however, you want to see if the player is mentioned as playing that position at all, you'll need to created one column for each of the nine positions, and label them "1B", "2B", etc. Under the column you name "1B", you would enter this formula for the first player listed: =IF(ISERR(FIND("1B",$AF2,1))=TRUE,"","1B") The FIND function looks to see if the string "1B" is found somewhere in the FRAA column. The ISERR function looks to see whether there's an error (if it doesn't find string "1B", the result is a #VALUE error -- you could leave those, but it looks kind of ugly). The IF function states what to do if there's an error (i.e., put a blank space in the cell) or if there isn't an error (i.e., put the value "1B" in the cell). This will put the value "1B" in the column for that player if they're listd at 1B in the FRAA column, and an empty field if not. The "$" before the AF ensures that when you copy this cell, the reference to column AF (i.e., the FRAA column) will stay the same, so you can then copy the formula into your 2b column, edit the formula where it looks up and then stores "1B" to now say "2B", and copy that for all players. Follow the same steps for 2B, 3B, SS, LF, CF, RF and C, and you'll have a column for each position on which you can either sort of filter. If you want a column that will describe if the player is listed by PECOTA for ANY outfield position, you can use this formula in its own column: =IF(ISERR(FIND("F",$AF2,1))=TRUE,"","OF") Hope this helps.
stlpdx
2/08
OR...you could do it for us on a later release?
norraist
2/09
Minor quibble -- seems that the find string for "C" will return a hit for both C and CF. This is because of the same reason using "F" can return any OF.
kenfunck
2/09
Search for "C " (i.e., c followed by a space). I should have specified that -- sorry.
norraist
2/09
This would also work (saw your reply after I puzzled it out) =IF(ISERR(FIND("C",$AF2,1))=TRUE,"",IF(ISERR(FIND("CF",$AF2,1))=TRUE,"C",""))
mblthd
2/07
PECOTA's not too pleased with Jason Heyward.
crperry13
2/07
Or the brothers Upton. My fantasy outfield is screwed! :)
billminick
2/07
How exactly do I access the PECOTA projections? This is not user-friendly at all. Please advise. Thanks.
NYYanks826
2/07
Once you open the Excel file, you should see tabs at the bottom for batters and pitchers.
jpjazzman
2/07
I think he means where the physical file is (which is a bit hidden) Click the fantasy tab, then there is the fantasy box about halfway down the page on the right side with the spreadsheet link. Definitely a bit buried for some reason.
ndemause
2/07
Or just click on "have fun, folks" at the end of the article.
tylernu
2/08
Which is the very definition of burying the lede...the link should be much more obvious.
jessehoffins
2/07
My goodness, thats quite the gap between pujols and the rest of the hitters. Maybe he should offer the rest of them lessons.
jessehoffins
2/07
The expected difference between him and braun in warp is as big as the one between braun and j.d. drew.
rbross
2/07
Awesome. But where is "upside?" Believe it or not, THIS is the reason I subscribe to Baseball Prospectus.
zstine1
2/07
it is important part for keeper league owners. please bring it back
mwright
2/07
Echo this sentiment. I'm sure you guys are swamped today but at some point would like to know if "upside" will be coming back or if there will be any other long-term value metric introduced in its place.
pakdawgie
2/07
I'm assuming that it will take a couple of weeks - they need the long term projections (i.e. the info on the player cards) to calculate this.
sfbennett1
2/07
this was the first thing that I looked for, too! (after looking to see what I had accidentally filtered when I noticed the WARP gap between Pujols and the next best hitter...)
cwyers
2/07
Upside requires the percentile forecasts, which typically aren't ready for the initial PECOTA release. As we get closer to the cards (and I'm not saying we'll wait until the cards are released) we'll get that added to the spreadsheet.
rbross
2/08
Thanks, Colin. Any ballpark ETA?
joelefkowitz
2/07
Oh this is torture. The URL for the spreadsheet includes the words "baseball" and "fantasy" which means it's blocked here at the office--criminal, I know!
leites
2/07
The comps for Carlos Santana are David Wright, Alvin Davis and Carl Yastrzemski (!). Does this mean PECOTA does not think Santana will remain a catcher?
leites
2/07
Also amusing to see that one of the comps for Mitch Moreland is Adrian Gonzalez . . .
zstine1
2/07
and chris iannetta being compared to gary sheffield and brian giles
leites
2/07
And Gordon Beckham's #1 comp is Gary Sheffield. (Why does PECOTA seem to think Sheffield started out as a second basemen? A few years ago he was the top comp for Dustin Pedroia.)
mrabesa
2/07
Well, Sheff did start out as an infielder; 3B and SS, I believe.
GoTribe06
2/07
He is also on Mike Trout's comp list. Anybody who is anybody has a Gary Sheffield comp.
leites
2/07
Hah! Other players this year who have Sheffield as a comp include Bobby Abreu, Kosuke Fukudome, Shin-Soo Choo, Magglio Ordonez (who also has Stan Musial as a comp), Jacob Smolinski (who also has Ron Santo), and Carlos Beltran.
leites
2/07
Although two of Smolinski's three comps are to near-hall-of-famers, he is not listed by KG among the top 20 prospects in the Marlins system . . .
LynchMob
2/07
My favorite is Evertt Williams comp'd to Mikey Mantle!
leites
2/07
And Kevin has Austin as a two-star prospect ("An intriguing center fielder, Austin has speed and power potential, but his bat lags behind.")
zstine1
2/07
also an AL or NL column, which has been in past spreadsheets is very useful for AL or NL only league fantasy owners.
boness
2/07
I'm getting the error "The requested URL /fantasy/files/PECOTA_20110207.xls was not found on this server."
cwyers
2/07
We're having some server difficulties at the moment, folks - we'll have everything back up and running a few minutes.
almartin
2/07
I'm also getting the 'requested URL' error. Please advise!
almartin
2/07
I'm a premium subscriber but am getting a 403 Forbidden error. Is this on my end or yours?
jgergeni
2/07
I am having the exact same problem. I'm wondering if it has to do with the use of "fantasy" in the spreadsheet name.
almartin
2/07
It's working now - thanks guys!
doog7642
2/07
No breakout scores above 10% for hitters? Seems odd.
acmcdowell
2/07
Yes, it does seem odd, especially in light of the extremely high breakout values for some pitchers (20+ over 20%, with a mix of injury recoveries, job shifts, and young players).
tbwhite
2/07
Pitchers are more unpredictable, so it makes sense that in general they would be more likely to break out or crash and burn. However, the lack of any hitter over 10% does feel low. I suppose the logical next question is what exactly does "breakout" mean, and has the definition changed at all from previous years ?
doog7642
2/07
I have no math mind whatsoever, but this seems like a big deal. "Breakout" used to mean (if I'm not mistaken) the percentage chance that the player put up a statline that was 20% (I think) more than the baseline projection. If across the board, no hitter is having that happen more than 10% of the time, what does that mean? Does it mean the comps are consistently conservative? Does it mean that regression to the mean is eliminating potential outlier performance projections? I would like to understand this better.
hotstatrat
2/07
The link "have fun folks" isn't working on my computer. My server can't find that web page.
slepican
2/07
Anyone having luck accessing in the last hour? I am getting the 403 Forbidden error.
geefsu
2/07
I can't access the page, either. Is there still a server issue or do I need to look into this further?
cwyers
2/07
Everyone, the server should be fixed now. Sorry for the delay - let us know if you have any more problems. Also, league and position fields have been added to the spreadsheet, so if you were interested in those, go ahead and download it again.
doog7642
2/07
Am I correctly understanding that once fielding is taken into account, PECOTA thinks Jim Edmonds is an outright improvement over Colby Rasmus in CF for the Cards?
kringent
2/07
Thanks, guys. Looking forward to diving in. One quick request: always enjoyed the companion article Nate wrote about what surprised him, what projections looked high or low, etc. I don't know if that's planned, but it would be welcomed, at least by me.
brianholbrook
2/07
Not to pick individual projections apart, but Freddie Freeman as -17 FRAA at 1B, and below replacement level? Really? All the scouting I've heard rates his glove as above average, but PECOTA sees him as Adam Dunn.
Mooser
2/07
So 381.1 WARP for hitters and 620 WARP for pitchers. I know that playing time is not adjusted yet, but seems too heavily weigthed to pitchers. A leaderboard for WARP would show Pujols at #1, then 14 starting pitchers, and then Ryan Braun as the second best position player.
doog7642
2/07
Perhaps there is a relationship between the low WARPs and the low breakout scores for hitters. Could it be that this PECOTA is particularly gunshy about suggesting significant steps forward for younger players?
brianholbrook
2/07
It sees Jason Heyward as taking a significant step backwards. Buster Posey seems to be treading water in rate terms, but taking an obvious step forward in playing time.
mgolovcsenko
2/07
Down in the weeds, but Fred Lewis is on the Reds nowadays, not TOR. (I was curious how bad a LF the Reds will field this year between Lewis & Gomes.)
LynchMob
2/07
Chris Young (the pitcher) now with Mets
douglasgoodman
2/07
Anyplace in particular we should send names that seem like they should be in the spreadsheet but are not? Happened to notice Stolmy Pimentel was not there -- seemed like he was advanced enough to get a card...
MattBey
2/07
So, Victor Marte is Carlos Zambrano's #2 comp? Let's look at the differences between Victor Marte and Carlos Zambrano. Zambrano made his major league debut at age 20. Marte made his major league debut at age 28. Marte has pitched 39.2 career innings in the majors. Zambrano has had 39.2 career tempter tantrums in the majors. Marte has zero professional starts according to B-R. Zambrano has over 300 professional starts according to B-R. Marte is 6'2" and fat Zambrano is 6'5" and fattish Somehow I don't see the comparison, I really don't. Nevermind Zambrano's been under 4.00 ERA for a decade now and PECOTA still thinks his true level is above it.
MattBey
2/07
and an aside, Archer has Nick Adenhart as a comparison, does that mean PECOTA thinks Archer has a higher chance of flaming out by age 23? Sad that I even have to ask this question.
crperry13
2/07
Horribly hilarious. I don't know if I should "plus" or "minus".
LynchMob
2/07
I assume the basis for this comp can be explained by this statement from Colin's article last week ... This year, we're encouraging PECOTA to rely more heavily on minor league comps for minor league players.
MattBey
2/07
There's nothing wrong with the comparison between Archer and Adenhart, but does PECOTA think Adenhart had a massive baseball related injury that caused him to be out of baseball by 23? If so, wouldn't it factor that into Archer's unfairly giving him a higher chance of flaming out?
cwyers
2/08
Adenhart is a special case as a comp, yeah. We present three comps as a rough guide to what the comps look like, but the actual PECOTA projections use a lot more players than that. So it probably has an impact on his MLB%, but not as much as you would think. It's rare that a case like this happens, but not unheard of. In addition to Adenhart, you've got Cory Liddle, Roberto Clemente, Darryl Kile... there's 22 of them total, it looks like. (And I may have constructed my query a bit too narrowly, looking only at players who died in the same year as their last appearance.) Again, I don't think this has a significant impact on anyone's numbers, but I do agree that it doesn't really help PECOTA to include these types of players in anyone's comp lists, for those seasons. I'll write up a fix tonight and have it out for the next round of PECOTAs.
rscharnell
2/07
Has BP totally eliminated VORP for the WARP projections this season or will the VORP still show up on the player cards?
yeamon
2/07
PECOTA does not share my affection for DLR.
mgolovcsenko
2/07
de la Rosa? I caught that, too. Made me think twice about keeping him in my deep NL-only league.
prhood
2/07
When I open the spreadsheet, it's empty. No player data at all.
prhood
2/07
OK, fixed that my Excel was hiding the tabs at the bottom.
fgreenagel2
2/07
Yeah, where's the VORP?
dconner
2/07
I'm a bit annoyed by the lack of VORP myself, as that's my number one metric. I guess I'll just have to start focusing on WARP and trying to make the mental adjustment myself.
DLegler21
2/07
Yes, please add VORP, its my primary research tool for my league. I hope its not being discontinued, will cause all sorts of rework in my models.
PBSteve
2/07
We will continue to present VORP here at BP.
DLegler21
2/07
Thanks Steve. Can VORP be added to the file?
lucasjthompson
2/07
Defensively, Mauer's projected at -2 at catcher and McCann's a +2 (every catcher seems to be in that narrow range). Mauer's averaged about +5 FRAA a year over his career and McCann about -4. Have you changed the way you calculate this stat? They're both at similar spots in their career, so it seems odd me me. Also, Justin Morneau as Mauer's #1 comp? They're pretty disimilar hitters (and fielders for that matter).
Richie
2/07
Any way of letting PECOTA know that Morrow and CJ Wilson are now starters? Among other things, then getting revised ERA and WHIP estimates for them?
belowm
2/07
Also no longer with the Blue Jays: Miguel Olivo (SEA) and Mike Napoli (TEX). Not with the Mariners: Russell Branyan (FA) and Guillermo Quiroz (SD).
PelotaDiSoldi
2/07
Wow, PECOTA does not believe in Delmon Young's power. I think his projections were better 4 years ago. Not a big fan of Rasmus, either. Pegging him for .100+ less OPS this year than '10. Other remarkable lines: - Dan Johnson projected for the same HR total as Longoria and a higher OPS. Wow. - it appears to be kind to Clay Buchholz, everyone's favorite regression candidate. - PECOTA loves Haren. He and King Felix are the only two projected for 200+ K. - Rumors of Beckett and Vazquez' demises appear to have been greatly exaggerated.
Olinkapo
2/07
I think Vazquez is precisely the kind of pitcher where you should give PECOTA a lot less say than for other pitchers. If I recall correctly, PECOTA has no way of knowing that Vaz' velocity dipped significantly last season. The easier park/league will help, but in Rotoland, I am certainly not paying for the 3.79 ERA and ~200 Ks PECOTA is projecting.
LynchMob
2/07
How long before you think PECOTA will incorporate into its algorithm data from hitfx/pitchfx? So that, for example, it will "know" that Vasquez lost some mph last year ...
mikefast
2/07
I'm really looking forward to that day, and being part of making that happen. The obvious problem is that that data doesn't exist prior to 2007, so the sample size is still very small. Every additional year of data helps us solve that problem. The other "problem" is figuring out what things to incorporate from PITCHf/x, and how. Doing that with fastball velocity isn't trivial, but it's probably doable. Doing that with other pitch data is more problematic. If you wanted me to give a ballpark guess, I'd say that I really hope that two years from now, PECOTA includes PITCHf/x information. That's not a guarantee or even a firm estimate or something I've discussed with Colin. It's me putting my finger up into the wind and projecting into the future how well we've done at digesting the PITCHf/x information that we have.
lucasjthompson
2/07
Are you not publishing a context-neutral measure for pitchers anymore? You have tAV for batters, but I assume the ERA and FRA numbers for pitchers include adjustments for park and league, etc. For my money getting the context-neutral numbers is a big deal.
Mikedaddy
2/07
So, Stephen Strasburg is going to throw more innings (122.6) than Mat Latos (104), Ian Kennedy (100.6) and Phil Hughes (121)?
jetheinenkel
2/07
Just noticed that too - PECOTA projects Mat Latos to pitch in only 104 IP with 20 GS, but in those 20 games, it projects him to be a top pitcher. Does this mean we should interpret PECOTA as suggesting he has a very high propensity for a season-ending or prolonged injury? All the comparables (Gallardo, Hughes, Elarton) had injuries early in their careers as well directly after a successful early-career season. In some other articles, there's mention of a new method of analysis regarding a players likelihood for injury. It would be useful to have a metric here denoting injury propensity so we can view the projections with that in mind.
geofflong
2/07
These have not yet been adjusted for playing time. Strasburg's projections therefore are a "what could have been." This was pointed out in a previous article and in the intro to the projections today, iirc.
smocon
2/07
Dratted content filter at work!!!! Cant wait for my book to arrive next week so that I can finally pin down my prediction for the '11 Brewers.
benrosenberg02
2/07
Can someone point me to a description of 'MLB_PCT'. I thought I understood, then saw Bobby Jenks at 7%, so perhaps not.
knockoutking
2/07
heh glad to see JP Howell is projected to throw 103 games, 121 IP lol 9 k/9, 103 games? sounds like fantasy gold!
shmooville
2/07
The Sort feature that is set up for each column on the Hitters tab is a huge time saver. Please add that for the Pitchers tab as well. Particularly when the other columns start filtering in like Upside, and Vorp etc... Thanks.
jetheinenkel
2/07
If you're working in a recent version of MS Excel, you can set this feature up yourself by selecting the "Sort and Filter" button in the "Home" tab, then selecting "Filter."
knockoutking
2/07
very easy to update/add this to the pitchers side as well (yourself)
cjrhgarmon
2/07
This article is actually kind of depressing. I understand that the PECOTA used for the comparisons is stripped-down, but, if I am reading this correctly, the stripped-down PECOTA is in a dead heat with Marcel with regard to accuracy. My understanding is that Marcel is the most basic projection system out there: basically an age-adjusted three-year average. If you do all of this work and are no more accurate than Marcel, well then what's the point? In my opinion, the real value of the PECOTA forecasts is not the mean projections. The value of the PECOTA is that it is the only projection system I know of that publishes the forecast distribution (e.g., breakout, attrition, etc.).
leites
2/07
Last year Nate Silver wrote a piece in the NY Times making exactly your point: "The key difference in Pecota, the forecasting system that I developed eight years ago to predict the performance of baseball players, was not that it did better than its competition, on average (it did in most years, but only by a tiny bit). Rather, it was that it looked at the uncertainty in the forecast as a feature rather than a bug. For example, it didn’t just tell you how many home runs Derek Jeter would hit on average, but what a best-case scenario looked like and what a worst-case scenario looked like. This not only made the forecasting system more honest, but also provided a lot more information to the reader. People often forget that there are essentially two parts to any forecast: what we can think of as the mean forecast (“our best guess is that Sarah Palin will win by 7 points”) and the confidence interval (“the margin of error on my guess is plus or minus 9 points”). http://fivethirtyeight.blogs.nytimes.com/2010/09/29/the-uncanny-accuracy-of-polling-averages-part-i-why-you-cant-trust-your-gut/?scp=7&sq=PECOTA&st=cse
markpadden
2/07
The goal should be to improve accuracy. Period. Any smart user is well aware of variance. So no, the goal of PECOTA should not be to explain what variance is; it should be to improve forecast quality (average error).
TADontAsk
2/07
At some point, there is only so much accuracy you can attain. Should they continually work towards a better system? Absolutely. But unless you find a projection system that is 100% accurate - obviously not possible - then a report of the accompanying variance of said projections is just as important, if not more so, than the point projection itself.
Tarakas
2/07
Actually, knowing the degree of variance is fairly handy.
PBSteve
2/07
Today's release is just the beginning. The percentile breakdowns will be coming with the cards.
rscharnell
2/07
Do you have an estimated date for that? I have a few days until my keepers are due in a league and would love to have as much information as possible before making the decision on my last few players. Thanks and great work.
markpadden
2/07
I, too, do not understand how/why Marcel is anywhere in this article. The goal is be accurate, not just "no worse than the worst projection system out there." If your model is not ready for release yet, then don't release it. People can get/generate Marcel projections easily enough on their own, if they need something for today.
mikefast
2/07
Colin is more of an expert than I am on this, but I can say that Marcel is far from the worst projection system out there. It is consistently among the best. It is one of the simplest, that is true, but simple does not mean bad. The value of PECOTA over Marcel is presumably in a number of other areas, including but not limited to, forecasts of rookies and minor leaguers, forecasts of fielding, forecasting other categories that Marcel does not, depth charts, PFM. Percentiles, comparable players, breakout/collapse percents, etc., could also be on the list, though with some of those things it needs to be proven that they are accurate. (Colin published some on this in the fall.)
markpadden
2/07
Marcel most certainly is not "among the best." This has been proven over and over. CHONE has crushed Marcel over the years, and PECOTA used to until 2009. My question remains: is this (the spreadsheet in this post) some kind of stripped-down pre-PECOTA algorithm, or is it the final algorithm you plan to use for the pre-season 2011 projections?
mikefast
2/07
I emphatically disagree, unless by "crushed" you mean "performed marginally better". The best projection systems, by which I mean CHONE, ZiPS, PECOTA, have historically been in the same neighborhood as Marcel, and outperformance of Marcel by those systems on rate stats (e.g., OPS, ERA) has been small. You should also note here that Colin has removed park adjustments from PECOTA and thrown out rookies in order to compare to Marcel. He mentioned that, but it bears repeating.
markpadden
2/07
Do you have aggregate data on this? Care to publish a study on it? "Crushed" means statistically significantly different than. Chone, for example, has shown itself to be superior to Marcel. But according to you, they (the forecasts) are all the same in avg. quality, and there is no hope of outperforming consistently. Contradicting facts aside, even if that was the case, why would you (BP) build/re-vamp a very complex projection system, when you know the work will not result in any consistent outperformance over a simple past-years' stat average system? Just for kicks? BP really can't have it both ways. The stance needs to be one of the following: (1) We are committed to producing the most accurate baseball projection system possible. We have worked tirelessly on possible methods of attacking this problem and believe ours is the best. We plan to charge money for these projections (alone or as part of an editorial package), and will in return provide objective evidence of our progress -- both in backtested results and in real-time (end of season) performance measurement, vs. various other methods/systems. (2) We have looked at the issue of how best to project player stats and have concluded that there is *no* benefit to doing anything more than averaging the players' past stats (Marcel). We believe that Marcel is and will always be about as good as the very best projection system, so we have formally given up. We will no longer be charging for fantasy projections and will refer all of past customers to a calculator. Obviously, stance #1 is preferable, but #2 would be fine as well. But trying to have both at once is....a problem.
mikefast
2/07
I mentioned a couple links to studies already, below. What I am saying is nothing new. You don't seem to be completely grasping what I am saying, though. I am not saying that Marcel is identical to PECOTA in every way, nor did Colin in his article. If all you want is a rate state projection for every player who was a regular in the major leagues last year, you will do just fine with Marcel. Don't spend your money to subscribe to BP if all you want is a PECOTA rate-stat projection for that set of established major-league players. PECOTA offers much more than that, plus hopefully it is more accurate than Marcel even on rate stats for established players. But it's not going to be outlandishly better than Marcel on that. Nate Silver said that, Colin said that before today. I'm not making some horrible reveal of the awful truth that BP has been trying to conceal for years. What you say on #1 is the case. BP is committed to producing the most accurate baseball projection system possible, and that includes the depth charts, Player Forecast Manager, percentiles, multi-year projections, projections of rookies and minor league players, etc., that go along with that and are not part of Marcel.
mikefast
2/07
A couple of articles on projection system comparisons: http://www.baseballprospectus.com/article.php?articleid=12102 http://www.baseballprospectus.com/unfiltered/?p=564
ils4O1
2/07
Your WARP score is saying that the best player in baseball will be Albert Pujols. The next 13 are pitchers (including injury combacks Peavy, Santana and Strasburg)? I don't buy it.
jdtk99
2/07
Colin, what's the new replacement level, ie How many wins for a team with 0 WARP? Thanks.
chabels
2/07
Echoing this request, what is the MLB league batting average, ERA and WHIP assumed to be?
markpadden
2/07
I don't get this (below). Are you saying you are currently using some algorithm that is different from the one you will use to make final forecasts? That would make no sense. "This is the first release of PECOTA, and as such will continue to undergo revisions through the remainder of the offseason. The program we use to generate the PECOTAs is continually evolving, and when we discover new ways to improve the forecasts, we'll make those changes and pass the updated forecasts on to you. We’ll also be updating periodically to keep up with players who switch teams."
nosybrian
2/07
If past is prologue, don't expect major fundamental improvements in the algorithms between now and the start of the season. But as Colin has already indicated, as the season approaches and lineups are set, the park adjustments are instituted, and anomalies in the data are discovered and corrected, the later PECOTAs, including especially those used in the Depth Charts and the PFM, will reflect the latest information. (That's the way it has been done for years.)
brianjamesoak
2/07
I thought Ichiro was meant to have a new and improved forecast for this year. Looks more pessimistic than ever.
brooklyn55
2/07
Small thing, but helps in readability - could you format to drop the leading zeros. ie .345 instead of 0.345 Thnx
chabels
2/07
This is something you can control in Microsoft Excel, and something they cannot.
sfbennett1
2/07
* select columns which contain the leading 0 * right click and select Format Cells * under Category, select Custom * in the Type box, erase "General" and type ".000" (NO quotes) * click OK
brooklyn55
2/07
Uh, isn't it an Excel SS we are getting? Thus, easier for all to have it formatted at source ...
patwood0
2/07
I know a lot of studies have been done to evaluate PECOTA and other projection systems when it comes to OPS, but is there any information for the reliability of 5x5 fantasy stat projections? Is a SB projection more likely to be accurate than a batting average projection? How about WHIP vs. RBIs? I guess what I'm really looking for is the average historical delta for each stat category projection among players likely to be drafted in fantasy leagues.
brianjamesoak
2/07
Really great point. Batting average gets lost in an OBP and stolen bases aren't even present. I would like to see comparisons in each stat--not just one.
TheTimmer
2/07
I've just bought a Fantasy subscription for the year, but it won't let me see the Pecota data... is it only for Premium subscribers? I thought I had access last year as a fantasy guy???
TheTimmer
2/07
Scratch that, it's working now...
deacon14
2/07
Two questions on weighted mean. One, are these ballpark adjusted? In other words, would Dave Bush look different if he was on Texas instead of Milwaukee as shown. Two, is the best way to look at these until we have depth charts is that the rate stats won't change based on playing time assumptions?
wyliecoyote
2/07
Including error totals (not just FRAA or whatever) would be immensely helpful for Scoresheet, and related formats.
WilliamWilde
2/07
The comps are one of my favorite parts of this data set. Seems that Frank Robinson is in a ton of comps.? (Braun, Posada, Hanley, Mike Stanton, Anybody have an idea why this could be
deacon14
2/07
Are comps based on similar player at that age? In other words, could Stephen Strasburg comp to Dwight Gooden (hot young stud) but so could a 35 year old expected to post a 6-5 record with a 4.71 era?
hessshaun
2/07
Great question. It would be really cool if it compared to the exact season. Like Jamie Moyer, '89. Not a necessity, but it would be cool to get lost in baseball seasons. I know I would go from comp to comp to comp to comp with no real purpose or care really. Just interesting reading.
greenr
2/08
Yes, similar player at that age.
deacon14
2/08
Thank you.
WilliamWilde
2/07
More Frank Robinson comps: -Ethier -Jay Bruce -Luke Scott.!? -Carlos Pena
deacon14
2/07
Do pitcher wins just look at the pitchers recent number of wins or does it consider that players stats with an average offense? With their actual offense? In other words, if Greinke puts up the numbers from the last couple of years, will his wins be higher in these projections (support from Braun and company) or use an average of his Royals days?
crperry13
2/07
Trying to post this without sounding critical...because I admire what you are able to accomplish with PECOTA. But...for the 3rd year running, these projections seem absurdly low and inconsistent. Some of them defy explanation. Example: Evan Longoria: 82/25/84 263/348/474 (2011 PECOTA) Evan Longoria: 88/27/101 283/361/507 (3-Year Average) Mike Stanton: 76/32/87 0.247/0.328/0.496 (2011 PECOTA) How can a guy with a 3-year track record of superstar stats be projected that low, while a sophomore with 1/2 of a MLB season and minor league translations be projected so well? I have no argument with Stanton's projection, but Longoria's looks ridiculous in comparison. A few more that jumped out at me: 85/23/79 for Zimmerman? 72/23/77 for Hamilton? 76/18/62 for K Morales? 62/17/58 for Weeks? I understand that PECOTA isn't a "prediction" system, it's a "projection" system, based on the probable result of a mathematical simulation. But this year, it just doesn't pass the smell test at all! I don't know the math, but is it possible PECOTA is forcing a regression to the mean TOO much?
Olinkapo
2/07
Hamilton's projection is in 515 PAs, which partially explains why they look a little low. Also seems pretty fair since he's averaged under 500 PAs the last four seasons, and is heading into his thirties. (Also of note: Hamilton's BABIP of .390(!!) in 2010...) I'm sure PECOTA's high-end projections for Hamilton are quite good, but the weighted means are gently reminding us that Hamilton has missed nearly 30% of his team's games over the last four seasons. Kinda similar issue with Weeks. PECOTA is perhaps a bit generous with Weeks, really: If Weeks does indeed notch 17 dingers and 58 RBIs, each would represent the second best total of his entire career, behind only last season. From 2005-2010, Weeks played in 53% of his team's games.
Olinkapo
2/07
* Division failure on my part. * "From 2005-2010, Weeks played in 53% of his team's games." Wrong. Should read: "From 2005-2010, Weeks played in 65% of his team's games."
crperry13
2/07
I understand your argument, I just don't agree. His career stats indicate that given approximately 500 AB (not PA), he is a 30 home-run hitter. This also ignores the fact that he was good for 156 games just 2 seasons ago, so the injury angle is overplayed. If he's averaged 30/HR per 500 AB/PA over his career in the majors, why should his MEAN be 23? Shouldn't his mean be closer to 30, with a 90th percentile that reflects his talent level over 600/700 AB? 40? 45?
cwyers
2/07
Well, they really don't, actually. Hamilton's career HR/AB prorates out to 26 HR per 500 AB, or 24 prorated out over the 464 at bats listed in the PECOTA spreadsheet. If Hamilton can stay on the field more than that (and it's not clear that he can), then he's probably going to top 23 HR. But given his age, and a little regression to the mean, I don't see what's surprising about projecting a guy to be just one home run under his career rate.
crperry13
2/08
What I'd really like to see, and have no right to demand, is projections ONLY of major league stats, with projected playing time, etc. I know that's coming on the depth charts. It just looks like if I added all of those counting stats (MLB proj. only), they wouldn't even come close to the totals achieved by MLB overall last season. You and Joe are focusing on my speculation about Hamilton, but seem to be missing my larger point. Why do ALL of the numbers look so low across the board? Again, I really do appreciate the work that goes into this. I'm just trying to understand so that I don't unfairly criticize.
jberkon
2/08
A very interesting article - and one that would be really helpful to address a lot of these comments - would be one dealing with the 10 or so players on which PECOTA and other projection systems (pick one: MARCEL, CAIRO, OLIVER, ZIPS, etc.) disagree. Perhaps the 10 on which PECOTA is more bullish and the 10 on which PECOTA is more bearish. And then try to explain why PECOTA differs. Basically, we want to understand why PECOTA believes that Longoria will experience a reasonably large drop-off at the age of 25 (one that is NOT being projected by the other systems), and why, say, Dan Johnson is expected to have such a good year. Is there some particular stat or physical attribute to which PECOTA attributes more weight than the other systems?
mikefast
2/08
The overall set of projected numbers match up pretty closely with 2010 levels of offense. That is down from previous seasons, of course, so that may be why the numbers for hitters look low across the board. In my experience, people also tend to perceive regression to the mean, while statistically correct, as making the numbers look too low for everyone.
tbwhite
2/08
Regression to which mean ? The MLB mean or the player's 3 year mean ? Longoria's 3 year mean slash stats are: .283/.361/.521 PECOTA's weighted mean for Longoria is .263/.348/.474
mikefast
2/08
By regression to the mean, I was referring to regression toward the MLB mean. Everyone (or almost) is going to be projected to do worse than their career-best year. The human tendency is to assume that a career-best year defines a talent level, and we want to see PECOTA project them to repeat that. However, that's not the most likely outcome. Most likely, if a player did really well last year, he got a little lucky, and, conversely, if he did really poorly last year, he got a little unlucky. You can see my specific comments about Longoria a few comments down. (Looks wrong to me, too.) It's great to have the high level of interest in PECOTA that we do, but one of the bad side effects of 200 comments on the thread is that it's easy for ideas to get lost or crossed in cyberspace here.
tbwhite
2/08
I understand the concept, or at least I think I do, but it seems to me that PECOTA should be focused on regression to the player's mean not the MLB mean. If a player has an established level of say a .280 TAv and suddenly posts a .310 one season, then I get it that he won't likely post another .310, he'll probably revert back to his previous level of performance. But if a guy cranks out .280 TAv seasons like clockwork, there is no reason to believe he is going to tend to revert to a .250 TAv simply because that is the MLB average. It seems to me that regressing to the MLB mean implies that no player is really better than another, and that good seasons are merely caused by luck. That's a premise which is obviously false. I apologize if I'm putting words in your mouth, or have twisted things somehow, I'm just trying to get a better handle on how PECOTA works. It would be really helpful if someone could write an article that would describe in a little greater detail how PECOTA works. I know it's a fine line between explaining what's in the black box and giving it away, but I think there is a lot of confusion about just what PECOTA does.
mikefast
2/08
I agree there's a lot of confusion about what PECOTA does. Quite a bit of it has been explained at one time or another in one place or another, some of it online, some of it in print. It might be helpful to see if some of that could be centralized or indexed in one place. Part of the problem is that Colin's really the only one here who fully knows how PECOTA works. Nate and Clay would obviously be able to contribute on that front, but they're not around as much nowadays. Colin can't work on everything everyone thinks might be wrong with PECOTA or explain everything that everyone wants explained about PECOTA. Some high priority stuff he can and will, but other things will take longer. So it falls to some other folks, like me, who know less about all the intricate details of PECOTA, to attend to some of the questions. With that caveat, let me say that regression to the MLB mean does not imply that all players have the same talent level. If that were true, we'd just predict everyone to do league average next year. It's regression to the mean, not collapsing to the mean. Tango's the king of explaining and pushing the implementation of the regression to the mean concept. The way he likes to explain it is that the longer the playing record we have for a given player, the less regression toward the mean that we need to apply. The way that Tango does regression to the mean for Marcel is to add in a half-season worth of league-average stats into the player's last three years of major-league stats. As to the specific amounts of regression to the mean that PECOTA uses for its baseline forecast, I don't know. It's probably not terribly different than Marcel in the amount of regression for established major-league players, but for minor-league players, PECOTA has the advantage of using translated minor-league stats.
nosybrian
2/08
Every player is subject to a regression effect, even the best and the worst. Keep in mind that regression toward the mean works in both directions -- "unluckily bad" performance is usually followed by improved performance the following year; "luckily good" performance is usually followed by worse performance the following year. Take a look at this example from the BP archive: http://www.baseballprospectus.com/article.php?articleid=1897
tbwhite
2/08
I think that study is flawed. Not all guys who hit between .300 and .310 are the same. Some of them are really .300 hitters who are performing as expected, some are .250 hitters having a "career year". Perhaps a very few, are actually superstars having an "off year". Odds are the first group doesn't decline much the next year, and the last group might actually improve a bit, but the middle group is likely to show a huge drop off as they revert to their normal level of performance. Of course that middle group is going to dominate when you measure the delta in batting average in year n+1. After all, the 1st group will likely have a small delta, and the last group is probably too small to impact the average too much. So, you end up with a watered down measure of players coming back to Earth after career years. To generalize their decline to all batters in that .300 to .310 group seems like a mistake. What I would like to see is the same analysis, but instead of grouping players by batting average in year n, group them by the difference of their year n batting average vs their career batting average(obviously limit to players with a meaningful sample size). Then look to see the average difference to career average in year n+1 for those groups. I think you would see that the group with the smallest differential in year n would also tend to have a smaller differential in year n+1 and that the distribution of outperformance vs underperformance would be close to random.
tbwhite
2/08
I understand that it is regression to the mean, and not instantaneous reversion to the mean, but I do believe the implication is that all MLB players have the same talent level. If Votto has a 3 year mean TAv of .325, but posted a .350 TAv in 2010, why should he revert towards the MLB mean TAv for 1B of .280(I'm just making up a number) in 2011 instead of his own mean TAv of .325 ? Regressing back towards the MLB mean implies that even his own .325 number over the past 3 seasons doesn't reflect his true level of ability, that the .325 is nothing more than an outlier and isn't repeatable. That is non-sense. Sometimes an absurd example helps illustrate a point. If someone stuck me at SS for the next 3 years for the Astros I might put up a line like .083/.100/.110. If someone wanted to project what my 4th season would look like they would be a damn fool if they projected me to improve just because I should regress to the ML mean and because average ML SS's hit .240/.290/.350 . Theoretically, I just don't see the case for regression to the MLB mean(especially for established MLB players). Tango may use it, but it sounds more like it is a short-cut to a result rather than based on a strong theoretical foundation. I base that comment on other comments that Marcel is a very simple system, I'm not familiar with it myself, but if it is in fact a simple projection method(and I don't mean that perjoratively) then regressing to the MLB means seems like a simple means(no pun intended) to an end.
crperry13
2/08
Side note: If you could post those numbers as an Astros SS for the next 3 years, you're already an improvement on whatever they've had.
mikefast
2/08
The link that Juris posted above your comment is an excellent one to begin learning about what regression to the mean actually is.
tbwhite
2/08
I think my comment still stands. The .350 hitters are by and large hall of famers or near hall of famers. You would get a much better prediction of their batting average in year n+1 by using their career batting averages(most of which are probably around .300) than a league average batting average of .275. If in making forecasts you assume regression to the league mean rather than the player mean, you will overstate the likely decline.
TangoTiger1
2/08
My response is on my blog.
crperry13
2/08
This is along the lines I was thinking too. My other general questions include: --Does PECOTA take into account why a player missed time? If guy misses for a 50-game PED or a freak home-run celebration leg fracture, that's not a "time missed" that will affect his projected production in real life, yet I suspect PECOTA looks at it as if all missed time were created equal. Thus, Morales' ridiculously low projection. Would that there was an injury database to quantify injury types and cross-reference them to PECOTA algorithms. --Is PECOTA giving TOO much weight to missed time? "Health is a skill" is a cliché, but injuries aren't predictable. Saying, "Hamilton/Weeks/Kinsler missed time for various (different) injuries over the past 3 seasons means the most likely outcome is that he will miss time for another injury in 2011" just doesn't make logical sense. If they had a weak joint and repeatedly suffer the same injury over and over, then fine. But that isn't too common a situation overall, and I don't think it's a very strong thing to base a projection on. --Is PECOTA regressing TOO far to the mean? Especially applies if you are using an MLB mean as tbwhite suggests, but also true if using a player mean. Longoria (example) historically is far better than his projection. Longoria is also entering his 'statistical prime' and has no history of significant injury. Nobody expects him to post career highs every year, but something is fishy about a projection that sees him only as an above-average 3B. --What's the deal with airline food and goofy WARP scores this year? I imagine this will be answered, so I'll leave it alone.
blcartwright
2/08
You are confused about what regression is. Including more than one season, with the most recent ones weighted more, is the method of looking at the relevant portions of a player's career. Regression is when we lack information about a player (even when he has played a lot, Colin has done articles on other sites about this). For example, if you have a 20 year old shortstop in AA, you might regress his stats to what all other 20 year old shortstops in AA have done. Or if he's a fat first baseman who hits a lot of homers. How does that group age?
crperry13
2/09
I understand that part, but if players with extensive stats aren't being regressed somehow, and it's only the ones with little information to go on, how does one explain odd projections like Longoria, Zimmerman, Morales, etc? (I can't buy the fact that Morales' projection is due to his broken leg, because it doesn't make sense that that injury will impact future performance so much) Somehow, elite players are having their stats grabbed by the neck, throttled, then drug back to the pack. Similiarly, some crummy players are being given the ol' leg up to reach higher heights than they could with no assistance. I have no idea WHY this is happening...I'll leave that to smart guys like you and Colin. But my engineering brain would like it explained or corrected for the sake of sanity.
blcartwright
2/10
Colin will have to answer any questions about specific players, or how Pecota works, as it his (adopted) baby, but for regression in general, you take the players historical data and add a fixed amount of average performance of the group he's determined to be a member of. Some players will have a sample of 1500+ weighted PAs. Adding 200 regression PAs means the regression is 11% of the total. For another player who had 200 PAs in his only season so far, 200 regression PAs is 50% of his total.
Olinkapo
2/08
"This also ignores the fact that he was good for 156 games just 2 seasons ago, so the injury angle is overplayed." I strongly disagree, and think you are proving precisely why PECOTA should hedge on Hamilton. Hamilton is going into his age-30 season, and you had to stretch back to his age-27 season to find the one time in his four-year career in which he was able to avoid the disabled list and put something close to a full season of work in. Not putting in full seasons might not be as big a deal if it was simply a matter of minor injuries forcing him out 20-25 games a year. We're instead talking about missing 70+ games in each of two separate seasons. And in one of those (2009), he stunk(.741 OPS). We should be skeptical of any projection system that *doesn't* ding Hamilton's playing time, and thus his HR/RBI/run totals.
brianjamesoak
2/07
The Longoria projection is maybe a little low but not radically different from what he's done. Stanton had a lot more homers than PECOTA is projecting him for and a higher translated batting average so it's not spectacularly optimistic about him either.
jberkon
2/08
The Longoria projection seems very odd. As CRP13 points out, his projection is well-off the three year average (in terms of the rate stats). PECOTA has as his comps Miguel Cabrera, David Wright, and Mark Teixera, all of whom progressed nicely as young players. And PECOTA gives Longoria a 55% chance of improving this year. So why are the numbers so far off? Colin, Ken, Mike, etc., any explanation on this one?
mikefast
2/08
For Longoria, specifically, I agree that his numbers look low. His projected batting average seems about 20 points too low to me. "Seems to low to me" is not necessarily the same as "PECOTA made a mistake here". We'll look into it and see if there's anything wrong.
jberkon
2/08
Thanks so much, Mike, for looking into this, to see if there is larger issue at play. I can only begin to imagine the difficulty of putting together the PECOTAs and very much appreciate yours (and everyone else's) responsiveness.
tradeatape
2/08
PECOTA knows that in 2011 Evan won't find his cap.
belowm
2/07
More misplaced players, this time on the Nats: Adam Kennedy (SEA), Willie Harris (NYM), Justin Maxwell (NYY). The more I look at this spreadsheet, the more I find. Scott Hairston is with NYM, not SDP, and Jerry Hairston signed with WAS. Jody Gerut is with SEA, not SDP. Pedro Feliz is with KCR, not STL. Felipe Lopez is with TBR, not BOS. Etc., etc.
wizstan
2/07
Andres Torres projection is epic fail... using anything pre 2009 in projecting Torres is a failure.
LynchMob
2/07
What is it about Torres that makes you think the current algorithm does not apply to him? Is it something you think could/should be incorporated into the algorithm?
wizstan
2/07
I think that a large break in a career should trigger a "reboot" of data. After several years of a different shape of performance you should stop trying to fit two dissimilar career patterns together.
crperry13
2/07
I don't know about a reboot, but I think giving the most recent season/s the most weight makes sense.
smocon
2/07
I wouldnt be shocked at all to see him drop from his 6 WAR number or whatever it was to below 2. That was a HUGE overperformance. Same goes for Huff, although not to the same extent. And I am somewhat of a Giants fan.
wizstan
2/07
overperformance relative to what? Watching him play, his speed is real, his defense is real, and everything he hits is hit hard, from the time he became an everyday starter until the his appendix began bothering him he was a .290/.370/.520 player. And that looked absolutely legit.
cwyers
2/07
Well, it doesn't seem to actually look that way. If you look at players who radically over perform their career stats, you actually don't gain any forecasting accuracy (in fact the opposite) when you throw out their older numbers. More recent numbers are of course more important in forecasts than older numbers, but not to the extent that you're implying here. Now, is it possible that he overperforms his forecast? Sure. But history gives us a lot of examples of players who out of nowhere put up a fantastic season and then went on to be something less than fantastic. Of course, there are some that did that and continued being fantastic. But there are fewer of them. Torres could do either, but the odds of him being something less than just his 2010 season implies are greater.
wizstan
2/07
Well if by overperfom their career stats you mean overperform what he did in the minors in 2006, well I think it is ludicrous to regress to that performance. I think dropping all weight given to minor league stats more than three years previous would significantly, and rationally, improve PECOTA, likewise dropping all major league numbers if there is a break of more than three years between MLB appearances would remove some dubious regressions. Sure Torres is likely to regress, but not towards his slap-hitting persona of five years ago. Try rerunning PECOTA on him but treating him as if his career started in 2007.
sho044
2/07
If the year was 2007 instead of 2011, I think this same comment could have been found on a Gary Matthews Jr post... that worked out well didnt it.
wizstan
2/08
Well, except for the fact that Matthews had a long CONTINUOUS history of one level of performance, and then one anomalous season. Torres has had 4 years of hitting like this in the majors or minors, and at no time in the last 4 years has he NOT hit. There is pretty much nothing in common here.
choms57
2/07
I have a premium subscription yet every time I try to download the Pecota spreadsheet it asks me for a username and password, even though I'm logged in. I put my said username and password in and it does not work. Can someone help me??
choms57
2/07
This sucks, I've been a customer for two years and now I can't see the main thing I look forward to!
kenfunck
2/07
If you're having trouble accessing the file and your subscription is up-to-date, please send details to Customer Service, and they will work on fixing this for you: http://www.baseballprospectus.com/contact.php
BarryR
2/07
I know I'm cherry-picking here, but since Jhoulys Chacin is my cherry in a couple of leagues, I just have to pick him. Chacin 4.93/1.51/ 6.7 K/9 -- really? In approximately the same number of IP, PECOTA has Radhames Liz at 4.78/1.48/7.5 Surely there is a mistake somewhere here. Jhoulys is also worse than Boof Bonser and Philip Humber. To put it kindly, this seems a little off to me. Now other than Ubaldo at 4.02/1.39, and two relievers (Street and Betancourt), all Rockies pitchers are projected to be some variation on horrible. Is PECOTA predicting a broken humidor?
beitvash
2/07
I noticed the same thing. Weird that it would project Chacin that negatively.
mibush
2/08
Any idea when your "Team Tracker" will be updated with these projections?
TaylorSanders
2/08
Ryan Braun 2nd highest hitter projection.
matteson72
2/08
Here are your top 10 NL pitchers with the highest breakout rates: 1. Luke Gregerson (31%) 2. Johnny Cueto (29%) 3. Jason Marquis (29%) 4. Chad Gaudin (28%) 5. Huston Street (26%) 6. Pedro Feliciano (26%) 7. Kelvim Escobar (26%) 8. Mike Leake (26%) 9. Hong-Chih Kuo (25%) 10. Edinson Volquez (25%) I'll go curl up in the corner until the next spreadsheet is released.
sfbennett1
2/08
Cinci's rotation looks pretty good for the future, no?
hessshaun
2/08
Depends on whether or not you trust their company above. I would say that the Reds are about the only members who could potentially be on this list if it was accurate.
doorbot
2/08
Ditto! Christmas is ruined!!!
igjarjuk
2/08
"Breakout Rate is the percent chance that a hitter's EqR/27 or a pitcher's EqERA will improve by at least 20% relative to the weighted average of his EqR/27 in his three previous seasons of performance. High breakout rates are indicative of upside risk." So what's the problem?
greenr
2/08
http://www.baseballprospectus.com/glossary/index.php?mode=viewstat&stat=182
fairacres
2/08
Am i missing something or are the individual players' batting averages off a bit? Pujols is forecast at 175 hits in 558 at bats -- .3136 on my calculator. PECOTA has him at "0.312". I spot checked a couple other players and found similar "errors." Also, maybe instead of "Singles" you could include a "total hits" column? Thanks -- great job, but I have to agree with other posts ---- overall, it seems as if each year, PECOTA's 50% numbers are a little conservative for hitters. . . taking some other examples, Adrian Gonzalez' line looks pretty similar to his average the past 4-5 years; wouldn't he be expected to do a bit better with 81 games in Fenway v. PETCO, and hitting in a better overall lineup? Same could be said for Adam Dunn moving to the Cell. . . .
chabels
2/08
IIRC this is a function of more significant digits deep in PECOTA and rounding. IE Pujols is forecast 174.500 (175) hits per 558.499 (558) AB. The resulting BA is .31244.
fairacres
2/08
I can guarantee you Pujols will NOT have 174.5 hits in 2011 . . . . .
jrmayne
2/08
This is a return to Nate's methodology, which I believe is better than a non-rounding method.
stlpdx
2/08
This feels dangerously similar to the six weeks of 2010 pre-season PECOTA discussions. (Spoiler: it didn't end well).
stlpdx
2/22
2 weeks and many comments later, nothing has changed my mind.
stlpdx
2/22
And yes I'm the guy who posts on his own Facebook status
jivas21
2/08
A couple of thoughts: (1) Unless I've screwed up, the average "Improve" number is 23%. I believe this should be 50% by construction. (2) Based on a super-quick review, a couple Rockies hitters (Fowler, Iannetta) have what appear to be optimistic projections, and as noted about the Chacin projection is pessimistic. I recommend double-checking the Coors Field park factor. (3) I'm hoping future version of the spreadsheet have the SS/Sim metric. This is/was very helpful for Scoresheet players.
MattBey
2/08
(1) Not really. Even if "improve" meant that they performed better than they did the year before (it doesn't), we wouldn't expect this to be 50 percent by construction. For one reason, look at all the minor leaguer's you're considering. A lot of filler players are going to move up a level and they're going to get weeded out. That alone isn't enough to lower it to something less than 25%, but it's a factor, you have to think.
jrmayne
2/08
There are some pretty severe problems with the comps for minor league players. Kila Ka'aihue: Nick Johnson, Joey Votto, Adrian Gonzalez. Dan Johnson: Jason Giambi, Nick Johnson, Paul Konerko Chris Carter: Chris Davis, Joey Votto, Evan Longoria. (Note that Billy Butler's comps are Conor Jackson, Dan Johnson and Gaby Sanchez.) So far, a problem. But then: Rich Poythress: Adrian Gonzalez, Prince Fielder, Kent Hrbek Seriously. I am not making that up. Next? Clint Robinson: Adrian Gonzalez, Joey Votto, Ryan Garko. Wil Myers is an awfully good prospect, but .256/341/410 next year? Robinson Chirinos (comps Iannetta, Carlos Ruiz, and Todd Helton) projects at 275/360/469. John Bowker? 268/342/457. Jason Dubois has a nice projection, and I would've guessed he was somewhere selling insurance rather than hitting pretty well in Iowa. Gerald Sands' comps are Prince Fielder, Willie McCovey and Mark McGwire. Jaff Decker has Willie Mays in his comps. I think there's a systemic problem with the minor league comps. Matt Carpenter's top comp is Chipper Jones. Trent Oeltjen's comps are Carlos Beltran, Vada Pinson, and Roberto Clemente. Thomas Field's top comp is Tim Raines. If you want to tell me these are right, it's going to take a lot of convincing. Jedd Gyorko's top two comps ought not be Eric Chavez and Steve Garvey. There are many more crazy comps out there. Now, I know there are a lot of comps in the system, but it looks like the system has a severe bias; the system is looking for people who are one hell of a lot better than the player. --JRM
yadenr
2/08
I noticed as well. One of the first things I do is search for current players with my favorite past players as comps. I was fairly surprised to find that Eric Davis only shows up for Dennis Raben, who also has Willie McCovey(!) and Brandon Wood. Time to get that Raben jersey.
tbwhite
2/08
This brings up an interesting question, I hope BP can address it, although I understand if they can't because it would reveal too much IP. Just what do these top 3 comps means ? My understanding is that PECOTA is based off of the last 3 seasons. But when it looks for comps does it consider age ? I assume having 28yo Mike Schmidt as a comp would be better than having a 38yo Mike Schmidt as a comp. It would be helpful to know just what specific year is a being cited as a comp. Also, do the comps always tie back to the age of the player being projected ? Do you only look at 25 year olds when trying to find comps for a player who will be 25 in 2011 ? Just wondering is perhaps Jaff Decker is being compared to a 41yo Willie Mays. It's about the only way that comp would make any sense.
tbwhite
2/08
Upon further review Decker has a ~.950 OPS thru age 20 in the minors. Mays had a 1.017 OPS in the minors thru age 20. Maybe that's what PECOTA is picking up. It does seem like perhaps batting is over-weighted compared to fielding for minor leaguers. I mean a guy who has to play LF in A ball is very different from the greatest CF of all-time.
BarryR
2/08
Decker spent his entire age 20 season in A ball. Mays spent a month in AAA, hitting .477 (yes, .477), with a 1.323 OPS, turning 20 in May, then spent the rest of his age 20 season in the NL. This is hardly comparable.
nosybrian
2/09
@tb: If memory serves me correctly, the comps are always based on the player's age-cohort, so, for example, a given player's age 28 season's comps are always other players in THEIR age 28 seasons (not their prior or later careers). My inference about this is based on my understanding of how the projections are made -- by looking at the performance of the matching age-cohort of players in the database of all player-seasons from 1950 onward (adjusted in various ways).
nosybrian
2/09
I should add that the comps are thus the "most comparable" of the larger subset of players from the same age cohort (in the 1950-2010 player database) who are most closely matched on a set of criteria includes information not just on performance (stat lines) but also such characteristics as position, handedness, and physical type (height and weight).
nosybrian
2/09
So the "match" is determined by age, physical characteristics, position PLUS the (adjusted) "baseline performance" from the immediately previous 3 seasons of the given player and other players in the database. That is my understanding of the core method. How is the proximity between a player and his comparables actually calculated? I don't know the formula but the basic method is one of "nearest neighbor analysis." (See "nearest neighbor search" and "nearest neighbor analysis" in Wikipedia. Nate Silver uses a similar approach in some of his election forecasting, to take advantage of information from "neighboring" (most similar) states to help make election forecasts of particular states.)
mikefast
2/09
Yes, you have explained the basic idea well.
igjarjuk
2/08
I'm not convinced by all this name dropping that there are "severe problems" with the comps. I think you should offer up evidence on PECOTA's terms: what information did PECOTA use to determine the comps? For example, with the Decker-Mays comp, I think you're calling foul because of your knowledge of the extraordinary career that Mays had, but I'm pretty sure that's not how PECOTA is designed to work: I don't think it's using Mays age 22+ seasons as the driver of the comparable match. Instead, Mays earliest years are used to make the match. In this case Mays later, very successful years are just a piece of a larger puzzle, one that in this case happens to contribute to a rosier outlook for Decker than if Mays were not a comp. This does not imply that a Mays like year is an assured, or even likely, outcome. See the Comparable Player and Comparable Year entries in the glossary for more details.
tbwhite
2/08
Mays' great career is what puts a target on this particular comp, but a LF with an OPS of .950 in the Calif League is not very comparable to a CF with a 1.017 OPS in a month in AAA followed by an .830 OPS in the majors in ~500 PA and oh yeah the RoY award. My guess would be that the lack of fielding data is hurting this comp. If you look at BaseballReference.com for Mays it just says he played OF his rookie year, no breakdown by where in the OF. Also, to be fair the quality of Decker's comps is low(I think it was around 62), but I just find it hard to believe that a defensively challenged 20 yo OF who crushes high A ball doesn't have more closely comparable players than a HoF CF. One more question, were the comps that are displayed cherry picked ? Because they seem like they are all ML players. I just can't believe that there isn't some obscure minor leaguer we never heard of who flamed out who is a better comp for Everett Williams than Mickey Mantle. And heck, Callison was pretty good too.
nosybrian
2/09
Good comment. See my further explanation above, which is consistent with your explanation.
tbwhite
2/08
Joey Votto is an interesting case. I would like someone at BP to please explain the following to me: Votto has the highest "Improve" rate at 61%. Yet his forecast calls for just 29 HR's. Digging deeper I see from Votto's player card that in 2010 he had 648 PA's , with 37 HRs, .324/.424/.600 for a .350 TAv. For 2011, he is projected for 615 PA's just 33 less than 2010. But he loses 8 homers, 26 points of AVG, 36 points of OBP, and 71 points of SLG. His TAv is .317 compared to .350 in 2010, and he has the greatest chance of improvement in 2011 of ALL hitters ? Something doesn't add up.
jimnabby
2/08
A lot of the questions in this thread would be cleared up if everyone just ran over to the glossary for a few minutes. "Improvement Rate is the percent chance that a hitter's EqR/27 or a pitcher's EqERA will improve *at all* relative the weighted average of his EqR/27 or EqERA in his three previous seasons of performance. A player who is expected to perform just the same as he has in the past will have an Improvement Rating of 50%." It seems pretty likely that Votto will improve over the average of his last 3 seasons, doesn't it? (Note that I'm not going to argue that this isn't a pretty non-intuitive way to define "improvement." Nor am I going to apologize for the triple negative in that last sentence.)
TheRedsMan
2/08
Exactly, Vottos' Improvement rate is so high because he went from solid to very good to MVP over the last 3 years. So even slipping back to the very good category leaves him higher than his 3 year average. That said, it certainly is not intuitive.
tbwhite
2/08
Look at Votto's player card. his average TAv for the past 3 years is .325. His weighted mean forecast for 2011 is .317, and he has the HIGHEST probability of improvement among all batters.
Oleoay
2/08
Out of curiosity, how much did Matt Wieters' rookie season projection change using this version of PECOTA?
cmac314
2/08
Of the 1012 batters listed, just 29 have an Improve % of 50% or greater. Something tells me that when all is said and done, 97% of MLB position players will not have declined in 2011. For comparison's sake, the January 2010 PECOTA listed 483 batters, and 90 of them had an Improve % of 50% or greater. Also in the Jan 2010 edition, there were 292 batters with a "Breakout" % of greater than 10%. This year there are zero. I don't believe the definition has changed. The 2011 leading Breakout batter candidate: Wilkin Ramirez, with his .289 projected OBP. Something's screwy here. Where's Nate Silver.
choms57
2/08
Just want to personally thank Colin, Ken, and Rob for emailing me and helping me figure out my pecota spreadsheet issue. Bp is the hands down best site ever.
crperry13
2/08
Best post in this thread. Very true.
TADontAsk
2/08
I completely understand the aged-related comps, and that when a 20-year old prospect has a comp of Willie Mays, they're saying that he's most comparable to a 20-year old Mays up to that point of Mays' career. NOT that he's going to have a Mays type career. What I DO question, is that there should be hundreds of comp possibilities. Thousands for the younger, minor league players. And with that consideration, it seems like the Mays, and the Kalines and the Younts are showing up at a higher frequency than we'd expect them to. Unless it just seems that way because we pause and pay extra attention when seeing "Willie Mays" on someone's comp list.
tbwhite
2/08
There are 16 out of 1012 batters with Yount as a comp, 7 with Willie Mays. Also, I disagree a bit. While the model isn't saying that Decker is going to definitively have a Mays type career, it does use it as a possible outcome. It does affect the results. That's why I worry about current minor leaguers being compared to old hall of famers. The current data for Jaff Decker doesn't exist for Mays when he was 20. We don't know how many games Mays played in CF at age 20. Also is the height and weight data available season by season. How PECOTA deals or doesn't deal with that type if missing data can have a huge impact on the results. If it isn't penalizing Decker for playing LF in High A ball at age 20, when Mays was playing CF(presumably) in the majors at age 20, then the results aren't going to be worth much.
BarryR
2/08
It doesn't matter what position Mays was playing in 1951. Mays, three months younger than Decker, was playing in the major leagues at age 20 after hitting .477 with a 1.323 OPS in AAA, while Decker was putting up a .950 OPS in the California League. These are not comparable hitters. It would seem to me that the comparables should start with players who did the same thing at the same level, not with players who were regulars in the major leagues.
jrmayne
2/08
That's not it. I've been looking at these comp lists for years with some care, and I'd bet Toyotas to Tonkas there are more HOF or active tremendous players in the comps. Four Mantle comps: Everett Williams Rymer Liriano Randall Grichuk Oswaldo Arcia Mantle's age-20 season: 311/394/530, OPS+ of 161. In the big leagues. Mickey Mantle should not be one of the top five hundred comps for these guys. Before, if you had Mantle or Kaline or Yount in your comps, you'd have an actual chance - if slight - of being similar to Mantle or Kaline or Yount. Rymer Liriano isn't going to be Mickey Mantle unless he gets bitten by a radioactive spider or something similar. Dave Winfield is not a good comp for Calvin Anderson. Period. There is a problem, and it's a serious one, and while reboots are hard (especially in this case), someone should have caught this before publication. It took me twenty minutes to find the pattern and twenty more to confirm. --JRM, welcoming our new Roberto Clemente, currently known as Jay Austin.
TADontAsk
2/08
That's the thing. You'd expect this if it was listing the top 20 comps, or maybe even top 10. The confusion is that these are the top THREE comps.
DLegler21
2/08
While I agree that it is troubling that great players are showing up in the top 3 comps so often, I think you are taking the term comp too literally. If I understand PECOTA correctly, the comps determine the "shape" of a players career (rise or fall from current, eventual decline, etc) moreso than the magnitude of the numbers. Comps determine this "shape" of the performance curve and then are applied to most recent level of performance (last 3 years) to come up with the projections. Colin and team, is this a correct (if simplified) explanation?
mikefast
2/08
Yes.
jrmayne
2/08
I understand this. But the comps are supposed to be comparable players under the theory that comparable players will age comparably. Uncomparable players will not age comparably. If you think a guy mashing some in High Desert at the age of 22 is going to follow the same route as a guy mashing in the bigs at age 22, I'd like to see some authority for that. This is also a massive change to the comp system that built the Pecota brand; the comp system was based on actual comparable players, rather than people mountains better than the player being evaluated. If the comps are nutty (and indeed they are) for minor leaguers, the shape is based not on actual comps but on a semi-random assortment of players. Further, if the comps are no good, the percentile rankings lose substantial value. Further further, it seems clear that these errors have made for bad projections for older minor leaguers - Kila, Dubois, Bowker, and others have wildly optimistic projections. If you think I'm wrong, I'll make a bet (cash to charity of winner's choice) and give you 7-5 on the over on all of the aging minor leaguers. --JRM
norraist
2/08
I was just wondering why players who are going to either miss full or hals seasons (Strasburg and Johan Santana spring to mind) are listed as pitching so many innings?
mikefast
2/08
With the depth charts will come human input about likely playing time. The weighted means spreadsheet is the output that PECOTA gives without knowledge about who has already suffered an injury that will cause them to miss time next year or whose playing time may change because of change in role.
cwyers
2/08
Because these are the weighted means, not adjusted for playing time. Playing time projections are coming in a week, with the depth charts. The two separate types of forecasts exist because there are two different ways of using the forecasts. I mean, there are a bunch of guys in that spreadsheet whose forecasted MLB totals for next season I can give you without having to do a lick of math - 0s across the board. Forecasting playing time for players we know won't get any playing time is useful... well, whether or not it's useful is up to you, so I shouldn't say that. But what it allows you to do is use PECOTA, if you like, to answer questions like, "What would the Nationals rotation look like if Strasburg was able to pitch this season?" Now, there are many, many use cases of PECOTA that require good MLB playing time estimates. We recognize that. We've announced when we are publishing them.
norraist
2/08
Very fair -- thank you for the quick response!
Hokieball
2/08
So is this the year that there will FINALLY be a uniform player ID that matches everyone between PECOTA, the PFM downloads, and the customizable in-season stat reports? IT would be awfully handy :)
cwyers
2/08
We'll be including IDs in the PFM output, and they will the the same ones we're including with PECOTA right now. I'll look into adding player IDs into the sortables.
jlebeck66
2/08
This may be out of the scope of Colin's power, but player ID's for historical stat reports and for the minor league translations would be spiffy too.
mikefast
2/08
Do not question the Colin's powers, or you risk being ground up and fed into his nutrient bath.
Oleoay
2/09
If he was so powerful, why would he need a nutrient bath?
BurrRutledge
2/09
Courtesy of wiki: In the original Doom Patrol series, The Brain was regularly portrayed as a disembodied brain, bobbing inside a sealed dome filled with a nutrient bath, hooked up with numerous machines, including a loudspeaker to convey his voice.
Oleoay
2/09
Shows how long its been since I read a comic book. The last time I did, it was Optimus Prime's disembodied head hooked up to numerous machines.
Richie
2/08
Nishioka is missing altogether, I believe.
worldtour
2/09
Can someone tell me why PECOTA loves Winston Abreu so much? Am I missing some Tommy John medical news, or am I reading the spreadsheet wrong??
BindleStiff
2/09
I'm not sure if this has been noted already but for those who have issues accessing the spreadsheet using their username/password, please note that the username/password are case-sensitive when accessing the PECOTA spreadsheet. Or at least they are for me (and have been every year). This is different than the main page login, where the username/password are not case-sensitive, so it can be a bit confusing if your username or password contains uppercase letters.
boness
2/10
A problem that I see with the comps is that everyone of the comps is a major league player. In the past, if you looked at the comps for a scrub in A ball, it was mainly other scrubs in A ball. What changed?
jberkon
2/10
Agreed - this does seem like a difference. Would love to know why and whether, as jrmayne suggests, this is inflating certain players' projections
drewsylvania
2/10
Is the Adrian Gonzalez projection adjusted for Petco and not Fenway? .880 OPS seems awfully low for Boston.
drewsylvania
2/11
Also, there are a LOT of guys who are on wrong teams. It's as though the offseason of trades never happened.
sldeck
2/14
What about Minnesota's Tsuyoshi Nishioka ? He appears to be MIA.
rscharnell
2/18
Any update on when VORP will be added to the spreadsheets?
jrbdmb
3/08
Concur strongly. For fantasy purposes VORP tends to be a much better measure than TAv or WARP, since most fantasy leagues do not take fielding and park adjustment factors into consideration. Please add this back in real soon, thanks.
jrbdmb
3/08
Actually, at this point a better question is when will the next update be released? Waiting for any updates based on early ST, plus of course the return of VORP.