CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here for forgotten password Click here to subscribe

<< Previous Article
Future Shock: Cincinna... (02/07)
<< Previous Column
Reintroducing PECOTA: ... (10/01)
Next Column >>
Reintroducing PECOTA: ... (02/08)
Next Article >>
Premium Article Prospectus Hit and Run... (02/08)

February 7, 2011

Reintroducing PECOTA

They're Here!

by Colin Wyers

And here we are–the release of the 2011 PECOTAs.

While I have your attention, I’d like to say a few words about the production of the PECOTAs this year. I guess I don’t have to tell you that I’m filling some pretty big shoes here–Nate Silver is probably the most famous sabermetrician not named Bill James, and PECOTA is where Nate made his biggest mark in our community. And so I’m building on his work–and work by people like Clay Davenport and Gary Huckabay, too. I am, as they say, standing on the shoulders of giants. (In big shoes, apparently–this is what I get for mixing my metaphors.) So I owe them, and others I’ve probably neglected to mention, my deepest thanks.

But even with all that, I wouldn’t have gotten this far without a lot of help. I couldn’t have accomplished what I’ve done without the help of everyone here at Baseball Prospectus, who have really given me all the support I could ask for. But I want to thank a few people especially–it’s a fine line to walk, as if I list too few I risk upsetting someone unfairly left off, and if I list too many, you won’t finish reading the list. So with that in mind–very special thanks to Rob McQuown, Mike Fast, Ben Lindbergh, and Steven Goldman. Gentlemen, take a bow, and everyone, please, give them a round of applause.

Now, then, on to business.

This is the first release of PECOTA, and as such will continue to undergo revisions through the remainder of the offseason. The program we use to generate the PECOTAs is continually evolving, and when we discover new ways to improve the forecasts, we'll make those changes and pass the updated forecasts on to you. We’ll also be updating periodically to keep up with players who switch teams.

In addition, we have several PECOTA features yet to roll out out–first will be the Depth Charts, which combine the PECOTA forecasts with estimates of a player’s role and playing time. Those should be ready a week from now, and will be available in two forms: the team Depth Chart pages, and the Player Forecast Manager (which is receiving some upgrades as well).

After that, we’ll be publishing the PECOTA cards, featuring perks like the percentiles and the ten-year forecasts. We’ll update you more as we get closer to that point.

We’re also revamping the cards to use the new Wins Above Replacement Player model we’ve developed. PECOTA has already been adapted to use the new WARP, so WARP baselines have shifted a bit from what you’re used to seeing. The biggest change is among relief pitchers, who take a major hit. Please keep this in mind as you review these forecasts. We know that many of you are relying on these forecasts for your fantasy teams, and we thought that it was better to get the forecasts out now rather than wait for when the entire site was ready to transition to new WARP.

Now, some of you may be asking, “How good are the PECOTAs this year?” Of course, we won’t know the answer for another eight months or so. But we can come up with an educated guess, if we make the assumption that there’s nothing special about predicting the 2011 season, and that a system that works over previous seasons will work in succeeding seasons.

In the course of producing the PECOTAs, we generate forecasts for every player who played from 1950 through 2010. These aren’t quite the same as the full PECOTAs–they are park-neutral, rather than being adjusted for the home park a player plays in. They are not age-adjusted. And for the most part, they do not reflect minor-league performance–we have major-league data for all of MLB history, but very little minor-league data. Still, they do represent a substantial portion of the PECOTA process.

It is time-prohibitive for us to generate full age curves for all of these historic forecasts, but we did adapt a simplified set of age adjustments for past purposes. These simplified PECOTA forecasts aren’t as accurate as the full PECOTAs, but they give us a chance to view how well PECOTA fares over a large swath of history.

There’s only one other projection system available and therefore able to be pitted against PECOTA for such a large part of baseball history: the Marcels, originally developed by Tom Tango. (The version we’re using in these tests was published by Jeff Sackmann.)

I “re-baselined” each forecast for each season in the test by subtracting the average forecast and adding in the average performance of the players forecasted (weighted by playing time, in both instances). Then I took a look at two tests–one is root mean square error, which tells us that 68% of forecasts were within that margin of error. The other is simply counting which forecast was closer to a player’s actual performance. Looking at offensive stats first:

P_OBP_RMSE

P_SLG_RMSE

M_OBP_RMSE

M_SLG_RMSE

P_OBP_SR

P_SLG_SR

0.036

0.066

0.036

0.066

56%

54%

In terms of RMSE, our simplified PECOTA is in a dead heat with the Marcels. (And again, PECOTA is giving Marcels an edge, as it is forecasting everyone for a neutral park; the Marcels make no park adjustments, but most players do not switch teams or parks between seasons.) In terms of “success rate” (in other words, the percentage of head-to-head projection matchups "won"), PECOTA has a slight edge.

Now, for pitching:

P_ERA_RMSE

M_ERA_RMSE

P_ERA_SR

1.22

1.22

48%

Again, RMSE shows a dead heat. In terms of success rate, the Marcels have a slight edge over the simplified PECOTAs.

Like I said, this sort of testing emphasizes breadth rather than depth–PECOTA has foregone several of its advantages, like park adjustments and minor-league data. And yet it’s still producing accurate forecasts.

Now, just as a reminder–PECOTAs are available only to our subscribers (Premium and Fantasy) who are signed up for a whole year. If you haven’t already subscribed, you can do so here. Doing so doesn’t just get you access to PECOTA, but also to our fantasy tools like the Player Forecast Manager and Team Tracker, as well as exclusive access to some of the best baseball writing out there. If you’re already a subscriber, thank you for your support, and I truly hope that you all enjoy what we do as much as we enjoy doing it. I am continually amazed at how intelligent and knowledgeable our readers are, and I really do think that the people who read BP are among the best baseball fans I’ve ever known.

That’s all I've got–have fun, folks.

Colin Wyers is an author of Baseball Prospectus. 
Click here to see Colin's other articles. You can contact Colin by clicking here

256 comments have been left for this article. (Click to hide comments)

BP Comment Quick Links

NYYanks826

+1 with a ridiculously large number of zeroes following it.

Feb 07, 2011 01:42 AM
rating: 2
 
Chomsky
(103)
Other readers have rated this comment below the viewing threshold. Click here to view anyway.

I see the PFM isn't quite ready yet - ETA?

Feb 07, 2011 01:49 AM
rating: -17
 
Benjamin Harris

"In addition, we have several PECOTA features yet to roll out out–first will be the Depth Charts, which combine the PECOTA forecasts with estimates of a player’s role and playing time. Those should be ready a week from now, and will be available in two forms: the team Depth Chart pages, and the Player Forecast Manager (which is receiving some upgrades as well)."

Feb 07, 2011 03:50 AM
rating: 7
 
NYYanks826

Looks like, according to PECOTA:

- Ryan Howard (41) is the only player hitting more than 40 homers this year.

- Joe Mauer (.317) will lead the league in batting average, with only a small handful of players even breaking .300.

- Juan Pierre will lead the league with 50 SB. Interestingly, Derrick Robinson has the next highest predicted steal total with 47.

- There will be a three-way tie with CC Sabathia, King Felix and Dan Haren for the league lead in wins (15). Those same three will lead the league in strikeouts, with King Felix taking the K crown at 204.

- Joe Nathan is going to be the runaway saves leader (40), with K-Rod behind him at 35.

Great stuff guys! This is one of the two most exciting days of the baseball pre-season, with the other being the release of the newest MLB: The Show video game.

Feb 07, 2011 02:03 AM
rating: 2
 
BP staff member Neil deMause
BP staff

Keep in mind that these are weighted means - in real life, there are inevitably going to be outliers who do better because they exceed their forecasts. So saying "PECOTA thinks only one player is going to hit 40 homers" is a bit like saying "PECOTA thinks everyone is only going to roll sevens at the craps table."

Feb 07, 2011 06:06 AM
 
deacon14

Could someone better explain Weighted Means to me? What exactly is the spreadsheet we are looking at? One, will playing time affect Jason Heyward's home run number or only a variance away from the mean. Two, what are Mike Trout's projections supposed to represent, if he played in the majors this year?

Feb 07, 2011 13:01 PM
rating: 4
 
pobothecat

excellent question. thank you for asking it.

Feb 07, 2011 15:01 PM
rating: 0
 
Tony

From my PECOTA experience:

This spreadsheet shows the weighted mean projections for each player - which means that the numbers shown in the spreadsheet take into account ALL possible outcomes for how a player will perform in 2011 and weighs them by how likely they are to happen.

All data is expressed as each player plays in the Major League environment environment of their currently listed team.


Please feel free to correct any portions of the above which are incorrect...

Feb 07, 2011 17:10 PM
rating: 1
 
deacon14

Thank you.

Feb 08, 2011 08:44 AM
rating: 0
 
BP staff member Steven Goldman
BP staff

We just put this up! The world is asleep and you're here! I love you guys!

Feb 07, 2011 02:15 AM
 
NYYanks826

Such is the advantage of working an overnight shift at work tonight :)

...getting a lot of work done, as you can tell.

Feb 07, 2011 02:21 AM
rating: 3
 
saint09

Australia has it's advantages, most notably early evening release of pecota.

Feb 07, 2011 02:21 AM
rating: 3
 
Tipman

Just looked at the Mets team quickly, and Kelvim Escobar with a 3 WARP, which is 2nd highest on Mets pitching staff? I know the Mets pitching staff is bad, but didn't think a guy who is going to pitch in the majors would be 2nd best!

Feb 07, 2011 04:03 AM
rating: 0
 
R.A.Wagman

Question about playing time estimates. As this release is not adjusted for playing time, I'll have to take the over on Brandon Morrow, among others. Will the post-depth chart release include such adjustments?

Feb 07, 2011 04:44 AM
rating: 0
 
smallflowers

YEAHHHHH!!! This is awesome.

(In the next iteration, can you add the league to each line? Easier to sort for AL- or NL-onlys!)

Feb 07, 2011 04:50 AM
rating: 0
 
smallflowers

Ah, and positions! Sort sort sort!

Feb 07, 2011 04:54 AM
rating: 3
 
jpjazzman

Also please add positions for sorting, really a pain without that.

Thanks!

Feb 07, 2011 05:11 AM
rating: 3
 
BP staff member Ken Funck
BP staff

Here's how you can do this yourself in Excel.

If all you need is to know the first position listed for the player in the FRAA column (i.e., the primary position at which PECOTA thinks the player will appear), create a column entitled "POS" at the end of the spreadsheet in Column AQ, and then use this formula in this column for the first player listed:

=LEFT(AF2,2)

then copy it into every player's row.

If, however, you want to see if the player is mentioned as playing that position at all, you'll need to created one column for each of the nine positions, and label them "1B", "2B", etc. Under the column you name "1B", you would enter this formula for the first player listed:

=IF(ISERR(FIND("1B",$AF2,1))=TRUE,"","1B")

The FIND function looks to see if the string "1B" is found somewhere in the FRAA column. The ISERR function looks to see whether there's an error (if it doesn't find string "1B", the result is a #VALUE error -- you could leave those, but it looks kind of ugly). The IF function states what to do if there's an error (i.e., put a blank space in the cell) or if there isn't an error (i.e., put the value "1B" in the cell).

This will put the value "1B" in the column for that player if they're listd at 1B in the FRAA column, and an empty field if not. The "$" before the AF ensures that when you copy this cell, the reference to column AF (i.e., the FRAA column) will stay the same, so you can then copy the formula into your 2b column, edit the formula where it looks up and then stores "1B" to now say "2B", and copy that for all players. Follow the same steps for 2B, 3B, SS, LF, CF, RF and C, and you'll have a column for each position on which you can either sort of filter.

If you want a column that will describe if the player is listed by PECOTA for ANY outfield position, you can use this formula in its own column:

=IF(ISERR(FIND("F",$AF2,1))=TRUE,"","OF")

Hope this helps.

Feb 07, 2011 06:24 AM
 
stlpdx

OR...you could do it for us on a later release?

Feb 07, 2011 19:22 PM
rating: 0
 
norraist

Minor quibble -- seems that the find string for "C" will return a hit for both C and CF. This is because of the same reason using "F" can return any OF.

Feb 08, 2011 18:48 PM
rating: 0
 
BP staff member Ken Funck
BP staff

Search for "C " (i.e., c followed by a space). I should have specified that -- sorry.

Feb 08, 2011 19:09 PM
 
norraist

This would also work (saw your reply after I puzzled it out)


=IF(ISERR(FIND("C",$AF2,1))=TRUE,"",IF(ISERR(FIND("CF",$AF2,1))=TRUE,"C",""))

Feb 08, 2011 20:16 PM
rating: 0
 
mblthd

PECOTA's not too pleased with Jason Heyward.

Feb 07, 2011 04:51 AM
rating: 0
 
CRP13

Or the brothers Upton. My fantasy outfield is screwed! :)

Feb 07, 2011 06:25 AM
rating: 0
 
billminick

How exactly do I access the PECOTA projections? This is not user-friendly at all. Please advise. Thanks.

Feb 07, 2011 05:05 AM
rating: 0
 
NYYanks826

Once you open the Excel file, you should see tabs at the bottom for batters and pitchers.

Feb 07, 2011 05:18 AM
rating: 0
 
jpjazzman

I think he means where the physical file is (which is a bit hidden)

Click the fantasy tab, then there is the fantasy box about halfway down the page on the right side with the spreadsheet link. Definitely a bit buried for some reason.

Feb 07, 2011 05:34 AM
rating: 2
 
BP staff member Neil deMause
BP staff

Or just click on "have fun, folks" at the end of the article.

Feb 07, 2011 06:03 AM
 
tylernu

Which is the very definition of burying the lede...the link should be much more obvious.

Feb 07, 2011 20:37 PM
rating: 2
 
Hoff

My goodness, thats quite the gap between pujols and the rest of the hitters. Maybe he should offer the rest of them lessons.

Feb 07, 2011 05:09 AM
rating: 0
 
Hoff

The expected difference between him and braun in warp is as big as the one between braun and j.d. drew.

Feb 07, 2011 05:10 AM
rating: 1
 
Bob

Awesome. But where is "upside?" Believe it or not, THIS is the reason I subscribe to Baseball Prospectus.

Feb 07, 2011 05:17 AM
rating: 13
 
zstine1

it is important part for keeper league owners. please bring it back

Feb 07, 2011 06:41 AM
rating: 0
 
mwright

Echo this sentiment. I'm sure you guys are swamped today but at some point would like to know if "upside" will be coming back or if there will be any other long-term value metric introduced in its place.

Feb 07, 2011 08:39 AM
rating: 1
 
pakdawgie

I'm assuming that it will take a couple of weeks - they need the long term projections (i.e. the info on the player cards) to calculate this.

Feb 07, 2011 08:52 AM
rating: 1
 
Tony

this was the first thing that I looked for, too!

(after looking to see what I had accidentally filtered when I noticed the WARP gap between Pujols and the next best hitter...)

Feb 07, 2011 15:08 PM
rating: 0
 
BP staff member Colin Wyers
BP staff

Upside requires the percentile forecasts, which typically aren't ready for the initial PECOTA release. As we get closer to the cards (and I'm not saying we'll wait until the cards are released) we'll get that added to the spreadsheet.

Feb 07, 2011 15:13 PM
 
Bob

Thanks, Colin. Any ballpark ETA?

Feb 07, 2011 16:36 PM
rating: 0
 
jlefty

Oh this is torture. The URL for the spreadsheet includes the words "baseball" and "fantasy" which means it's blocked here at the office--criminal, I know!

Feb 07, 2011 06:38 AM
rating: 2
 
leites

The comps for Carlos Santana are David Wright, Alvin Davis and Carl Yastrzemski (!). Does this mean PECOTA does not think Santana will remain a catcher?

Feb 07, 2011 06:39 AM
rating: 0
 
leites

Also amusing to see that one of the comps for Mitch Moreland is Adrian Gonzalez . . .

Feb 07, 2011 06:42 AM
rating: 1
 
zstine1

and chris iannetta being compared to gary sheffield and brian giles

Feb 07, 2011 06:46 AM
rating: 0
 
leites

And Gordon Beckham's #1 comp is Gary Sheffield. (Why does PECOTA seem to think Sheffield started out as a second basemen? A few years ago he was the top comp for Dustin Pedroia.)

Feb 07, 2011 06:47 AM
rating: 0
 
Benjarvis

Well, Sheff did start out as an infielder; 3B and SS, I believe.

Feb 07, 2011 07:00 AM
rating: 2
 
GoTribe06

He is also on Mike Trout's comp list. Anybody who is anybody has a Gary Sheffield comp.

Feb 07, 2011 08:05 AM
rating: 1
 
leites

Hah! Other players this year who have Sheffield as a comp include Bobby Abreu, Kosuke Fukudome, Shin-Soo Choo, Magglio Ordonez (who also has Stan Musial as a comp), Jacob Smolinski (who also has Ron Santo), and Carlos Beltran.

Feb 07, 2011 09:05 AM
rating: 1
 
leites

Although two of Smolinski's three comps are to near-hall-of-famers, he is not listed by KG among the top 20 prospects in the Marlins system . . .

Feb 07, 2011 09:13 AM
rating: 1
 
leites

Jay Austin's comps are Roberto Clemente, Robin Yount and Adrian Beltre . . .

Feb 07, 2011 07:18 AM
rating: 0
 
LynchMob

My favorite is Evertt Williams comp'd to Mikey Mantle!

Feb 07, 2011 08:35 AM
rating: 2
 
leites

And Kevin has Austin as a two-star prospect ("An intriguing center fielder, Austin has speed and power potential, but his bat lags behind.")

Feb 07, 2011 09:09 AM
rating: 0
 
zstine1

also an AL or NL column, which has been in past spreadsheets is very useful for AL or NL only league fantasy owners.

Feb 07, 2011 06:44 AM
rating: 1
 
Brian DewBerry-Jones
(244)

I'm getting the error "The requested URL /fantasy/files/PECOTA_20110207.xls was not found on this server."

Feb 07, 2011 07:01 AM
rating: 2
 
BP staff member Colin Wyers
BP staff

We're having some server difficulties at the moment, folks - we'll have everything back up and running a few minutes.

Feb 07, 2011 07:04 AM
 
almartin

I'm also getting the 'requested URL' error. Please advise!

Feb 07, 2011 07:12 AM
rating: -1
 
almartin

I'm a premium subscriber but am getting a 403 Forbidden error. Is this on my end or yours?

Feb 07, 2011 07:23 AM
rating: 1
 
jgergeni

I am having the exact same problem. I'm wondering if it has to do with the use of "fantasy" in the spreadsheet name.

Feb 07, 2011 07:40 AM
rating: 0
 
almartin

It's working now - thanks guys!

Feb 07, 2011 07:46 AM
rating: 0
 
doog7642

No breakout scores above 10% for hitters? Seems odd.

Feb 07, 2011 07:14 AM
rating: 2
 
acmcdowell

Yes, it does seem odd, especially in light of the extremely high breakout values for some pitchers (20+ over 20%, with a mix of injury recoveries, job shifts, and young players).

Feb 07, 2011 10:12 AM
rating: 0
 
tbwhite
(361)

Pitchers are more unpredictable, so it makes sense that in general they would be more likely to break out or crash and burn. However, the lack of any hitter over 10% does feel low. I suppose the logical next question is what exactly does "breakout" mean, and has the definition changed at all from previous years ?

Feb 07, 2011 12:50 PM
rating: 0
 
doog7642

I have no math mind whatsoever, but this seems like a big deal. "Breakout" used to mean (if I'm not mistaken) the percentage chance that the player put up a statline that was 20% (I think) more than the baseline projection. If across the board, no hitter is having that happen more than 10% of the time, what does that mean? Does it mean the comps are consistently conservative? Does it mean that regression to the mean is eliminating potential outlier performance projections? I would like to understand this better.

Feb 07, 2011 13:39 PM
rating: 0
 
John Carter

The link "have fun folks" isn't working on my computer. My server can't find that web page.

Feb 07, 2011 07:21 AM
rating: 1
 
slepican

Anyone having luck accessing in the last hour? I am getting the 403 Forbidden error.

Feb 07, 2011 07:38 AM
rating: 0
 
geefsu

I can't access the page, either. Is there still a server issue or do I need to look into this further?

Feb 07, 2011 07:38 AM
rating: 0
 
BP staff member Colin Wyers
BP staff

Everyone, the server should be fixed now. Sorry for the delay - let us know if you have any more problems.

Also, league and position fields have been added to the spreadsheet, so if you were interested in those, go ahead and download it again.

Feb 07, 2011 07:42 AM
 
doog7642

Am I correctly understanding that once fielding is taken into account, PECOTA thinks Jim Edmonds is an outright improvement over Colby Rasmus in CF for the Cards?

Feb 07, 2011 07:47 AM
rating: 1
 
Cromulent

Thanks, guys. Looking forward to diving in. One quick request: always enjoyed the companion article Nate wrote about what surprised him, what projections looked high or low, etc. I don't know if that's planned, but it would be welcomed, at least by me.

Feb 07, 2011 07:53 AM
rating: 16
 
ATLExile

Not to pick individual projections apart, but Freddie Freeman as -17 FRAA at 1B, and below replacement level? Really? All the scouting I've heard rates his glove as above average, but PECOTA sees him as Adam Dunn.

Feb 07, 2011 07:56 AM
rating: 0
 
Mooser

So 381.1 WARP for hitters and 620 WARP for pitchers. I know that playing time is not adjusted yet, but seems too heavily weigthed to pitchers. A leaderboard for WARP would show Pujols at #1, then 14 starting pitchers, and then Ryan Braun as the second best position player.

Feb 07, 2011 08:04 AM
rating: 2
 
doog7642

Perhaps there is a relationship between the low WARPs and the low breakout scores for hitters. Could it be that this PECOTA is particularly gunshy about suggesting significant steps forward for younger players?

Feb 07, 2011 08:09 AM
rating: 1
 
ATLExile

It sees Jason Heyward as taking a significant step backwards. Buster Posey seems to be treading water in rate terms, but taking an obvious step forward in playing time.

Feb 07, 2011 08:17 AM
rating: 0
 
dREaDS Fan

Down in the weeds, but Fred Lewis is on the Reds nowadays, not TOR. (I was curious how bad a LF the Reds will field this year between Lewis & Gomes.)

Feb 07, 2011 08:19 AM
rating: 2
 
LynchMob

Chris Young (the pitcher) now with Mets

Feb 07, 2011 09:12 AM
rating: 0
 
douglasgoodman

Anyplace in particular we should send names that seem like they should be in the spreadsheet but are not? Happened to notice Stolmy Pimentel was not there -- seemed like he was advanced enough to get a card...

Feb 07, 2011 08:33 AM
rating: 0
 
MattBey

So, Victor Marte is Carlos Zambrano's #2 comp? Let's look at the differences between Victor Marte and Carlos Zambrano.

Zambrano made his major league debut at age 20.
Marte made his major league debut at age 28.

Marte has pitched 39.2 career innings in the majors.
Zambrano has had 39.2 career tempter tantrums in the majors.

Marte has zero professional starts according to B-R.
Zambrano has over 300 professional starts according to B-R.

Marte is 6'2" and fat
Zambrano is 6'5" and fattish

Somehow I don't see the comparison, I really don't. Nevermind Zambrano's been under 4.00 ERA for a decade now and PECOTA still thinks his true level is above it.

Feb 07, 2011 08:38 AM
rating: 5
 
MattBey
Other readers have rated this comment below the viewing threshold. Click here to view anyway.

and an aside, Archer has Nick Adenhart as a comparison, does that mean PECOTA thinks Archer has a higher chance of flaming out by age 23? Sad that I even have to ask this question.

Feb 07, 2011 08:39 AM
rating: -19
 
CRP13
Other readers have rated this comment below the viewing threshold. Click here to view anyway.

Horribly hilarious. I don't know if I should "plus" or "minus".

Feb 07, 2011 09:03 AM
rating: -9
 
LynchMob

I assume the basis for this comp can be explained by this statement from Colin's article last week ...

This year, we're encouraging PECOTA to rely more heavily on minor league comps for minor league players.

Feb 07, 2011 08:43 AM
rating: 0
 
MattBey

There's nothing wrong with the comparison between Archer and Adenhart, but does PECOTA think Adenhart had a massive baseball related injury that caused him to be out of baseball by 23? If so, wouldn't it factor that into Archer's unfairly giving him a higher chance of flaming out?

Feb 07, 2011 08:51 AM
rating: 2
 
BP staff member Colin Wyers
BP staff

Adenhart is a special case as a comp, yeah. We present three comps as a rough guide to what the comps look like, but the actual PECOTA projections use a lot more players than that. So it probably has an impact on his MLB%, but not as much as you would think.

It's rare that a case like this happens, but not unheard of. In addition to Adenhart, you've got Cory Liddle, Roberto Clemente, Darryl Kile... there's 22 of them total, it looks like. (And I may have constructed my query a bit too narrowly, looking only at players who died in the same year as their last appearance.)

Again, I don't think this has a significant impact on anyone's numbers, but I do agree that it doesn't really help PECOTA to include these types of players in anyone's comp lists, for those seasons. I'll write up a fix tonight and have it out for the next round of PECOTAs.

Feb 07, 2011 16:00 PM
 
rscharnell

Has BP totally eliminated VORP for the WARP projections this season or will the VORP still show up on the player cards?

Feb 07, 2011 08:38 AM
rating: 2
 
Chad Supp

PECOTA does not share my affection for DLR.

Feb 07, 2011 08:42 AM
rating: 0
 
dREaDS Fan

de la Rosa? I caught that, too. Made me think twice about keeping him in my deep NL-only league.

Feb 07, 2011 09:31 AM
rating: 1
 
Peter Hood

When I open the spreadsheet, it's empty. No player data at all.

Feb 07, 2011 09:04 AM
rating: 0
 
Peter Hood

OK, fixed that my Excel was hiding the tabs at the bottom.

Feb 07, 2011 09:07 AM
rating: 0
 
fgreenagel2

Yeah, where's the VORP?

Feb 07, 2011 09:10 AM
rating: 2
 
dconner

I'm a bit annoyed by the lack of VORP myself, as that's my number one metric. I guess I'll just have to start focusing on WARP and trying to make the mental adjustment myself.

Feb 07, 2011 10:36 AM
rating: 1
 
DLegler21

Yes, please add VORP, its my primary research tool for my league. I hope its not being discontinued, will cause all sorts of rework in my models.

Feb 07, 2011 10:55 AM
rating: 3
 
BP staff member Steven Goldman
BP staff

We will continue to present VORP here at BP.

Feb 07, 2011 11:47 AM
 
DLegler21

Thanks Steve. Can VORP be added to the file?

Feb 07, 2011 13:18 PM
rating: 2
 
Luke in MN

Defensively, Mauer's projected at -2 at catcher and McCann's a +2 (every catcher seems to be in that narrow range). Mauer's averaged about +5 FRAA a year over his career and McCann about -4. Have you changed the way you calculate this stat? They're both at similar spots in their career, so it seems odd me me.

Also, Justin Morneau as Mauer's #1 comp? They're pretty disimilar hitters (and fielders for that matter).

Feb 07, 2011 09:15 AM
rating: 2
 
Richie

Any way of letting PECOTA know that Morrow and CJ Wilson are now starters? Among other things, then getting revised ERA and WHIP estimates for them?

Feb 07, 2011 09:47 AM
rating: 1
 
belowm

Also no longer with the Blue Jays: Miguel Olivo (SEA) and Mike Napoli (TEX). Not with the Mariners: Russell Branyan (FA) and Guillermo Quiroz (SD).

Feb 07, 2011 09:52 AM
rating: 0
 
PelotaDiSoldi

Wow, PECOTA does not believe in Delmon Young's power. I think his projections were better 4 years ago.

Not a big fan of Rasmus, either. Pegging him for .100+ less OPS this year than '10.

Other remarkable lines:
- Dan Johnson projected for the same HR total as Longoria and a higher OPS. Wow.
- it appears to be kind to Clay Buchholz, everyone's favorite regression candidate.
- PECOTA loves Haren. He and King Felix are the only two projected for 200+ K.
- Rumors of Beckett and Vazquez' demises appear to have been greatly exaggerated.

Feb 07, 2011 09:58 AM
rating: 0
 
Joe D.

I think Vazquez is precisely the kind of pitcher where you should give PECOTA a lot less say than for other pitchers.

If I recall correctly, PECOTA has no way of knowing that Vaz' velocity dipped significantly last season. The easier park/league will help, but in Rotoland, I am certainly not paying for the 3.79 ERA and ~200 Ks PECOTA is projecting.

Feb 07, 2011 13:44 PM
rating: 1
 
LynchMob

How long before you think PECOTA will incorporate into its algorithm data from hitfx/pitchfx? So that, for example, it will "know" that Vasquez lost some mph last year ...

Feb 07, 2011 14:17 PM
rating: 1
 
BP staff member Mike Fast
BP staff

I'm really looking forward to that day, and being part of making that happen.

The obvious problem is that that data doesn't exist prior to 2007, so the sample size is still very small. Every additional year of data helps us solve that problem.

The other "problem" is figuring out what things to incorporate from PITCHf/x, and how. Doing that with fastball velocity isn't trivial, but it's probably doable. Doing that with other pitch data is more problematic.

If you wanted me to give a ballpark guess, I'd say that I really hope that two years from now, PECOTA includes PITCHf/x information. That's not a guarantee or even a firm estimate or something I've discussed with Colin. It's me putting my finger up into the wind and projecting into the future how well we've done at digesting the PITCHf/x information that we have.

Feb 07, 2011 14:27 PM
 
Luke in MN

Are you not publishing a context-neutral measure for pitchers anymore? You have tAV for batters, but I assume the ERA and FRA numbers for pitchers include adjustments for park and league, etc. For my money getting the context-neutral numbers is a big deal.

Feb 07, 2011 10:02 AM
rating: 1
 
Mikedaddy

So, Stephen Strasburg is going to throw more innings (122.6) than Mat Latos (104), Ian Kennedy (100.6) and Phil Hughes (121)?

Feb 07, 2011 10:07 AM
rating: 2
 
jetheinenkel

Just noticed that too - PECOTA projects Mat Latos to pitch in only 104 IP with 20 GS, but in those 20 games, it projects him to be a top pitcher. Does this mean we should interpret PECOTA as suggesting he has a very high propensity for a season-ending or prolonged injury? All the comparables (Gallardo, Hughes, Elarton) had injuries early in their careers as well directly after a successful early-career season.

In some other articles, there's mention of a new method of analysis regarding a players likelihood for injury. It would be useful to have a metric here denoting injury propensity so we can view the projections with that in mind.

Feb 07, 2011 10:11 AM
rating: 1
 
geoff

These have not yet been adjusted for playing time. Strasburg's projections therefore are a "what could have been." This was pointed out in a previous article and in the intro to the projections today, iirc.

Feb 07, 2011 10:16 AM
rating: 4
 
smocon

Dratted content filter at work!!!!

Cant wait for my book to arrive next week so that I can finally pin down my prediction for the '11 Brewers.

Feb 07, 2011 10:24 AM
rating: 0
 
barosey

Can someone point me to a description of 'MLB_PCT'.

I thought I understood, then saw Bobby Jenks at 7%, so perhaps not.

Feb 07, 2011 10:31 AM
rating: 2
 
knockoutking

heh glad to see JP Howell is projected to throw 103 games, 121 IP lol

9 k/9, 103 games? sounds like fantasy gold!

Feb 07, 2011 10:40 AM
rating: 1
 
McLovins

The Sort feature that is set up for each column on the Hitters tab is a huge time saver. Please add that for the Pitchers tab as well. Particularly when the other columns start filtering in like Upside, and Vorp etc... Thanks.

Feb 07, 2011 10:48 AM
rating: 0
 
jetheinenkel

If you're working in a recent version of MS Excel, you can set this feature up yourself by selecting the "Sort and Filter" button in the "Home" tab, then selecting "Filter."

Feb 07, 2011 10:58 AM
rating: 0
 
knockoutking

very easy to update/add this to the pitchers side as well (yourself)

Feb 07, 2011 11:01 AM
rating: 0
 
cjrhgarmon

This article is actually kind of depressing. I understand that the PECOTA used for the comparisons is stripped-down, but, if I am reading this correctly, the stripped-down PECOTA is in a dead heat with Marcel with regard to accuracy. My understanding is that Marcel is the most basic projection system out there: basically an age-adjusted three-year average. If you do all of this work and are no more accurate than Marcel, well then what's the point?

In my opinion, the real value of the PECOTA forecasts is not the mean projections. The value of the PECOTA is that it is the only projection system I know of that publishes the forecast distribution (e.g., breakout, attrition, etc.).

Feb 07, 2011 10:50 AM
rating: 8
 
leites

Last year Nate Silver wrote a piece in the NY Times making exactly your point:

"The key difference in Pecota, the forecasting system that I developed eight years ago to predict the performance of baseball players, was not that it did better than its competition, on average (it did in most years, but only by a tiny bit). Rather, it was that it looked at the uncertainty in the forecast as a feature rather than a bug.

For example, it didn’t just tell you how many home runs Derek Jeter would hit on average, but what a best-case scenario looked like and what a worst-case scenario looked like. This not only made the forecasting system more honest, but also provided a lot more information to the reader.

People often forget that there are essentially two parts to any forecast: what we can think of as the mean forecast (“our best guess is that Sarah Palin will win by 7 points”) and the confidence interval (“the margin of error on my guess is plus or minus 9 points”).

http://fivethirtyeight.blogs.nytimes.com/2010/09/29/the-uncanny-accuracy-of-polling-averages-part-i-why-you-cant-trust-your-gut/?scp=7&sq=PECOTA&st=cse

Feb 07, 2011 11:12 AM
rating: 5
 
evo34

The goal should be to improve accuracy. Period. Any smart user is well aware of variance. So no, the goal of PECOTA should not be to explain what variance is; it should be to improve forecast quality (average error).

Feb 07, 2011 12:22 PM
rating: -2
 
TADontAsk

At some point, there is only so much accuracy you can attain. Should they continually work towards a better system? Absolutely. But unless you find a projection system that is 100% accurate - obviously not possible - then a report of the accompanying variance of said projections is just as important, if not more so, than the point projection itself.

Feb 07, 2011 12:42 PM
rating: 2
 
Tarakas

Actually, knowing the degree of variance is fairly handy.

Feb 07, 2011 15:28 PM
rating: 1
 
BP staff member Steven Goldman
BP staff

Today's release is just the beginning. The percentile breakdowns will be coming with the cards.

Feb 07, 2011 11:49 AM
 
rscharnell

Do you have an estimated date for that? I have a few days until my keepers are due in a league and would love to have as much information as possible before making the decision on my last few players. Thanks and great work.

Feb 07, 2011 15:15 PM
rating: 0
 
evo34

I, too, do not understand how/why Marcel is anywhere in this article. The goal is be accurate, not just "no worse than the worst projection system out there." If your model is not ready for release yet, then don't release it. People can get/generate Marcel projections easily enough on their own, if they need something for today.

Feb 07, 2011 12:25 PM
rating: -3
 
BP staff member Mike Fast
BP staff

Colin is more of an expert than I am on this, but I can say that Marcel is far from the worst projection system out there. It is consistently among the best. It is one of the simplest, that is true, but simple does not mean bad.

The value of PECOTA over Marcel is presumably in a number of other areas, including but not limited to, forecasts of rookies and minor leaguers, forecasts of fielding, forecasting other categories that Marcel does not, depth charts, PFM. Percentiles, comparable players, breakout/collapse percents, etc., could also be on the list, though with some of those things it needs to be proven that they are accurate. (Colin published some on this in the fall.)

Feb 07, 2011 12:43 PM
 
evo34

Marcel most certainly is not "among the best." This has been proven over and over. CHONE has crushed Marcel over the years, and PECOTA used to until 2009.

My question remains: is this (the spreadsheet in this post) some kind of stripped-down pre-PECOTA algorithm, or is it the final algorithm you plan to use for the pre-season 2011 projections?



Feb 07, 2011 12:52 PM
rating: -2
 
BP staff member Mike Fast
BP staff

I emphatically disagree, unless by "crushed" you mean "performed marginally better". The best projection systems, by which I mean CHONE, ZiPS, PECOTA, have historically been in the same neighborhood as Marcel, and outperformance of Marcel by those systems on rate stats (e.g., OPS, ERA) has been small.

You should also note here that Colin has removed park adjustments from PECOTA and thrown out rookies in order to compare to Marcel. He mentioned that, but it bears repeating.

Feb 07, 2011 13:03 PM
 
evo34

Do you have aggregate data on this? Care to publish a study on it? "Crushed" means statistically significantly different than. Chone, for example, has shown itself to be superior to Marcel.

But according to you, they (the forecasts) are all the same in avg. quality, and there is no hope of outperforming consistently. Contradicting facts aside, even if that was the case, why would you (BP) build/re-vamp a very complex projection system, when you know the work will not result in any consistent outperformance over a simple past-years' stat average system? Just for kicks?

BP really can't have it both ways. The stance needs to be one of the following:

(1) We are committed to producing the most accurate baseball projection system possible. We have worked tirelessly on possible methods of attacking this problem and believe ours is the best. We plan to charge money for these projections (alone or as part of an editorial package), and will in return provide objective evidence of our progress -- both in backtested results and in real-time (end of season) performance measurement, vs. various other methods/systems.

(2) We have looked at the issue of how best to project player stats and have concluded that there is *no* benefit to doing anything more than averaging the players' past stats (Marcel). We believe that Marcel is and will always be about as good as the very best projection system, so we have formally given up. We will no longer be charging for fantasy projections and will refer all of past customers to a calculator.

Obviously, stance #1 is preferable, but #2 would be fine as well. But trying to have both at once is....a problem.

Feb 07, 2011 13:34 PM
rating: -3
 
BP staff member Mike Fast
BP staff

I mentioned a couple links to studies already, below. What I am saying is nothing new. You don't seem to be completely grasping what I am saying, though. I am not saying that Marcel is identical to PECOTA in every way, nor did Colin in his article.

If all you want is a rate state projection for every player who was a regular in the major leagues last year, you will do just fine with Marcel. Don't spend your money to subscribe to BP if all you want is a PECOTA rate-stat projection for that set of established major-league players.

PECOTA offers much more than that, plus hopefully it is more accurate than Marcel even on rate stats for established players. But it's not going to be outlandishly better than Marcel on that. Nate Silver said that, Colin said that before today. I'm not making some horrible reveal of the awful truth that BP has been trying to conceal for years.

What you say on #1 is the case. BP is committed to producing the most accurate baseball projection system possible, and that includes the depth charts, Player Forecast Manager, percentiles, multi-year projections, projections of rookies and minor league players, etc., that go along with that and are not part of Marcel.

Feb 07, 2011 14:02 PM
 
BP staff member Mike Fast
BP staff

A couple of articles on projection system comparisons:
http://www.baseballprospectus.com/article.php?articleid=12102
http://www.baseballprospectus.com/unfiltered/?p=564

Feb 07, 2011 13:06 PM
 
George

Your WARP score is saying that the best player in baseball will be Albert Pujols. The next 13 are pitchers (including injury combacks Peavy, Santana and Strasburg)? I don't buy it.

Feb 07, 2011 10:51 AM
rating: 1
 
jdtk99

Colin, what's the new replacement level, ie How many wins for a team with 0 WARP? Thanks.

Feb 07, 2011 11:33 AM
rating: 2
 
SC

Echoing this request, what is the MLB league batting average, ERA and WHIP assumed to be?

Feb 07, 2011 13:49 PM
rating: 0
 
evo34

I don't get this (below). Are you saying you are currently using some algorithm that is different from the one you will use to make final forecasts? That would make no sense.


"This is the first release of PECOTA, and as such will continue to undergo revisions through the remainder of the offseason. The program we use to generate the PECOTAs is continually evolving, and when we discover new ways to improve the forecasts, we'll make those changes and pass the updated forecasts on to you. We’ll also be updating periodically to keep up with players who switch teams."

Feb 07, 2011 12:19 PM
rating: 0
 
Juris

If past is prologue, don't expect major fundamental improvements in the algorithms between now and the start of the season. But as Colin has already indicated, as the season approaches and lineups are set, the park adjustments are instituted, and anomalies in the data are discovered and corrected, the later PECOTAs, including especially those used in the Depth Charts and the PFM, will reflect the latest information. (That's the way it has been done for years.)

Feb 07, 2011 12:42 PM
rating: 1
 
Brian Oakchunas

I thought Ichiro was meant to have a new and improved forecast for this year. Looks more pessimistic than ever.

Feb 07, 2011 12:48 PM
rating: 2
 
brooklyn55

Small thing, but helps in readability - could you format to drop the leading zeros. ie .345 instead of 0.345
Thnx

Feb 07, 2011 13:04 PM
rating: 0
 
SC

This is something you can control in Microsoft Excel, and something they cannot.

Feb 07, 2011 13:47 PM
rating: 3
 
Tony

* select columns which contain the leading 0
* right click and select Format Cells
* under Category, select Custom
* in the Type box, erase "General" and type ".000" (NO quotes)
* click OK

Feb 07, 2011 15:17 PM
rating: 3
 
brooklyn55
Other readers have rated this comment below the viewing threshold. Click here to view anyway.

Uh, isn't it an Excel SS we are getting? Thus, easier for all to have it formatted at source ...

Feb 07, 2011 15:21 PM
rating: -6
 
patwood0

I know a lot of studies have been done to evaluate PECOTA and other projection systems when it comes to OPS, but is there any information for the reliability of 5x5 fantasy stat projections? Is a SB projection more likely to be accurate than a batting average projection? How about WHIP vs. RBIs? I guess what I'm really looking for is the average historical delta for each stat category projection among players likely to be drafted in fantasy leagues.

Feb 07, 2011 13:12 PM
rating: 1
 
Brian Oakchunas

Really great point. Batting average gets lost in an OBP and stolen bases aren't even present. I would like to see comparisons in each stat--not just one.

Feb 07, 2011 15:04 PM
rating: -1
 
TheTimmer

I've just bought a Fantasy subscription for the year, but it won't let me see the Pecota data... is it only for Premium subscribers? I thought I had access last year as a fantasy guy???

Feb 07, 2011 13:17 PM
rating: 0
 
TheTimmer

Scratch that, it's working now...

Feb 07, 2011 13:20 PM
rating: 0
 
deacon14

Two questions on weighted mean. One, are these ballpark adjusted? In other words, would Dave Bush look different if he was on Texas instead of Milwaukee as shown. Two, is the best way to look at these until we have depth charts is that the rate stats won't change based on playing time assumptions?

Feb 07, 2011 13:27 PM
rating: 0
 
wyliecoyote
(235)

Including error totals (not just FRAA or whatever) would be immensely helpful for Scoresheet, and related formats.

Feb 07, 2011 13:36 PM
rating: 0
 
WilliamWilde

The comps are one of my favorite parts of this data set. Seems that Frank Robinson is in a ton of comps.? (Braun, Posada, Hanley, Mike Stanton,
Anybody have an idea why this could be

Feb 07, 2011 13:46 PM
rating: 1
 
deacon14

Are comps based on similar player at that age? In other words, could Stephen Strasburg comp to Dwight Gooden (hot young stud) but so could a 35 year old expected to post a 6-5 record with a 4.71 era?

Feb 07, 2011 13:53 PM
rating: 1
 
hessshaun

Great question. It would be really cool if it compared to the exact season. Like Jamie Moyer, '89. Not a necessity, but it would be cool to get lost in baseball seasons. I know I would go from comp to comp to comp to comp with no real purpose or care really. Just interesting reading.

Feb 07, 2011 15:31 PM
rating: 0
 
greenr

Yes, similar player at that age.

Feb 08, 2011 05:22 AM
rating: 0
 
deacon14

Thank you.

Feb 08, 2011 08:46 AM
rating: -1
 
WilliamWilde

More Frank Robinson comps:

-Ethier
-Jay Bruce
-Luke Scott.!?
-Carlos Pena

Feb 07, 2011 13:49 PM
rating: 1
 
deacon14

Do pitcher wins just look at the pitchers recent number of wins or does it consider that players stats with an average offense? With their actual offense? In other words, if Greinke puts up the numbers from the last couple of years, will his wins be higher in these projections (support from Braun and company) or use an average of his Royals days?

Feb 07, 2011 13:55 PM
rating: 1
 
CRP13

Trying to post this without sounding critical...because I admire what you are able to accomplish with PECOTA.

But...for the 3rd year running, these projections seem absurdly low and inconsistent. Some of them defy explanation.

Example:
Evan Longoria: 82/25/84 263/348/474 (2011 PECOTA)
Evan Longoria: 88/27/101 283/361/507 (3-Year Average)
Mike Stanton: 76/32/87 0.247/0.328/0.496 (2011 PECOTA)

How can a guy with a 3-year track record of superstar stats be projected that low, while a sophomore with 1/2 of a MLB season and minor league translations be projected so well? I have no argument with Stanton's projection, but Longoria's looks ridiculous in comparison.

A few more that jumped out at me:

85/23/79 for Zimmerman?
72/23/77 for Hamilton?
76/18/62 for K Morales?
62/17/58 for Weeks?

I understand that PECOTA isn't a "prediction" system, it's a "projection" system, based on the probable result of a mathematical simulation.

But this year, it just doesn't pass the smell test at all!

I don't know the math, but is it possible PECOTA is forcing a regression to the mean TOO much?

Feb 07, 2011 14:00 PM
rating: 5
 
Joe D.

Hamilton's projection is in 515 PAs, which partially explains why they look a little low. Also seems pretty fair since he's averaged under 500 PAs the last four seasons, and is heading into his thirties. (Also of note: Hamilton's BABIP of .390(!!) in 2010...)
I'm sure PECOTA's high-end projections for Hamilton are quite good, but the weighted means are gently reminding us that Hamilton has missed nearly 30% of his team's games over the last four seasons.

Kinda similar issue with Weeks. PECOTA is perhaps a bit generous with Weeks, really: If Weeks does indeed notch 17 dingers and 58 RBIs, each would represent the second best total of his entire career, behind only last season. From 2005-2010, Weeks played in 53% of his team's games.

Feb 07, 2011 14:52 PM
rating: 2
 
Joe D.

* Division failure on my part. *

"From 2005-2010, Weeks played in 53% of his team's games."

Wrong. Should read:

"From 2005-2010, Weeks played in 65% of his team's games."

Feb 07, 2011 14:56 PM
rating: 0
 
CRP13

I understand your argument, I just don't agree. His career stats indicate that given approximately 500 AB (not PA), he is a 30 home-run hitter. This also ignores the fact that he was good for 156 games just 2 seasons ago, so the injury angle is overplayed.

If he's averaged 30/HR per 500 AB/PA over his career in the majors, why should his MEAN be 23? Shouldn't his mean be closer to 30, with a 90th percentile that reflects his talent level over 600/700 AB? 40? 45?

Feb 07, 2011 15:30 PM
rating: 0
 
BP staff member Colin Wyers
BP staff

Well, they really don't, actually. Hamilton's career HR/AB prorates out to 26 HR per 500 AB, or 24 prorated out over the 464 at bats listed in the PECOTA spreadsheet.

If Hamilton can stay on the field more than that (and it's not clear that he can), then he's probably going to top 23 HR. But given his age, and a little regression to the mean, I don't see what's surprising about projecting a guy to be just one home run under his career rate.

Feb 07, 2011 15:45 PM
 
CRP13

What I'd really like to see, and have no right to demand, is projections ONLY of major league stats, with projected playing time, etc. I know that's coming on the depth charts. It just looks like if I added all of those counting stats (MLB proj. only), they wouldn't even come close to the totals achieved by MLB overall last season.

You and Joe are focusing on my speculation about Hamilton, but seem to be missing my larger point.

Why do ALL of the numbers look so low across the board?

Again, I really do appreciate the work that goes into this. I'm just trying to understand so that I don't unfairly criticize.

Feb 08, 2011 06:33 AM
rating: 1
 
jberkon

A very interesting article - and one that would be really helpful to address a lot of these comments - would be one dealing with the 10 or so players on which PECOTA and other projection systems (pick one: MARCEL, CAIRO, OLIVER, ZIPS, etc.) disagree. Perhaps the 10 on which PECOTA is more bullish and the 10 on which PECOTA is more bearish. And then try to explain why PECOTA differs. Basically, we want to understand why PECOTA believes that Longoria will experience a reasonably large drop-off at the age of 25 (one that is NOT being projected by the other systems), and why, say, Dan Johnson is expected to have such a good year. Is there some particular stat or physical attribute to which PECOTA attributes more weight than the other systems?

Feb 08, 2011 07:00 AM
rating: 1
 
BP staff member Mike Fast
BP staff

The overall set of projected numbers match up pretty closely with 2010 levels of offense. That is down from previous seasons, of course, so that may be why the numbers for hitters look low across the board.

In my experience, people also tend to perceive regression to the mean, while statistically correct, as making the numbers look too low for everyone.

Feb 08, 2011 07:32 AM
 
tbwhite
(361)

Regression to which mean ? The MLB mean or the player's 3 year mean ?

Longoria's 3 year mean slash stats are: .283/.361/.521

PECOTA's weighted mean for Longoria is .263/.348/.474



Feb 08, 2011 08:22 AM
rating: 1
 
BP staff member Mike Fast
BP staff

By regression to the mean, I was referring to regression toward the MLB mean. Everyone (or almost) is going to be projected to do worse than their career-best year. The human tendency is to assume that a career-best year defines a talent level, and we want to see PECOTA project them to repeat that. However, that's not the most likely outcome. Most likely, if a player did really well last year, he got a little lucky, and, conversely, if he did really poorly last year, he got a little unlucky.

You can see my specific comments about Longoria a few comments down. (Looks wrong to me, too.)

It's great to have the high level of interest in PECOTA that we do, but one of the bad side effects of 200 comments on the thread is that it's easy for ideas to get lost or crossed in cyberspace here.

Feb 08, 2011 08:30 AM
 
tbwhite
(361)

I understand the concept, or at least I think I do, but it seems to me that PECOTA should be focused on regression to the player's mean not the MLB mean. If a player has an established level of say a .280 TAv and suddenly posts a .310 one season, then I get it that he won't likely post another .310, he'll probably revert back to his previous level of performance. But if a guy cranks out .280 TAv seasons like clockwork, there is no reason to believe he is going to tend to revert to a .250 TAv simply because that is the MLB average. It seems to me that regressing to the MLB mean implies that no player is really better than another, and that good seasons are merely caused by luck. That's a premise which is obviously false.

I apologize if I'm putting words in your mouth, or have twisted things somehow, I'm just trying to get a better handle on how PECOTA works. It would be really helpful if someone could write an article that would describe in a little greater detail how PECOTA works. I know it's a fine line between explaining what's in the black box and giving it away, but I think there is a lot of confusion about just what PECOTA does.

Feb 08, 2011 08:53 AM
rating: 3
 
BP staff member Mike Fast
BP staff

I agree there's a lot of confusion about what PECOTA does. Quite a bit of it has been explained at one time or another in one place or another, some of it online, some of it in print. It might be helpful to see if some of that could be centralized or indexed in one place.

Part of the problem is that Colin's really the only one here who fully knows how PECOTA works. Nate and Clay would obviously be able to contribute on that front, but they're not around as much nowadays. Colin can't work on everything everyone thinks might be wrong with PECOTA or explain everything that everyone wants explained about PECOTA. Some high priority stuff he can and will, but other things will take longer.

So it falls to some other folks, like me, who know less about all the intricate details of PECOTA, to attend to some of the questions.

With that caveat, let me say that regression to the MLB mean does not imply that all players have the same talent level. If that were true, we'd just predict everyone to do league average next year. It's regression to the mean, not collapsing to the mean.

Tango's the king of explaining and pushing the implementation of the regression to the mean concept. The way he likes to explain it is that the longer the playing record we have for a given player, the less regression toward the mean that we need to apply. The way that Tango does regression to the mean for Marcel is to add in a half-season worth of league-average stats into the player's last three years of major-league stats.

As to the specific amounts of regression to the mean that PECOTA uses for its baseline forecast, I don't know. It's probably not terribly different than Marcel in the amount of regression for established major-league players, but for minor-league players, PECOTA has the advantage of using translated minor-league stats.

Feb 08, 2011 09:18 AM
 
Juris

Every player is subject to a regression effect, even the best and the worst. Keep in mind that regression toward the mean works in both directions -- "unluckily bad" performance is usually followed by improved performance the following year; "luckily good" performance is usually followed by worse performance the following year.

Take a look at this example from the BP archive: http://www.baseballprospectus.com/article.php?articleid=1897

Feb 08, 2011 09:41 AM
rating: 0
 
tbwhite
(361)

I think that study is flawed. Not all guys who hit between .300 and .310 are the same. Some of them are really .300 hitters who are performing as expected, some are .250 hitters having a "career year". Perhaps a very few, are actually superstars having an "off year". Odds are the first group doesn't decline much the next year, and the last group might actually improve a bit, but the middle group is likely to show a huge drop off as they revert to their normal level of performance.

Of course that middle group is going to dominate when you measure the delta in batting average in year n+1. After all, the 1st group will likely have a small delta, and the last group is probably too small to impact the average too much. So, you end up with a watered down measure of players coming back to Earth after career years. To generalize their decline to all batters in that .300 to .310 group seems like a mistake.

What I would like to see is the same analysis, but instead of grouping players by batting average in year n, group them by the difference of their year n batting average vs their career batting average(obviously limit to players with a meaningful sample size). Then look to see the average difference to career average in year n+1 for those groups. I think you would see that the group with the smallest differential in year n would also tend to have a smaller differential in year n+1 and that the distribution of outperformance vs underperformance would be close to random.

Feb 08, 2011 14:15 PM
rating: 0
 
tbwhite
(361)

I understand that it is regression to the mean, and not instantaneous reversion to the mean, but I do believe the implication is that all MLB players have the same talent level.

If Votto has a 3 year mean TAv of .325, but posted a .350 TAv in 2010, why should he revert towards the MLB mean TAv for 1B of .280(I'm just making up a number) in 2011 instead of his own mean TAv of .325 ? Regressing back towards the MLB mean implies that even his own .325 number over the past 3 seasons doesn't reflect his true level of ability, that the .325 is nothing more than an outlier and isn't repeatable. That is non-sense.

Sometimes an absurd example helps illustrate a point. If someone stuck me at SS for the next 3 years for the Astros I might put up a line like .083/.100/.110. If someone wanted to project what my 4th season would look like they would be a damn fool if they projected me to improve just because I should regress to the ML mean and because average ML SS's hit .240/.290/.350 .

Theoretically, I just don't see the case for regression to the MLB mean(especially for established MLB players). Tango may use it, but it sounds more like it is a short-cut to a result rather than based on a strong theoretical foundation. I base that comment on other comments that Marcel is a very simple system, I'm not familiar with it myself, but if it is in fact a simple projection method(and I don't mean that perjoratively) then regressing to the MLB means seems like a simple means(no pun intended) to an end.

Feb 08, 2011 10:00 AM
rating: 0
 
CRP13

Side note: If you could post those numbers as an Astros SS for the next 3 years, you're already an improvement on whatever they've had.

Feb 08, 2011 10:06 AM
rating: 1
 
BP staff member Mike Fast
BP staff

The link that Juris posted above your comment is an excellent one to begin learning about what regression to the mean actually is.

Feb 08, 2011 10:19 AM
 
tbwhite
(361)

I think my comment still stands. The .350 hitters are by and large hall of famers or near hall of famers. You would get a much better prediction of their batting average in year n+1 by using their career batting averages(most of which are probably around .300) than a league average batting average of .275. If in making forecasts you assume regression to the league mean rather than the player mean, you will overstate the likely decline.

Feb 08, 2011 10:54 AM
rating: -1
 
TangoTiger

My response is on my blog.

Feb 08, 2011 11:49 AM
rating: 2
 
CRP13

This is along the lines I was thinking too. My other general questions include:
--Does PECOTA take into account why a player missed time? If guy misses for a 50-game PED or a freak home-run celebration leg fracture, that's not a "time missed" that will affect his projected production in real life, yet I suspect PECOTA looks at it as if all missed time were created equal. Thus, Morales' ridiculously low projection. Would that there was an injury database to quantify injury types and cross-reference them to PECOTA algorithms.
--Is PECOTA giving TOO much weight to missed time? "Health is a skill" is a cliché, but injuries aren't predictable. Saying, "Hamilton/Weeks/Kinsler missed time for various (different) injuries over the past 3 seasons means the most likely outcome is that he will miss time for another injury in 2011" just doesn't make logical sense. If they had a weak joint and repeatedly suffer the same injury over and over, then fine. But that isn't too common a situation overall, and I don't think it's a very strong thing to base a projection on.
--Is PECOTA regressing TOO far to the mean? Especially applies if you are using an MLB mean as tbwhite suggests, but also true if using a player mean. Longoria (example) historically is far better than his projection. Longoria is also entering his 'statistical prime' and has no history of significant injury. Nobody expects him to post career highs every year, but something is fishy about a projection that sees him only as an above-average 3B.
--What's the deal with airline food and goofy WARP scores this year? I imagine this will be answered, so I'll leave it alone.

Feb 08, 2011 09:24 AM
rating: 0
 
Brian Cartwright

You are confused about what regression is. Including more than one season, with the most recent ones weighted more, is the method of looking at the relevant portions of a player's career. Regression is when we lack information about a player (even when he has played a lot, Colin has done articles on other sites about this). For example, if you have a 20 year old shortstop in AA, you might regress his stats to what all other 20 year old shortstops in AA have done. Or if he's a fat first baseman who hits a lot of homers. How does that group age?

Feb 08, 2011 15:29 PM
rating: 0
 
CRP13

I understand that part, but if players with extensive stats aren't being regressed somehow, and it's only the ones with little information to go on, how does one explain odd projections like Longoria, Zimmerman, Morales, etc?

(I can't buy the fact that Morales' projection is due to his broken leg, because it doesn't make sense that that injury will impact future performance so much)

Somehow, elite players are having their stats grabbed by the neck, throttled, then drug back to the pack. Similiarly, some crummy players are being given the ol' leg up to reach higher heights than they could with no assistance.

I have no idea WHY this is happening...I'll leave that to smart guys like you and Colin. But my engineering brain would like it explained or corrected for the sake of sanity.

Feb 09, 2011 06:27 AM
rating: 0
 
Brian Cartwright

Colin will have to answer any questions about specific players, or how Pecota works, as it his (adopted) baby, but for regression in general, you take the players historical data and add a fixed amount of average performance of the group he's determined to be a member of.

Some players will have a sample of 1500+ weighted PAs. Adding 200 regression PAs means the regression is 11% of the total. For another player who had 200 PAs in his only season so far, 200 regression PAs is 50% of his total.

Feb 09, 2011 21:10 PM
rating: 0
 
Joe D.

"This also ignores the fact that he was good for 156 games just 2 seasons ago, so the injury angle is overplayed."

I strongly disagree, and think you are proving precisely why PECOTA should hedge on Hamilton.

Hamilton is going into his age-30 season, and you had to stretch back to his age-27 season to find the one time in his four-year career in which he was able to avoid the disabled list and put something close to a full season of work in.

Not putting in full seasons might not be as big a deal if it was simply a matter of minor injuries forcing him out 20-25 games a year. We're instead talking about missing 70+ games in each of two separate seasons. And in one of those (2009), he stunk(.741 OPS).

We should be skeptical of any projection system that *doesn't* ding Hamilton's playing time, and thus his HR/RBI/run totals.

Feb 07, 2011 20:49 PM
rating: 2
 
Brian Oakchunas

The Longoria projection is maybe a little low but not radically different from what he's done. Stanton had a lot more homers than PECOTA is projecting him for and a higher translated batting average so it's not spectacularly optimistic about him either.

Feb 07, 2011 15:19 PM
rating: 0
 
jberkon

The Longoria projection seems very odd. As CRP13 points out, his projection is well-off the three year average (in terms of the rate stats). PECOTA has as his comps Miguel Cabrera, David Wright, and Mark Teixera, all of whom progressed nicely as young players. And PECOTA gives Longoria a 55% chance of improving this year. So why are the numbers so far off?

Colin, Ken, Mike, etc., any explanation on this one?

Feb 07, 2011 19:37 PM
rating: 3
 
BP staff member Mike Fast
BP staff

For Longoria, specifically, I agree that his numbers look low. His projected batting average seems about 20 points too low to me. "Seems to low to me" is not necessarily the same as "PECOTA made a mistake here". We'll look into it and see if there's anything wrong.

Feb 08, 2011 07:33 AM
 
jberkon

Thanks so much, Mike, for looking into this, to see if there is larger issue at play. I can only begin to imagine the difficulty of putting together the PECOTAs and very much appreciate yours (and everyone else's) responsiveness.

Feb 08, 2011 08:36 AM
rating: 0
 
T. Kiefer

PECOTA knows that in 2011 Evan won't find his cap.

Feb 08, 2011 10:10 AM
rating: 1
 
belowm

More misplaced players, this time on the Nats: Adam Kennedy (SEA), Willie Harris (NYM), Justin Maxwell (NYY). The more I look at this spreadsheet, the more I find. Scott Hairston is with NYM, not SDP, and Jerry Hairston signed with WAS. Jody Gerut is with SEA, not SDP. Pedro Feliz is with KCR, not STL. Felipe Lopez is with TBR, not BOS. Etc., etc.

Feb 07, 2011 14:05 PM
rating: 0
 
wizstan
Other readers have rated this comment below the viewing threshold. Click here to view anyway.

Andres Torres projection is epic fail...

using anything pre 2009 in projecting Torres is a failure.

Feb 07, 2011 14:07 PM
rating: -12
 
LynchMob

What is it about Torres that makes you think the current algorithm does not apply to him? Is it something you think could/should be incorporated into the algorithm?

Feb 07, 2011 14:15 PM
rating: 1
 
wizstan

I think that a large break in a career should trigger a "reboot" of data. After several years of a different shape of performance you should stop trying to fit two dissimilar career patterns together.

Feb 07, 2011 14:23 PM
rating: -2
 
CRP13

I don't know about a reboot, but I think giving the most recent season/s the most weight makes sense.

Feb 07, 2011 14:32 PM
rating: 0
 
smocon

I wouldnt be shocked at all to see him drop from his 6 WAR number or whatever it was to below 2. That was a HUGE overperformance.

Same goes for Huff, although not to the same extent.

And I am somewhat of a Giants fan.

Feb 07, 2011 14:19 PM
rating: 0
 
wizstan

overperformance relative to what?

Watching him play, his speed is real, his defense is real, and everything he hits is hit hard, from the time he became an everyday starter until the his appendix began bothering him he was a .290/.370/.520 player. And that looked absolutely legit.

Feb 07, 2011 14:30 PM
rating: -1
 
BP staff member Colin Wyers
BP staff

Well, it doesn't seem to actually look that way. If you look at players who radically over perform their career stats, you actually don't gain any forecasting accuracy (in fact the opposite) when you throw out their older numbers. More recent numbers are of course more important in forecasts than older numbers, but not to the extent that you're implying here.

Now, is it possible that he overperforms his forecast? Sure. But history gives us a lot of examples of players who out of nowhere put up a fantastic season and then went on to be something less than fantastic. Of course, there are some that did that and continued being fantastic. But there are fewer of them. Torres could do either, but the odds of him being something less than just his 2010 season implies are greater.

Feb 07, 2011 15:02 PM
 
wizstan

Well if by overperfom their career stats you mean overperform what he did in the minors in 2006, well I think it is ludicrous to regress to that performance. I think dropping all weight given to minor league stats more than three years previous would significantly, and rationally, improve PECOTA, likewise dropping all major league numbers if there is a break of more than three years between MLB appearances would remove some dubious regressions.

Sure Torres is likely to regress, but not towards his slap-hitting persona of five years ago.

Try rerunning PECOTA on him but treating him as if his career started in 2007.

Feb 07, 2011 15:36 PM
rating: -2
 
sho044

If the year was 2007 instead of 2011, I think this same comment could have been found on a Gary Matthews Jr post... that worked out well didnt it.

Feb 07, 2011 15:46 PM
rating: 5
 
wizstan

Well, except for the fact that Matthews had a long CONTINUOUS history of one level of performance, and then one anomalous season.

Torres has had 4 years of hitting like this in the majors or minors, and at no time in the last 4 years has he NOT hit.

There is pretty much nothing in common here.

Feb 07, 2011 17:10 PM
rating: -2
 
choms57

I have a premium subscription yet every time I try to download the Pecota spreadsheet it asks me for a username and password, even though I'm logged in. I put my said username and password in and it does not work. Can someone help me??

Feb 07, 2011 14:08 PM
rating: 0
 
choms57

This sucks, I've been a customer for two years and now I can't see the main thing I look forward to!

Feb 07, 2011 14:15 PM
rating: -2
 
BP staff member Ken Funck
BP staff

If you're having trouble accessing the file and your subscription is up-to-date, please send details to Customer Service, and they will work on fixing this for you:

http://www.baseballprospectus.com/contact.php

Feb 07, 2011 14:35 PM
 
BarryR

I know I'm cherry-picking here, but since Jhoulys Chacin is my cherry in a couple of leagues, I just have to pick him.

Chacin 4.93/1.51/ 6.7 K/9 -- really? In approximately the same number of IP, PECOTA has Radhames Liz at 4.78/1.48/7.5
Surely there is a mistake somewhere here. Jhoulys is also worse than Boof Bonser and Philip Humber. To put it kindly, this seems a little off to me.
Now other than Ubaldo at 4.02/1.39, and two relievers (Street and Betancourt), all Rockies pitchers are projected to be some variation on horrible. Is PECOTA predicting a broken humidor?

Feb 07, 2011 14:22 PM
rating: 5
 
Ameer

I noticed the same thing. Weird that it would project Chacin that negatively.

Feb 07, 2011 14:59 PM
rating: 1
 
mibush

Any idea when your "Team Tracker" will be updated with these projections?

Feb 07, 2011 16:14 PM
rating: 0
 
TaylorSanders

Ryan Braun 2nd highest hitter projection.

Feb 07, 2011 16:29 PM
rating: 0
 
matteson72

Here are your top 10 NL pitchers with the highest breakout rates:

1. Luke Gregerson (31%)
2. Johnny Cueto (29%)
3. Jason Marquis (29%)
4. Chad Gaudin (28%)
5. Huston Street (26%)
6. Pedro Feliciano (26%)
7. Kelvim Escobar (26%)
8. Mike Leake (26%)
9. Hong-Chih Kuo (25%)
10. Edinson Volquez (25%)

I'll go curl up in the corner until the next spreadsheet is released.

Feb 07, 2011 16:32 PM
rating: 1
 
Tony

Cinci's rotation looks pretty good for the future, no?

Feb 07, 2011 17:12 PM
rating: -1
 
hessshaun

Depends on whether or not you trust their company above. I would say that the Reds are about the only members who could potentially be on this list if it was accurate.

Feb 07, 2011 18:04 PM
rating: -1
 
doorbot

Ditto! Christmas is ruined!!!

Feb 07, 2011 17:54 PM
rating: 0
 
igjarjuk

"Breakout Rate is the percent chance that a hitter's EqR/27 or a pitcher's EqERA will improve by at least 20% relative to the weighted average of his EqR/27 in his three previous seasons of performance. High breakout rates are indicative of upside risk."

So what's the problem?

Feb 07, 2011 18:18 PM
rating: 5
 
greenr

http://www.baseballprospectus.com/glossary/index.php?mode=viewstat&stat=182

Feb 08, 2011 05:44 AM
rating: 0
 
fairacres

Am i missing something or are the individual players' batting averages off a bit? Pujols is forecast at 175 hits in 558 at bats -- .3136 on my calculator. PECOTA has him at "0.312". I spot checked a couple other players and found similar "errors."

Also, maybe instead of "Singles" you could include a "total hits" column?

Thanks -- great job, but I have to agree with other posts ---- overall, it seems as if each year, PECOTA's 50% numbers are a little conservative for hitters. . . taking some other examples, Adrian Gonzalez' line looks pretty similar to his average the past 4-5 years; wouldn't he be expected to do a bit better with 81 games in Fenway v. PETCO, and hitting in a better overall lineup? Same could be said for Adam Dunn moving to the Cell. . . .

Feb 07, 2011 18:11 PM
rating: 1
 
SC

IIRC this is a function of more significant digits deep in PECOTA and rounding. IE Pujols is forecast 174.500 (175) hits per 558.499 (558) AB. The resulting BA is .31244.

Feb 07, 2011 18:20 PM
rating: 2
 
fairacres

I can guarantee you Pujols will NOT have 174.5 hits in 2011 . . . . .

Feb 07, 2011 18:42 PM
rating: 0
 
jrmayne

This is a return to Nate's methodology, which I believe is better than a non-rounding method.

Feb 07, 2011 21:08 PM
rating: 0
 
stlpdx

This feels dangerously similar to the six weeks of 2010 pre-season PECOTA discussions. (Spoiler: it didn't end well).

Feb 07, 2011 19:37 PM
rating: -1
 
stlpdx

2 weeks and many comments later, nothing has changed my mind.

Feb 21, 2011 20:48 PM
rating: 0
 
stlpdx

And yes I'm the guy who posts on his own Facebook status

Feb 21, 2011 20:49 PM
rating: 0
 
Jivas
(649)

A couple of thoughts:

(1) Unless I've screwed up, the average "Improve" number is 23%. I believe this should be 50% by construction.

(2) Based on a super-quick review, a couple Rockies hitters (Fowler, Iannetta) have what appear to be optimistic projections, and as noted about the Chacin projection is pessimistic. I recommend double-checking the Coors Field park factor.

(3) I'm hoping future version of the spreadsheet have the SS/Sim metric. This is/was very helpful for Scoresheet players.

Feb 07, 2011 19:48 PM
rating: 0
 
MattBey

(1) Not really. Even if "improve" meant that they performed better than they did the year before (it doesn't), we wouldn't expect this to be 50 percent by construction. For one reason, look at all the minor leaguer's you're considering. A lot of filler players are going to move up a level and they're going to get weeded out. That alone isn't enough to lower it to something less than 25%, but it's a factor, you have to think.

Feb 07, 2011 20:09 PM
rating: 0
 
jrmayne

There are some pretty severe problems with the comps for minor league players.

Kila Ka'aihue: Nick Johnson, Joey Votto, Adrian Gonzalez.

Dan Johnson: Jason Giambi, Nick Johnson, Paul Konerko

Chris Carter: Chris Davis, Joey Votto, Evan Longoria.

(Note that Billy Butler's comps are Conor Jackson, Dan Johnson and Gaby Sanchez.)

So far, a problem. But then:

Rich Poythress: Adrian Gonzalez, Prince Fielder, Kent Hrbek

Seriously. I am not making that up. Next?

Clint Robinson: Adrian Gonzalez, Joey Votto, Ryan Garko.

Wil Myers is an awfully good prospect, but .256/341/410 next year?

Robinson Chirinos (comps Iannetta, Carlos Ruiz, and Todd Helton) projects at 275/360/469.

John Bowker? 268/342/457. Jason Dubois has a nice projection, and I would've guessed he was somewhere selling insurance rather than hitting pretty well in Iowa.

Gerald Sands' comps are Prince Fielder, Willie McCovey and Mark McGwire.

Jaff Decker has Willie Mays in his comps.

I think there's a systemic problem with the minor league comps. Matt Carpenter's top comp is Chipper Jones. Trent Oeltjen's comps are Carlos Beltran, Vada Pinson, and Roberto Clemente. Thomas Field's top comp is Tim Raines.

If you want to tell me these are right, it's going to take a lot of convincing. Jedd Gyorko's top two comps ought not be Eric Chavez and Steve Garvey. There are many more crazy comps out there.

Now, I know there are a lot of comps in the system, but it looks like the system has a severe bias; the system is looking for people who are one hell of a lot better than the player.

--JRM





Feb 07, 2011 21:39 PM
rating: 3
 
yadenr

I noticed as well. One of the first things I do is search for current players with my favorite past players as comps. I was fairly surprised to find that Eric Davis only shows up for Dennis Raben, who also has Willie McCovey(!) and Brandon Wood. Time to get that Raben jersey.

Feb 07, 2011 22:05 PM
rating: 0
 
tbwhite
(361)

This brings up an interesting question, I hope BP can address it, although I understand if they can't because it would reveal too much IP.

Just what do these top 3 comps means ? My understanding is that PECOTA is based off of the last 3 seasons. But when it looks for comps does it consider age ? I assume having 28yo Mike Schmidt as a comp would be better than having a 38yo Mike Schmidt as a comp. It would be helpful to know just what specific year is a being cited as a comp. Also, do the comps always tie back to the age of the player being projected ? Do you only look at 25 year olds when trying to find comps for a player who will be 25 in 2011 ? Just wondering is perhaps Jaff Decker is being compared to a 41yo Willie Mays. It's about the only way that comp would make any sense.

Feb 07, 2011 22:09 PM
rating: 1
 
tbwhite
(361)

Upon further review Decker has a ~.950 OPS thru age 20 in the minors. Mays had a 1.017 OPS in the minors thru age 20. Maybe that's what PECOTA is picking up. It does seem like perhaps batting is over-weighted compared to fielding for minor leaguers. I mean a guy who has to play LF in A ball is very different from the greatest CF of all-time.

Feb 07, 2011 22:17 PM
rating: 0
 
BarryR

Decker spent his entire age 20 season in A ball. Mays spent a month in AAA, hitting .477 (yes, .477), with a 1.323 OPS, turning 20 in May, then spent the rest of his age 20 season in the NL. This is hardly comparable.

Feb 07, 2011 22:46 PM
rating: 1
 
Juris

@tb: If memory serves me correctly, the comps are always based on the player's age-cohort, so, for example, a given player's age 28 season's comps are always other players in THEIR age 28 seasons (not their prior or later careers). My inference about this is based on my understanding of how the projections are made -- by looking at the performance of the matching age-cohort of players in the database of all player-seasons from 1950 onward (adjusted in various ways).

Feb 09, 2011 09:43 AM
rating: 1
 
Juris

I should add that the comps are thus the "most comparable" of the larger subset of players from the same age cohort (in the 1950-2010 player database) who are most closely matched on a set of criteria includes information not just on performance (stat lines) but also such characteristics as position, handedness, and physical type (height and weight).

Feb 09, 2011 09:52 AM
rating: 1
 
Juris

So the "match" is determined by age, physical characteristics, position PLUS the (adjusted) "baseline performance" from the immediately previous 3 seasons of the given player and other players in the database. That is my understanding of the core method. How is the proximity between a player and his comparables actually calculated? I don't know the formula but the basic method is one of "nearest neighbor analysis." (See "nearest neighbor search" and "nearest neighbor analysis" in Wikipedia. Nate Silver uses a similar approach in some of his election forecasting, to take advantage of information from "neighboring" (most similar) states to help make election forecasts of particular states.)

Feb 09, 2011 10:45 AM
rating: 1
 
BP staff member Mike Fast
BP staff

Yes, you have explained the basic idea well.

Feb 09, 2011 12:47 PM
 
igjarjuk

I'm not convinced by all this name dropping that there are "severe problems" with the comps. I think you should offer up evidence on PECOTA's terms: what information did PECOTA use to determine the comps? For example, with the Decker-Mays comp, I think you're calling foul because of your knowledge of the extraordinary career that Mays had, but I'm pretty sure that's not how PECOTA is designed to work: I don't think it's using Mays age 22+ seasons as the driver of the comparable match. Instead, Mays earliest years are used to make the match. In this case Mays later, very successful years are just a piece of a larger puzzle, one that in this case happens to contribute to a rosier outlook for Decker than if Mays were not a comp. This does not imply that a Mays like year is an assured, or even likely, outcome. See the Comparable Player and Comparable Year entries in the glossary for more details.

Feb 07, 2011 22:26 PM
rating: 3
 
tbwhite
(361)

Mays' great career is what puts a target on this particular comp, but a LF with an OPS of .950 in the Calif League is not very comparable to a CF with a 1.017 OPS in a month in AAA followed by an .830 OPS in the majors in ~500 PA and oh yeah the RoY award.

My guess would be that the lack of fielding data is hurting this comp. If you look at BaseballReference.com for Mays it just says he played OF his rookie year, no breakdown by where in the OF.

Also, to be fair the quality of Decker's comps is low(I think it was around 62), but I just find it hard to believe that a defensively challenged 20 yo OF who crushes high A ball doesn't have more closely comparable players than a HoF CF.

One more question, were the comps that are displayed cherry picked ? Because they seem like they are all ML players. I just can't believe that there isn't some obscure minor leaguer we never heard of who flamed out who is a better comp for Everett Williams than Mickey Mantle. And heck, Callison was pretty good too.

Feb 08, 2011 06:38 AM
rating: 2
 
Juris

Good comment. See my further explanation above, which is consistent with your explanation.

Feb 09, 2011 10:55 AM
rating: 0
 
tbwhite
(361)

Joey Votto is an interesting case. I would like someone at BP to please explain the following to me:

Votto has the highest "Improve" rate at 61%. Yet his forecast calls for just 29 HR's.

Digging deeper I see from Votto's player card that in 2010 he had 648 PA's , with 37 HRs, .324/.424/.600 for a .350 TAv.

For 2011, he is projected for 615 PA's just 33 less than 2010. But he loses 8 homers, 26 points of AVG, 36 points of OBP, and 71 points of SLG. His TAv is .317 compared to .350 in 2010, and he has the greatest chance of improvement in 2011 of ALL hitters ?

Something doesn't add up.

Feb 07, 2011 21:59 PM
rating: 0
 
jimnabby

A lot of the questions in this thread would be cleared up if everyone just ran over to the glossary for a few minutes.

"Improvement Rate is the percent chance that a hitter's EqR/27 or a pitcher's EqERA will improve *at all* relative the weighted average of his EqR/27 or EqERA in his three previous seasons of performance. A player who is expected to perform just the same as he has in the past will have an Improvement Rating of 50%."

It seems pretty likely that Votto will improve over the average of his last 3 seasons, doesn't it?

(Note that I'm not going to argue that this isn't a pretty non-intuitive way to define "improvement." Nor am I going to apologize for the triple negative in that last sentence.)

Feb 07, 2011 22:42 PM
rating: 11
 
RedsManRick

Exactly, Vottos' Improvement rate is so high because he went from solid to very good to MVP over the last 3 years. So even slipping back to the very good category leaves him higher than his 3 year average.

That said, it certainly is not intuitive.

Feb 08, 2011 08:59 AM
rating: 1
 
tbwhite
(361)

Look at Votto's player card. his average TAv for the past 3 years is .325. His weighted mean forecast for 2011 is .317, and he has the HIGHEST probability of improvement among all batters.

Feb 08, 2011 09:26 AM
rating: 0
 
Richard Bergstrom

Out of curiosity, how much did Matt Wieters' rookie season projection change using this version of PECOTA?

Feb 08, 2011 00:48 AM
rating: 9
 
cmac314

Of the 1012 batters listed, just 29 have an Improve % of 50% or greater.

Something tells me that when all is said and done, 97% of MLB position players will not have declined in 2011.

For comparison's sake, the January 2010 PECOTA listed 483 batters, and 90 of them had an Improve % of 50% or greater.

Also in the Jan 2010 edition, there were 292 batters with a "Breakout" % of greater than 10%. This year there are zero. I don't believe the definition has changed. The 2011 leading Breakout batter candidate: Wilkin Ramirez, with his .289 projected OBP.

Something's screwy here. Where's Nate Silver.

Feb 08, 2011 02:02 AM
rating: 1
 
choms57

Just want to personally thank Colin, Ken, and Rob for emailing me and helping me figure out my pecota spreadsheet issue. Bp is the hands down best site ever.

Feb 08, 2011 04:27 AM
rating: 5
 
CRP13

Best post in this thread. Very true.

Feb 08, 2011 09:29 AM
rating: 2
 
TADontAsk

I completely understand the aged-related comps, and that when a 20-year old prospect has a comp of Willie Mays, they're saying that he's most comparable to a 20-year old Mays up to that point of Mays' career. NOT that he's going to have a Mays type career.

What I DO question, is that there should be hundreds of comp possibilities. Thousands for the younger, minor league players. And with that consideration, it seems like the Mays, and the Kalines and the Younts are showing up at a higher frequency than we'd expect them to. Unless it just seems that way because we pause and pay extra attention when seeing "Willie Mays" on someone's comp list.

Feb 08, 2011 06:53 AM
rating: 1
 
tbwhite
(361)

There are 16 out of 1012 batters with Yount as a comp, 7 with Willie Mays.

Also, I disagree a bit. While the model isn't saying that Decker is going to definitively have a Mays type career, it does use it as a possible outcome. It does affect the results. That's why I worry about current minor leaguers being compared to old hall of famers. The current data for Jaff Decker doesn't exist for Mays when he was 20. We don't know how many games Mays played in CF at age 20. Also is the height and weight data available season by season. How PECOTA deals or doesn't deal with that type if missing data can have a huge impact on the results. If it isn't penalizing Decker for playing LF in High A ball at age 20, when Mays was playing CF(presumably) in the majors at age 20, then the results aren't going to be worth much.

Feb 08, 2011 07:19 AM
rating: -1
 
BarryR

It doesn't matter what position Mays was playing in 1951. Mays, three months younger than Decker, was playing in the major leagues at age 20 after hitting .477 with a 1.323 OPS in AAA, while Decker was putting up a .950 OPS in the California League. These are not comparable hitters.
It would seem to me that the comparables should start with players who did the same thing at the same level, not with players who were regulars in the major leagues.

Feb 08, 2011 10:44 AM
rating: 1
 
jrmayne

That's not it. I've been looking at these comp lists for years with some care, and I'd bet Toyotas to Tonkas there are more HOF or active tremendous players in the comps. Four Mantle comps:

Everett Williams

Rymer Liriano

Randall Grichuk

Oswaldo Arcia

Mantle's age-20 season: 311/394/530, OPS+ of 161. In the big leagues.

Mickey Mantle should not be one of the top five hundred comps for these guys. Before, if you had Mantle or Kaline or Yount in your comps, you'd have an actual chance - if slight - of being similar to Mantle or Kaline or Yount. Rymer Liriano isn't going to be Mickey Mantle unless he gets bitten by a radioactive spider or something similar.

Dave Winfield is not a good comp for Calvin Anderson. Period. There is a problem, and it's a serious one, and while reboots are hard (especially in this case), someone should have caught this before publication. It took me twenty minutes to find the pattern and twenty more to confirm.

--JRM, welcoming our new Roberto Clemente, currently known as Jay Austin.


Feb 08, 2011 07:48 AM
rating: 4
 
TADontAsk

That's the thing. You'd expect this if it was listing the top 20 comps, or maybe even top 10. The confusion is that these are the top THREE comps.

Feb 08, 2011 08:59 AM
rating: 1
 
DLegler21

While I agree that it is troubling that great players are showing up in the top 3 comps so often, I think you are taking the term comp too literally.

If I understand PECOTA correctly, the comps determine the "shape" of a players career (rise or fall from current, eventual decline, etc) moreso than the magnitude of the numbers. Comps determine this "shape" of the performance curve and then are applied to most recent level of performance (last 3 years) to come up with the projections.

Colin and team, is this a correct (if simplified) explanation?

Feb 08, 2011 10:25 AM
rating: 0
 
BP staff member Mike Fast
BP staff

Yes.

Feb 08, 2011 10:42 AM
 
jrmayne

I understand this. But the comps are supposed to be comparable players under the theory that comparable players will age comparably. Uncomparable players will not age comparably. If you think a guy mashing some in High Desert at the age of 22 is going to follow the same route as a guy mashing in the bigs at age 22, I'd like to see some authority for that.

This is also a massive change to the comp system that built the Pecota brand; the comp system was based on actual comparable players, rather than people mountains better than the player being evaluated.

If the comps are nutty (and indeed they are) for minor leaguers, the shape is based not on actual comps but on a semi-random assortment of players.

Further, if the comps are no good, the percentile rankings lose substantial value.

Further further, it seems clear that these errors have made for bad projections for older minor leaguers - Kila, Dubois, Bowker, and others have wildly optimistic projections. If you think I'm wrong, I'll make a bet (cash to charity of winner's choice) and give you 7-5 on the over on all of the aging minor leaguers.

--JRM

Feb 08, 2011 11:40 AM
rating: 0
 
norraist

I was just wondering why players who are going to either miss full or hals seasons (Strasburg and Johan Santana spring to mind) are listed as pitching so many innings?

Feb 08, 2011 08:35 AM
rating: 0
 
BP staff member Mike Fast
BP staff

With the depth charts will come human input about likely playing time. The weighted means spreadsheet is the output that PECOTA gives without knowledge about who has already suffered an injury that will cause them to miss time next year or whose playing time may change because of change in role.

Feb 08, 2011 08:43 AM
 
BP staff member Colin Wyers
BP staff

Because these are the weighted means, not adjusted for playing time. Playing time projections are coming in a week, with the depth charts.

The two separate types of forecasts exist because there are two different ways of using the forecasts. I mean, there are a bunch of guys in that spreadsheet whose forecasted MLB totals for next season I can give you without having to do a lick of math - 0s across the board.

Forecasting playing time for players we know won't get any playing time is useful... well, whether or not it's useful is up to you, so I shouldn't say that. But what it allows you to do is use PECOTA, if you like, to answer questions like, "What would the Nationals rotation look like if Strasburg was able to pitch this season?"

Now, there are many, many use cases of PECOTA that require good MLB playing time estimates. We recognize that. We've announced when we are publishing them.

Feb 08, 2011 08:54 AM
 
norraist

Very fair -- thank you for the quick response!

Feb 08, 2011 11:22 AM
rating: 1
 
Hokieball

So is this the year that there will FINALLY be a uniform player ID that matches everyone between PECOTA, the PFM downloads, and the customizable in-season stat reports? IT would be awfully handy :)

Feb 08, 2011 08:50 AM
rating: 0
 
BP staff member Colin Wyers
BP staff

We'll be including IDs in the PFM output, and they will the the same ones we're including with PECOTA right now. I'll look into adding player IDs into the sortables.

Feb 08, 2011 08:55 AM
 
jlebeck66

This may be out of the scope of Colin's power, but player ID's for historical stat reports and for the minor league translations would be spiffy too.

Feb 08, 2011 11:06 AM
rating: 0
 
BP staff member Mike Fast
BP staff

Do not question the Colin's powers, or you risk being ground up and fed into his nutrient bath.

Feb 08, 2011 11:22 AM
 
Richard Bergstrom

If he was so powerful, why would he need a nutrient bath?

Feb 08, 2011 18:46 PM
rating: 2
 
BurrRutledge

Courtesy of wiki: In the original Doom Patrol series, The Brain was regularly portrayed as a disembodied brain, bobbing inside a sealed dome filled with a nutrient bath, hooked up with numerous machines, including a loudspeaker to convey his voice.

Feb 08, 2011 21:14 PM
rating: 1
 
Richard Bergstrom

Shows how long its been since I read a comic book. The last time I did, it was Optimus Prime's disembodied head hooked up to numerous machines.

Feb 08, 2011 21:57 PM
rating: 0
 
Richie

Nishioka is missing altogether, I believe.

Feb 08, 2011 11:48 AM
rating: 0
 
worldtour

Can someone tell me why PECOTA loves Winston Abreu so much? Am I missing some Tommy John medical news, or am I reading the spreadsheet wrong??

Feb 09, 2011 06:47 AM
rating: 0
 
BindleStiff

I'm not sure if this has been noted already but for those who have issues accessing the spreadsheet using their username/password, please note that the username/password are case-sensitive when accessing the PECOTA spreadsheet.

Or at least they are for me (and have been every year). This is different than the main page login, where the username/password are not case-sensitive, so it can be a bit confusing if your username or password contains uppercase letters.

Feb 09, 2011 07:51 AM
rating: 2
 
Brian DewBerry-Jones
(244)

A problem that I see with the comps is that everyone of the comps is a major league player. In the past, if you looked at the comps for a scrub in A ball, it was mainly other scrubs in A ball. What changed?

Feb 09, 2011 16:49 PM
rating: 2
 
jberkon

Agreed - this does seem like a difference. Would love to know why and whether, as jrmayne suggests, this is inflating certain players' projections

Feb 09, 2011 19:27 PM
rating: 2
 
Drew Miller

Is the Adrian Gonzalez projection adjusted for Petco and not Fenway? .880 OPS seems awfully low for Boston.

Feb 10, 2011 14:43 PM
rating: 0
 
Drew Miller

Also, there are a LOT of guys who are on wrong teams. It's as though the offseason of trades never happened.

Feb 10, 2011 22:29 PM
rating: 0
 
sldeck

What about Minnesota's Tsuyoshi Nishioka ? He appears to be MIA.

Feb 14, 2011 10:19 AM
rating: 0
 
rscharnell

Any update on when VORP will be added to the spreadsheets?

Feb 18, 2011 08:39 AM
rating: 2
 
jrbdmb

Concur strongly. For fantasy purposes VORP tends to be a much better measure than TAv or WARP, since most fantasy leagues do not take fielding and park adjustment factors into consideration. Please add this back in real soon, thanks.

Mar 08, 2011 12:25 PM
rating: 0
 
jrbdmb

Actually, at this point a better question is when will the next update be released? Waiting for any updates based on early ST, plus of course the return of VORP.

Mar 08, 2011 12:27 PM
rating: 0
 
You must be a Premium subscriber to post a comment.
Not a subscriber? Sign up today!
<< Previous Article
Future Shock: Cincinna... (02/07)
<< Previous Column
Reintroducing PECOTA: ... (10/01)
Next Column >>
Reintroducing PECOTA: ... (02/08)
Next Article >>
Premium Article Prospectus Hit and Run... (02/08)

RECENTLY AT BASEBALL PROSPECTUS
Premium Article League Preview Series
Every Team's Moneyball: Minnesota Twins: Reb...
Premium Article Skewed Left: History Repeats Itself
Premium Article League Preview Series
Premium Article Pitching Backward: Why Relievers Get A Free ...
Premium Article Spring Training Notebook: Cactus League
Prospectus Feature: How the Astros do Spring...

MORE FROM FEBRUARY 7, 2011
Future Shock: Cincinnati Reds Top 11 Prospec...
The Payoff Pitch: Whose Money Is It, Anyway?...
Fantasy Beat: BP Scoresheet Early Draft Prep
The Week in Quotes: January 31-February 6

MORE BY COLIN WYERS
2011-02-21 - BP Unfiltered: Depth Charts and PFM Open
2011-02-18 - Premium Article Manufactured Runs: Projecting Pujols
2011-02-16 - Premium Article Prospectus Preview: AL West 2011 Preseason P...
2011-02-07 - Reintroducing PECOTA: They're Here!
2011-02-04 - BP Unfiltered: A few other quick words on PE...
2011-02-04 - BP Unfiltered: PECOTA looks at Jimmy Ballgam...
2011-02-02 - BP Unfiltered: A few quick words about PECOT...
More...

MORE REINTRODUCING PECOTA
2012-03-08 - Reintroducing PECOTA: House of Cards
2012-02-08 - Reintroducing PECOTA: The Weighting is the H...
2011-02-07 - Reintroducing PECOTA: They're Here!
2010-10-01 - Reintroducing PECOTA: The Seven Percent Solu...
2010-09-30 - Reintroducing PECOTA: Aches and Pains
2010-09-29 - Reintroducing PECOTA: The Hits Just Keep On ...
More...

INCOMING ARTICLE LINKS
2011-02-10 - Premium Article Prospectus Hit and Run: The Five Disappointm...
2011-02-08 - Transaction of the Day: Vladimir Guerrero
2011-02-08 - Fantasy Article Fantasy Beat: PECOTA Projected Bargains