And here we are–the release of the 2011 PECOTAs.
While I have your attention, I’d like to say a few words about the production of the PECOTAs this year. I guess I don’t have to tell you that I’m filling some pretty big shoes here–Nate Silver is probably the most famous sabermetrician not named Bill James, and PECOTA is where Nate made his biggest mark in our community. And so I’m building on his work–and work by people like Clay Davenport and Gary Huckabay, too. I am, as they say, standing on the shoulders of giants. (In big shoes, apparently–this is what I get for mixing my metaphors.) So I owe them, and others I’ve probably neglected to mention, my deepest thanks.
But even with all that, I wouldn’t have gotten this far without a lot of help. I couldn’t have accomplished what I’ve done without the help of everyone here at Baseball Prospectus, who have really given me all the support I could ask for. But I want to thank a few people especially–it’s a fine line to walk, as if I list too few I risk upsetting someone unfairly left off, and if I list too many, you won’t finish reading the list. So with that in mind–very special thanks to Rob McQuown, Mike Fast, Ben Lindbergh, and Steven Goldman. Gentlemen, take a bow, and everyone, please, give them a round of applause.
Now, then, on to business.
This is the first release of PECOTA, and as such will continue to undergo revisions through the remainder of the offseason. The program we use to generate the PECOTAs is continually evolving, and when we discover new ways to improve the forecasts, we'll make those changes and pass the updated forecasts on to you. We’ll also be updating periodically to keep up with players who switch teams.
In addition, we have several PECOTA features yet to roll out out–first will be the Depth Charts, which combine the PECOTA forecasts with estimates of a player’s role and playing time. Those should be ready a week from now, and will be available in two forms: the team Depth Chart pages, and the Player Forecast Manager (which is receiving some upgrades as well).
After that, we’ll be publishing the PECOTA cards, featuring perks like the percentiles and the ten-year forecasts. We’ll update you more as we get closer to that point.
We’re also revamping the cards to use the new Wins Above Replacement Player model we’ve developed. PECOTA has already been adapted to use the new WARP, so WARP baselines have shifted a bit from what you’re used to seeing. The biggest change is among relief pitchers, who take a major hit. Please keep this in mind as you review these forecasts. We know that many of you are relying on these forecasts for your fantasy teams, and we thought that it was better to get the forecasts out now rather than wait for when the entire site was ready to transition to new WARP.
Now, some of you may be asking, “How good are the PECOTAs this year?” Of course, we won’t know the answer for another eight months or so. But we can come up with an educated guess, if we make the assumption that there’s nothing special about predicting the 2011 season, and that a system that works over previous seasons will work in succeeding seasons.
In the course of producing the PECOTAs, we generate forecasts for every player who played from 1950 through 2010. These aren’t quite the same as the full PECOTAs–they are park-neutral, rather than being adjusted for the home park a player plays in. They are not age-adjusted. And for the most part, they do not reflect minor-league performance–we have major-league data for all of MLB history, but very little minor-league data. Still, they do represent a substantial portion of the PECOTA process.
It is time-prohibitive for us to generate full age curves for all of these historic forecasts, but we did adapt a simplified set of age adjustments for past purposes. These simplified PECOTA forecasts aren’t as accurate as the full PECOTAs, but they give us a chance to view how well PECOTA fares over a large swath of history.
There’s only one other projection system available and therefore able to be pitted against PECOTA for such a large part of baseball history: the Marcels, originally developed by Tom Tango. (The version we’re using in these tests was published by Jeff Sackmann.)
I “re-baselined” each forecast for each season in the test by subtracting the average forecast and adding in the average performance of the players forecasted (weighted by playing time, in both instances). Then I took a look at two tests–one is root mean square error, which tells us that 68% of forecasts were within that margin of error. The other is simply counting which forecast was closer to a player’s actual performance. Looking at offensive stats first:
P_OBP_RMSE |
P_SLG_RMSE |
M_OBP_RMSE |
M_SLG_RMSE |
P_OBP_SR |
P_SLG_SR |
0.036 |
0.066 |
0.036 |
0.066 |
56% |
54% |
In terms of RMSE, our simplified PECOTA is in a dead heat with the Marcels. (And again, PECOTA is giving Marcels an edge, as it is forecasting everyone for a neutral park; the Marcels make no park adjustments, but most players do not switch teams or parks between seasons.) In terms of “success rate” (in other words, the percentage of head-to-head projection matchups "won"), PECOTA has a slight edge.
Now, for pitching:
P_ERA_RMSE |
M_ERA_RMSE |
P_ERA_SR |
1.22 |
1.22 |
48% |
Again, RMSE shows a dead heat. In terms of success rate, the Marcels have a slight edge over the simplified PECOTAs.
Like I said, this sort of testing emphasizes breadth rather than depth–PECOTA has foregone several of its advantages, like park adjustments and minor-league data. And yet it’s still producing accurate forecasts.
Now, just as a reminder–PECOTAs are available only to our subscribers (Premium and Fantasy) who are signed up for a whole year. If you haven’t already subscribed, you can do so here. Doing so doesn’t just get you access to PECOTA, but also to our fantasy tools like the Player Forecast Manager and Team Tracker, as well as exclusive access to some of the best baseball writing out there. If you’re already a subscriber, thank you for your support, and I truly hope that you all enjoy what we do as much as we enjoy doing it. I am continually amazed at how intelligent and knowledgeable our readers are, and I really do think that the people who read BP are among the best baseball fans I’ve ever known.
That’s all I've got–have fun, folks.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
- Ryan Howard (41) is the only player hitting more than 40 homers this year.
- Joe Mauer (.317) will lead the league in batting average, with only a small handful of players even breaking .300.
- Juan Pierre will lead the league with 50 SB. Interestingly, Derrick Robinson has the next highest predicted steal total with 47.
- There will be a three-way tie with CC Sabathia, King Felix and Dan Haren for the league lead in wins (15). Those same three will lead the league in strikeouts, with King Felix taking the K crown at 204.
- Joe Nathan is going to be the runaway saves leader (40), with K-Rod behind him at 35.
Great stuff guys! This is one of the two most exciting days of the baseball pre-season, with the other being the release of the newest MLB: The Show video game.
This spreadsheet shows the weighted mean projections for each player - which means that the numbers shown in the spreadsheet take into account ALL possible outcomes for how a player will perform in 2011 and weighs them by how likely they are to happen.
All data is expressed as each player plays in the Major League environment environment of their currently listed team.
Please feel free to correct any portions of the above which are incorrect...
...getting a lot of work done, as you can tell.
(In the next iteration, can you add the league to each line? Easier to sort for AL- or NL-onlys!)
Thanks!
If all you need is to know the first position listed for the player in the FRAA column (i.e., the primary position at which PECOTA thinks the player will appear), create a column entitled "POS" at the end of the spreadsheet in Column AQ, and then use this formula in this column for the first player listed:
=LEFT(AF2,2)
then copy it into every player's row.
If, however, you want to see if the player is mentioned as playing that position at all, you'll need to created one column for each of the nine positions, and label them "1B", "2B", etc. Under the column you name "1B", you would enter this formula for the first player listed:
=IF(ISERR(FIND("1B",$AF2,1))=TRUE,"","1B")
The FIND function looks to see if the string "1B" is found somewhere in the FRAA column. The ISERR function looks to see whether there's an error (if it doesn't find string "1B", the result is a #VALUE error -- you could leave those, but it looks kind of ugly). The IF function states what to do if there's an error (i.e., put a blank space in the cell) or if there isn't an error (i.e., put the value "1B" in the cell).
This will put the value "1B" in the column for that player if they're listd at 1B in the FRAA column, and an empty field if not. The "$" before the AF ensures that when you copy this cell, the reference to column AF (i.e., the FRAA column) will stay the same, so you can then copy the formula into your 2b column, edit the formula where it looks up and then stores "1B" to now say "2B", and copy that for all players. Follow the same steps for 2B, 3B, SS, LF, CF, RF and C, and you'll have a column for each position on which you can either sort of filter.
If you want a column that will describe if the player is listed by PECOTA for ANY outfield position, you can use this formula in its own column:
=IF(ISERR(FIND("F",$AF2,1))=TRUE,"","OF")
Hope this helps.
=IF(ISERR(FIND("C",$AF2,1))=TRUE,"",IF(ISERR(FIND("CF",$AF2,1))=TRUE,"C",""))
Click the fantasy tab, then there is the fantasy box about halfway down the page on the right side with the spreadsheet link. Definitely a bit buried for some reason.
(after looking to see what I had accidentally filtered when I noticed the WARP gap between Pujols and the next best hitter...)
Also, league and position fields have been added to the spreadsheet, so if you were interested in those, go ahead and download it again.
Zambrano made his major league debut at age 20.
Marte made his major league debut at age 28.
Marte has pitched 39.2 career innings in the majors.
Zambrano has had 39.2 career tempter tantrums in the majors.
Marte has zero professional starts according to B-R.
Zambrano has over 300 professional starts according to B-R.
Marte is 6'2" and fat
Zambrano is 6'5" and fattish
Somehow I don't see the comparison, I really don't. Nevermind Zambrano's been under 4.00 ERA for a decade now and PECOTA still thinks his true level is above it.
This year, we're encouraging PECOTA to rely more heavily on minor league comps for minor league players.
It's rare that a case like this happens, but not unheard of. In addition to Adenhart, you've got Cory Liddle, Roberto Clemente, Darryl Kile... there's 22 of them total, it looks like. (And I may have constructed my query a bit too narrowly, looking only at players who died in the same year as their last appearance.)
Again, I don't think this has a significant impact on anyone's numbers, but I do agree that it doesn't really help PECOTA to include these types of players in anyone's comp lists, for those seasons. I'll write up a fix tonight and have it out for the next round of PECOTAs.
Also, Justin Morneau as Mauer's #1 comp? They're pretty disimilar hitters (and fielders for that matter).
Not a big fan of Rasmus, either. Pegging him for .100+ less OPS this year than '10.
Other remarkable lines:
- Dan Johnson projected for the same HR total as Longoria and a higher OPS. Wow.
- it appears to be kind to Clay Buchholz, everyone's favorite regression candidate.
- PECOTA loves Haren. He and King Felix are the only two projected for 200+ K.
- Rumors of Beckett and Vazquez' demises appear to have been greatly exaggerated.
If I recall correctly, PECOTA has no way of knowing that Vaz' velocity dipped significantly last season. The easier park/league will help, but in Rotoland, I am certainly not paying for the 3.79 ERA and ~200 Ks PECOTA is projecting.
The obvious problem is that that data doesn't exist prior to 2007, so the sample size is still very small. Every additional year of data helps us solve that problem.
The other "problem" is figuring out what things to incorporate from PITCHf/x, and how. Doing that with fastball velocity isn't trivial, but it's probably doable. Doing that with other pitch data is more problematic.
If you wanted me to give a ballpark guess, I'd say that I really hope that two years from now, PECOTA includes PITCHf/x information. That's not a guarantee or even a firm estimate or something I've discussed with Colin. It's me putting my finger up into the wind and projecting into the future how well we've done at digesting the PITCHf/x information that we have.
In some other articles, there's mention of a new method of analysis regarding a players likelihood for injury. It would be useful to have a metric here denoting injury propensity so we can view the projections with that in mind.
Cant wait for my book to arrive next week so that I can finally pin down my prediction for the '11 Brewers.
I thought I understood, then saw Bobby Jenks at 7%, so perhaps not.
9 k/9, 103 games? sounds like fantasy gold!
In my opinion, the real value of the PECOTA forecasts is not the mean projections. The value of the PECOTA is that it is the only projection system I know of that publishes the forecast distribution (e.g., breakout, attrition, etc.).
"The key difference in Pecota, the forecasting system that I developed eight years ago to predict the performance of baseball players, was not that it did better than its competition, on average (it did in most years, but only by a tiny bit). Rather, it was that it looked at the uncertainty in the forecast as a feature rather than a bug.
For example, it didn’t just tell you how many home runs Derek Jeter would hit on average, but what a best-case scenario looked like and what a worst-case scenario looked like. This not only made the forecasting system more honest, but also provided a lot more information to the reader.
People often forget that there are essentially two parts to any forecast: what we can think of as the mean forecast (“our best guess is that Sarah Palin will win by 7 pointsâ€) and the confidence interval (“the margin of error on my guess is plus or minus 9 pointsâ€).
http://fivethirtyeight.blogs.nytimes.com/2010/09/29/the-uncanny-accuracy-of-polling-averages-part-i-why-you-cant-trust-your-gut/?scp=7&sq=PECOTA&st=cse
The value of PECOTA over Marcel is presumably in a number of other areas, including but not limited to, forecasts of rookies and minor leaguers, forecasts of fielding, forecasting other categories that Marcel does not, depth charts, PFM. Percentiles, comparable players, breakout/collapse percents, etc., could also be on the list, though with some of those things it needs to be proven that they are accurate. (Colin published some on this in the fall.)
My question remains: is this (the spreadsheet in this post) some kind of stripped-down pre-PECOTA algorithm, or is it the final algorithm you plan to use for the pre-season 2011 projections?
You should also note here that Colin has removed park adjustments from PECOTA and thrown out rookies in order to compare to Marcel. He mentioned that, but it bears repeating.
But according to you, they (the forecasts) are all the same in avg. quality, and there is no hope of outperforming consistently. Contradicting facts aside, even if that was the case, why would you (BP) build/re-vamp a very complex projection system, when you know the work will not result in any consistent outperformance over a simple past-years' stat average system? Just for kicks?
BP really can't have it both ways. The stance needs to be one of the following:
(1) We are committed to producing the most accurate baseball projection system possible. We have worked tirelessly on possible methods of attacking this problem and believe ours is the best. We plan to charge money for these projections (alone or as part of an editorial package), and will in return provide objective evidence of our progress -- both in backtested results and in real-time (end of season) performance measurement, vs. various other methods/systems.
(2) We have looked at the issue of how best to project player stats and have concluded that there is *no* benefit to doing anything more than averaging the players' past stats (Marcel). We believe that Marcel is and will always be about as good as the very best projection system, so we have formally given up. We will no longer be charging for fantasy projections and will refer all of past customers to a calculator.
Obviously, stance #1 is preferable, but #2 would be fine as well. But trying to have both at once is....a problem.
If all you want is a rate state projection for every player who was a regular in the major leagues last year, you will do just fine with Marcel. Don't spend your money to subscribe to BP if all you want is a PECOTA rate-stat projection for that set of established major-league players.
PECOTA offers much more than that, plus hopefully it is more accurate than Marcel even on rate stats for established players. But it's not going to be outlandishly better than Marcel on that. Nate Silver said that, Colin said that before today. I'm not making some horrible reveal of the awful truth that BP has been trying to conceal for years.
What you say on #1 is the case. BP is committed to producing the most accurate baseball projection system possible, and that includes the depth charts, Player Forecast Manager, percentiles, multi-year projections, projections of rookies and minor league players, etc., that go along with that and are not part of Marcel.
http://www.baseballprospectus.com/article.php?articleid=12102
http://www.baseballprospectus.com/unfiltered/?p=564
"This is the first release of PECOTA, and as such will continue to undergo revisions through the remainder of the offseason. The program we use to generate the PECOTAs is continually evolving, and when we discover new ways to improve the forecasts, we'll make those changes and pass the updated forecasts on to you. We’ll also be updating periodically to keep up with players who switch teams."
Thnx
* right click and select Format Cells
* under Category, select Custom
* in the Type box, erase "General" and type ".000" (NO quotes)
* click OK
Anybody have an idea why this could be
-Ethier
-Jay Bruce
-Luke Scott.!?
-Carlos Pena
But...for the 3rd year running, these projections seem absurdly low and inconsistent. Some of them defy explanation.
Example:
Evan Longoria: 82/25/84 263/348/474 (2011 PECOTA)
Evan Longoria: 88/27/101 283/361/507 (3-Year Average)
Mike Stanton: 76/32/87 0.247/0.328/0.496 (2011 PECOTA)
How can a guy with a 3-year track record of superstar stats be projected that low, while a sophomore with 1/2 of a MLB season and minor league translations be projected so well? I have no argument with Stanton's projection, but Longoria's looks ridiculous in comparison.
A few more that jumped out at me:
85/23/79 for Zimmerman?
72/23/77 for Hamilton?
76/18/62 for K Morales?
62/17/58 for Weeks?
I understand that PECOTA isn't a "prediction" system, it's a "projection" system, based on the probable result of a mathematical simulation.
But this year, it just doesn't pass the smell test at all!
I don't know the math, but is it possible PECOTA is forcing a regression to the mean TOO much?
I'm sure PECOTA's high-end projections for Hamilton are quite good, but the weighted means are gently reminding us that Hamilton has missed nearly 30% of his team's games over the last four seasons.
Kinda similar issue with Weeks. PECOTA is perhaps a bit generous with Weeks, really: If Weeks does indeed notch 17 dingers and 58 RBIs, each would represent the second best total of his entire career, behind only last season. From 2005-2010, Weeks played in 53% of his team's games.
"From 2005-2010, Weeks played in 53% of his team's games."
Wrong. Should read:
"From 2005-2010, Weeks played in 65% of his team's games."
If he's averaged 30/HR per 500 AB/PA over his career in the majors, why should his MEAN be 23? Shouldn't his mean be closer to 30, with a 90th percentile that reflects his talent level over 600/700 AB? 40? 45?
If Hamilton can stay on the field more than that (and it's not clear that he can), then he's probably going to top 23 HR. But given his age, and a little regression to the mean, I don't see what's surprising about projecting a guy to be just one home run under his career rate.
You and Joe are focusing on my speculation about Hamilton, but seem to be missing my larger point.
Why do ALL of the numbers look so low across the board?
Again, I really do appreciate the work that goes into this. I'm just trying to understand so that I don't unfairly criticize.
In my experience, people also tend to perceive regression to the mean, while statistically correct, as making the numbers look too low for everyone.
Longoria's 3 year mean slash stats are: .283/.361/.521
PECOTA's weighted mean for Longoria is .263/.348/.474
You can see my specific comments about Longoria a few comments down. (Looks wrong to me, too.)
It's great to have the high level of interest in PECOTA that we do, but one of the bad side effects of 200 comments on the thread is that it's easy for ideas to get lost or crossed in cyberspace here.
I apologize if I'm putting words in your mouth, or have twisted things somehow, I'm just trying to get a better handle on how PECOTA works. It would be really helpful if someone could write an article that would describe in a little greater detail how PECOTA works. I know it's a fine line between explaining what's in the black box and giving it away, but I think there is a lot of confusion about just what PECOTA does.
Part of the problem is that Colin's really the only one here who fully knows how PECOTA works. Nate and Clay would obviously be able to contribute on that front, but they're not around as much nowadays. Colin can't work on everything everyone thinks might be wrong with PECOTA or explain everything that everyone wants explained about PECOTA. Some high priority stuff he can and will, but other things will take longer.
So it falls to some other folks, like me, who know less about all the intricate details of PECOTA, to attend to some of the questions.
With that caveat, let me say that regression to the MLB mean does not imply that all players have the same talent level. If that were true, we'd just predict everyone to do league average next year. It's regression to the mean, not collapsing to the mean.
Tango's the king of explaining and pushing the implementation of the regression to the mean concept. The way he likes to explain it is that the longer the playing record we have for a given player, the less regression toward the mean that we need to apply. The way that Tango does regression to the mean for Marcel is to add in a half-season worth of league-average stats into the player's last three years of major-league stats.
As to the specific amounts of regression to the mean that PECOTA uses for its baseline forecast, I don't know. It's probably not terribly different than Marcel in the amount of regression for established major-league players, but for minor-league players, PECOTA has the advantage of using translated minor-league stats.
Take a look at this example from the BP archive: http://www.baseballprospectus.com/article.php?articleid=1897
Of course that middle group is going to dominate when you measure the delta in batting average in year n+1. After all, the 1st group will likely have a small delta, and the last group is probably too small to impact the average too much. So, you end up with a watered down measure of players coming back to Earth after career years. To generalize their decline to all batters in that .300 to .310 group seems like a mistake.
What I would like to see is the same analysis, but instead of grouping players by batting average in year n, group them by the difference of their year n batting average vs their career batting average(obviously limit to players with a meaningful sample size). Then look to see the average difference to career average in year n+1 for those groups. I think you would see that the group with the smallest differential in year n would also tend to have a smaller differential in year n+1 and that the distribution of outperformance vs underperformance would be close to random.
If Votto has a 3 year mean TAv of .325, but posted a .350 TAv in 2010, why should he revert towards the MLB mean TAv for 1B of .280(I'm just making up a number) in 2011 instead of his own mean TAv of .325 ? Regressing back towards the MLB mean implies that even his own .325 number over the past 3 seasons doesn't reflect his true level of ability, that the .325 is nothing more than an outlier and isn't repeatable. That is non-sense.
Sometimes an absurd example helps illustrate a point. If someone stuck me at SS for the next 3 years for the Astros I might put up a line like .083/.100/.110. If someone wanted to project what my 4th season would look like they would be a damn fool if they projected me to improve just because I should regress to the ML mean and because average ML SS's hit .240/.290/.350 .
Theoretically, I just don't see the case for regression to the MLB mean(especially for established MLB players). Tango may use it, but it sounds more like it is a short-cut to a result rather than based on a strong theoretical foundation. I base that comment on other comments that Marcel is a very simple system, I'm not familiar with it myself, but if it is in fact a simple projection method(and I don't mean that perjoratively) then regressing to the MLB means seems like a simple means(no pun intended) to an end.
--Does PECOTA take into account why a player missed time? If guy misses for a 50-game PED or a freak home-run celebration leg fracture, that's not a "time missed" that will affect his projected production in real life, yet I suspect PECOTA looks at it as if all missed time were created equal. Thus, Morales' ridiculously low projection. Would that there was an injury database to quantify injury types and cross-reference them to PECOTA algorithms.
--Is PECOTA giving TOO much weight to missed time? "Health is a skill" is a cliché, but injuries aren't predictable. Saying, "Hamilton/Weeks/Kinsler missed time for various (different) injuries over the past 3 seasons means the most likely outcome is that he will miss time for another injury in 2011" just doesn't make logical sense. If they had a weak joint and repeatedly suffer the same injury over and over, then fine. But that isn't too common a situation overall, and I don't think it's a very strong thing to base a projection on.
--Is PECOTA regressing TOO far to the mean? Especially applies if you are using an MLB mean as tbwhite suggests, but also true if using a player mean. Longoria (example) historically is far better than his projection. Longoria is also entering his 'statistical prime' and has no history of significant injury. Nobody expects him to post career highs every year, but something is fishy about a projection that sees him only as an above-average 3B.
--What's the deal with airline food and goofy WARP scores this year? I imagine this will be answered, so I'll leave it alone.
(I can't buy the fact that Morales' projection is due to his broken leg, because it doesn't make sense that that injury will impact future performance so much)
Somehow, elite players are having their stats grabbed by the neck, throttled, then drug back to the pack. Similiarly, some crummy players are being given the ol' leg up to reach higher heights than they could with no assistance.
I have no idea WHY this is happening...I'll leave that to smart guys like you and Colin. But my engineering brain would like it explained or corrected for the sake of sanity.
Some players will have a sample of 1500+ weighted PAs. Adding 200 regression PAs means the regression is 11% of the total. For another player who had 200 PAs in his only season so far, 200 regression PAs is 50% of his total.
I strongly disagree, and think you are proving precisely why PECOTA should hedge on Hamilton.
Hamilton is going into his age-30 season, and you had to stretch back to his age-27 season to find the one time in his four-year career in which he was able to avoid the disabled list and put something close to a full season of work in.
Not putting in full seasons might not be as big a deal if it was simply a matter of minor injuries forcing him out 20-25 games a year. We're instead talking about missing 70+ games in each of two separate seasons. And in one of those (2009), he stunk(.741 OPS).
We should be skeptical of any projection system that *doesn't* ding Hamilton's playing time, and thus his HR/RBI/run totals.
Colin, Ken, Mike, etc., any explanation on this one?
using anything pre 2009 in projecting Torres is a failure.
Same goes for Huff, although not to the same extent.
And I am somewhat of a Giants fan.
Watching him play, his speed is real, his defense is real, and everything he hits is hit hard, from the time he became an everyday starter until the his appendix began bothering him he was a .290/.370/.520 player. And that looked absolutely legit.
Now, is it possible that he overperforms his forecast? Sure. But history gives us a lot of examples of players who out of nowhere put up a fantastic season and then went on to be something less than fantastic. Of course, there are some that did that and continued being fantastic. But there are fewer of them. Torres could do either, but the odds of him being something less than just his 2010 season implies are greater.
Sure Torres is likely to regress, but not towards his slap-hitting persona of five years ago.
Try rerunning PECOTA on him but treating him as if his career started in 2007.
Torres has had 4 years of hitting like this in the majors or minors, and at no time in the last 4 years has he NOT hit.
There is pretty much nothing in common here.
http://www.baseballprospectus.com/contact.php
Chacin 4.93/1.51/ 6.7 K/9 -- really? In approximately the same number of IP, PECOTA has Radhames Liz at 4.78/1.48/7.5
Surely there is a mistake somewhere here. Jhoulys is also worse than Boof Bonser and Philip Humber. To put it kindly, this seems a little off to me.
Now other than Ubaldo at 4.02/1.39, and two relievers (Street and Betancourt), all Rockies pitchers are projected to be some variation on horrible. Is PECOTA predicting a broken humidor?
1. Luke Gregerson (31%)
2. Johnny Cueto (29%)
3. Jason Marquis (29%)
4. Chad Gaudin (28%)
5. Huston Street (26%)
6. Pedro Feliciano (26%)
7. Kelvim Escobar (26%)
8. Mike Leake (26%)
9. Hong-Chih Kuo (25%)
10. Edinson Volquez (25%)
I'll go curl up in the corner until the next spreadsheet is released.
So what's the problem?
Also, maybe instead of "Singles" you could include a "total hits" column?
Thanks -- great job, but I have to agree with other posts ---- overall, it seems as if each year, PECOTA's 50% numbers are a little conservative for hitters. . . taking some other examples, Adrian Gonzalez' line looks pretty similar to his average the past 4-5 years; wouldn't he be expected to do a bit better with 81 games in Fenway v. PETCO, and hitting in a better overall lineup? Same could be said for Adam Dunn moving to the Cell. . . .
(1) Unless I've screwed up, the average "Improve" number is 23%. I believe this should be 50% by construction.
(2) Based on a super-quick review, a couple Rockies hitters (Fowler, Iannetta) have what appear to be optimistic projections, and as noted about the Chacin projection is pessimistic. I recommend double-checking the Coors Field park factor.
(3) I'm hoping future version of the spreadsheet have the SS/Sim metric. This is/was very helpful for Scoresheet players.
Kila Ka'aihue: Nick Johnson, Joey Votto, Adrian Gonzalez.
Dan Johnson: Jason Giambi, Nick Johnson, Paul Konerko
Chris Carter: Chris Davis, Joey Votto, Evan Longoria.
(Note that Billy Butler's comps are Conor Jackson, Dan Johnson and Gaby Sanchez.)
So far, a problem. But then:
Rich Poythress: Adrian Gonzalez, Prince Fielder, Kent Hrbek
Seriously. I am not making that up. Next?
Clint Robinson: Adrian Gonzalez, Joey Votto, Ryan Garko.
Wil Myers is an awfully good prospect, but .256/341/410 next year?
Robinson Chirinos (comps Iannetta, Carlos Ruiz, and Todd Helton) projects at 275/360/469.
John Bowker? 268/342/457. Jason Dubois has a nice projection, and I would've guessed he was somewhere selling insurance rather than hitting pretty well in Iowa.
Gerald Sands' comps are Prince Fielder, Willie McCovey and Mark McGwire.
Jaff Decker has Willie Mays in his comps.
I think there's a systemic problem with the minor league comps. Matt Carpenter's top comp is Chipper Jones. Trent Oeltjen's comps are Carlos Beltran, Vada Pinson, and Roberto Clemente. Thomas Field's top comp is Tim Raines.
If you want to tell me these are right, it's going to take a lot of convincing. Jedd Gyorko's top two comps ought not be Eric Chavez and Steve Garvey. There are many more crazy comps out there.
Now, I know there are a lot of comps in the system, but it looks like the system has a severe bias; the system is looking for people who are one hell of a lot better than the player.
--JRM
Just what do these top 3 comps means ? My understanding is that PECOTA is based off of the last 3 seasons. But when it looks for comps does it consider age ? I assume having 28yo Mike Schmidt as a comp would be better than having a 38yo Mike Schmidt as a comp. It would be helpful to know just what specific year is a being cited as a comp. Also, do the comps always tie back to the age of the player being projected ? Do you only look at 25 year olds when trying to find comps for a player who will be 25 in 2011 ? Just wondering is perhaps Jaff Decker is being compared to a 41yo Willie Mays. It's about the only way that comp would make any sense.
My guess would be that the lack of fielding data is hurting this comp. If you look at BaseballReference.com for Mays it just says he played OF his rookie year, no breakdown by where in the OF.
Also, to be fair the quality of Decker's comps is low(I think it was around 62), but I just find it hard to believe that a defensively challenged 20 yo OF who crushes high A ball doesn't have more closely comparable players than a HoF CF.
One more question, were the comps that are displayed cherry picked ? Because they seem like they are all ML players. I just can't believe that there isn't some obscure minor leaguer we never heard of who flamed out who is a better comp for Everett Williams than Mickey Mantle. And heck, Callison was pretty good too.
Votto has the highest "Improve" rate at 61%. Yet his forecast calls for just 29 HR's.
Digging deeper I see from Votto's player card that in 2010 he had 648 PA's , with 37 HRs, .324/.424/.600 for a .350 TAv.
For 2011, he is projected for 615 PA's just 33 less than 2010. But he loses 8 homers, 26 points of AVG, 36 points of OBP, and 71 points of SLG. His TAv is .317 compared to .350 in 2010, and he has the greatest chance of improvement in 2011 of ALL hitters ?
Something doesn't add up.
"Improvement Rate is the percent chance that a hitter's EqR/27 or a pitcher's EqERA will improve *at all* relative the weighted average of his EqR/27 or EqERA in his three previous seasons of performance. A player who is expected to perform just the same as he has in the past will have an Improvement Rating of 50%."
It seems pretty likely that Votto will improve over the average of his last 3 seasons, doesn't it?
(Note that I'm not going to argue that this isn't a pretty non-intuitive way to define "improvement." Nor am I going to apologize for the triple negative in that last sentence.)
That said, it certainly is not intuitive.
Something tells me that when all is said and done, 97% of MLB position players will not have declined in 2011.
For comparison's sake, the January 2010 PECOTA listed 483 batters, and 90 of them had an Improve % of 50% or greater.
Also in the Jan 2010 edition, there were 292 batters with a "Breakout" % of greater than 10%. This year there are zero. I don't believe the definition has changed. The 2011 leading Breakout batter candidate: Wilkin Ramirez, with his .289 projected OBP.
Something's screwy here. Where's Nate Silver.
What I DO question, is that there should be hundreds of comp possibilities. Thousands for the younger, minor league players. And with that consideration, it seems like the Mays, and the Kalines and the Younts are showing up at a higher frequency than we'd expect them to. Unless it just seems that way because we pause and pay extra attention when seeing "Willie Mays" on someone's comp list.
Also, I disagree a bit. While the model isn't saying that Decker is going to definitively have a Mays type career, it does use it as a possible outcome. It does affect the results. That's why I worry about current minor leaguers being compared to old hall of famers. The current data for Jaff Decker doesn't exist for Mays when he was 20. We don't know how many games Mays played in CF at age 20. Also is the height and weight data available season by season. How PECOTA deals or doesn't deal with that type if missing data can have a huge impact on the results. If it isn't penalizing Decker for playing LF in High A ball at age 20, when Mays was playing CF(presumably) in the majors at age 20, then the results aren't going to be worth much.
It would seem to me that the comparables should start with players who did the same thing at the same level, not with players who were regulars in the major leagues.
Everett Williams
Rymer Liriano
Randall Grichuk
Oswaldo Arcia
Mantle's age-20 season: 311/394/530, OPS+ of 161. In the big leagues.
Mickey Mantle should not be one of the top five hundred comps for these guys. Before, if you had Mantle or Kaline or Yount in your comps, you'd have an actual chance - if slight - of being similar to Mantle or Kaline or Yount. Rymer Liriano isn't going to be Mickey Mantle unless he gets bitten by a radioactive spider or something similar.
Dave Winfield is not a good comp for Calvin Anderson. Period. There is a problem, and it's a serious one, and while reboots are hard (especially in this case), someone should have caught this before publication. It took me twenty minutes to find the pattern and twenty more to confirm.
--JRM, welcoming our new Roberto Clemente, currently known as Jay Austin.
If I understand PECOTA correctly, the comps determine the "shape" of a players career (rise or fall from current, eventual decline, etc) moreso than the magnitude of the numbers. Comps determine this "shape" of the performance curve and then are applied to most recent level of performance (last 3 years) to come up with the projections.
Colin and team, is this a correct (if simplified) explanation?
This is also a massive change to the comp system that built the Pecota brand; the comp system was based on actual comparable players, rather than people mountains better than the player being evaluated.
If the comps are nutty (and indeed they are) for minor leaguers, the shape is based not on actual comps but on a semi-random assortment of players.
Further, if the comps are no good, the percentile rankings lose substantial value.
Further further, it seems clear that these errors have made for bad projections for older minor leaguers - Kila, Dubois, Bowker, and others have wildly optimistic projections. If you think I'm wrong, I'll make a bet (cash to charity of winner's choice) and give you 7-5 on the over on all of the aging minor leaguers.
--JRM
The two separate types of forecasts exist because there are two different ways of using the forecasts. I mean, there are a bunch of guys in that spreadsheet whose forecasted MLB totals for next season I can give you without having to do a lick of math - 0s across the board.
Forecasting playing time for players we know won't get any playing time is useful... well, whether or not it's useful is up to you, so I shouldn't say that. But what it allows you to do is use PECOTA, if you like, to answer questions like, "What would the Nationals rotation look like if Strasburg was able to pitch this season?"
Now, there are many, many use cases of PECOTA that require good MLB playing time estimates. We recognize that. We've announced when we are publishing them.
Or at least they are for me (and have been every year). This is different than the main page login, where the username/password are not case-sensitive, so it can be a bit confusing if your username or password contains uppercase letters.