Last night's depth chart/PFM/PECOTA update was more than just a simple update.
Yes, it did cover the changes from the last few days, like Elijah Dukes' release and Armando Galarraga being sent down. More importantly, though, it includes what I believe are the final substantive changes to the PECOTA process for this season. This is the first run of the PECOTA system that includes the full, finished percentile- and ten-year runs for every hitter and pitcher with a record in professional baseball.
It has been a long time coming, and for that I am truly sorry. The retrospective solution would have been to switch from the day job at least 60 days earlier than I did, but the task–in layout–did not seem as daunting as the execution proved to be. I use this data myself–I've had two drafts already, and will have three more in the next two weeks–so I do understand the hardship that comes from these delays. I want to thank Dave Pease for the thankless task he took on, playing the front man for the process to allow me to concentrate on the work itself–but any complaints about PECOTA should properly be directed at me, not to Dave, and not to BP in general.
The process started with a PECOTA version that only produced data for the book: one year, one forecast. We had an inital set of modifications that expanded those projections from one year to ten, in a manner that led to a wide divergence of possible outcomes, probably too broad, and one which was very slow to run–it would take about a week to run every card. Towards the end of February we pushed a major upgrade for the hitters, which streamlined this process, removed an unnecessary program which will make it much easier to transfer from one machine to another, and reduced the processing time for all hitters from about four days to 18 hours, at the same time improving the accuracy of the system when run on past years' data.
This update represents that same step for pitchers. The future casts, which we are scrambling to get into the cards, are based on the 10-year performance of the current cast of comparables, not on generating a new set of comps each year. As with hitters, the unneccessary calls to R have been removed, replaced with inline statistical calculations, and the processing time to cover all pitchers is reduced from 72 hours to 13. Once again, we do see an improvement in the tested accuracy when compared to previous seasons:
System | Hits | ER | HR | BB | K | Sum |
---|---|---|---|---|---|---|
2009 PECOTA | 15.19 | 12.96 | 4.56 | 10.05 | 15.94 | 58.70 |
BP2010 PECOTA | 14.51 | 12.86 | 4.61 | 9.96 | 16.33 | 58.27 |
February | 14.55 | 12.88 | 4.70 | 9.95 | 16.15 | 58.23 |
Now | 14.53 | 12.88 | 4.72 | 9.91 | 16.16 | 58.20 |
Now (Weighted Mean) | 14.40 | 12.72 | 4.69 | 9.79 | 16.02 | 57.62 |
These are the root mean square errors for a set of 300 pitchers from 2009, with all forecasts pro-rated to the pitcher's actual innings total. The last two columns represent the current run – "now" is the 50% projection, and now-WM is the weighted means projection. For pitchers, unlike hitters, the use of the weighted mean does result in a noticeable improvement in performance. And that is why the depth charts are now using the weighted mean projection for pitchers instead of the 50% projection, which is the main reason why the numbers have changed from the previous run as much as they have.
The weighted mean projection, for those unfamiliar with the phrase, is a weighted average (by innings) of the difference percentile probabilities for his performance. Nate described it well here.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
Regardless, I can only imagine the gigabytes of data that needed to be wrangled.
This is finally the year Rickie Weeks arrives.
If you don't take Prince, you can get Adrian or Votto in the 2nd or 3rd round, respectively. Other options like Berkman and Pena are available many rounds later should somebody reach for either Adrian or Votto.
If you don't take Utley, you can get Kinsler or Pedroia in the 2nd or 3rd round, respectively. Or Roberts, Weeks, or Uggla many rounds later.
If you don't take Longoria, you can get Wright or Zimmerman in the 2nd or 3rd round, respectively. Young, Beckham, and Chipper are your later-round options here.
So, which of these options is the best for you:
1a) Prince, Kinsler, and Zimmerman
1b) Prince, Wright, and Pedroia
2a) Utley, Adrian, and Zimmerman
2b) Utley, Wright, and Pedroia
3a) Longoria, Adrian, and Pedroia
3b) Longoria, Kinsler, and Votto
or, another alternative:
4) Prince, Adrian and Pedroia/Zimmerman.
Discuss...
If I may offer some advice for next year. Since you've gone through the hardship of automating this process, why not run your first set of projections immediately after the 2010 season ends. I'm unsure as to how future projections are affected by park factors, but it seems to me like you've been doing them all in a neutral park and league and then translating the results. If so, it makes sense to simply run a set of projections, lock in the data, and then spend the whole winter simply translating based on transactions. Then, if someone says, "why did so-and-so's numbers change?" you can simply say, "because he got traded to the Mets, and he'll be in an easier league but a harder park, or because he's slated to start the year in AA.
Thanks for all the hard work....