March 20, 2010
3/19 Pecota Update: Weighted Means In for Pitchers
Yes, it did cover the changes from the last few days, like Elijah Dukes' release and Armando Galarraga being sent down. More importantly, though, it includes what I believe are the final substantive changes to the PECOTA process for this season. This is the first run of the PECOTA system that includes the full, finished percentile- and ten-year runs for every hitter and pitcher with a record in professional baseball.
It has been a long time coming, and for that I am truly sorry. The retrospective solution would have been to switch from the day job at least 60 days earlier than I did, but the task--in layout--did not seem as daunting as the execution proved to be. I use this data myself--I've had two drafts already, and will have three more in the next two weeks--so I do understand the hardship that comes from these delays. I want to thank Dave Pease for the thankless task he took on, playing the front man for the process to allow me to concentrate on the work itself--but any complaints about PECOTA should properly be directed at me, not to Dave, and not to BP in general.
The process started with a PECOTA version that only produced data for the book: one year, one forecast. We had an inital set of modifications that expanded those projections from one year to ten, in a manner that led to a wide divergence of possible outcomes, probably too broad, and one which was very slow to run--it would take about a week to run every card. Towards the end of February we pushed a major upgrade for the hitters, which streamlined this process, removed an unnecessary program which will make it much easier to transfer from one machine to another, and reduced the processing time for all hitters from about four days to 18 hours, at the same time improving the accuracy of the system when run on past years' data.
This update represents that same step for pitchers. The future casts, which we are scrambling to get into the cards, are based on the 10-year performance of the current cast of comparables, not on generating a new set of comps each year. As with hitters, the unneccessary calls to R have been removed, replaced with inline statistical calculations, and the processing time to cover all pitchers is reduced from 72 hours to 13. Once again, we do see an improvement in the tested accuracy when compared to previous seasons:
These are the root mean square errors for a set of 300 pitchers from 2009, with all forecasts pro-rated to the pitcher's actual innings total. The last two columns represent the current run - "now" is the 50% projection, and now-WM is the weighted means projection. For pitchers, unlike hitters, the use of the weighted mean does result in a noticeable improvement in performance. And that is why the depth charts are now using the weighted mean projection for pitchers instead of the 50% projection, which is the main reason why the numbers have changed from the previous run as much as they have.
The weighted mean projection, for those unfamiliar with the phrase, is a weighted average (by innings) of the difference percentile probabilities for his performance. Nate described it well here.