CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here for forgotten password Click here to subscribe

<< Previous Article
Prospectus Q&A: Terry ... (03/19)
<< Previous Column
BP Unfiltered: Chicago... (03/18)
Next Column >>
BP Unfiltered: PECOTA ... (03/23)
Next Article >>
Premium Article Player Profile: Brian ... (03/21)

March 20, 2010

BP Unfiltered

3/19 Pecota Update: Weighted Means In for Pitchers

by Clay Davenport

Last night's depth chart/PFM/PECOTA update was more than just a simple update.

Yes, it did cover the changes from the last few days, like Elijah Dukes' release and Armando Galarraga being sent down. More importantly, though, it includes what I believe are the final substantive changes to the PECOTA process for this season. This is the first run of the PECOTA system that includes the full, finished percentile- and ten-year runs for every hitter and pitcher with a record in professional baseball. 

It has been a long time coming, and for that I am truly sorry. The retrospective solution would have been to switch from the day job at least 60 days earlier than I did, but the task--in layout--did not seem as daunting as the execution proved to be. I use this data myself--I've had two drafts already, and will have three more in the next two weeks--so I do understand the hardship that comes from these delays. I want to thank Dave Pease for the thankless task he took on, playing the front man for the process to allow me to concentrate on the work itself--but any complaints about PECOTA should properly be directed at me, not to Dave, and not to BP in general. 

The process started with a PECOTA version that only produced data for the book: one year, one forecast. We had an inital set of modifications that expanded those projections from one year to ten, in a manner that led to a wide divergence of possible outcomes, probably too broad, and one which was very slow to run--it would take about a week to run every card. Towards the end of February we pushed a major upgrade for the hitters, which streamlined this process, removed an unnecessary program which will make it much easier to transfer from one machine to another, and reduced the processing time for all hitters from about four days to 18 hours, at the same time improving the accuracy of the system when run on past years' data.

This update represents that same step for pitchers. The future casts, which we are scrambling to get into the cards, are based on the 10-year performance of the current cast of comparables, not on generating a new set of comps each year. As with hitters, the unneccessary calls to R have been removed, replaced with inline statistical calculations, and the processing time to cover all pitchers is reduced from 72 hours to 13. Once again, we do see an improvement in the tested accuracy when compared to previous seasons:

System Hits ER HR BB K Sum
2009 PECOTA 15.19 12.96 4.56 10.05 15.94 58.70
BP2010 PECOTA 14.51 12.86 4.61 9.96 16.33 58.27
February 14.55 12.88 4.70 9.95 16.15 58.23
Now 14.53 12.88 4.72 9.91 16.16 58.20
Now (Weighted Mean) 14.40 12.72 4.69 9.79 16.02 57.62

These are the root mean square errors for a set of 300 pitchers from 2009, with all forecasts pro-rated to the pitcher's actual innings total. The last two columns represent the current run - "now" is the 50% projection, and now-WM is the weighted means projection. For pitchers, unlike hitters, the use of the weighted mean does result in a noticeable improvement in performance. And that is why the depth charts are now using the weighted mean projection for pitchers instead of the 50% projection, which is the main reason why the numbers have changed from the previous run as much as they have.

The weighted mean projection, for those unfamiliar with the phrase, is a weighted average (by innings) of the difference percentile probabilities for his performance. Nate described it well here.

I will, of course, continue looking for places to improve the program, but I highly doubt that I will be able to both find and implement anything prior to Opening Day. So I'm declaring that this version is closed, except and unless for any bugs which turn up which require a fix. For now, I'm looking forward to actually using the programs and not just building them...some of which I'll be doing this week.

Related Content:  The Process

13 comments have been left for this article. (Click to hide comments)

BP Comment Quick Links

dianagram

Thank you Clay for being forthright. Its a monumental task. Maybe BP (in general) botched the timing of the switch-over, but as one who has had to automate many manual processes over the years (my Operations Research background), I know its not easy. Perhaps it would have been best to run the old and new ways in parallel for a year prior to the switch-over.

Regardless, I can only imagine the gigabytes of data that needed to be wrangled.

Mar 20, 2010 09:16 AM
rating: 6
 
BurrRutledge

Thanks, Clay!

Mar 20, 2010 09:20 AM
rating: 0
 
jberkon

Thanks Clay - very helpful. Does this mean that there will NOT be WMs for hitters in the spreadsheets?

Mar 20, 2010 09:36 AM
rating: 0
 
swarmee

I think based on previous posts, the numbers they're using tested out better on predicting 2009 data (without using any 2009 data) than the weighted means, so from what I recall, they're not going back to using the weighted means.

Mar 20, 2010 14:22 PM
rating: 0
 
Juris

Thanks a lot, Clay. Now you need a massively parallel computer system that will allow you to run your R code in a few minutes rather than hours. We're setting one up that has 24 PC processors in a "cloud computing" network (sorry not available to outsiders). You may be able to get something that doesn't rely on just one box.

Mar 20, 2010 13:58 PM
rating: 2
 
calhounite

Could someone tell me if the 1 digit Peavy value is correct. As an example, listed below the likes of M Byrd outfielder Chicago in relative value and below the likes of a shoulder-damaged Ted Lilly as a direct starting pitcher comparisons.


Mar 20, 2010 17:25 PM
rating: 0
 
SnakeDoctor18
Other readers have rated this comment below the viewing threshold. Click here to view anyway.

Anybody care to weigh in...my friend's and I were having a discussion, if you have the 5th pick in a 6x6 rotisserie league with the extra offensive category as OPS, who do you pick assuming 1-4 is pujols, hanley, arod, braun. I argued for Longo , others said Prince, but the consensus was Utley which I think is an overrated pick? Thoughts?

Mar 21, 2010 19:03 PM
rating: -6
 
gluckschmerz

Go with Longo and nab Votto and Weeks later in the draft.

This is finally the year Rickie Weeks arrives.

Mar 21, 2010 19:18 PM
rating: 0
 
mattjozga

Until he gets injured again...which is inevitable. Love me some Weeks, but let's be realistic!

Mar 22, 2010 07:53 AM
rating: -1
 
BurrRutledge

You can go wrong in that situation, but as long as you stay flexible and draft according to what the rest of the league is leaving for you in your next few picks you should be fine.

If you don't take Prince, you can get Adrian or Votto in the 2nd or 3rd round, respectively. Other options like Berkman and Pena are available many rounds later should somebody reach for either Adrian or Votto.

If you don't take Utley, you can get Kinsler or Pedroia in the 2nd or 3rd round, respectively. Or Roberts, Weeks, or Uggla many rounds later.

If you don't take Longoria, you can get Wright or Zimmerman in the 2nd or 3rd round, respectively. Young, Beckham, and Chipper are your later-round options here.

So, which of these options is the best for you:

1a) Prince, Kinsler, and Zimmerman
1b) Prince, Wright, and Pedroia
2a) Utley, Adrian, and Zimmerman
2b) Utley, Wright, and Pedroia
3a) Longoria, Adrian, and Pedroia
3b) Longoria, Kinsler, and Votto
or, another alternative:
4) Prince, Adrian and Pedroia/Zimmerman.

Discuss...

Mar 22, 2010 11:13 AM
rating: 0
 
BurrRutledge

2b should be Utley, Kinsler, Votto

Mar 22, 2010 11:15 AM
rating: 0
 
BurrRutledge

er... Utley, Wright, and Votto.

Mar 22, 2010 11:15 AM
rating: 0
 
Ira

Thanks Clay, for all your hard work!!!

If I may offer some advice for next year. Since you've gone through the hardship of automating this process, why not run your first set of projections immediately after the 2010 season ends. I'm unsure as to how future projections are affected by park factors, but it seems to me like you've been doing them all in a neutral park and league and then translating the results. If so, it makes sense to simply run a set of projections, lock in the data, and then spend the whole winter simply translating based on transactions. Then, if someone says, "why did so-and-so's numbers change?" you can simply say, "because he got traded to the Mets, and he'll be in an easier league but a harder park, or because he's slated to start the year in AA.

Thanks for all the hard work....

Mar 23, 2010 07:21 AM
rating: 1
 
You must be a Premium subscriber to post a comment.
Not a subscriber? Sign up today!
<< Previous Article
Prospectus Q&A: Terry ... (03/19)
<< Previous Column
BP Unfiltered: Chicago... (03/18)
Next Column >>
BP Unfiltered: PECOTA ... (03/23)
Next Article >>
Premium Article Player Profile: Brian ... (03/21)

RECENTLY AT BASEBALL PROSPECTUS
Fantasy Rounders: The Young and the Splitles...
Premium Article Minor League Update: Games of Thursday, May ...
Premium Article What You Need to Know: Bummed!
Premium Article The Prospectus Hit List: Friday, May 22
West Coast By Us: Day 1: In The Land Where E...
Premium Article Rubbing Mud: The Quarter-Season Odds Report
Going Yard: The Near Perfection of Pederson


MORE BY CLAY DAVENPORT
2010-04-20 - Premium Article Between The Numbers: Short Season to Majors ...
2010-04-01 - Fantasy Article Fantasy Beat: PFM Update 4-1
2010-03-21 - Fantasy Article Fantasy Focus: Depth Chart/PFM Update, Part ...
2010-03-20 - BP Unfiltered: 3/19 Pecota Update: Weighted ...
2010-03-17 - Fantasy Beat: Depth Chart/PFM Update
2009-10-26 - Premium Article World Series Prospectus: The Weather
2009-10-06 - Premium Article Playoff Prospectus: Post-Season Ballparks
More...

MORE BP UNFILTERED
2010-03-30 - BP Unfiltered: Baltimore BP Event
2010-03-27 - BP Unfiltered: A Conversation with Garrett J...
2010-03-23 - BP Unfiltered: PECOTA Cards: The Final Front...
2010-03-20 - BP Unfiltered: 3/19 Pecota Update: Weighted ...
2010-03-18 - BP Unfiltered: Chicago DePaul Booksigning
2010-03-15 - BP Unfiltered: On Halladay
2010-03-12 - BP Unfiltered: PECOTA Update
More...

INCOMING ARTICLE LINKS
2010-03-23 - BP Unfiltered: PECOTA Cards: The Final Front...