CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  


rssOur Latest Blog Entries
03-03Yankees-Pirates, Phils-FSU, 3/3 by Joh...
03-02Braves-Mets, 3/2 by John Perrotto
03-01Clay Davenport Now at BP Full-Time by ...

February 26, 2010, 11:43 AM ET
PECOTA Update 2/26

by Clay Davenport

Got another round of updates done and sent out for PFM, depth charts, and the weighted means spreadsheet late last night. What’s changed?

* SS/Sim is in. Thanks to Mike for the help with those.

* Upside - the Upside is calculated from a series of player forecasts; it is essentially the players runs above average for a six-year period. Nate calculated the upside only by looking at the current set of comparable players; I’ve calculated by iteratively running the player’s forecast into the future. The first time I ran that the forecast was definitely too ‘hot’ - virtually every 20-year old was eventually turned into a .350 hitter, as optimism up like a runaway resonance effect. I’ve put a quick damper on that for now (reading from a lower forecast level will reduce the influence of age, which was the primary variable being over-emphasized); a larger fix, with the forecast starting out highly optimistic but regressing towards a median level over time, will take more time to run. I’ll also revisit the “classic Nate” method - I changed it in the first place because the iterative method worked better, but I’ve made enough other changes since then that I can’t be sure that’s still the case. I’ll run some tests from the early 2000s - whichever version makes the best projections for the late 2000s will enter the program.

* Steals - I know there was some discussion of stolen bases looking too optimistic, and there was a good bad reason for that - a piece of code that was regressing stolen base percentage towards the league average was actually regressing it towards 1, making a ~10-point gain for the typical player. Not noticeable on a guy with 5 steals - but very much so on a guy with 30.

* Strikeouts - hitter strikeouts were an area where the initial version was performing noticeably worse than last year’s PECOTA, and I did make some changes that wipes out 75% of that difference (which does mean that the current one is still doing worse than last year for some reason, but with an average error about 0.3 worse instead of 1.3 worse). The change had to do with how the stats are weighted to determine a player’s baseline rate of performance. For both the tested players and his comparables, we build a weighted mean of his prior three years of performance - this establishes a baseline that is then tested against the fourth year, and those differences are what drives PECOTA.  How that weighted mean is built mainly depends on the age of the player - for very young and old players, the most recent year is the strongest driver, while for mid-career guys you tend more towards a simple three-year average. Among all age groups, though, strikeouts need to be strongly driven by the most recent year - and that change, allowing different stats to weight differently for the same player, is new. I wouldn’t be surprised to find that other stats will also benefit.

* Depth charts - John Perrotto collected a series of depth charts from the beat writers for each team, with their opinions about how the lineups would look come April. I’ve allowed those to influence, and in some cases strongly influence, my thinking on various players; I’ve also rejected it in places where the result simply made no sense. The beat writers were focused primarily on an Opening Day lineup, while I’m trying to establish patterns for the entire season - I can readily believe that teams would be stupid enough to start Sucky Player A in April, but don’t think that they’ll continue sticking with him in July.

Another slew of words about the depth charts in general. PECOTA is a system geared for the projection of individual players. It is not run for teams - the depth chart takes playing time estimates from a person, looks up the PECOTA projection for that player, and adds those up for every player on the team to generate “team totals”. That is not the way to optimize the projection for a team. The sum of the individual projections is going to be greater than a proper team projection, and the sum of those is going to be higher than a proper league projection. The reason the league doesn’t end up as high as the projections is not because the individual players projected will all do worse across the board - it is because teams will go deeper on their depth charts than we can reliably predict (and beyond the top two, its generally a crap shoot which minor leaguer gets called up). Some players are going to get hurt and fall dramatically short of the projected playing time, and there are likely to be more high estimates of PT than low ones. We’re listing 2-4 players per position, probably about 30 ’slots’ per team. The Diamondbacks, to pick a team more or less at random, last year used 3 players at shortstop, 4 at catchers and second and third, 5 in center and right, 6 in left, and 8 at first, plus 13 used as a DH or PH at least 5 times, which is somewhere between 39 and 52 ’slots’ depending on how you count the PHs. We list 17 or 18 pitchers per team -  the average team in 2009 used more than 24. Attempting to constrain PECOTA to the depth charts - by changing the numbers to match the expected league total - will damage the forecast. There were elements in the depth charts that were doing just that - I’ve been removing them as I find them, but we’re still doing it for playing time. I may change that soon as well; since the new PECOTA does make a specific major league playing time estimate (the “Major” column on the spreadsheets is the expected percentage of his playing time that comes in the majors)  it doesn’t need to be nearly so totally reliant on the depth charts.

But we also use those depth charts, rightly or wrongly, to assess a team’s expected wins, we have to find a way to reconcile the individual projections (which tend to produce too many runs for the offense, and not allow enough to the defense). The runs scored and allowed totals that show up for the team have been balanced - the total runs scored and allowed made to be equal, allowing pythagorean win estimates to create a balanced won/loss record for the league. However, the batting line that goes with it is still just the sum of the individual player projections. So yes, there is a disconnect between the team slash line and the runs scored, and there is a disconnect between the sum of the players runs scored and the team runs scored. I haven’t figured out any way around that without compromising the quality of the individual projections.

25 comments have been left for this post.

BP Comment Quick Links

Berselius

Clay, maybe I'm missing something here. I don't see how the Pecotas would have to change based on your team projections. Don't you have a season simulator already for your playoff odds predictions? I would seem to me that pecota and the depth charts would just be the inputs to this, which would give the final projected standings. Am I crazy here?

Feb 26, 2010 09:07 AM
rating: 1
 
Berselius

Plus as far as the rest of the depth chart goes, why not just fill in the blanks with replacement level players?

Feb 26, 2010 09:12 AM
rating: 4
 
archilochusColubris

I'll second this motion. Seems like the easiest way of reconciling the individual and aggregated projections. Check what percentage of plate appearances should be expected to go to unlisted players and write in replacement-level stats in their place.

You wouldn't have to be supremely accurate with these replacement players (i.e., specifying by team) to improve the overall consistency of the team projections.

Feb 26, 2010 10:40 AM
rating: 0
 
Michael
(736)

Recognizing the playing time changes will continue, are the rest of the projections stable yet? Is it safe to start using them to plan for fantasy drafts and auctions? Or do you have other things on your to do list such as "investigate excessive stolen base projections" previously was on your to do list?

Feb 26, 2010 09:42 AM
rating: 3
 
repstein

PFM question: Is it possible to tweak the interface so that users can see stats that don't count in their league scoring? So for instance, my league uses standard 5x5 for our scoring, but when I'm looking at Excel spreadsheets on auction day (league rules: no Internet allowed) I'd like to see projections for OPB, SLG, GB%, etc. The way PFM runs now, if you set it to see those stats, it considers them in calculating your player values. Any way to allow you to see those stats alongside those that count for your league?

Feb 26, 2010 10:11 AM
rating: 2
 
ccweinmann

I have made this exact request every year since PFM was first introduced. My work-around is to run two spreadsheets, one for rank and the other with all the data I'm interested in, sort alphabetically, combine, and then re-sort by rank. A bit of a pain in the ass, though, compared to how easy I think it would be to just tweak PFM.

Feb 26, 2010 10:53 AM
rating: 0
 
Will

Instead of sorting alphabetically, just search for vlookup in the excel help menu.

Feb 27, 2010 13:18 PM
rating: 0
 
Steve Paulo

"SS/Sim is in"

Amazing timing... six hours until my SS draft starts! You guys really came through for me! Awesome!

Feb 26, 2010 10:15 AM
rating: 2
 
SC

Does PECOTA consider playing time when calculating rate stats? That is, some players (I'm thinking particularly younger players) can be expected to perform better/worse with 500 PA than with 200.

Feb 26, 2010 10:18 AM
rating: -1
 
misterdelaware

Mariano Rivera is still showing up as human in the latest projections. I assume this will be worked out later.

Feb 26, 2010 12:17 PM
rating: 9
 
PaulieNeu

Awesome

Feb 26, 2010 13:49 PM
rating: 0
 
evo34

When are the PECOTA cards really coming out?

Feb 26, 2010 14:03 PM
rating: 4
 
fawcettb

Excuse me for not knowing, but what the hell is the SS/Sim and where do I find it? (I assume that "SS" means "scoresheet?

Feb 26, 2010 14:56 PM
rating: -2
 
BP staff member Dave Pease
BP staff
(2)

http://www.baseballprospectus.com/glossary/index.php?search=sssim

Feb 26, 2010 15:55 PM
 
jdouge
(968)

Very glad to see the update and appreciate the explanation!

Feb 26, 2010 17:18 PM
rating: 0
 
acardonick

Sorry, but Upside is still goofy. As I said in an earlier thread, in 2008 there were 17 hitters w/Upside > 200 (don't have the 2009 numbers). The last run reduced the 2010 number from 342 to 122, so progress but still way too high (unless there has been some significant methodology change). At least the order of the players makes much more sense, though.

Drew

Feb 26, 2010 18:21 PM
rating: 2
 
ccseverson

Looks like the Upside number is now 10 years not the previous 7 so that's probably some of it.

Feb 28, 2010 01:44 AM
rating: 1
 
PeachPit

Maybe the answer is in the blog post and I just can't figure it out, but why are the PAs on the ML Hitters tab in the Pecota spreadsheet (playing time adjusted) always 5% lower than the PAs on the depth charts?

Feb 27, 2010 08:27 AM
rating: 0
 
John Carter

Can anybody, please, tell me why I am getting all zeros for the SSim column?

Feb 27, 2010 11:37 AM
rating: -1
 
vtadave

User error?

Feb 27, 2010 17:56 PM
rating: -1
 
Will

I'm glad there was a bug identified with the stolen bases. I'm in a 15 team league and PFM kept telling me Crawford was $20 more valuable than Pujols (speaking of - why the drop off in SLG and HR? I don't see it).

Feb 27, 2010 14:29 PM
rating: 0
 
jdouge
(968)

I liked the links on past cards to each player's DT cards and Baseball-Reference pages. That may already be in the works, but just in case it's not, a humble request . . .

Feb 27, 2010 22:08 PM
rating: 2
 
jdouge
(968)

Re-posting to the PECOTA *cards* thread :)

Feb 27, 2010 22:09 PM
rating: 0
 
charlie

Concerning SS/Sim, I read the Glossary definition, but am still not sure: is this a single, all-encompassing, stat for use in Scoresheet leagues, or does it just take the defensive rating and position eligibility into account? In other words, if I want to run PFM for a Scoresheet league, should SS/Sim be the only category selected, or an additional category, along with other metrics I'm interested in?

Feb 27, 2010 22:36 PM
rating: 0
 
Nathan J. Miller

FYI... in the 2-25 version of the weighted means spreadsheet...within the PT adjusted tab for pitchers, Jeff Manship, Edgar Osuna, and Wesley Wright are listed twice.

Mar 02, 2010 09:48 AM
rating: 0
 
You must be logged in to post a comment. Not a subscriber? Sign up today!

Baseball Prospectus Home  |  Terms of Service  |  Privacy Policy  |  Customer Service  |  Newsletter  |  Masthead  |  Contact Us

Baseball Prospectus Unfiltered is powered by WordPress.
Copyright © 1996-2013 Prospectus Entertainment Ventures, LLC.