In case you haven’t yet noticed, a trip to team depth charts and individual PECOTA cards now features what is commonly referred to as in-season projection updates. A fantastically useful addition to the site, the updated PECOTAs incorporate data accrued this season and, through a nifty little process that will be outlined below, blend with the pre-season projection in order to inform on expectations for a player moving forward. Keep in mind when reading about the in-season update process that the actual PECOTA cards for individual players have not been altered save for a brief table beneath the player’s picture with the update itself; the vast majority of the original card has remained intact.
Clay Davenport utilizes his translations to make the updates, and instead of providing you with a laundry list of random steps, let’s instead discuss the new system through an example… say, Joe Mauer, since we recently discussed his chances at hitting .400 (score +1 for Seidman in the self-pimping column!).
Prior to the season, Mauer had a weighted mean projection of .307/.388/.436, based on aspects such as his track record, his age, expected park factors for the league, expected league performance, and the list goes on. The first step to providing his in-season projection involves the retranslation of his pre-season PECOTA to the external factors inherent in 2009. For instance, partly due to Minnesota running a park factor of 1.06 as opposed to the expected 0.98, and partly due to the current offensive levels of the junior circuit, Mauer’s retranslation of the original PECOTA jumps to .313/.402/.457; as in, if we expected Minnesota to boast that specific park factor entering the season, Mauer’s weighted mean would have looked like the aforementioned triple-slash line, not the original line atop this paragraph.
The next step involves translating Mauer’s 2006-09 seasons relative to the 2009 specifications, or, in other words, what his lines in years past would have looked like had they sported similar leaguewide offensive levels and park factors. The translated numbers from 2006-09 are then weighted, decreasing back by one-half each season; 2009=1, 2008=0.5, 2007=0.25, 2006=0.125.
When the weights were run on the translated numbers, a weighted average of .339/.426/.538 results, which then has to be blended with the pre-season projection. The Twins had played 94 games, which accounts for 58 percent of their season, so the weighting would be 0.58*translated weighted average + 0.42*original but translated PECOTA. The end result is a .328/.416/.504 line. Based on what we have seen this year as well as what PECOTA saw in making the original projection, Mauer should hit right around .328/.416/.504 the rest of the season. The way it is set up, the more games played this season, the less weight the original PECOTA carries. From there, raw totals are derived from the playing time estimates in the updated PFM.
So, if you’re ever curious about what a player is projected to do from here on out, simply take a trip to the team depth charts or to his individual PECOTA page. In case anyone isn’t sure where to find it, here is a link to Joe Mauer’s page:
Beneath his picture is a big, blue bolded link titled Projected Playing Time. Beneath that is a short, 3-line or so table with his playing time percentage and rates/raw totals projected from here on out. This is where it can be found on the individual cards. Relative to the team depth charts, click on the link for Projected Playing Time and you will be transported to the team page, with playing time estimates and projections from here on out.
If I am reading this correctly, you are correcting a player's performance for the season so far by weighing in his performance is prior seaons (without taking career path into consideratio, i.e, it doesn't matter if he is closer to (or farther from) his expected peak now than in 2006)then projecting him to play at the level of his original PECOTA (translated) for the rest of the season. In other words, his performance so far this year doesn't change the estimate of his performance going forward. I can't see why this is the most logical approach. Wouldn't it make more sense to take his performance to date as is, translate his PECOTA for park and year and then adjust it to reflect actual performance this year relative to the projection and, finally, add the pro-rated results of the updated projection to his production so far?
You're slightly off course, which is likely skewing your conclusion. The current season DEFINITELY plays a large factor as it carries the most weight in the area currently accounting for 58%.
The part accounting for 58% is comprised of 1.0*translated 2009 performance, 0.5*translated 2008 performance, 0.25*translated 2007 performance, and 0.125*translated 2006 performance. And by translated, I mean that those seasonal lines are translated to what they would look like with the factors inherent in 2009.
That then carries the weight of team GP/162, with 1-(team GP/162) being the weight for the original, translated PECOTA. For Mauer, based on his 2006-09 translations of .339/.426/.538, and his translated pre-season PECOTA of .313/.402/.457, he is expected to hit .328/.416/.504 over the rest of the season.
But for someone like Justin Upton or Matt Weiters or Pat Burrell this doesn't make a lot of sense. I mean maybe it does, you'd have to test it.
In the case of weiters for example that lovely pecota projection obviously hasn't panned out, and there is a ton more data on him now, data that clearly diverges off the rocket like trajectory he was on until this season (I think you'd have to be a fool in retrospect to say pecota got a little fooled by sample size, maybe i'm out of bounds here, i grovel to nate silver). he's got an eqa of 230 or so and very little translated performance before that that gave him a super strong pecota. If he had started his career five months earlier and performed this way after being called up for two months last season his pecota projection this year would not be for a rosy 800 obp after he did 675.
Likewise with Upton he's on a much more certain track now for the rest of his career, that is outweighed by a conservative pecota forecast. Maybe he will only pick up another 8 vorp for the rest of the season, but I can't imagine that a rerun pecota would tell us that.
So long story short, can't we just rerun the comps?
Might it be helpful to list the updated projections for the full season alongside the rest-of-season projections? The way the stats are presented on PECOTA cards ("2009 Total"), it appears that Mauer's projection has him hitting 9 HR in 272 PA for all of 2009.
It would actually be better to show actual results atop the rest-of-season updates, and then perhaps the total result combining what has happened and what is expected to happen, to show that Mauer, with 15-16 HR, projected to hit 9 more, would finish at 24-25.
I really don't understand why you're using past seasons worth of performance data. All that information is already include in the PECOTA projection. Why aren't the only two inputs for rest-of-season 2009 pre-2009 PECOTAs and 2009 performance data?
Sky, the past performance is included in the original PECOTA as you said, however, the weighting changes and that is what the in-season estimates attempt to gauge. For instance, at the beginning of the season, Mauer was at .307/.388/.436, based on say 2008 weighted at X, 2007 at Y and 2006 at Z. We update the pre-season based on current factors, which doesn't alter the pre-season weights, but just updates one component that translates the original weighted result.
That is then weighted against the translation of 2006-09 data, with new weights, IE 2009=1.0, 2008=0.5, 2007=0.25 and 2006=0.125, so the past performance is incorporated from a weighting standpoint. It IS already included in the original PECOTA but this past data is re-weighted, with 2009 being the heaviest.
If you just did pre-season PECOTA and performance right now, you would essentially be over-weighting the current performance, saying that the 2006-08 performance is worth 42% with the 2009 worth 58%, whereas the current performance needs to be regressed to the prior track record a bit as a means of estimating true talent, which helps us avoid over-weighting the current performance or ignoring prior, re-weighted performance.
Basically, it boils down to the reconciling what we thought his talent level was prior to the season with what we know about the player now. This true talent level is somewhere in between the 2006-08 weighted data and the 2006-09 re-weighted data with 2009 being the heaviest, with the ultimate weighting reliant on how far we are into the season, so when we are 160 games in, Mauer's talent level will be his 2006-09 weighted and translated performance.
1. Will this be updated on a regular basis?
2. Is the projected record update to reflect the current standings plus expected performance rest of the way?
3. Pitching stats seem to be full season for games, innings and games started
4. Are the depth charts updated (ie. someone at BP thinks Ian Kennedy will get eight starts for the Yanks the rest of the way?)
1) Yes, definitely updated regularly, not a once per season sort of thing.
2) Yep, current record is the starting point here.
3) That is something we're working on, thanks for the catch.
4) The number of starts would be full season, which needs to be weighted by the amount of time left. In this case, around 40%, so that would be, what, 3 starts for Kennedy from here on out? Might not get there but he would be more likely to be called up than, say, Kei Igawa.
Why are his 2009 stats being translated? I thought you were using the 2009 environment, so shouldn't his stats be unchanged since he already plays in that environment?
I misspoke, apologies. The 2009 stats are being translated, however since it is the 2009 environment they will output practically the same results. The issue with Mauer's stats, which I'll update in the article, had more to do with the timing of the translation last run and the stats selected.
In case anyone isn't sure where to find it, here is a link to Joe Mauer's page - http://www.baseballprospectus.com/pecota/mauerjo01.php
Beneath his picture is a big, blue bolded link titled Projected Playing Time. Beneath that is a short, 3-line or so table with his playing time percentage and rates/raw totals projected from here on out.
This is where it can be found on the individual cards. Relative to the team depth charts, click on the link for Projected Playing Time and you will be transported to the team page, with playing time estimates and projections from here on out.
It would be much better if that blue bolded title read "Projected Remaining Playing Time in Season". This would appropriately imply from now til end of season.
Also, unless this is continuously updated (i.e., on daily basis), it would be appropriate to label it "Projected Playing Time from {{Date}} to End of Season."
I agree. I've been trying to make sense of this new feature, but couldn't figure out which part of the page was the new feature in the first place. Something like "Projected PLaying Time from {{Date}} to End of Season," would be a useful heading.
It looks like pitchers' individual pages have incorrect GS and IP listed in the projections. For example, his page claims that Tim Lincecum will have 30 more starts and 205 more IP, but all the other numbers look about right (28 BB, 104 K, 7-3 record).
I addressed that above - it's something that needs fixing. The GS and IP are full-season whereas everything else is rest of season. I'd imagine it will be addressed shortly.
Thanks Eric. This is a great addition to PECOTA and BP more generally.
So the new numbers that we see are PECOTA's projections of what the players will do from this point on, NOT what their total, end of the season numbers will be? (For instance, Pujols is projected to hit 22 homers, according to PECOTA, meaning that he'll hit 22 MORE homers this year, not, of course, a total of 22).
Also, is there a way to calculate new UPSIDE scores mid-season?
Correct, so you would add 22 to whatever Pujols currently has and that's what his end of season numbers are expected to be. An addition I'd like to implement is current numbers atop rest of season projections, with an updated end of season total beneath.
I love it. It's very helpful. Thanks. One addition to PFM which I have been pining for since its inception is what I call an "all things being equal" option, which allows you to compare players on a per game basis. For example, Marco Scutaro is valued highly in the PFM ($4.92 for my league parameters) due more than anything to his being projected for a ton of plate appearances. But if you have Scutaro and, say Casey Blake ($4.60) on your fantasy team, and you KNOW they are both in that day's lineup, in spite of the higher value attributed to Scutaro, you could do the math and realize Blake is more valuable on a game by game basis and therefore put him into your utility slot for that night over Scutaro. Now, this is not too difficult to formulate on your own, but it does in many cases require finding a common denominator of plate appearances to determine who is better "all things being equal." I find myself doing these little equations all the time. I would LOVE an option which either simply shows a players value on a per game basis, or allows for one to place all players with the same number of plate appearances to see how they would compare on that level.
I actually didn't even see it, apologies. Unfortunately I'm not much of a fantasy guy though I can certainly see the utility to be gained by such an addition.
Thanks, this is a great addition. It doesn't appear that the new forecasts impact the 7-yr projections. Any plans to have them do so, or will those continue to be a once-per-year only project?
I've been looking at these updated projections and I have to say I'm really surprised at some of them. I've compared them to the pre-season PECOTAs and I find it hard to believe that some of these players can be expected to do as well as PECOTA is projecting based on a half-season worth of data added to what they did previously.
For example, Jason Bartlett (pre-season PECOTAs from the 3/21 PFM)--
Pre-season: .257/.311/.346, .657 OPS
Rest-of-season: .307/.356/.440, .796 OPS (difference of .139)
Joe Mauer
Pre-season: .307/.388/.436, .824 OPS
Rest-of-season: .328/.416/.504, .920 OPS (difference of .096)
In the case of Jason Bartlett, he came into this season with 1702 PA and a triple slash line of .276/.337/.362, for a .699 OPS. Is it realistic to think that 300 PA with a .903 OPS has changed his projection that much, that from this point to the end of the season he's a .796 OPS hitter? In fact, I looked at what he's done since June 1--.288/.344/.396 or since July 1--.232/.338/.321, and it looks like Jason Bartlett has gone back to being Jason Bartlett after a hot first couple months.
These are the extreme examples in a favorable direction, but there are just as extreme examples in the negative direction (Kelly Johnson, .839 OPS to .707 OPS), and the pitching projections have similar extremes.
There is a difference between in-season and full-season. For instance, full season weights all the full season data with comparables and things like, while in-season does its best "guesswork" based on whatever we input, and since 2009 is weighed the heaviest it is going to be skewed towards the number this year. Just because Greinke has a 2.97 ERA projected down the stretch doesn't mean he goes into 2010 with a 2.97 PECOTA.
I certainly understand and can appreciate the difficulty in doing in-season projections but maybe this shows that more regression is needed?
If Greinke's projected 2010 ERA is going to be off by more than let's say .20 from his end-of-season 2009 in-season projection then I think that that would suggest that the current year's stats are too heavily weighed. Maybe someone could look into this and see if maybe the current year's performance is weighed too heavily?
One thing you're missing when you're making the comparisons is something clearly stated in the post, being that the pre-season PECOTAs being used in the in-season projections ARE NOT the original pre-season PECOTAs.
What we do is translate the original PECOTA to 2009 specifications, including the offensive levels of the league and the park factors and such. It may not account for all of the gap you found in certain players but it is something to keep in mind; for instance, Mauer's original projection called for an adjustment from a .388 OBP/.436 SLG to a .402 OBP/.457 SLG... is it really that much of a stretch to think someone projected to be at .402/.457 (what his projection would have been if we suspected Minnesota would have a higher park factor, etc) would be projected to go .416/.504 from here on out, especially given the incredible first half?
Shouldn't the in-season projection be run the same as the pre-season, except that it's done now? Instead of doing two projections at different times and then blending the results, why not run the projection each day or each week and post the results.
The only issue is how to weight each season. What I was privately describing to you was a rolling or progressive weight, which as I understood your explanation of PECOTA should mathematically be the same. As we go from 0 to 100% of 2009, the weights for 2006, 2007 and 2008 should progressively decrease until the end of the current 2009 season, when 2006 will reach 0% and it effectively becomes the 2010 pre-season.
Glad to see this unfiltered post finally, as I had been scratching my head over the sudden stark differences on the Pecota cards between top-of-page projections and the data on the rest of the page! When you get a chance, it might be useful to add some text to your boilerplate Pecota template to explain what the first set of numbers are under the player picture?
As for the in-season projections themselves I'm still trying to get my head around the weightings. How much empirical work went into deciding those? I recall a piece here several years ago that basically concluded that April numbers had roughly 25% of the predictive value that previous seasons did for a player going forward, but that's about all the formal work I've seen here on the in-season question.
My first impression is that you're weighting 2009 in a bit too heavily - though I'm not even sure yet that I understand the description above (not a comment on the writing clarity, it's more a case of it all lending itself better to an XL formula than sentences).
My other (gut) feeling is that you're taking "new" park effects for 2009 far too seriously (in retranslating things). These are pretty voluble (er, and variable) between full seasons as it is, before even getting into the smaller sample size of a partial season.
All of that said, it's very nice to see BP venturing into this realm - even if it turns out that things need tweaking from this first attempt. So, just wanted you to hear "thanks" from another reader. Some new #s to puzzle over!
Eric,
Thanks for your responses in the comments. I do appreciate having in-season PECOTAs and I think it's a great addition even if I'm not convinced that they are as accurate as they could be. You mentioned above the offensive levels of the league and park factors as being part of the translated PECOTA projections. I looked at the offensive levels of the leagues and offense is basically the same in the AL and slightly down in the NL this year:
2008 AL .268/.336/.420, .756 OPS, 4.78 R/G
2009 AL .263/.334/.425, .759 OPS, 4.77 R/G
So would that suggest that park factors play a bigger part in some of the projections that seem a little extreme? And would those be solely this year's park factors, so based on a half-season's worth of data?
I agree with many other subscribers. While I like the idea here (of making in-season projections), your (Eric's) methodology is arbitrary at best and seriously flawed at worst.
First, recalculating everything based on 2-month park factors is chasing noise, and surely adding nothing but increased average error to your projections. There is no evidence whatsoever that if a first-half park factor differs from what was expected in the pre-season, the second-half park factor will be any closer to the first-half one. I.e., the sample is way too small for partial season park factors to be predictive. Frankly, you should already know this.
So the main concept behind in-season projections is to figure out how to best use any *new* information the current season has provided. Instead you arbitrarily add adjusted rate stats from past seasons to the equation. PECOTA is specifically designed to replace the method of using past stats directly to predict the future. You are basically throwing that out the window and saying that past rate stats -- as long as they are adjusted -- are more useful than PECOTA alone for predicting the current season. Where is the evidence for this? And if there is any, then why don't your pre-season projections use adjusted past rate stats to create the projections, rather than trusting PECOTA to handle all interpretation of the past on its own?
In short, you have a created an arbitrary hodge-podge of a formula to project rest-of-season stats. It's not even clear what stats you are using. Are you using this season's raw ERA as an input? This season's raw BA as an input? I should hope not, as these stats are not predictive.
My hope is that you or someone at BP will do the research to figure out the optimal choice and weighting of in-season stats to use to come up with the most accurate rest of season projections. Whatever it is [and I have done some research on this topic], I am pretty confident it will differ significantly from what you threw together this season.
How often will these updates be run? Is this a one time only in season update or will they be recalculated on a weekly or daily basis?
Why did this question not get answered? Is the answer already printed somewhere else? This seems like a good question...