June 6, 2011
In-season PECOTA updates
We’re going to be pushing these updated forcasts into the Depth Charts and PFM (as well as the tops of the player cards) on a daily basis. The last updated fields in the DC and PFM will still only change when a player’s playing time has been manually tweaked by us. (If you come across any player news or have an update for our depth charts, you can let our Fantasy crew know at our comments page, sending an e-mail to firstname.lastname@example.org or sending @BProDepthCharts a message on Twitter.)
I’m sure you’re all more interested in the PECOTA aspects of things, and I’ll be going into more detail on that in just a moment, but I do want to emphasize that we’ve shifted baselines as well. If you see a pitcher with a lower ERA forecasted than what you’ve seen in the past, but he hasn’t pitched well to date, that’s because we’re expecting all pitchers to have a lower ERA than we would have prior to the start of the season. The change in baselines is not necessarily meant to reflect our expectations of what the rest of the season will be in terms of offensive levels (as it gets warmer, we should see the baselines rise), but to facilitate easier comparison between a player’s season-to-date performance and his rest-of season projection.
Now, as to the PECOTA updates: We are not rerunning the entire PECOTA process on a daily basis. First off, that would simply be impractical; by the time we got done the next day’s stats would already be waiting for us. Secondly, it would be the wrong tool for the job. Much of the computational horsepower behind PECOTA is spent figuring out how a player will change with age. The effects of age between now and late August are minimal enough to be ignored, and the aging process used to figure a player’s aging between seasons would be very ill-suited to help us capture them anyway.
Instead, we are taking a player’s season-to-date numbers and, in effect, “regressing” them toward the pre-season PECOTA forecast. The weighting is determined by two things: (1) a player’s playing time so far this season and (2) the reliability of a player’s preseason forecast. The more a player plays this season, the more the rest-of-season forecast can move, but at the same time, the forecast for a rookie is more likely to move than that of an established veteran.
It’s a good question. Let’s say this up front: Bautista is very nearly a singular case in the history of baseball, insofar as his transformation from journeyman to premiere hitter. It is possible that a model like PECOTA, based on historic data, is having a hard time coping with a player as unique as Bautista. The trouble is that since Bautista is unique, it is impossible for us to test this proposition any other way than to just let Bautista play and see what he does next. But the updated PECOTA is by nature conservative, not just for Bautista but for all hitters. The reason this clashes with our expectations is because of a phenomenon called recency bias. Humans have a tendency to overweight more recent information at the expense of older information. Oone of the big benefits of using a forecasting system like PECOTA is that it forces us to confront our recency bias and to account for all of the information we know about a player.
The next question I expect is, "So why does Fangraphs have a much higher rest of season projection for Bautista than you do?” My answer is simple: they are wrong. Since this is obviously a statement from interest, I shall explain why this is the case:
Fangraphs uses a stat called wOBA as their all-encompassing batting rate; conceptually it and TAv are very similar. For our purposes, the main difference between them is that wOBA is baselined to the OBP scale rather than the batting average scale. (In Fangraph’s implementation, it is reconciled to the league OBP for that season, unlike TAv where the average is held constant over time.) Prior to the season, ZiPS (the projection system designed by Dan Szymborski) projected Bautista to have a .381 wOBA. This is not too far from where PECOTA had him; depending on how you want to handle converting between wOBA and TAv, these could be identical forecasts in terms of overall batting productivity.
Since the start of the season, Bautista has hit for a .516 wOBA (by Fangraph’s reckoning; other sites such as Statcorner figure wOBA slightly differently and thus come to different results) in 235 plate appearances. That gives Bautista a projected .415 wOBA for the rest of the season, equivalent to a TAv somewhere around .330 (depending on the assumed OBP for the league rest of season), significantly higher that what rest-of-season PECOTA says. If we were to take a weighted average of his preseason forecast and his season-to-date performance, in order for the numbers to equal his rest-of-season projection you'd have to treat them as worth 698 plate appearances, right around the number of PAs Bautista had in 2010 alone, whereas a projection that took into account the previous three seasons would be closer to 1500 PAs. ZiPS is underweighing Bautista's preseason projection in favor of his most recent performance. If the point of a projection system is to help overcome recency bias, this kind of a rest-of-season forecast helps less than it hurts--instead of combatting recency bias, it reinforces it.
And this is not an issue related to Bautista’s singular nature; let's look at the top twenty players in terms of absolute change between the preseason projection and the current rest-of-season forecast:
Now, looking at in-season PECOTA:
The ZiPS projections, first of all, show a lot more movement, equivalent to .028 points of TAv at its most extreme, compared to .016 for PECOTA. Saunders, in fact, is the only player on the PECOTA list with a larger change than the lowest player on the ZiPS list. The next notable thing is that the players on the ZiPS list seem to be much more likely to be established veterans, while the PECOTA list leans much heavily towards rookie players. Veterans, as a rule, should be less amenable to projection changes than younger, inexperienced players - when you have three full seasons of a guy in the majors, it should take more information to change your mind than it should for someone whose projection is based on less than a full MLB season and some translated minor league data; these lists show PECOTA behaving that way but not ZiPS.
The next question I anticipate is, when will these rest-of-season forecasts be available outside of the PFM? Right now I am working on incorporating the rest-of-season forecasts into the rest of our PECOTA offerings, including the 10-year forecasts, which I anticipate being able to debut sometime next week. We will also be offering updated in-season numbers for players who are not included on the depth charts in the very near future.