BP360 is Back! One low price for a: BP subscription, 2022 Annual, 2022 Futures Guide, choice of shirt

Yesterday I claimed that the rest-of-season projections at Fangraphs are wrong. Again, I am well aware this is a statement from interest, as I made this claim as I was announcing our own rest-of-season PECOTA projections.  Some people were skeptical of my illustrations, so I’ll go into the weeds here and explain why I think the methodology is incorrect, as opposed to simply disagreeable. First, I’ll lay out the methodology in neutral terms, then I will provide my own commentary.

What I’ve done is recreate at least a portion of the methodology used to compute rest-of-season ZiPS, as based on this spreadsheet, particularly, the portion having to do with HR. (If you want to look at the methods yourself, just download that spreadsheet, and copy the batter worksheet into a blank sheet so you can unhide the locked columns in the sheet.) First, you need to take the number of games a player’s team has played, and divide by 162. This is called G% in the original spreadsheet. For clarity I will call this G_RT.

Now you take that, along with a player’s preseason projected HR (which I’ll abbreviate ProjHR) and his season to date HR. To figure his projected rest-of-season HR according to ZiPS, you do this:

Formula for RoS ZiPS

The first term of this is simply a weighted average, where a player’s preseason and season-to-date HR totals are averaged based upon weights of 11 and a value that floats along with team games. After 60 games, the weight is about 3, after 120 games the weight is about 6, after a full season it is of course 8. Note that these are total home runs, not home run rates. The second term simply prorates the weighted average out to the remaining games of the season.

The method in the ZiPS rest-of-season spreadsheet uses park adjusted data in this step, and then converts to a player’s particular park; I did not have access to the specific park factors being used. And as a shortcut, I used a constant value of G_RT for all players, rather than compute the correct value for each individual player.

And yet, even without those steps, I get a very good agreement with the values at Fangraphs – using the same data as I did for yesterday’s post, I come up with 23.09 projected HR for Bautista; the listed total with that data was 23. If I do this for all players, I get an average absolute error of .29; if I round the projected HR totals I did to whole numbers (as is done for the values I downloaded from Fangraphs), I get an average absolute error of .19. (If I limit myself to players with at least 100 projected AB rest-of-season, I get an average error of .19 as well.) So even with excluding the park factors, I get a very good agreement with the values currently being published; if Fangraphs has changed the methodology originally used in the spreadsheet I have, it’s been an extremely modest change.

Everything is computed by that same formula: AB, H, 2B, 3B, BB, SO, etc. And once you’ve done that, you have rest-of-season ZiPS.

Now, my thoughts: First, the weight for a player’s previous seasons is based on two factors–a constant of 11, and a player’s projected playing time. There is going to be a modest correlation between a player’s projected playing time and his previous playing time, of course, so in this sense I suppose there is some accounting for the reliability of a past projection, but it seems to occur almost entirely by accident. And for young players who are expected to hold down a job for a whole season (your Jason Heyward types, for instance) it treats those forecasts exactly the same as it does those for an experienced veteran with the same expected playing time.

Secondly, it treats all events the same way. If you take two players with the same hit totals and at-bats, one with a lot of HR and one with a lot of singles, the rest-of-season forecast at Fangraphs will treat those exactly the same when it comes to projecting their batting average going foward. The rest-of-season forecast also does not distinguish between a player whose low batting average is driven by a low BABIP, versus one whose low BABIP is driven by a high strikeout rate. And it uses the same arbitrary 11 and 8 weights for HR as it does for triples.

Writing at ESPN today (how serendipitious!) Dan Szymborski, the creator of ZiPS and the in-season projection methodology, says:

Not all bad seasons are created alike, though. For example, changes in walk and strikeout rate are far more likely to be retained going forward than changes in batting average on balls in play (BABIP). A homer outage for a home run hitter is a greater cause of concern than a singles hitter having a single outage.

And he’s absolutely right; it’s just that the rest-of-season ZiPS projections don’t know those things. It treats changes in walks rates just the same as changes in hit rates.

The pitcher projections are essentially the same as the batter projections. There is a reduced set of stats to project, as one might expect,  but ERA, has some peculiarities of its own. A component ERA is used in place of standard ERA for projecting future performance, which would be comforting except:

  1. The component ERA involved includes H/9 as one of its terms (along with SO/9, BB/9 and HR/9); basic DIPS theory strongly suggests that accuracy would be improved by using an ERA estimator that does not include hits on balls in play. (The use of H/9 instead of something like BABIP also substantially shrinks the coefficient of the strikeout, meaning that the greater impact of strikeouts compared to outs on BIP for predictive purposes is being largely ignored.)
  2. The component ERA is computed twice, once before and once after park-adjusting the four inputs. After computing the park-adjusted component ERA, rest of season ZiPS then adds the difference between the unadjusted component ERA and observed ERA to the adjusted component ERA. That probably destroys any benefit you gain from using component ERA instead of just using observed ERA, especially since you’re using a version of component ERA that includes hits allowed.
  3. The same 11/8 ratio we saw for batting stats is used for the pitching projections as well, including ERA.

Tthe pitching methodology means that a pitcher whose peripherals suggest he’s pitching in line with his preseason projection can still see his numbers change significantly because of ball in play outcomes, or bases empty/men on splits, or relievers allowing a greater or lesser number of inherited runners to score.

Am I suggesting you shouldn’t use the rest-of-season ZiPS projections? Of course I am; but that is clearly a biased statement. So let me suggest instead, when looking at a player’s rest of season ZiPS forecast, ask yourself how much of the change is due to the player’s ability being better measured, and how much is due to the peculiarities in how the rest-of-season adjustments are being done.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
I was one of the people who questioned you yesterday, and think this is a great follow up. tx.
Me too. As expected, I don't really understand this article, but that is an issue with my math skills, not the writing or methodology. Appreciate you going "into the weeds" to back up your statement.
Thank for the follow-up Colin. This answered my specific question.
OK, I'll agree that I wouldn't use the ZiPS method, for many of the reasons you discuss.

Can you give us more about PECOTA's method, to the level of detail that you examined ZiPS? This is what you wrote in the previous article:

"Instead, we are taking a player’s season-to-date numbers and, in effect, “regressing” them toward the pre-season PECOTA forecast. The weighting is determined by two things: (1) a player’s playing time so far this season and (2) the reliability of a player’s preseason forecast. The more a player plays this season, the more the rest-of-season forecast can move, but at the same time, the forecast for a rookie is more likely to move than that of an established veteran."

I also am more interested in what PECOTA is doing *right* than what Fangraphs is doing wrong. I.e., what weighting per stat is in-season PECOTA using and more importantly, what was the method used to arrive at these weightings?
(Cross-posted from my blog.)

This is a pretty simple issue as to what to do.

Suppose you have the weights for the past seasons as 5, 4, 3, and a regression toward the mean weight of 2.

You multiply each weight by the number of PA (with the regression part being 600 PA). That gives you the effective number of weight for your pre-season forecast.

Your in-season forecast would get a weight of 6 or 7.

That's it. That's pretty much all you have to do (with the understanding that weighting by component is better).

So, let's take Bautista. His PAs:
2011: 249 x 7
2010: 683 x 5
2009: 404 x 4
2008: 424 x 3
Regr: 600 x 2

So, before 2011, you add up those PA (meaning 2010, 2009, 2008, regression), and you get a PA weight of 7503.

If you include 2011, your total weight is 9246.

Therefore, the pre-season forecast will have a weight of 7503/9246 = 81%.

Guy is saying that ZiPS is using 78% / 22%? Doesn't seem outlandish considering I'd use 81% / 19%.

By using 78% / 22%, that means that if you start with the 2010 and earlier weights (7503), then that implies a total amount of weight of 7503/.78 = 9619. That means 2011 is getting a weight of 9619 - 7503 = 2116.

Given that he has 249 PA, that means he's counting 2011 as 2116 / 249 = 8.5.

It is an overweight. I'm not sure I'd count it as a severe overweight. That would make the 2010 weight as 60% of the 2011 weight. That's fairly defensible.

I use 80%, but I think best-testing by Brian Cartwright has shown using 70%.

I don't think anyone is questioning how the theory of weighting by PA/IP works, or how one would implement it. The question is what are the optimal weights to use to provide the most accurate in-season forecasts? It's an interesting problem and one I would like to hear from Colin on.
I've already provided that in the past. The weights is as follows:

.9994^daysAgo for hitting
.9990^daysAgo for pitching

Pretty simple. You can try to fine-tune it with better empirical testing, and changing the weighting by the components.

But, this is not such a big unknown problem. I've published the above several times over the past several years.
If the optimal weighting scheme is that well known and accepted, I'm not totally sure why none of the three in-season projection systems (ZIPs, Oliver and PECOTA) are using it. Or is PECOTA now using that weighting scheme?

Does "daysAgo" refer to calendar days, or days of a season?
daysAgo is human days, because that's how a player ages, based on real-time.
That's if you assume that whatever phenomenon causing past data to lose its predictive value over time is generated solely by a ticking clock. The alternative hypothesis is that a player is more likely to implement/undergo fundamental changes in the off-season (new delivery/stance, fitness change, nagging injury healing, etc.), and so the existence of the off-season would cause more information value to evaporate from the past performance than would the passage of the same amount of time within a continuous season.

If the theory is true, April's performance would be more predicitve of same-season October's performance than October's performance would be predictive of the following April's.

Hence, the question.
Also, can you post a link to your study?
The only study of Tango's on the matter that I've seen is this:

He can hopefully provide us with a link to any updated material.
What I'm doing is treating the preseason forecast as a set of Bayesian priors, similar to what Nate Silver has done in the past:

A player's batting line is broken down into a set of independent binomial components, and I compute the reliability of each component in terms of variance. Observations is the main criteria, but I'm weighting past observations down as less, and I'm also down-weighting minor league playing time. There's also a component based on the reliability of each skill - strikeout rates are more stable than triple rates, for instance.

The reliability of current season stats is computed similarly (you only need to use the playing time component here, you don't need to look at component reliability twice) and the two distributions are combined to come up with a new distribution.
I'm well aware of Tango's daily decay weighting system, where days ago refers to calendar days. The 1.0/0.8/0.6 seasonal weights for batters, and 1.0/0.7/0.5 for pitchers conforms to those numbers.

The problem is writing code to implement it. The formula for the weights isn't difficult, but each player's performance has to be analyzed game by game, instead of season by season. It can then only be used for leagues where play by play is available. This also bloats up the data tables, and presumably the processing time as well, creating and then analyzing those tables.

However, I am considering it for Oliver. On the good side, analyzing game by game would allow cutting off the sample in the middle of a past season, either those games beyond a certain number of days in the past, or after a sufficient sample size is reached (or both).
Discussion of this article at "The Book Blog"
Here you go:

Elapsed Days