There was some chatter on Twitter this morning about park factors, and Marc Normandin made the point that all of the most common park-adjusted offensive stats out there (TAv, OPS+, wRC+) use "generic" run-based park factors, not component park factors. (Baseball Prospectus does have component park factors which we use in PECOTA, and we use those to generate our run-based park factors, but we use run-based park factors in TAv.) Marc wondered if using one-size-fits-all rather than the component factors might lead to inaccuracies—after all, we know different parks affect different types of hitters in different ways, and our park adjustment methods here don't account for that. (Marc isn't the first to raise this point, by the way.) So why do we all do it this way?
The answer is not, as you might suspect, additional complexity—at least not for us here at Baseball Prospectus. We generate lines based on component park adjustments daily for in-season PECOTA anyway. No, this is a deliberate choice to park adjust TAv the way we do.
The important thing to remember is that we are not park adjusting a player's actual production, in TAv or FairRA+ or any of our other park adjusted metrics. We don't really care if a home run was wind-aided or wouldn't have been out without some friendly fence distances. Nor do we care if a warning track fly ball maybe would have been a home run somewhere else. Those are useful things to know if we want to project a player's skill, but TAv is not supposed to measure a player's skill, it's supposed to measure a player's value to his team. So our interest in park adjustment is not in seeing what a player would have done in a different park (I abjure those sorts of hypotheticals in value stats), but accounting for the different value of a run in different park contexts. We all know that Juan Pierre wasn't the sort of player who could really take advantage of Coors back when he was on the Rockies, for instance. But the average player coming in to face the Rockies could, and that changed the run environment Pierre played in. Even if he couldn't hit additional home runs in Coors, the park still affected the run environment he played in, and in a value stat that's important to account for.
So we don't park adjust a player's own stats at all, for TAv or any other value metric. What we do instead (and this may sound like a pedantic distinction, but it really isn't) is apply park adjustments to the production of the average player, which is what we use as as baseline. As an example, here's how we park-adjust TAv, going off a player's RPA (or runs per plate appearance, where in this case runs is absolute linear weights runs):
RPA/(lgRPA*parkadjust) * lgRPA
Then we convert a player's park-adjusted RPA to TAv. (Minus the conversion back to RPA by multiplying by the league average, that's the formula for the stat RPA_PLUS available in the custom sortables, by the way. As a techincal note, we do this on a per-PA basis and sum to get seasonal values, so each plate appearance is park adjusted based upon the park where it occurred in, not the average value of a player's—or team's, for that mattervparks.)
A handful of notes:
- Most of the time, this is still going to be pretty close to what you get using component park factors, and definitely closer to that than to using park-unadjusted stats.
- This is definitely going to be closer to a player's "true" park-neutral value than what you will get if you use component park factors poorly.
- You are using component park factors poorly.
I mean, yes, there is some value of "you" where that last one isn't true, but for most people it most certainly is. Most published component park factors are exceptionally noisy, and to make things worse they're multiplicative, which works well for run-based park factors but is bad for the sort of component park adjustments we're talking about here. And to top it all off, they don't handle skills independently (what's the point of using component park factors if you're just going to use PA as the denominator for everything?)
Which is to say, if you're trying to figure out the change in Josh Hamilton's home run production leaving Arlington, and you're using multiplication and division to do it, you're almost certainly going to overstate the change. (Resorting to a player's home-road splits is no better—that ignores sampling issues and home field advantage, which isn't just about wins and losses but about the underlying components as well.) A metric like TAv, while not intending to be a true talent metric, is probably going to do a better job of telling you about a player's park-neutral talent than those sorts of component park factors will.