There was some chatter on Twitter this morning about park factors, and Marc Normandin made the point that all of the most common park-adjusted offensive stats out there (TAv, OPS+, wRC+) use "generic" run-based park factors, not component park factors. (Baseball Prospectus does have component park factors which we use in PECOTA, and we use those to generate our run-based park factors, but we use run-based park factors in TAv.) Marc wondered if using one-size-fits-all rather than the component factors might lead to inaccuracies—after all, we know different parks affect different types of hitters in different ways, and our park adjustment methods here don't account for that. (Marc isn't the first to raise this point, by the way.) So why do we all do it this way?

The answer is not, as you might suspect, additional complexity—at least not for us here at Baseball Prospectus. We generate lines based on component park adjustments daily for in-season PECOTA anyway. No, this is a deliberate choice to park adjust TAv the way we do.

The important thing to remember is that we are not park adjusting a player's actual production, in TAv or FairRA+ or any of our other park adjusted metrics. We don't really care if a home run was wind-aided or wouldn't have been out without some friendly fence distances. Nor do we care if a warning track fly ball maybe would have been a home run somewhere else. Those are useful things to know if we want to project a player's skill, but TAv is not supposed to measure a player's skill, it's supposed to measure a player's value to his team. So our interest in park adjustment is not in seeing what a player would have done in a different park (I abjure those sorts of hypotheticals in value stats), but accounting for the different value of a run in different park contexts. We all know that Juan Pierre wasn't the sort of player who could really take advantage of Coors back when he was on the Rockies, for instance. But the average player coming in to face the Rockies could, and that changed the run environment Pierre played in. Even if he couldn't hit additional home runs in Coors, the park still affected the run environment he played in, and in a value stat that's important to account for.

So we don't park adjust a player's own stats at all, for TAv or any other value metric. What we do instead (and this may sound like a pedantic distinction, but it really isn't) is apply park adjustments to the production of the average player, which is what we use as as baseline. As an example, here's how we park-adjust TAv, going off a player's RPA (or runs per plate appearance, where in this case runs is absolute linear weights runs):

RPA/(lgRPA*parkadjust) * lgRPA

Then we convert a player's park-adjusted RPA to TAv. (Minus the conversion back to RPA by multiplying by the league average, that's the formula for the stat RPA_PLUS available in the custom sortables, by the way. As a techincal note, we do this on a per-PA basis and sum to get seasonal values, so each plate appearance is park adjusted based upon the park where it occurred in, not the average value of a player's—or team's, for that mattervparks.)

A handful of notes:

  • Most of the time, this is still going to be pretty close to what you get using component park factors, and definitely closer to that than to using park-unadjusted stats.
  • This is definitely going to be closer to a player's "true" park-neutral value than what you will get if you use component park factors poorly.
  • You are using component park factors poorly.

I mean, yes, there is some value of "you" where that last one isn't true, but for most people it most certainly is. Most published component park factors are exceptionally noisy, and to make things worse they're multiplicative, which works well for run-based park factors but is bad for the sort of component park adjustments we're talking about here. And to top it all off, they don't handle skills independently (what's the point of using component park factors if you're just going to use PA as the denominator for everything?)

Which is to say, if you're trying to figure out the change in Josh Hamilton's home run production leaving Arlington, and you're using multiplication and division to do it, you're almost certainly going to overstate the change. (Resorting to a player's home-road splits is no better—that ignores sampling issues and home field advantage, which isn't just about wins and losses but about the underlying components as well.) A metric like TAv, while not intending to be a true talent metric, is probably going to do a better job of telling you about a player's park-neutral talent than those sorts of component park factors will.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Colin, just want some clarification- tell me if I'm wrong- but is this what is meant by 'component' vs. 'run-based' park factors?:

component park factors: individual factors for singles, doubles, triples, HR, etc.
run-based: just a single factor based on runs

What about handedness of hitters- is this accounted for in either?
Great article, Colin. If you don't mind, a couple of math questions.

Why introduce lgRPA the way you do? Why not just divide by parkadjust? Mathematically, isn't it the same thing?

By absolute linear weight runs, do you mean runs based on zero instead of average? If you were to use linear weights centered on average, would you approach the math differently?
To the first point -- you're right. I do it the way I do because I have uses for RPA_PLUS in its own right, so I calculate RPA_PLUS and then I multiply that by lgRPA to get back to RPA scale to derive TAv. If you don't care about getting RPA_PLUS, you don't need to worry about it. (I also omitted the math for the league quality adjustment, which is part of RPA_PLUS and TAv but wasn't relevant to this article.)

To the second, yes. Actually, what I do is figure linear weights relative to average, and then do:


But you can do the same thing with linear weights baselined to zero. You can't apply multiplicative park adjustment straight to RAA, though, because there are cases where a below-average RAA should lead to a positive park-adjusted RAA and you simply won't get that result unless you're dealing with absolute runs.
thanks, Colin. Perfect. And great point about multiplying park factors to negative RAA. Have you thought much about additive park factors?
Never mind, I see your answer below. I like the distinction between multiplicative vs. additive park factors for runs vs. components, respectively. Makes intuitive sense, though part of me still wonders if additive is best for runs as well.
I tried digging up the old Book Blog thread on the subject but I couldn't find the one I was thinking of.

In terms of team level (or even player-level), it could be true that additive is still better than multiplicative for runs-based park factors. I would need to either find some old analysis or do some new analysis to satisfy myself on this point.

For what we're doing here, though, I don't think it matters. Because we're applying the park factor to the league average production instead of the player's own production, there should be very little difference between either method (there would be none, I should think, except for the way I'm applying the league quality adjustment).
Since we're talking technical stuff, I should note that you can't use park factors straight off Baseball Reference (or our BPF in the sortables, for that matter) the way I describe here. Those are run per out park factors, and to park adjust RPA_PLUS (and thus TAv) you need to apply it on a PA basis. I calculate separate park factors for PA-denominated stats, but if you have out-denominated park factors and need to apply them to RPA you can take the square root of the park factor and use that. I cannot remember where I first read about that, I think it was either Tango's blog or BtBS, so while I remember the method I can't tell you off the top of my head WHY it works.
OK, I'm confused: Why is using multiplication and division the wrong way to use park factors?

CAIRO projected Hamilton to hit 27 homers in Arlington in 2013. I'd look at the three year park factors for homers (1.26 for Arlington and .79 for Anaheim), take the 15 we'd project for Arlington and reduce it by about 5.5 HR's. And that's a lot, for sure.

What I can't figure out is why that's going to radically overstate the loss, assuming no park changes in the last three years (and assuming the 2011 outlier result in Arlington is properly factored in at full weight.) Are you asserting that it's wrong because even three-year park factors are too noisy? I'm genuinely baffled, and would be pleased to be unbaffled.
Let's assume that the park factors are "true," that is to say without noise. The premise still holds.

Think of it this way -- what a 1.26 park factor says is that the average player hits 1.26 times more home runs at that park than at other parks. So you divide by that and then multiply by the park factor for the new park. The implicit assumption behind that ends up being that the more HR you hit, the more affected you are by the change in parks. (Actually, I don't know your source -- is that home PF only or home-and-road PF?)

Consider two players, who in Arlington in 600 PA are expected to hit 30 and 35 home runs respectively. Now move them to a league average park:

35/1.26 = 27.8
30/1.26 = 23.8

That's a loss of 7.2 HR for the 35 HR guy, but 6.2 HR for the 30 HR guy. Is there any reason for us to think that the 35 HR guy is more aided by his park than the 30 HR guy? There really isn't. (If anything, my gut says the opposite is probably true -- I suspect the better you are at hitting HR, the more no-doubters you have.)
Another way to think about it is to assume that each batter/pitcher combination has some distribution of fly ball distances. What you are saying is that the shapes of those distributions are not identical from one pair of players to another, and/or they are not a shape that would lose area in the tail in perfect proportion to how far out one cuts off the tail by placing the OF fence there. Those seem like safe assumptions to me; the proportional, constant shape assumption is a reach.