August 14, 2001
The Daily Prospectus
Park EffectsYesterday's article on Wrigley Field, in which I pointed out that Wrigley has not been a great hitters' park for the past two seasons, generated some e-mail questions about park effects. I get a handful of these whenever I mention that this park or that park is good or bad for run scoring, but I've never addressed them in a column.
Paraphrasing, the questions look something like this:
You say that Wrigley Field looks more like a pitchers' park. Couldn't that be because the Cubs have had such lousy offenses the past few years, while their pitching has improved? If you put the Indians in Wrigley Field, wouldn't it look more like a hitters' park?
The short answer is, "no."
Before I go further, let me say that the methodology I'm about to explain is how park effects have been calculated up until now. At the end of the column, I'll address some of the problems we're running into this season.
Park effects are traditionally calculated by dividing the number of events by both teams in all of a team's home games by the number of those events in a team's road games. For example, in 1996, there were 813 runs scored, total, in Pirates home games. There were 796 runs scored in Pirates road games. Dividing the first number by the second yields 1.05, which we state as a percentage. "In 1996, Three Rivers Stadium increased run scoring by 5%."
Calculating park effects this way cancels out the impact of an individual team's characteristics, because their performance counts just as much as that of the rest of the league. That the Braves have good pitching won't distort the numbers of Turner Field, because they carry that good pitching with them on the road. If the Indians have a great offense and no pitching, they'll have that for 81 home games and 81 road games, and against all the same opponents.
You can do this for any statistic. In 2000, 137 home runs were hit in County Stadium by the Brewers and their National League opponents (see below for more on that last clause). In Brewers road games in NL parks, 181 home runs were hit. County Stadium's final-season park effect on home runs was -28%.
I guess I should explain all of these caveats. Up until recently, each park saw the same distribution of teams each season, making this method fairly reliable. Park factors compared the same players playing in the same parks throughout the league, which is why doing the factors this way worked. The park was the variable, not the players.
Well, the introduction of interleague play changed that. The distribution of teams playing in, say, Wrigley Field, was different from the distribution of teams playing at Shea Stadium. Over the past few seasons, STATS, Inc. calculated its park factors by ignoring interleague games, a fair solution.
This season, though, the unbalanced schedule has completely changed the distribution of teams playing in various parks. Park factors can be distorted, because the mix of teams playing in one park is nothing like the mix of teams playing in another. Simply dividing runs scored in a team's home games by runs scored in a team's road games is dangerous, because teams aren't playing home-and-home series, and are playing more games within their division than outside of it. A different overall set of players is responsible for the runs scored in every major-league park.
Given the importance of park factors in much of the work we do, it's essential that we account for this in evaluating parks, and by extension, players. Clay Davenport's park factors, for example, are actually weighted by games played in each individual park, something he's done for years. (The unbalanced schedules used by many minor leagues caused him to develop his methodology.)
The unbalanced schedule brings other issues into play: individuals will play in an unequal distribution of parks. NL East pitchers, traveling to Shea Stadium, Pro Player Park, and Turner Field, appear to have an easier time of it than NL Central pitchers taking trips to Enron, Cinergy, and the new parks in Milwaukee and Pittsburgh. How can we account for this when we compare their performances?
Beyond parks, what about facing an unequal distribution of teams? It seems clear that, say, your average NL Central pitcher will have faced a easier slate of teams than your average NL West pitcher in 2001. Is this a significant factor, and can we account for it in player and team evaluation?
One thing is certain, though, as people a hell of a lot smarter than me work on optimizing park-effect calculations, the systems they devise will all strive for the original goal: isolating the effect of a park on performance, independent of the team that plays there.
Joe Sheehan is an author of Baseball Prospectus. You can contact him by clicking here.