I've written before about why I dislike the error and its cousins, due to its subjective nature. But how much does it matter, especially over a long period of time? Is there a practical consequence to the subjectiveness of the error?

So here's what I did. I took a look at the rate of errors per balls fielded by infielders in all of a team's home games, including the visitors, from 2002 – 2009. I did the same for a team's road games. Each of those rates were then regressed to the mean, then the regressed home rate divided by the regressed road rate to produce park factors. Then I averaged all the one-year regressed park factors over the full time span, and here's what I got:

Team Park Factor
Anaheim/LA 97
Arizona 96
Atlanta 106
Baltimore 95
Boston 100
Chicago (AL) 103
Chicago (NL) 102
Cincinnati 96
Cleveland 105
Colorado 102
Detroit 103
Florida 99
Houston 98
Kansas City 97
Los Angeles 98
Milwaukee 105
Minnesota 89
Montreal 96
New York (AL) 105
New York (NL) 103
Oakland 96
Philadelphia 98
Pittsburgh 98
St. Lous 102
San Diego 92
San Francisco 103
Seattle 99
Tampa Bay 103
Texas 100
Toronto 102
Washington 101

The standard deviation of the group over the eight-year time span was about four percent in either direction; in other words, teams had an error park factor of between 96 and 104 about 68% of the time.

Of course, this raises the question – how much of the difference between parks is scorers, and how much of it is actual changes in the way balls are hit at infielders? Bear in mind, for something to be an opportunity here, a fielder has to reach the ball. So the park effect would have to control how cleanly a fielder can field a ball.

Now, I have my own suspicions (and familiarity with my work would probably tell you what those suspicions are). But it's just a suspicion – I'm not sure we really know, and I'm not sure that we will ever know for certain.

But it does give one pause – or at least it should give one pause – when using errors to attribute value to players. Which is of course something we frequently do for hitters, pitchers and fielders right now. (Anyone who thinks we don't use errors for evaluating hitters, or that we shouldn't – the very act of considering an error an at-bat but not a time where he reached safely implicitly, if not explicitly, includes the error in an evaluation of a hitter's prowess.)

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
The average everyday player might reach base on an error, what, 5-10 times a season. So maybe 1-2 base hits are errors or vice versa depending on the park?

Doesn't this fall into an 'immaterial' category when doing estimates?
"I took a look at the rate of errors per balls fielded by infielders..."

Why just infielders? Am I missing something?
Probably because a) most OF errors are scored hits since the ball doesn't touch the glove, and b) the number of OF errors actually scored errors (dropped balls) is too miniscule to bother with.
I should clarify that this isn't fielding percentage, but looks at all balls fielded at all. And, with the exceptions of home runs and foul balls, all batted balls have to be fielded by somebody. With the exception of infield singles and home runs (and ground rule doubles), every hit is fielded by an outfielder at some point.

So outfield errors (which, as you note, are much, much rarer - and probably a bit more cut-and-dried as far as the scoring is concerned) are a bit more difficult to tease out. You'd want to look at batted ball data to sort out the relationship, but then you're left puzzling out what bias is in the batted ball data versus the judgement of the official scorer.
Dumb question: does a higher number mean more errors, or fewer errors, at a given ballpark?
High means more errors, low means fewer errors.

Not to get all nit-picky on you, but the standard deviation you quoted applies to data that is normally distributed. The data you've posted here is a small set, but actually looks bi-modal. Anyone want to take a stab at what that might mean?
That's interesting. I don't know if it's really bimodal or not - it could simply be a strongly right-skewed distribution that needs some smoothing. You're right that I haven't given enough data points to say for sure. I'll have to look at this a bit more and report back.
I don't want to say I looked at a larger data set, but I looked at a more granular one - I figured three-year park factors for each year in the data set. (Anyone interested can get the data here.)

And I don't see any reason to think that it's bimodal, looking at
a histogram of the data. That isn't to say it's strictly normal, though - there's a pretty clear right skew to the data.
Thanks for reporting back on it! I was struck by the lack of 100's in the data in the article, but it is clear that a larger data set fills it in better. I always like reading about the reliability of the source data that is used in so many of the advanced metrics, and this was a nice look at the way errors are scored.