Between The Numbers: Official Secrets

I've written before about why I dislike the error and its cousins, due to its subjective nature. But how much does it matter, especially over a long period of time? Is there a practical consequence to the subjectiveness of the error?

So here's what I did. I took a look at the rate of errors per balls fielded by infielders in all of a team's home games, including the visitors, from 2002 – 2009. I did the same for a team's road games. Each of those rates were then regressed to the mean, then the regressed home rate divided by the regressed road rate to produce park factors. Then I averaged all the one-year regressed park factors over the full time span, and here's what I got:

Team	Park Factor
Anaheim/LA	97
Arizona	96
Atlanta	106
Baltimore	95
Boston	100
Chicago (AL)	103
Chicago (NL)	102
Cincinnati	96
Cleveland	105
Colorado	102
Detroit	103
Florida	99
Houston	98
Kansas City	97
Los Angeles	98
Milwaukee	105
Minnesota	89
Montreal	96
New York (AL)	105
New York (NL)	103
Oakland	96
Philadelphia	98
Pittsburgh	98
St. Lous	102
San Diego	92
San Francisco	103
Seattle	99
Tampa Bay	103
Texas	100
Toronto	102
Washington	101

The standard deviation of the group over the eight-year time span was about four percent in either direction; in other words, teams had an error park factor of between 96 and 104 about 68% of the time.

Of course, this raises the question – how much of the difference between parks is scorers, and how much of it is actual changes in the way balls are hit at infielders? Bear in mind, for something to be an opportunity here, a fielder has to reach the ball. So the park effect would have to control how cleanly a fielder can field a ball.

Now, I have my own suspicions (and familiarity with my work would probably tell you what those suspicions are). But it's just a suspicion – I'm not sure we really know, and I'm not sure that we will ever know for certain.

But it does give one pause – or at least it should give one pause – when using errors to attribute value to players. Which is of course something we frequently do for hitters, pitchers and fielders right now. (Anyone who thinks we don't use errors for evaluating hitters, or that we shouldn't – the very act of considering an error an at-bat but not a time where he reached safely implicitly, if not explicitly, includes the error in an evaluation of a hitter's prowess.)

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

You need to be logged in to comment. Login or Subscribe

Mountainhawk

5/24

The average everyday player might reach base on an error, what, 5-10 times a season. So maybe 1-2 base hits are errors or vice versa depending on the park?

Doesn't this fall into an 'immaterial' category when doing estimates?

Reply to Mountainhawk

ashitaka

"I took a look at the rate of errors per balls fielded by infielders..."

Why just infielders? Am I missing something?

Reply to ashitaka

eighteen

Probably because a) most OF errors are scored hits since the ball doesn't touch the glove, and b) the number of OF errors actually scored errors (dropped balls) is too miniscule to bother with.

Reply to eighteen

cwyers

I should clarify that this isn't fielding percentage, but looks at all balls fielded at all. And, with the exceptions of home runs and foul balls, all batted balls have to be fielded by somebody. With the exception of infield singles and home runs (and ground rule doubles), every hit is fielded by an outfielder at some point.

So outfield errors (which, as you note, are much, much rarer - and probably a bit more cut-and-dried as far as the scoring is concerned) are a bit more difficult to tease out. You'd want to look at batted ball data to sort out the relationship, but then you're left puzzling out what bias is in the batted ball data versus the judgement of the official scorer.

Reply to cwyers

fantasyking

Dumb question: does a higher number mean more errors, or fewer errors, at a given ballpark?

Reply to fantasyking

High means more errors, low means fewer errors.

sensij

Not to get all nit-picky on you, but the standard deviation you quoted applies to data that is normally distributed. The data you've posted here is a small set, but actually looks bi-modal. Anyone want to take a stab at what that might mean?

Reply to sensij

That's interesting. I don't know if it's really bimodal or not - it could simply be a strongly right-skewed distribution that needs some smoothing. You're right that I haven't given enough data points to say for sure. I'll have to look at this a bit more and report back.

5/25

I don't want to say I looked at a larger data set, but I looked at a more granular one - I figured three-year park factors for each year in the data set. (Anyone interested can get the data here.)

And I don't see any reason to think that it's bimodal, looking at a histogram of the data. That isn't to say it's strictly normal, though - there's a pretty clear right skew to the data.

5/26

Thanks for reporting back on it! I was struck by the lack of 100's in the data in the article, but it is clear that a larger data set fills it in better. I always like reading about the reliability of the source data that is used in so many of the advanced metrics, and this was a nice look at the way errors are scored.

Between The Numbers: Official Secrets

Thank you for reading

Latest Articles

Next Man Up ’24: Week Three $

Fantasy Starting Pitching Planner ’24: Week Four $

speX ’24: Week Three $

Box Score Banter: Experiments in Takeout Slides B

Some Potential Answers for Pete Fairbanks $

Colin Wyers

Latest Articles

Next Man Up ’24: Week Three $

Fantasy Starting Pitching Planner ’24: Week Four $

speX ’24: Week Three $