As regular readers may have gathered by now, I spend a lot of time thinking about the validity of the data that’s collected about baseball. The bee in my bonnet these days is really batted-ball data.

We can refer to one of two things when we talk about batted-ball data—trajectory data and location data. Trajectory data describes how the ball travels—typically subdivided into grounders, fly balls, line drives, and popups (also called infield fly balls). Location data typically describes where the ball went—distance and vector, basically.

For now, let’s focus on trajectory data. It’s easier to get our hands on, and easier to condense into a single quantity for study. Simply put—how accurate is the trajectory data we have? And how might the data be biasing our conclusions—about pitchers, hitters, and fielders?

The Story So Far

This isn’t the first time I’ve studied potential problems with batted-ball scoring. A few months ago, I looked at data from Retrosheet, which is produced by the Gameday stringers working for MLB Advanced Media. Those scorers sit in the press box and chart the games on a computer. What I found was a modest correlation between the height of the press box and the line-drive rate reported.

It’s an interesting finding, and it raises some question about the validity of metrics based on batted-ball data. But it’s also a very tentative finding. So, in the past few months, I’ve been wracking my brain trying to come up with another way to study the issue.

Well, it struck me. Here at Baseball Prospectus, we publish batted-ball rates for teams, both for batting and pitching. Fangraphs also reports similar figures on offense and defense.

Methodologically, these reports are practically identical—the only difference is that Fangraphs includes infield flies in fly balls (along with breaking them out as a separate category), while BP does not. But the results are strikingly different. Why?

Because here, we use the Gameday/MLBAM data. At Fangraphs, they use data provided by Baseball Information Solutions. BIS, rather than placing a scorer in the press box, uses video feeds to chart batted balls.

So we have two sources for ostensibly the same data, collected by two distinct data providers using two distinct methods. What can we learn from comparing the two data sets?

Running the Averages

The first thing I did was to subtract infield flies from fly balls in the BIS data, so that we were comparing apples and apples here. The first thing to note is that the two data sources seem to regularly disagree even on the average rates:

2003 43.3% 46.1% 22.6% 27.4% 22.5% 18.5% 11.6% 8.0%
2004 44.2% 45.6% 25.3% 27.8% 18.9% 18.5% 11.6% 8.2%
2005 44.2% 45.6% 23.8% 27.9% 20.9% 18.3% 11.1% 8.2%
2006 43.7% 45.2% 26.5% 28.3% 19.6% 18.7% 10.2% 7.9%
2007 43.5% 45.1% 28.3% 28.0% 18.6% 19.1% 9.6% 7.9%
2008 43.9% 45.1% 26.0% 28.1% 20.2% 19.0% 9.9% 7.7%
2009 43.3% 44.9% 28.1% 28.6% 18.9% 18.8% 9.7% 7.7%
Total 43.7% 45.4% 25.8% 28.0% 19.9% 18.7% 10.5% 7.9%

In this case, "b" denotes data from BIS, and "r" denotes data from Retrosheet.

What this tells us is that someone using BIS data and someone using Retrosheet data do not mean precisely the same thing when they refer to a "fly ball" or a "line drive." It’s a subtle difference, to be sure, but one worth noting.

Looking for Park Effects

The next step was to try and see if a team’s home park had an effect on the error rates. This is akin to figuring out park factors without the benefit of home/road splits—something like teaching an elephant to play the piano. Let’s accept from the outset that our elephant isn’t going to be able to play Rachmaninoff. But let’s see if we can at least get Chopsticks, shall we?

So let’s compare the data on a team-by-team basis, both on offense and defense. I took the rate for each team (using BIS and MLBAM data) and subtracted the average for that season to provide "normalized" rates. In other words, if I were looking at 2009 BIS data, instead of saying that a team had a line-drive rate of 23.8 percent, I would say their rate of line drives above average was 5 percent.

Then I subtracted the normalized rate for MLBAM data from the normalized rate for BIS data to produce what we can call "errors" between the two sets—or "residuals," if you prefer a less loaded term. So if you have a team with a normalized rate of 5 percent using BIS data and 2 percent using MLBAM data, the "error” was 3 percent.

In order to see if the error was due to a consistent park effect, I compared the data for each park to the same data for the next season. The year-to-year correlations were:

LD: 0.503
FB: 0.383
GB: 0.584
IFFB: 0.215

If you prefer a more visual representation, we can look at scatter plots. The x-axis represents the difference between normalized rates in year one; the y-axis represents the difference between normalized rates in year two.

For ground balls:

For fly balls:

For line drives:

For infield flies:

What this strongly suggests is that there are persistent biases in how batted-ball trajectories are scored between different parks. (If this were so, we should expect this study to actually underreport the park-to-park bias, since a team’s home and road stats are lumped together.)

What This Means

Assuming there is bias represented in this data, there are three potential explanations:

  1. That the MLBAM data is correct and the BIS data is biased
  2. That the BIS data is correct and the MLBAM data is biased
  3. That both data sets are biased to some extent, and do not share the same set of biases

Which is true? The truth is, we don’t really know. And I’m not even sure, given the data available, that we can know. (It is possible that this could be better resolved with a more granular look at the data, but I can’t say for sure right now.)

But we can take a look at how the difference in batted-ball metrics affects current metrics. Let’s consider tRA for a minute. tRA is a component run estimator for pitchers (somewhat akin to SIERA, but with a focus on describing what happened, rather than predicting future performance) which takes into account batted-ball data. It’s an interesting case study since it’s displayed on two sites—StatCorner, which uses the MLBAM data, and Fangraphs, which uses the BIS data.

[Fangraphs, it should be noted, scales tRA to ERA instead of RA by multiplying by .92; all figures presented below from Fangraphs divide by .92 to place them on the same scale as the StatCorner figures.]

I chose to look at Wandy Rodriguez  as a for instance because Houston's Minute Maid Park has been one of the more extreme parks over the period looked at, with an average difference of .014 in normalized line-drive rates. In other words, BIS has reported a higher line-drive rate on average than MLBAM for Houston players. Here’s how the two sites, using the same metric, view Rodriguez:

  Fangraphs StatCorner
2007 4.53 4.26
2008 4.48 3.97
2009 3.91 3.41

Over a period of several years, the Fangraphs data is consistently higher than the StatCorner data.

Let’s look at another team with an extreme difference—the Mariners, with an average difference of -0.007. In other words, a park where BIS tends to see fewer line drives than MLBAM. Consider Felix Hernandez:

Fangraphs StatCorner
2007 3.83 4.19
2008 4.23 4.45
2009 3.32 3.27

Not as dramatic a difference as Wandy, but now you see that Fangraphs consistently shows King Felix as a better pitcher than StatCorner.

What I want to point out is that this is not a difference of opinion between the two on how to evaluate pitchers—Graham MacAree of StatsCorner collaborated with Dave Appelman of Fangraphs to implement tRA there. As Appelman points out:

There are a couple things which are different between the StatCorner version of tRA and the version implemented on FanGraphs. The main difference is we’re using Baseball Info Solutions batted ball stats instead of Gameday batted-ball stats. The other difference, though probably not as major is we’re using different park factors.

The thing is, the differences between BIS and MLBAM data are influenced by park biases that are persistent over time—no matter how much data you have, they will not wash out of the sample.

And what about fielding? Again, let’s consider a rather famous Mariner—Ichiro Suzuki. Tom Tango compared Ichiro’s rating in UZR between values computed using STATS data (which is similar to the MLBAM data in that it is collected in the press box) and values computed using BIS data. What he found was that using BIS data, Ichiro grades out as a very good defensive outfielder, while looking at STATS data he looks like a much less stellar outfielder.

The difference isn’t method—UZR was used in both cases. The difference is the underlying data. And the difference between the data sets is persistent and biased based upon a player’s home park.

And we simply don’t know who is right and who is wrong.

So what do we do about this? Stop taking the data at face value. Try to figure out what the errors in the data are and how they affect specific players over a period of years. And then adjust for those errors as best we can.

What We Can Do

The long-term answer is to get better data. Let’s face it—no matter how much we massage the data, there simply is not a way to objectively define the difference between a fly ball and a line drive. It is inherently a subjective and somewhat arbitrary distinction.

There’s a lot of work being done right now in precision batted-ball tracking, both with cameras and radars. Someday maybe that will percolate down—but it won’t tell us anything about players from before the introduction of those technologies. Failing that, a simple stopwatch could provide more accurate, quantifiable data than what we’re getting right now. And it is possible, to some extent, to review video of past games and get those measurements for players and seasons already passed.

 In the meantime, consider this my sabermetric crisis of faith. It’s not that I don’t believe in the objective study of baseball. I’m just not convinced at this point that something dealing with batted-ball data is, at least wholly, an objective study. And where does this leave us with existing metrics that utilize batted-ball data? Again, I’m not sure. I can tell you I’m a lot less comfortable accepting their conclusions—even over a large number of seasons—than I was in the past.