keyboard_arrow_uptop

Prompted by some of the complaints about umpiring, last week I investigated why watching on television may not give an accurate picture of where the ball is actually going. The short version—the position of the camera has a distorting effect on the image, and when the brain reconstructs a three-dimensional view, it is fooled by those distortions. Some people were skeptical of my claims.

Now, as it happens, we have a ready source of data we can use to evaluate the question of observer positioning bias in scoring balls and strikes off of video cameras—the plate discipline stats published by Fangraphs. Those figures are calculated using data provided by Baseball Info Solutions, which uses “video scouts” to collect data off baseball broadcasts (the same telecasts that we see as fans). These video scouts have a representation of the batter, the plate, and the strike zone, and they map the perceived location of the pitch on that image. That data is then aggregated.

Importantly for us, BIS rotates the scorers between games—so we don’t have to worry about a particular scorer being biased for or against a particular team; that should wash out of the data as simply noise.

First, let’s start off by looking at what data is available. According to the definitions on the site, the metrics we’ll be focusing on are:

O-Swing%: The percentage of pitches a batter swings at outside the strike zone.
Z-Swing%: The percentage of pitches a batter swings at inside the strike zone.
Swing%: The overall percentage of pitches a batter swings at.
Zone%: The overall percentage of pitches a batter sees inside the strike zone.

We can usefully define the relationship between them as:

Swing%= Zone%*ZSwing%+(1-Zone%)*OSwing

Using some basic algebra we can rearrange the formula to define any of the points in terms of any of the others.

Now if you think about it, you can divide up the idea of Zone% into two components:

• Zone% on pitches where the batter swung, and
• Zone% on pitches where the batter didn’t swing.

It is a bit less clear-cut than that, in that you have the matter of checked swings, but it is close enough for the purposes of our discussion. FanGraphs does not directly publish those figures, but they can be derived easily enough from that formula.

Previously several analysts, notably Joe Pawlikowski and Nick Steiner, noticed something of a drift in the number of pitches in the zone according to BIS. If we break this down into pitches marked as in the zone on pitches swung on and pitches not swung on, you can see this pretty clearly:

If a batter takes a pitch (that is to say, if the umpire has to make a decision on whether the pitch was in the zone), the percentage of pitches scored as in or out of the zone stays pretty consistent season to season. If the batter swings, and the umpire doesn’t have to make a determination of whether or not the batter swung on the pitch, we see a pretty constant decline in the willingness of BIS’ scorers to mark it as in the zone.

There doesn’t seem to be an explanation for this that comes from the actual pitched ball themselves—it really isn’t plausible that pitchers could change their approach that much without it showing up in a batter’s swing rate or the called strike rate.

There’s a likely explanation for why the data on pitches taken is more stable than the data on pitches swung at—in the case of a pitch taken, the video scout is getting immediate feedback in the form of the umpire’s calls. Now, there will be times when the scorer disagrees with the assessment of the umpire, but they always have the umpire’s assessment available to them when it comes time to make the decision. On pitches where the batter swung, the scorer must make the decision unaided—his only recourse is another scorer, who shares the same observational problems he has. (Since 2008, all parks have had PITCHf/x data, which could serve as an additional check; 2007 had partial coverage. But from 2002-06, no such objective data was available publicly.)

(What we don’t know is why it has been declining so steadily. It’s probably an important thing to know—the answer probably tells us a lot about what collecting this sort of data entails. It just doesn’t seem that we can solve it given the rather coarse look at the data we have now.)

This raises a rather serious problem with the use of the data, incidentally—it’s not independent of the outcome. If you take two batters who see the exact same distribution of pitches, but one has a tendency to swing more, the free swinger will end up with the higher zone percentage. Of course the percentage of pitches in the zone and the number of times a batter swings aren’t independent—batters will swing more if you throw hittable pitches, and pitchers will pitch around a batter who can’t keep the bat on his shoulders.

The problem is, if you have two batters with different swing rates, their Zone% (and all their derivative numbers, like O-Swing) do not mean the same thing. This creates problems if you want to use the data to compare different players (which is to say, if you want to use the data at all.)

Now before we can discuss the general question of scorer bias in the data, we have to deal with the two LA teams, the Angels and Dodgers, in ’07. A cursory look at the data shows them as extreme outliers. As a team, Dodgers and Angels hitters saw only 39.5 percent and 41.0 percent of pitches in the zone, according to BIS. That’s roughly 10 points lower than the next lowest team that season. We see pretty much the same effect for the pitchers on those teams—both at 40.6 percent of pitches in the zone, over 10 points lower than the next lowest team that season. Now, if this is a park effect, we have to bear in mind that these team stat lines include both home and road—in other words, the actual park effect would have had to have been twice that.

And we see no evidence that there really were that few strikes thrown in the City of Angels that year—Angels and Dodgers hitters placed 10th and 11th in walks, respectively, while the pitchers placed 27th and 17th. (In both cases, rankings are presented from most to fewest walks.)

What this proves is that for those two teams in that season, there were substantial problems in the data BIS was collecting. What that doesn’t tell us is how much data problems affected the other 268 team-seasons under consideration. But the sheer magnitude of the problems with the data for those two teams is such that it overwhelms any testing we may care to do. So from here on out, I’ve discarded those two teams from any of the correlations I’m presenting.

To reiterate the preceding paragraph—in a study looking for potential park-related bias in data, I have thrown out the data points that carry the strongest suggestion of bias. This means that any tests will underreport the observed bias in the data set.

Focusing on pitches the batter swung on (as that should give us the clearest picture of the ability of the scorers to determine whether or not a ball was in the zone or not independent of the signal given by the umpire), I took each team’s “Zone Swing” for batting and pitching and subtracted the league average. This gives me values independent of the change in average year over year. (I looked at correcting the data by dividing rather than subtracting and it made no significant difference in the results.)

Then I looked at the correlation between Zone Swing Above Average for a team’s offense and defense, which was a robust .46 (again, after excluding the ’07 LA teams—with them included, it jumps up to .68). This scatterplot captures the effect pretty well:

What this tells us is that there is a substantial relationship between a team’s pitches in zone (according to the BIS stringers) on offense and defense. If this is park effect, again—it is understating the magnitude of the effect by half, as this data includes a team’s home and road games. And again—this is exactly the sort of thing we should expect to see, given what we know about the positioning of the cameras in various parks and the effects this has on our perception.

(The picture is quite a bit better looking at pitches where the batter didn’t swing; a similar study of team Zone% on pitches not swung on yields only .06. Without more detailed data, it is difficult to say if this is because it is easier to discern a pitch’s location when the batter doesn’t swing, or if it’s easier to discern a pitch’s location when the umpire tells you, to at least some extent, where it was. I suspect both things are true, but I have no idea the magnitude for each.)

And the effect shows some persistence from season to season. For each season, I took each team’s Zone Swing Above Average for hitting and pitching and averaged the two together. I then looked at the year-to-year correlation, which was .21 (again, excluding the ’07 LA teams). That means to some extent we can predict how these effects will affect teams using past data. I suspect we could do better still if we included information on when teams switched parks or made changes to the placement of the center field camera.

So we have evidence that some people (namely, the BIS stringers) are affected by the positioning of the cameras when it comes to calling balls and strikes. (This is of a piece with previous examinations I’ve made of batted-ball data, and I suspect that the same would hold if hit location data were examined in a similar fashion.) But what about other people? Not to put too fine a point on it—what about you?

There is a tendency for people to be overconfident in their own abilities. As Dr. David Dunning, a social psychologist, puts it:

"People overestimate themselves," he says, "but more than that, they really seem to believe it. I've been trying to figure out where that certainty of belief comes from."

[One] problem is that in many areas of life, accurate feedback is rare. People don't like giving negative feedback, Dunning says, so it's likely we will fail to hear criticism that would help us improve our performance.

"It's surprising how often feedback is nonexistent or ambiguous," he asserts. "It's a pretty safe assumption that what people say to our face is more positive than what they're saying behind our backs." People also overestimate themselves out of ignorance, Dunning says. Take the ironic example of an elderly man who thinks he's an excellent driver but is a hazard on the road, or the woman who reads a book about the stock market and is ready to compete with a professional stockbroker.

When it comes to things like charting pitches or batted balls off the TV screen—what’s our feedback? What are we using to assess ourselves and our ability? In the case of the pitched ball, what we have is the umpire’s determination of ball or strike. And we only have that if the batter doesn’t swing. (And if we are quick to dismiss the umpire’s judgment when it disagrees with our own, we are losing even that meager feedback.) And for charting balls put into play—we lack even that level of feedback.

The point of this exercise is not to focus on any one person, or any one group of people—it is to illustrate our limits as humans, both in our ability to correctly interpret what our eyes are telling us and in assessing those abilities in ourselves.

Because more and more, sabermetrics is trying to answer questions that require this sort of observational data. That analysis is only as good as the underlying data. And that underlying data is only going to be as good as our ability to ask meaningful questions and get meaningful answers in return.

10/18
10/18
10/18