I spend an awful lot of time talking about baseball data—what data we have, how we can tell if data is good or bad, what data we need to answer certain questions.

Here at BP we use a lot of baseball data, most of it either seasonal accounts (now from the Palmer database) or play-by-play data compiled by the fine, fine folks at Retrosheet. Up until now, we’ve had only scattered usage of one of the most exciting sources of data to come about in recent years—the PITCHf/x data collected by Sportvision for MLB Advanced Media’s Gameday product.

But PITCHf/x is here to stay, and we’ve got a lot of things that we want to start doing with it, like Mike Fast’s investigation of catcher framing. We’re not there yet, but we’re taking a first step toward incorporating more PITCHf/x data into our offerings.

Our first PITCHf/x-based report for our sortables is pitcher and batter plate discipline. These, in concept, are the same metrics published on FanGraphs and quoted throughout the Internet—Zone%, O-Swing%, and so forth. The definitions of the most commonly used figures:

  • O-Swing%: The percentage of pitches a batter swings at outside the strike zone.
  • Z-Swing%: The percentage of pitches a batter swings at inside the strike zone.
  • Swing%: The overall percentage of pitches a batter swings at.
  • Zone%: The overall percentage of pitches a batter sees inside the strike zone.

As I said, these are the same metrics presented at FanGraphs—what’s not the same is the results. And the reason for this is the data we’re using. FanGraphs uses stringer-collected data from Baseball Info Solutions, recorded by “video scouts” off the same broadcasts the rest of us get on cable and As alluded to above, we’re using PITCHf/x data. But despite the data source being different, they still measure the same thing, right? More to the point, when they disagree, how to determine which is better?

I’ve discussed some of the problems with charting balls and strikes from commercial video before:

There are a lot of things conspiring against you being able to judge balls and strikes off of video. You can sum it up broadly like this—your brain is a magnificent thing, and it takes the two-dimensional images you’re seeing on your television and reconstructs it so that you think you’re seeing it in three dimensions. It’s a marvelous process, and if you stop to think about it, it’s pretty amazing.

What it is not, however, is perfect.

In order to present the view that you see, the camera is positioned in the outfield at an offset, and then zoomed in to magnify the picture. This is, in essence, an act of deception—you are made to feel like you’re watching a little ways from behind the pitcher’s mound, when in reality you’re watching from the outfield bleachers.

And what the offset does is it distorts the view of the strike zone you have—it’s the phenomenon of parallax. You can observe this yourself, if you just go out to your car and check the gas gauge from the passenger’s seat and then from the driver’s seat:

Description: Illustration of parallax using a car's gas gauge.

You also have problems with depth perception—essentially your brain is “guessing” the depth based upon visual cues in the image. This is difficult enough under the best of circumstances—there are really, really good reasons human beings have two eyes instead of one. Cyclops would be a terrible baseball player. You can get some idea of how this works just by covering one eye and trying to judge distance, then doing it with both eyes open.

The other issue with using commercial video feeds is the frame rates. NTSC video, used in all North American broadcasting, has video at roughly thirty frames per second. (Because of possible interference between the chroma and audio carrier signals, NTSC video has been 29.97 frames per second since the introduction of color.) So each frame works out to a little over three-hundredths of a second.

That sounds like an awfully brief period of time—but it’s not as brief as the flight of a baseball pitched by a major leaguer. The time from release (as defined as 50 feet from the back edge of home plate) to the back edge of home plate for an average pitch is going to be, on average, a shade over four-tenths of a second. Home plate itself is just a little more than 1.4 feet in length. So the time it takes for a pitched ball to cross home plate is less than a third of a frame.

And typically a camera won’t be recording for the entirety of a frame. Modern CCD-based cameras don’t use physical shutters, but all of them will only be recording video for a fraction of the frame time. Let’s consider a particular model of camera from Sony that has been used by Fox Sports for their MLB playoff coverage. The Sony HDC-1500, for instance, has a minimum shutter speed of 1/60—the longest exposure it will take is going to be roughly half a frame in length. At its fastest shutter speed, the camera may only be recording for a shade over one-hundredth of a frame.

What this means, in practical terms, is that there is no guarantee that the video camera will record an image during the moment when a pitched ball passes over the plate. The reason you feel as though it does is an optical illusion caused by the brain’s inability to distinguish between real and apparent movement. (In the past, this has been mistakenly referred to as “persistence of vision,” a real but unrelated optical phenomenon. The preferred nomenclature among the academic community appears to be “apparent motion.”)

To reemphasize—your perception of the ball as it crosses the plate is the product of a number of optical illusions (chiefly apparent motion and parallax error). Your eyes give an unreliable testimony as to the location of the pitched ball in relation to other objects.

The obvious question is, how do we know these effects are meaningfully impacting the data BIS is collecting? In the past I’ve studied the BIS-based plate discipline statistics and found some disturbing irregularities. To sum up:

  • The size of the strike zone is inconsistent from season to season,
  • There are large and unexplained anomalies in the data set (particularly the ’07 Los Angeles teams), and
  • There are measurable park effects even after the obvious outliers have been excluded from the analysis.

It is true that the quality of the data appears to have improved with the introduction of PITCHf/x, most likely due to the introduction of PITCHf/x itself. Quoting from BIS founder John Dewan:

One of the questions that has come up is: How can the video scouts who track pitch location data at Baseball Info Solutions (BIS) be as good as Sportvision's very cool PITCHf/x technology that tracks pitch location using hi-tech camera angles. In short, how can a human being be as good as the technology?

The answer is that, at BIS, it's not simply human vs. technology. The equation at BIS is that technology PLUS human review is much better than technology alone. Let me explain. PITCHf/x technology is a huge step forward in baseball analytics and the pitch location data it provides is excellent. But not perfect. At BIS, they take it a step further. Thanks to the fact that PITCHf/x data is publicly available, when BIS video scouts review video to determine pitch location, they also have information about how PITCHf/x plotted the location. The video scout reviews both the actual video of the pitch and the PITCHf/x location to determine where the pitch is located. In essence, pitch location charting at BIS enhances the charting done by PITCHf/x to come up with what BIS believes to be the best data possible, a kind of Enhanced PITCHf/x.

We’ve discussed the problems with human observation already; how does PITCHf/x avoid those problems? Sportvision, the company that collects PITCHf/x, is allowed to install their own cameras directly in the ballpark. They have the ability to choose lenses and measure the optical effects directly. Because they have more than one camera tracking every pitch, they are able to (in essence) take advantage of distinguishing parallax. And by fitting a trajectory to the entire flight of the pitched ball rather than focusing on one point of the entire sequence, they are able to avoid the problem of not having an image of the ball at the exact moment it crosses the plate. Under controlled circumstances, Sportvision’s engineers have been able to establish the accuracy of the PITCHf/x systems to within an inch, or a third of a baseball. Given these massive advantages, a combined approach incorporating both stringer data and precision PITCHf/x data is most likely to degrade, not improve, the quality of the data.

BIS’s response to these concerns is not particularly reassuring:

As a way to test this, BIS conducted an impartial study. They selected the 100 pitches from their database of the 2010 season that represented the biggest discrepancies in pitch location between BIS data and raw PITCHf/x data. They then meticulously reviewed video once again on all these pitches. The video scouts reviewed the pitch location and selected the data source, either BIS or PITCHf/x, that they believed best represented the true location.

These impartial video reviewers chose BIS plotted pitch location data 55 percent more often than the raw PITCHf/x location as the correct location. The details: 59 choices for BIS pitch location (Enhanced PITCHf/x), 38 choices for the raw PITCHf/x location, 2 pitches that Pitch FX has since corrected, and one pitch where neither location was close.

Let us take BIS at their word that the reviewers involved in this study were, in fact, impartial. The video feeds the reviewers were watching, however, were not impartial—they were the same video feeds the original video scouts reviewed. In other words, if there was a bias caused by the video source (parallax error originating from the placement of the center field camera, for instance) the reviewers would be more likely to agree with the video scouts than the PITCHf/x data, even though both of them would be less able to tell the location of the pitched ball than the precision tracking data.

This isn’t to suggest that the PITCHf/x data is perfect, however—the accuracy under controlled conditions is likely to be higher than accuracy in the field, where the weight of the people sitting in the stadium is great enough in aggregate to actually move the stadium itself and thus the placement of the cameras.

So, based on work by Mike Fast, we’ve incorporated a series of calibration adjustments to “correct” the plate location data to give a better picture of where a pitch really was when it crossed the plate. And we’re maintaining a consistent definition of the strike zone, which means that a batter or pitcher’s numbers can be directly compared between seasons without fear that an apparent change is really the product of how the numbers are being crunched.

This is only our first foray into PITCHf/x—rest assured, we’re not done yet. Even when it comes to the subject of plate discipline, we’re always considering new approaches and will work at incorporating the best analysis possible. So consider this an appetizer course.

 (And we have a few non-PITCHf/x related announcements still up our sleeves in the weeks to come. So we’ve expanded our Big September promotion into the first half of October as well.)

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Could the fact that some stadium cameras are placed differently than others come into play? It has seemed to me (at least this season) that the cameras in Minnesota and Tampa are both more directly behind the mound (and higher up) than cameras in other parks. That could definitely skew the way one watches the video, no?
If you're referring to the TV cameras in center field, that's exactly the sort of thing Colin is talking about in the section of the article on parallax.
This article is fine, but we're in the middle of an exciting postseason, and we're getting almost no coverage of it here. The lead article is on PITCHf/x? That's the most exciting thing happening in baseball this week? Is this Baseball Prospectus, or the Online Journal of Sabremetrics?

I'm a Cardinals fan, and in the slate of new articles today, we have exactly nothing about that series yet again.

This isn't your fault Colin. But BP, during an exciting postseason, can we find time to give the dry statistical research articles a rest and actually pay attention to the baseball games being played? Even if the PITCHf/x stats announced here are the greatest thing ever, is the postseason the time to launch them? Might baseball fans be focused on something else?

you forget. There are fans of 22 other teams right now that would prefer to read Colin's article than yet another breakdown of the same Cardinals and Phillies teams that we've been reading about all year.

Albert Pujols is a good hitter.
Roy Halladay is a good pitcher.

Really, is there more that hasn't already been said a few dozen times?
Then why does BP cover the Yankees series? Because the Yankees don't get enough press coverage elsewhere?

And I'm asking for analysis of the games, not the players. There is a difference.
My reply was more of a sarcastic jab at the lack of acknowledgement of Colin's interesting article than a disagreement with you. If you think I was being serious, you missed my point.

This article was a pretty good read. Hijacking the comments thread to complain about something unrelated just irritated me. But you made your point, and the powers that be responded, so now we can all go out for a beer.
Sarcasm is notoriously wasted in electronic forums.

I'm not sure if commenting on something's placement as a lead story is hijacking, but if it is, I apologize. If Colin was in any way insulted, I particularly apologize.

A long time subscriber, I've tried posting about my frustration with the site in other comment areas of the site, and I have sent a couple emails. I thought I'd express my frustration with the site one last time before I shuffled on.

Hi, everyone. I just posted this in Jay's thread from yesterday and I'll include it here as well. We will have coverage of all the series beginning tonight, with Ben and Derek taking on the NL series to go with Jay and R.J. on the AL series.

I agree with much of the criticism here. This has purely been my error. I'm currently preoccupied with deadlines for our next book, Extra Innings: More Baseball Between the Numbers (we call it BBTN II around here) so I while I did think about assignments for division series previews, I didn't think through continuing those beats into the series themselves.

Moreover, and I think this is the real lesson of what has happened here, is that I am the first BP editor to have more than one BBWAA member on staff, giving us the possibility of having the kind of on-sight coverage Jay has been giving us. Normally, BP writers are self-directed and our coverage hasn't been so systematic, but Jay's dispatches have been so well done that it pointed up our lack of analytic coverage of the other series. That will change immediately.

Finally, both Jay and I will have home park credentials should the Yankees or Phillies make it to the next round (and if either makes it to the World Series as well) and so we will definitely continue to have detailed on-site coverage throughout the end of the postseason. As always, I appreciate your feedback and I hope that you continue to throw both bouquets and brickbats as our work merits them.
--Steve, Editor-in-Chief
Parenthetically, we have also had requests for some in-game chats with the staff as we have done with other key games in the past. This is something we plan to do beginning with the next round, when fewer overlapping games makes scheduling a bit more rational.
Great! This is really good news, Steven. Thanks so much.
I just love the outright candor of Steve's post, while accepting the legitimacy of the readers' opinions.

Nothing to parse, and everything taken at face value. If only politicians were more like Steve has shown himself to be here.
It's because Steve is the man (and The Man).
Thanks for the reply Steven. Your post in Jay's column must have been since I last checked it. As always, you are the classiest of class acts.
Get your head out of making the site better and watch some baseball!

Actually, I kind of agree that while the effort is appreciated the timing is odd to say the least.
I appreciate Colin's work. My post was not about him. My post is about what we're not seeing much of on the site anymore. At the end of the day, baseball games are at the heart of baseball analysis. Statistics are useful for what they tell us about baseball games. What I used to like about BP was it had both stat based pieces and pieces that used the statistics to cover the game. The combination was nice

Now, we mainly get the statistics. It's like buying a Reese's Peanut Butter Cup that contains ONLY peanut butter. I like peanut butter, but I would still miss the chocolate. Noting that I want some chocolate back in my peanut butter cup in no way disparages peanut butter or those who make it.

Given their large slate of writers, I would hope someone at BP could be assigned to watch a NL postseason series and write about it in an entertaining way that uses the insight that Baseball Prospectus provides.

If they don't have anyone who can do that, they could hire a special guest to cover postseason baseball. There has to be someone out there they can get.

As it is now, I have the disturbing impression that no one at BP is even watching the NL postseason series.
I agree. A baseball-only site has nothing on some playoff series? There's no excuse for that. Content please.
This is what I miss:

I know Colin was not hired to write this sort of piece. But can't someone be?
Don't we all miss Mr. Sheehan? I'm not sure he can be replaced.
Fine work, Colin.

To those upset at the timing of this post, you must not be on Twitter during these playoff games. Holy crap, it seems like 90% of the discussion is fans whining and bitching about the strike zones. Heck, even LaRussa and Girardi have channeled their inner Phil Jackson in attempts to get the umps to change their zones mid-game/mid-series.

I'm complaining about the strike zone because I believe (perhaps incorrectly) that having the ump situate himself between the batter and the catcher impairs his view of outside corner pitches. If the ump stood directly behind the catcher (as they used to do, IIRC), they'd have a "truer" view, am I right?
You may be right. We don't have a control group any more to test the theory, as far as I know. I suspect that you're not right, and that it's more driven by umpires calling the zone relative to the catcher target.

I say that for two reasons. One, I looked at where a few umpires stand, and it had no obvious effect on their zone:
Home Plate Umpire Positioning

Two, if you look at the difference in zone between RHB and LHB, which I think is what bugs a lot of people, the zone for LHB is not actually wider, it's just shifted toward the outside (on both the inside and outside edges). This makes sense if it's due to the catcher target, which is shifted outside by 2-3 inches for LHB. But if it's because the umpire is in the slot and can't get a good view of the outside edge, why wouldn't he call the inside edge the same for RHB and LHB?
I think you'll find the answer if you look at the average horizontal angle a pitch comes in to a LHB and to a RHB. LHBs see more pitches that are moving from inside to outside than RHBs do. The umpire's strike zone is going to naturally shift to where pitchers are trying to pitch, even if that results in a shifted strike zone. By no means is this acceptable, but it's what happens.
There was a time (and I guess I'm dating myself) when NL umps wore chest protectors inside their shirts/coats and AL umps wore the "balloon" protectors, and as a result, accepted wisdom was that NL umps were better at calling the bottom of the zone accurately, and AL umps were better at calling the corners. So, standing above the catcher's head would make it a higher strike zone. Just seems like there are tradeoffs either way.
Great stuff - Can we get Team as a filter on the reports? I've noticed it's dropped off a lot of reports, so if I wanted to see those stats for all the say, Cardinals, it'd be a pain unless I put them all on my "tracker."

Which is a pain.
Great addition.

The four decimal places, and the format in general, are a little tough on the eyes though. I think it would be helpful to overhaul the presentation format, and not just for this stat report.

As an example, when Jeff Euston's compensation data was added (another great addition!), once the ticker at the top of the page disappeared, it is a bear to try to find the information now.

I hate to use another website as an example (but Colin mentioned FanGraphs in the article so I think its OK), but a large part of the reason FanGraphs data is popular is because of the ease of navigation. Making custom reports is great, but I'd wager $1 that having easily navigable standard reports would result in more use.
Great article, Colin.

With that out of the way: The lack of playoff coverage continues to amaze/dismay me.
Love the addition of these stats and that they're pitch f/x based. And the performance increase of the stats page is noticeable and appreciated. Now just make the data as accessible and as well-presented as FanGraphs and you'll really have something. The current layout is somewhat painful to use.

A few suggestions:

1) Difficulty accessing definitions. Definitions are not displayed on the stats page and abbreviations are not always clear. One can access the glossary through the headers, but it requires loading a separate page -- either in a separate window or by leaving the stats page. Could these be hover tool-tips instead? Or at minimum, could the glossary search bar be placed on the report page as well?

2) Sorting. Having multi-layered sorting is nice, but it's of secondary importance/value to simple, quick sorts. Being able to sort quickly by clicking on the field headers would be a welcomed addition. Perhaps you could add a neutral sort icon (e.g. "--") like the up and down arrows that would all be clickable and would rotate through asc/dec/neutral.

3) Filtering. Again, having multi-layered filtering is nice, but I'd love to be able to filter on more than just Team/League/Pos/PA. Often I'm trying to obtain a list of batters that mean some threshold of stat. Perhaps this would cause performance issues, but it would be very nice to get all batters with OBP > .330, for example, without having to export a full list in to Excel. Just adding one variable filter field would be a great addition.

4) Significant digits. A contact rate of 0.7906 is difficult to read. I assume you'd have to give up speed to display it as 79.1%, but it's a trade off I'd personally take. Is the hundredths place meaningful? I'm guessing not. Heck, is the tenths? If I want a raw data export, the detail is helpful. But the current display can make it more difficult to interpret the data.

Just a few thoughts. Keep up the great work, Colin.
Are the strike zone definitions used based on the rulebook definition, or on Fast's findings about how the zone shifts based on batter handedness?
The strike zone definition used for these stats is the one I described here:
A Zone of Their Own
(after the fourth paragraph)
Is there any easy place to see the league averages for each metric? What's the league ave. O-Swing rate, for example?
2008 50.7% 45.8% 81.4% 63.0% 28.0% 87.9% 66.1% 8.4%
2009 51.0% 45.2% 81.3% 61.8% 28.0% 87.9% 65.6% 8.4%
2010 51.0% 45.4% 80.8% 61.7% 28.4% 87.8% 64.7% 8.6%
2011 50.8% 46.0% 80.8% 62.5% 28.9% 87.9% 64.7% 8.7%
Why is the SW STRK RT in the pitching plate discipline table so much higher than the ~8.5% historical average. The simple mean from the table is ~19.3%
As we go down the road, hopefully we'll make this more user friendly, better formatted, easier to sort, etc.
Thanks for that table. Very interesting. Apologies for being dense, but what does CONT stand for? Thanks.

I'm worried a little about the impact on metrics like O-Swing and Z-Swing that could arise from the fact that players adjust to the individual umpire's strike zone in effect during a particular game. Therefore they are often pitching to, and selecting pitches based on, a strike zone that is different from the denominators of the metrics, no matter how carefully the metrics are adjusted. I guess as long as we apply the metrics to seasons and not individual games, it will probably even out enough.
CONT = contact

I agree we're a long way from being finished with our understanding of the strike zone, batter plate discipline, and how to measure them accurately, consistently, and in ways that have useful baseball meaning.
Thanks, Mike. We may be a ways from full confidence about all aspects of the matter, but we sure are a lot farther along than we used to be, thanks to your work and others'.