Every major-league game you watch, ball-tracking technology is used to deliver live views of where the ball passes through the strike zone. This is an entertaining and seemingly unbiased image of where the ball traveled through the zone. As a result, it’s no surprise that there’s been a steadily increasing call to replace the home plate umpire and rely entirely on machine measurements to call balls and strikes in real time, i.e. a “Robo Zone” or “Robot Umps.”
The reality, however, is that there are myriad issues with implementing a ball-and-strike-calling system like this for real games. The primary issue is that using machine measurements to call balls and strikes will simply shift disagreements with the call from the umpire to the machine, or to the machine’s operators.
People watching a game also tend to forget that their viewing perspective alters how the pitch appears to them, which can create the same animosity when the machine report does not match what people believe they saw. Your perception of the pitch can change significantly depending on whether your point of view is from beyond the outfield wall, or from the perspective of pitcher, umpire, or batter.
In order to fully assess the implications of embracing our mechanical strike zone-judging overlords, we’ll take a look at the issue from two standpoints. First, we’ll look at the potential issues that could limit or impact the implementation of the technology, followed by an assessment on the impact the switch would have on the game.
If you’ve ever seen people posting images of a “live strike zone” during games (Twitter, we’re looking at you), it’s clear that too many fans believe these systems are 100 percent accurate. This is simply not true, as there are limited accuracy tolerances within which the machines are reliable, and human tolerances in which the machine operator decides to set the top and bottom of the zone. Sportvision claimed that their PITCHf/x system, first introduced in major-league stadiums in 2006 with data made public in 2007, was accurate to one inch. In analysis performed by Mike Fast in 2011, he found that the error on that system was actually better than that:
Just over half of the stadium-seasons have a measurement error below half an inch in each dimension. When both dimensions are combined, 29 percent of the stadium seasons are below half an inch of measurement error, 67 percent have less than one inch of error on average, and 98 percent are less than two inches.
Mike Fast – Spinning Yarn: How Accurate is PitchTrax?
There does not appear to be any publicly available strike zone accuracy tolerance or error rates for ball-tracking systems by manufacturers such as TrackMan and Flightscope. Are they accurate to within a tenth of an inch? One inch? Two inches? Four inches? It’s important to know.
Even if the companies did publish that data, MLB would still need to set allowable amounts, and then test these systems in order to verify and calibrate those machines on a constant basis. These set tolerances also need to include the amount of time the machine takes to process the data (i.e., the time that elapses between the pitch and the decision made by the machine), or the game will be slowed down.
With PITCHf/x there was a concerted effort from Sportvision to engage the public analytical audience in helping to refine and improve the product. This is most clearly exhibited through their hosting of PITCHf/x Summits, which were aimed at collaborating with the public analysts who made PITCHf/x so popular. This interest in collaboration not only improved PITCHf/x as an analytical tool but also built trust with the public. Max Marchi summed up the sentiment thusly:
Sportvision has been doing a terrific work in the past few years in tracking every major league pitch and we are really fortunate that it (and MLBAM) let us put our hands on that wealth of data. Thus, pointing at miscalibrations is not meant as criticizing their amazing work, but rather as a way to give something back.
Max Marchi – Fine tuning PITCHf/x location data
For current ball-tracking systems, some are known to have issues reporting correct spin rates on pitch types, such as certain types of sliders, low-spin pitches, gyro-spin pitches, and missing exit velocity and spin data on certain batted balls. Much of that likely is due to the way the Doppler effect is used to detect spin. But it’s also quite possible some of these errors could be due to occasional radio frequency (RF) interference, even though the estimated frequency band for TrackMan and Flightscope is likely the relatively little used 10.5GHz band. This is based on viewing their mandatory FCC certification filings. Plus, RF interference could be other transmitters (accidental, or on purpose), self-reflections, or other effects from material (like cables or seats) near the transmitter. In fact, in these bands, rain of any kind can be a significant interference source that can be hard to eliminate, even with the best processing.
Those are the types of measurement errors and error rates that need to be monitored. It is widely known that both Doppler-based and camera-based machines are not accurately reporting 100 percent of their measured data 100 percent of the time. Even if they were accurate 99.9 percent of the time, what if that 0.1 percent is a strike but is called a ball on a 3-2 count in the bottom of the ninth inning with bases loaded and two outs in a tie game? It may seem like a silly thought exercise, but it’s important to consider these potential impacts on the game.
In the example below, you can see two views of the exact same pitch. From the mound view the pitch looks to be a ball, but a closeup shows it was a strike just barely on the edge of the zone. For a pitch this close to the strike zone, the accuracy tolerance of a machine calling balls and strikes is critical.
Rob Arthur of FiveThirtyEight showed that when it came to pitch tracking, the new TrackMan systems that made up Statcast had been experiencing some issues:
Errors in both horizontal and vertical movement have never been higher in the four years that Statcast has made some of its data publicly available. So it’s not just your imagination as you watch the game on TV: In-broadcast representations of the strike zone (like FoxTrax) take their data from Statcast, and Statcast’s errors, in turn, have bred anger with umpires and confusion over how pitches are being called.
The root cause of Statcast’s troubles is unknown. The problems could originate in the hardware, the computer code processing the resulting data, or any other part of a complex system. The hardware part of Statcast — the part that actually tracks pitches — is a radar system sold by a company called TrackMan.
These ball-tracking systems all feature a combination of hardware and software that work together to track, analyze, and record each pitch thrown. The hardware—Doppler radar transceivers, camera systems, etc.—and software are each equal sources of potential issues. There can be software inaccuracies, bugs, or other issues that impact performance like any piece of software. Additionally, when software determines what measurement is reported, that opens the possibility for hacking. In theory it would be fairly trivial to modify software to report pitches closer or further away from the strike zone, toggling that to favor one team or another. While data is being sent from the machine to the reader, this can also be hijacked, or a spoofed measurement reported, again to favor one team or another. Securing the integrity of the software and the transmission link on which its sent to the user to read is critical.
Even if the data is securely transmitted and received, it is also uncertain how, exactly, the two sensor types (camera and radar) will be used to make the calls directly. Will one sensor make a primary call, with the other relegated to backup or confirmation? What if they disagree? If they’re to be used together, what algorithm will be used to combine the reports from each sensor? How are those validated? Once again, the measurement tolerances come into play, as the overlap between each sensor’s uncertainty region can mean the difference between a dead rally and a half-inning that won’t end.
You also have to prepare in the event the machine breaks. Machines will break, and under Murphyʼs Law, break at the worst possible times. If there were no umpire calling balls and strikes behind the plate, you’d have to account for the instances where the machine is broken. A broken machine could be caused by hardware issues or software crashes. There would have to be backup systems in place to immediately switch to or else there would be intolerable game delays. Would we need an umpire on standby to step in and make the calls in the event that the system goes down mid-game?
Even for calibration, measuring the accuracy of one machine with another machine is risky. One machine measuring another adds similar tolerance and accuracy issues within the machine being used to perform the calibration. This means that calibration would need to be done using more direct and indisputable physical methods, not unlike the foam board used by Sportvision to monitor the accuracy of the PITCHf/x system:
Sportvision’s high tech lab (parking lot) and several baseball diamonds have played host to their engineers on many occasions to test the system. Setting up the PFX cameras, and placing a foam board on the front-plane of home-plate, they have been able to establish the level of accuracy of the px/pz values. Within 1 inch. Sometimes better. In any case, you and I can trust those plate locations within a third of a baseball.
Harry Pavlidis – The State of F/X: MLB.com Gameday and Baseball Analysis Improves
Verifying a machine’s accuracy is important[i], but it’s not the only potential source of issues. Currently, a stringer or operator is responsible for setting the top and bottom of the strike zone for each batter and each plate appearance. That process is subject to error as much as an umpire behind the plate calling strikes.
Even if you had optical detection and software to automatically set the top and bottom of the strike zone, we have the same issues with the accuracy and tolerance of that technology as we do with the radar systems, as well as potential for hacking or tampering to favor one team or another. That of course is if a machine setting of the zone were even feasible; it’s unlikely that a machine would be able to accurately track the top/bottom of the zone in real time as a batter moved in the box. There would also be the likelihood that batters would develop ways to game the optical top/bottom zone detection system much like catchers have developed techniques for pitch framing.
Rule 2.00: The Strike Zone
The STRIKE ZONE is that area over home plate the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants, and the lower level is a line at the hollow beneath the kneecap. The Strike Zone shall be determined from the batter’s stance as the batter is prepared to swing at a pitched ball.
Additionally, the rule book strike zone is somewhat subjective as well and would likely need a further refined definition if machine systems were to be implemented for calling strikes in a live game. Dr. David Kagan recently explored this aspect of the robo zone problem, noting that it remains one of the largest obstacles to implementation.
This would impact the analysis of the game as well–it’s unwieldy and unreasonable to account for the unique zone on every pitch when performing analysis. There were more than 720,000 pitches thrown in major-league games last year, and in theory each has a different zone depending on the batter’s setup in the box. Even if you limited it to one zone per batter, that’s still more than 1,300 different strike zones to analyze. Hence the rise of the “typical six-foot batter strike zone,” a relatively uniform instrument for post-game analysis.
The challenge is that even with this iteration of the zone for analysis, there isn’t consensus on the exact zone to use. This is an analysis challenge rather than a robo zone one, so we’ll save further discussion on this for another time.
If there were a Robo Zone, at some point there would need to be agreement on how to interpret the rule book strike zone using machine data. Dr. Kagan pointed out that MLB is using previous umpire calls to set the zone rather than stringers now:
PITCHf/x originally used poorly paid “stringers” to sit in a dark room under the stands and manually turn a dial to set the top and bottom of the zone on the video image of the batter. Saunders reports that Statcast uses the previous calls of major league umpires to build a database of the top and bottom of the strike zone for each hitter.
Isn’t that ironic? Until MLB comes up with a machine-comprehensible definition of the top and bottom of the strike zone, machines will need the assistance of humans to define the strike zone for the machines.
Dr. David Kagan — The Physics of RoboUmp
We built a model that compared the umpire-called zone to the ones that stringers set to see what kind of impact the change Kagan notes would have. In reality, it would be minimal. Below is a chart showing the distribution of the bottom of the zone, on the y-axis by stringers and on the x-axis by umpires:
There’s a strong consensus that the bottom of the zone is roughly 1.75 feet off the ground, with variance for different batter heights and stances. We tested the variation in called strikes and found that using an umpire-set bottom of the zone rather than a stringer zone would result in a 0.0015 percent difference in calls at the bottom of the zone. This change may be a good one from a process standpoint, but it’s largely negligible from an impact standpoint.
Aside from the 2D dimensions of the strike zone as shown above, a machine is capable of measuring the strike zone as a 3D area volume above the plate, as opposed to a front only-facing 2D area. The image below illustrates what this 3D area volume looks like. It’s unlikely that umpires would call strikes in such a 3D space.
A machine could, however, so would pitches that enter the zone through anything other than the front facet of the zone be considered strikes by the robo zone? While it’s rare for a pitch to fall into this category now, it’s possible that pitchers could game the system by throwing more eephus pitches designed to drop into the top of the zone, or some other such strategy. Again, these implications may seem outlandish, but they must be reconciled before moving forward with something that could drastically change the game.
It’s important that we talk about the impact the robo zone would have on the game in real terms. Dr. Kagan did an excellent job outlining how random error would impact the called zone, so we won’t rehash that aspect here. What we will do, however, is examine real impacts to the game based on the available data today. How many pitches would change from ball to strike or vice versa?
The images above highlight the changes that would manifest should an automated system be implemented. The zones being used here correspond to a 24-inch-wide zone with the top and bottom set by stringers at each park. Pitches in red are strikes called by human umpires that would now be balls. Pitches in green are balls called by current umpires that the robo zone would call strikes. As you can see, the top and bottom of the zone would be significantly impacted, as would the outside corner for lefties (which is often left uncalled).
These charts also highlight the difference in opinion of the strike zone among stringers and umpires, as umpires tend to call pitches down below the bottom of the stringer zone strikes with some regularity. How might this change depending on count? We know that the zone expands and contracts in relation to whether the pitcher is ahead or behind.
There are changes here, too. Life becomes significantly better for the player with the advantage, as pitchers who fall behind no longer have a larger zone to pitch to, while pitchers who get ahead will get all those strikes on the edge that are often called balls.
|Strike Rate||Batter Ahead||Pitcher Ahead|
Once calibrated, there’s roughly a two-percentage-point difference between the way that humans call the game and the way that the robo zone would be called. That may not seem like much, but it could result in a significant increase in walks or strikeouts.
It would likely impact pitcher strategy, as they’d be incentivized more than ever to get ahead early in the count–likely prompting early action from the batters as well. With human umpires the zone expands and contracts depending on the count and whether the batter or pitcher has the advantage. Knowing that, pitchers who fall behind know that an expanding zone can help them battle back. When the zone doesn’t change size based on count, pitchers will be incentivized to try to get ahead early so that hitters have to cover areas they haven’t traditionally had to defend when behind in the count (e.g., the top corners of the zone).
We looked at every major-league game in 2017 to see how many calls would change from one thing to another based on our model. The results ranged from just four changed calls to 50. Some games would be minimally impacted, likely in a way that was hardly noticeable. Some games, however, would see significant changes in how they’re called. This highlights the inherent issues with a system in which nobody seems to agree on what the strike zone ought to actually be.
Pitchgrader’s analysis across a significant amount of collegiate strike zone data suggests more of the same, with umpires generally missing calls in the same locations as their major-league counterparts (albeit in different magnitudes). That said, there can be significant strike zone changes from game to game and umpire to umpire, which suggests that implementing a robo zone in the collegiate game would see even greater impacts than in MLB.
(How PitchGrader handles missed calls.)
Even with all the potential issues, there are potential solutions for them. Technology could be employed to reduce subjective calls made in error with a redundant system. After all, everyone wants the calls made both quickly and correctly. Even so, we would never be in favor of a purely robo strike zone without an umpire.
Let’s assume we tried to create a reliable measurement system. It would need three elements as below:
- Doppler radar system
- Optical tracking system with video
Together, the three elements would be used to call balls and strikes in a majority rules fashion. If both machines call a strike, but the umpire calls a ball, the umpire is overruled, etc. Though this could work, it could also be an overly complex system that would be prone to more problems, breakdowns, delays, and errors than it potentially solves.
The various ball-tracking systems are an invaluable resource for live and postgame analysis, particularly in player development and scouting. The data they produce opens new ways of seeing details of the game like never before. It goes beyond sabermetrics into a new facet for process analysis rather than results analysis.
The data is a resource that needs careful interpretation, consideration, and understanding to be useful. If they’re accurate 99.9 percent of the time, that still makes the data they produce an amazing and invaluable resource for player development, scouting, and general fan interest. That’s the best application of the systems that exist today.
With all the potential problems and complications of implementing a robo strike zone it seems that, instead of a robo zone, an unbiased and skilled umpire is still our best choice.
Bio: Wayne Boyle is an electronics and software engineer with a career spanning 30 years. He is the founder of Pitchgrader, which makes advanced software used by top D1 & MLB teams for Pro level baseball 3D analysis/simulation/projection, and actionable meaning for pitched and batted ball tracking data for player development and scouting.
[i] Teams, especially colleges, need a fast and inexpensive way to perform tests on their own now and then. A good method to test strike zone accuracy would be as follows: Print a 1” grid pattern on a foam board, with printed marks for left and right side of the plate, and a height marking such as 10”. Then secure that foam board to a stand with a wood backing. Something simple like a 2’x4’ sheet of ¾” plywood, with 2×4’s attached to stand it up. Simply line up the foam board with the plate, and straight up vertically, and measure that the 10” mark is actually 10” from the ground. Then as each ball hits the target, mark the dent with a marker pen as #1, #2, etc. and confirm the foam board is aligned before the next pitch. Later you then compare the ball tracking system values with the ruler measured values on the foam board dents. A free calibration image that teams with ball tracking systems can print on a 24” x 36” foam board is available from Pitchgrader here.