It is always better to try and evaluate performance than to do nothing. This is true in almost everything. If you work for a huge telecom company (for instance) and there’s no tracking of any kind of project success or failure, that’s a major problem–ideas are floated off, no one knows whether they fail or succeed, and when they go over budget and don’t work, no one learns a valuable lesson from that failure. Instead of remaining ignorant and believing that things are going well, spend the money and see what’s up.

The cheap, primitive survey done shows that 75% of the company’s projects fail utterly, and the 25% that succeed at all run, on average, 200% over the worst-case budget estimate and are 50% over schedule. Sketchy as it may be, that’s useful knowledge, immediately applicable, and helps to make the company better.

Now, any attempts to measure what’s going on won’t be perfect: Measuring budget alone means ignoring quality and ability to meet deadlines–you need a more holistic view to track everything. But as long as you’re conscientious about what you’re doing and looking to constantly improve, you’ll be much better off.

Along these lines, all we’ve had to evaluate umpiring are raw stats. If an umpire tends to yield fewer walks and more strikeouts, it’s a good bet that he’s being generous with his personal strike zone. When that happens, players are forced to swing at bad pitches or get rung up on called third strikes. There are obvious issues with looking at umpires this way: if one ump happens to work a lot of good starters by luck, he’ll look like more of a pitcher’s umpire.

Computerized ball and strike calling even in its most primitive forms is potentially a great tool for evaluation and a step toward finally getting the strike zone settled. Even if it’s a set of overhead cameras mounted in domed stadiums looking straight down on the plate that only detects really bad calls, but turns out a report like…

Umpire One 10% of strikes 1″ off plate, 5% of strikes 2″ off plate, 1% of strikes 3″ or more off plate.
Umpire Two 5% of strikes 1″ off plate, 1% of strikes 2″ off plate, 0% of strikes 3″ or more off plate.

…you’ve learned something and can go to umpires calling the Tom Glavine Personal Strike Zone and say “knock it off.” You’re still vulnerable to the next worst tier of bad umpires, but reigning in the worst calls helps.

QuesTec is much more sophisticated than that, though.

The worst case presented by the umpires is that when QuesTec disagrees with the umpire, QuesTec is wrong at least half the time, and that it’s only accurate within six inches. Here’s the thing–that can’t possibly be true. The plate is 17″ wide, more or less. Making a couple of quick assumptions (QuesTec makes a three-inch inside-from-true-position call as often as it gets it right or 3″ outside), QuesTec will call a middle-of-plate strike a strike every time, and it’s only when the ball’s off dead-center between 6.5″ (when three extra perceived inches outside makes it barely a ball) and 11″ (when taking three inches back inside barely makes it a strike). At a foot off the plate, it doesn’t matter which way QuesTec is wrong. Within that range, QuesTec’s going to make bad decisions that don’t affect the call the majority of the time–on a ball just outside, QuesTec would be right on or put the ball further outside and still a ball. QuesTec wouldn’t just have to be inaccurate to disagree with a correct umpire 50% of the time: it would have to contain some kind of easily-diagnosed systemic malfunction where it called every third pitch the opposite of what the pitch was observed as, kind of like John Shulock.

That’s pretty much worst-case: If you reduce the margin of error in locating the ball, the agreement rate to a defined strike soars. The umpires’ representation has damaged their own credibility here, like anyone who tries to convince you of something with a gaudy and obviously false statistic (I remember someone in college telling me one in three American men would commit rape in their lifetime, for instance, which is ludicrous).

At the same time, evaluating job performance makes people nervous. The guy across the aisle who comes in at 10, 10:30 every day, goes for a two-hour run at 11:30, and after that exhausting day leaves at 4 doesn’t want his productivity evaluated if he can avoid it. Introducing evaluation into the process will almost always result in resistance and efforts to sabotage it.

Umpires have fought evaluation. They’ve seized on some excellent-sounding arguments (debunked here) about how QuesTec forces them to call games differently in parks that have the system compared to their true strike zone. They’ve blamed questionable calls on QuesTec, gotten pitchers to lash out and generally made a scapegoat of the system.

It’s disappointing. Umpiring has made great strides in the last couple of years. The greatest move was the purge-without-blood where a ton of baseball’s worst umps all resigned in a particularly boneheaded labor maneuver, but it goes beyond that. You’ll now see more umps run out to make better catch-or-trap calls, or watch a line drive hit the wall or the painted line. They hustle more, and seem to generally show better temperaments. I sometimes miss the Ron Luciano days when umpires had personalities, but the umpiring crews today are so much better than the old crews that it’s a compromise we’re all willing to make.

I’d have liked to have seen the umpires try and work with QuesTec a little more constructively. Sandy Alderson’s a smart guy, and he wants what’s best for baseball. There’s a compromise to be found. That said, Major League Baseball hasn’t done anything to make the process easier. MLB would have been much better if it’d have worked out a partnership with QuesTec, with trials of different systems and a lot more feedback on what pitches the devices seemed to do well with and those it missed. In trials, it’s a lot easier to figure these things out–you can set up 60 cameras if you want, to evaluate the different two- and three-camera set-ups, and watch tons of video on disputed calls. If you were thorough, you could go so far as to test how operator training and pay affects calls (as umpires maintain). But as usual, it’s too late to undo the damage caused by half-assed implementation, so now we have no choice but to try to look at the bright side.

Any step toward improving calling balls and strikes is for the better. Even if the umpires’ worst claims about QuesTec are correct, the system would still be useful. Every baseball play starts with a pitch. It’s baseball’s line of scrimmage, and while it’s historically been inconsistent, that’s all the more reason for baseball to be working to improve it, and by extension, the game.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe