If you were at the Sloan MIT Sports Analytics Conference or on Baseball Twitter this weekend, you’ve already seen this video, but you probably won’t mind watching it once more.
That’s a sneak peek at Major League Baseball Advanced Media’s new play-tracking system, a marriage between radar (via Trackman) and camera (via ChyronHego) technology that promises to measure every movement that takes place on a baseball field. The as-yet-unnamed system, which was announced by MLBAM CEO Bob Bowman and CTO Joe Inzerillo at Sloan on Saturday, will be functioning at three ballparks—Miller Park, Target Field, and Citi Field—for the full 2014 season, with the rest theoretically rolling out by Opening Day 2015.
It’s been almost five years since we first heard about the possibility of an all-seeing system installed in every major-league park. In July of 2009, Bowman said he “hoped to have meaningful data flowing…from all 30 stadiums in 2010,” so some skepticism is understandable. But there’s an important difference between the 2009 claims and the ones from this weekend: All we’d seen before Saturday were mockups and proofs of concept. The numbers in the Heyward video, though, are the real thing—and since that play took place in July, we know that this presentation was in the works for a while. “Those are actual calculated data points from the plays…not mock-up values,” says Cory Schwartz, VP of Stats at MLBAM. “That thing’s operational,” added Lando Calrissian.
This sort of information doesn’t stimulate every fan’s pleasure centers to the same degree. It’s possible to appreciate the difficulty and aesthetic appeal of Heyward’s catch without knowing all the numbers involved, and for some people who still suffer from painful flashbacks to math class, it might not be welcome news that baseball broadcasts are about to start looking like word problems. (“If Jason Heyward accelerates at 15.1 ft/s and reaches a top speed of 18.5 mph, how fast must his first step be for him to catch a ball with a hang time of 4.0 seconds at a distance of 83.2 feet?”)*
But you’re reading Baseball Prospectus, so you’re probably not one of those people. You’re probably one of the people for whom this is just about the best thing imaginable. So let’s discuss what we learned about the system, what we still don’t know, and what it all might mean for the future of baseball analysis. (Note: You can watch the full presentation and Q&A on YouTube, though you will have to pay or sign up for a free trial.)
How Does it Work?
For a while now, we’ve been waiting for FIELDf/x, the full-field extension of Sportvision’s PITCHf/x, HITF/x, and COMMANDf/x ball-tracking technologies. It now looks like it’s never going to get here. Sportvision’s system was completely camera based, which worked well for some requirements but not as well for others. As a result, it seems to have been surpassed by a more multi-disciplinary approach. Just as baseball teams (and baseball websites) have found that a blend of stats and scouting leads to better information and better decisions, the engineers in charge of digitizing every action on the field have found that blending two types of data collection leads to higher-quality capture.
The new system uses two camera arrays, each consisting of 3-6 cameras (depending on ballpark geography) installed 15 meters apart. The offset between the two cameras gives the system stereoscopic vision, much like human eyes, enabling it to judge depth and movement with great precision. But the new system also uses radar, and the combination of the two technologies further enhances its abilities. As Physics of Baseball researcher Alan Nathan wrote, “Video is the natural technology for tracking players on the field…Radar is the natural technology for tracking the batted ball.” Thus, Nathan concludes, “the merger of the two technologies, radar for the ball and video for the players, looks like it takes full advantage of the strengths of both.”
What Are the Obstacles?
Despite the dazzling presentation on Saturday, the system isn’t quite ready for primetime. There are still at least three potential problems that have to be dealt with before anyone at MLBAM will declare their mission accomplished.
The first is that the system is still measuring center of mass, not the player’s extremities. In other words, it can tell you if a runner’s body beat the ball to the bag, but it can’t say whether he got his hand in under a tag. The limiting factors, according to Inzerillo, are processing power and pixels: It takes a lot of both to resolve each figure in such fine detail. Given that cameras continue to add pixels and computers continue to add processing power, Inzerillo said, it’s “an inevitability” that “we will be able to eventually resolve body position down to where the guy’s fingertips are.” Of course, plenty of analysis is possible with center-of-mass tracking alone, as we’ll get to below.
Another obstacle is “occlusion,” the technical term for one player passing in front of another. When occlusion (or a collision) strikes, the computer can get confused, losing track of which player was which before the occlusion occurred. Inzerillo mentioned that this is less of a problem in basketball, both because the players are bigger and cover less ground and because cameras can be mounted overhead in indoor arenas. But he also said that he and his team have developed physics models that ensure that when the computer does have to “guess,” it picks the right player 80-90 percent of the time (up from 60-70 percent). When it guesses incorrectly, the operator can manually correct the record, and the computer will backdate its tracking data with the correct IDs attached to each player.
The third problem is simply having a hard drive and a database big enough to house all the data and nimble enough to allow for analysis. The radar component of the system samples the flight of the ball 20,000 times per second, and the camera component records the position of every player 30 times a second. That data, in addition to video stored for validation purposes, adds up to seven terabytes of information per game. In the future, MLBAM might not need to store the video, but for now, it’s a vital troubleshooting tool.
Claudio Silva, a PhD and professor of computer science and engineering at NYU Polytechnic School of Engineering who helped MLBAM design software for the system, explained, “One of the things we had to do to be certain [that the data reflects actual gameplay] was to design a whole validation scheme where we recorded our own video and designed algorithms that would independently generate some of the metrics to be compared to the data that we were getting out of the vendors.” Silva went on to say, “One of the goals of what we wanted to achieve was to virtually recreate the game using the geometric data.”
If the system is functioning properly, it should be able to reconstruct all of the events of a game by looking only at the data. To test this, MLBAM has used simplistic “LEGO man” renderings of events like this:
As well as more sophisticated representations like this:
If the system records a play incorrectly, MLBAM can tinker with the algorithms until the kinks are worked out. This year, they’ll be sharing the data with baseball operations departments for an additional level of validation. And if all goes well, we might see some of it. Which brings us to…
What Don’t We Know
We’re still missing the most important piece of information: When, and to what extent, will we get to play with it?
MLBAM’s representatives danced around answers to those questions, possibly because they aren’t yet sure of the answers themselves.
Bowman began his presentation by stating that the technology would be available “for baseball operations and some fan use for 2014.” Later, he said, “This year, fans will be able to see this data and these videos,” Bowman said. Answers don’t get much more vague than that.
In the Q&A portion of the presentation, Inzerillo opened up a bit more:
If you look at what MLBAM has done historically on this front, we’ve made a lot of data available and we’ve had really good collaboration with the community. I would expect that whatever policy we come up with as far as dissemination, it’s going to live within the boundaries and the guidelines that we normally have done.
That sounds promising, although it’s also worth noting that releasing nothing to the public—as has been very close to the case with HITf/x—is also within the boundaries of what MLBAM has done. And if they decide to do that again, they’ll be well within their rights.
As BP Director of Technology Harry Pavlidis described it to me, there are four levels of data processing involved in a project of this scope:
1) Raw Data
2) Calculation-Friendly Data
In this case, level one corresponds to raw camera/radar information, or pixel data. Level two, which Trackman would presumably provide to MLBAM, is a usable stream of information: timestamps, events, player IDs. Level three translates the information from step two into something more easily understood: “Player X took Y seconds to get to the ball.” Finally, level four adds insight to the level-three info: “Player X took Y seconds to get to the ball, and his route was inefficient.” The numbers in the Heyward video are a mix of level three and level four.
Ideally, we’d get a data dump of level-two info, as we have for PITCHf/x. With that information, any avenue of analysis would be open. But it’s possible that we’ll have to settle for seeing some pro-processed level-three and –four output in GameDay or on TV broadcasts instead of doing any number-crunching as a community. While more graphics like those in the Heyward video would undoubtedly make broadcasts better, we might see some stagnation in online analysis if that’s where the access ends—especially if PITCHf/x is phased out in favor of the new system after 2014, in which case we might have less access to data than we do now, and an even greater knowledge gap between front offices and fans than already exists. This could go either way.
What Could We Do With It?
We’ll have to wait to see whether we’ll get our grubby hands on the data, but whoever has access to it will be able to move miles beyond where we are now with publicly available statistics. With a fully functional combined camera/radar system, an analyst (or team of analysts) could design a completely process-based player rating system in which outcomes—whether an outfielder catches a fly ball, whether a hitter’s batted ball falls for a hit—would be almost irrelevant.
Some teams are already doing this with HITf/x by focusing on speed off the bat and launch angle instead of hits and outs or even groundball rate and fly ball rate. If a batter is hitting the ball hard at an angle that typically leads to success in the long term, it’s not a cause for concern if he happens to hit a few line drives right at opposing fielders. Similarly, if a fielder is getting good jumps, acceleration, and top speed and taking efficient routes, it would be easy to excuse the occasional bobble or a change in the distribution of balls hit in his direction. Over three years ago, Greg Rybarcyzk (who was recently hired by the Red Sox) and Sportvision’s Kate McSurley opened our minds to new possibilities by proposing the framework for a process-based defensive metric based on data we didn’t yet have. Now we—or some lucky baseball operations employees—are on the verge of applying that framework to real information.
We’ll also be able to analyze aspects of the game that we’ve largely overlooked. For instance, how big a lead does each player typically take, and what’s the optimal lead length?
How efficient is a baserunner’s route, and how much can its efficiency be optimized through instruction?
Would Omar Infante have scored on this play if he’d reached his top speed sooner? Or if the outfielder (not shown) had taken more than 1.1 seconds to transfer and release the ball?
How much of an “effective velocity” boost do certain pitchers gain from an extra-long extension?
(Note, just because it's kinda cool, that the ball left Fielder’s bat traveling faster than it did when it left Harvey’s hand.)
And how does the spatial map of a plus fielder differ from that of a subpar one?
For now, we’re content just to look at the pretty pictures. But there’s no end to the questions we can ask, and—theoretically, at least—no end to the questions we could answer. Of course, just because the data will be provided to every team doesn’t mean they’ll all be on an equal footing when it comes to leveraging it. Teams that are better equipped to handle and extract insights from large amounts of data will have an edge; expect to see the internet continue to be picked clean of its most astute analysts. The more a team has already invested in its infrastructure and analytics department, though, the less it might want to see the data go public and jeopardize whatever advantage it’s already carved out.
Another question to consider: What will this new system mean for traditional scouting? Some teams have already shifted resources away from advance or pro scouting as advances in technology have made it possible to fulfill some of the functions of a scout without hitting the road. When someday this system is installed in every professional ballpark, will pro scouts be reassigned to amateur coverage? When the system is installed in every college ballpark, will they be reassigned to the indy leagues? Will they eventually run out of places where the system isn’t installed? And if so, can the human eye add anything that isn’t captured by a complete record of where everyone was at all times?
Yes, probably. There will always be some role for human evaluators, whether it’s assessing makeup or determining whether an efficient fielder positions himself or has some help from his coaching staff. Still, it’s hard not to imagine that that role won’t be reduced as baseball continues to embrace Big Data. In the final part of my diary series from Scout School, I wrote, “I’m not sure I’d encourage my grandson to go into scouting; it hasn’t happened yet, but in the long run, technology may make teams less reliant on organic eyes.” It’s going to take decades, but it’s hard not to see Saturday’s presentation as a step toward that eventual outcome. Since before “Beer or tacos?” we’ve been talking about the benefits of drawing upon both statistical and scouting information. But the distinction between the two is about to disappear.
During his presentation, Bowman said, “We think it’s going to change the way we argue about the game, but we don’t think it’s going to settle any debates. We hope it starts more.” That’s one way to make the system seem less threatening, but I’m not so sure it’s true. This is a system that’s designed to settle debates, and that's not something to be scared of. The debates it does settle will be the ones we would never have ended without it. And the discussions it starts will break ground that otherwise would have been buried.
*The video provided the answer—.02 seconds—but the presentation yielded another interesting tidbit: Heyward’s quick first step was just six degrees off the perfect line to the ball. No wonder his route was 97 percent efficient.