August 29, 2010
Between The Numbers
The PITCHf/x Summit Quasi-Liveblog
I have seen the future, and its name is FIELDf/x. OK, so we kind of knew that. But today, FIELDf/x started to seem a lot more real, and even more exciting than I’d imagined. You may have noticed that BP had a man on the scene at Sportvision’s PITCHf/x summit whose liveblog was actually live. So why am I doing this, when Colin already did? Well, for one thing, Colin arrived fashionably late, and I was all over those first 14 minutes that he missed. For another, his computer died before a lot of the fun started. And for still another (this is a third reason, now), I thought it might be fun to do a Simmons-style quasi-liveblog (written live, published later) that would free me from worries about frequent updates, and allow me to write at length. Most likely that length turned out to be a good deal longer than anyone has any interest in reading, but if you’re determined to catch up on the day’s intriguing events without sitting through eight hours of archived video, you’re welcome to peruse what lies below. If you’d like to follow along, here’s an agenda, and here’s where you should be able to find downloadable presentations in the near future.
Here we are in sunny California, home of the cutest girls in the world, if the Beach Boys are to be believed (I gather there’s also a more recent chart-topper that expresses a similar view). Okay, so by “we,” I mean the attendees at the 3rd (annual?) Sportvision PITCHf/x summit, held at the Westin San Francisco in—you guessed it—San Francisco. I, on the other hand, am watching from the other end of the continent, via a webcast that dubiously claims to be “hi-res,” despite being blurry enough to make deciphering text an adventure (I guess “hi-res” is relative, in the sense that there are even lower resolutions at which it could’ve been streamed). And sure, maybe the Beach Boys weren’t thinking of this particular gathering when they extolled the virtues of California’s beach bunnies. But never mind that—it’s a beautiful Saturday afternoon here in New York, and how better to spend it than to watch a video of some fellow nerds talk about baseball in a dark room some 3,000 miles away? Well, to describe the experience at the same time, of course. Let’s get this quasi-liveblog started.
12:00: We have video, but no audio. Fortunately, that PITCHf/x summit logo hanging behind what looks like a podium tells me that I’m watching the right webcam. That’s a relief—you never know what you’ll see on the other end of a webcam these days.
12:02: More from Tommy (maybe Tommy should be the one writing this thing?), this time via Twitter: “I have a lot of respect for the other 13 people who are streaming the Sportvision PITCHf/x conference right now.” Maybe they shouldn’t have put the viewer count on the website—I might have felt a little better about this whole exercise if I could’ve kept pretending that we worldwide webcast viewers numbered at least several thousand strong.
12:02: An Important Sportvision Guy is talking (note: not actual title). I gather from the agenda that it’s one Ryan Zander, offering “Welcome and Introductions.” He tells us that Sportvision’s mission is to “Develop data acquisition technologies that create the most accurate and robust objective data set,” or something to that effect (the slide changed before I could finish plagiarizing). I thought they were only in it to improve the caliber of article that pops up in my Google Reader account, so this comes as news to me.
12:03: Someone just turned the volume way up without warning. I’m wearing headphones and my ears may be bleeding, but I’m not letting go of this keyboard. No pain, no gain.
12:04: We’re watching a cbsnews.com video about Sportvision’s past endeavors. My main takeaway: former Sportvision CTO Marv White (who now works for ESPN) has excellent taste in shirts. The one I just saw would not have looked out of place under a Craig Sager suit.
12:05: An archived video version of Bruce Bochy warns of paralysis by analysis. Noted.
12:06: It’s really dark in this room. I thought California was supposed to be sunny?
OK, now the good stuff: a brief overview of FIELDf/x. When is it going to be available? From a tech side, the intent is a commercial launch by the start of the 2011 season, and they’re on track for that. From the business side, there’s nothing they can announce yet. But…
Who will have access? Intent is to have the data available for all the people who want it, including teams, broadcasters, and analysts. I don’t know that I qualify as any of those, but I’d like some FIELDf/x data, too. Sportvision wants the data to be as ubiquitous as PITCHf/x. That might be the best news we get all day.
The Giants provided data that was captured in AT&T Park (the first test ground for FIELDf/x) this season to the presenters for the purposes of this summit. So, you know, a Giant thank-you to them.
12:09: We’re up to 28 viewers! This summit may bring the internet to a standstill yet.
12:10: Dr. Alan Nathan in the house. Alan looks like a physics professor. Oh wait, he is a physics professor! He also looks like Alan Alda (but isn’t Alan Alda). Nathan gets a round of applause for driving a Prius. OK, now this feels a bit more like California.
We’re going to get most of the non-FIELDf/x-based presentations first, with the exception of John Walsh, who’s presenting from Italy, where it’s already getting late.
Infield Defense with FIELDf/x........... John Walsh
12:11: Nathan: “None of us, by the way, has ever seen John Walsh, so we’re going to see him for the first time.” Even from 3,000 miles away, the tension is palpable.
12:12: OH GOD HE’S HIDEOUS! No, I’m kidding. John Walsh is quite a handsome fellow, even over the internet and from thousands of miles away. “I’m very sorry that I couldn’t make the trip from Italy,” Walsh says. Sure you are, John. It must be rough, having to be in Italy and all.
12:13: We’re still seeing John’s face instead of his presentation. As I’ve just stated, I have nothing against John’s face, but I’m excited about seeing his presentation, too. Okay, there it is. We’re ready to go.
12:14: Walsh’s focus is on the potential of FIELDf/x, and the kinds of information he thinks we can learn from it. He’ll be discussing infield defense, specifically grounders.
12:14 “Ball events”: for every pitch, hit, catch, and throw, the exact time at which it occurred is recorded by the FIELDf/x system, down to 1/15th of a second.
Data includes all players involved in the event—both fielders and hitters. Positions of the players are monitored at every moment of the game, which equates to something like 50,000 measurements of each player on the field, for each individual game.
12:16: Walsh mentions that one center fielder he examined covered 8 miles in a single game, including all movements made. That’s the coolest thing I’ve ever heard, and we’re only 17 minutes in. No wonder you don’t see a lot of chubby center fielders. Okay, so maybe that’s because chubby players can’t hack it in CF, not because players can’t get fat running 8 miles a game. At least, not unless they’re Andruw Jones in Los Angeles.
12:18: I’m working with dual monitors, but my computer won’t let me maximize the video on one screen and type on the other. I don’t expect that anyone out there cares, but I want the universe to know of my suffering.
12:19: Walsh is explaining his method for plotting plays. Dots mark fielders, and stop being plotted when the ball is caught. Arrow vectors indicate angle of the hit, from HITf/x. The presenters didn’t have the coordinates of the ball (it wasn’t included in the current dataset, though it will be eventually), so they had to extrapolate its location from other information.
12:21: We briefly lost John, but he’s back now, although a poltergeist appears to be in control of advancing his Powerpoint slides.
12:23 Now we’re seeing plots of plays with the infield in, and the shift on. The fielder markers are where we’d expect them to be in those situations. That’s not shocking or anything, but still, the backs of the middle-aged heads appear to be eating it up. This is the first time we’re seeing this kind of stuff, so I can’t blame them.
12:26: John points out that it’s interesting that we can see what players away from the ball are doing—are they making any movements toward the ball, covering, etc.
12:26: Can we see infielders leaning one way before each ball is hit? We hear in interviews that fielders want to know what the pitch type is so they can lean in one direction, but do they actually do it?
In the first play John looks at, the 3B leaned to his left (into the shortstop hole) and had to come back in the other direction for a hard-hit ball to his right, but fielded it anyway and started a DP. John finds that the 3B often leans toward the shortstop, while the shortstop often takes a step in toward the batter. We can also measure how long it takes each fielder to react to the ball being hit.
12:31 Now showing some routes taken by the fielders toward ground balls. Walsh observes that routes may initially appear suboptimal, but might not be when you take into account the fact that fielders need to get into good throwing position before gloving the ball.
12:33: You can do more with FIELDf/x than just make a bunch of pictures (not that the pictures aren’t pretty). Average hit speed, time the fielder has to catch the ball, average distance covered. This isn’t anything shocking, but it’s pretty cool that we can quantify it. Interesting stats: 3B have, on average, 1.48 seconds to field the ball—they don’t call it the “hot corner” for nothing. Middle infielders have 2 seconds or more. That half-second advantage allows them to cover more ground.
12:35: I’ve been informed that it’s not actually mother’s-basement black in the Westin San Francisco—that’s just how it looks on the webcast. I’m a little disappointed—I’d prefer to have heard that they had to turn the overheads down when several of the attendees covered their eyes and ran screaming from the unfamiliar light. I blog about baseball and have lived in a windowless basement, so I’m allowed to make that joke.
12:36: Now we’re being treated to plots of ground covered vs. catch time for 4 different positions. On the x-axis, we see the time it took the fielder to get to the ball. On the vertical access, we see how much ground was covered. As John points out, those values should be strongly correlated, but by comparing plots between players, we can see who’s making the rangiest plays.
12:37: What about throwing the ball? The time of each catch and release is recorded, so we can find out how long an average shortstop takes to get a throw off (roughly a second on average, for the ground balls in this sample). Most throws are routine, and players aren’t trying to throw as fast as they can, so an average speed value doesn’t tell us how quickly a player could throw to first base if he wanted to. However, on potential double players, players are throwing as hard as possible, so we can glean a lot from looking at those.
12:38: The total time it takes to complete a double play turns out to be roughly 4 seconds. Life-altering information? No. Glad I know that now? Yes.
12:41: John reaches his conclusion/summary, and tells us, “There’s nothing interesting on this slide.” I beg to differ. Mentions the possibility of studying optimal positioning and “optimal leaning.” I’d love to know who leans most optimally. My money’s on Fat Joe.
12:42 Alan asks for questions for John. A stunned silence ensues. He might as well have been the monolith from 2001, imparting knowledge of a new and vastly powerful technology to an unsuspecting audience.
12:43: Someone finally musters the ability to speak. In response to his question, John answers that one can probably tell from the data whether a player backhands the data or not, but it will require further study, since we have to adjust for the size of the player to speculate about how each fielder might be oriented.
As a reminder, all of the numbers we saw were generated from data on ground balls. Dave Allen will be talking about pop-ups a little later on.
12:48 A round of well-deserved applause for John. Now he can go back to lounging around his villa, or whatever one does while living in Italy and not delivering FIELDf/x presentations. High drama at the PITCHf/x conference: the time budget is already shot, and we may have to cut into the period allotted for lunch. PITCHf/x analysts need to eat, Alan!
Using Velocity Components to Evaluate Pitch Effectiveness........... Lentzner/Fast
12:49 Matt Lentzner takes the podium. We’re back to covering PITCHf/x data only. Borrring. This is so 2007. Just kidding.
Matt isn’t satisfied with current measures of movement or break; what we have is movement from the POV of the ball, but that’s the wrong frame of reference. How far a pitch moved doesn’t mean it was moving much from the batter’s perspective. “Straight” fastball is delivered off-center, and curves back toward the plate so that it appears to be moving pretty straight by the time it reaches the batter. Slider is moving straight, but because it’s released at an off-angle, it appears to be sliding across the plate.
Location and speed are accurately represented, but movement can be improved by switching to the batter’s frame of reference.
12:52: We’re still at 25 viewers. And the chat application below the video isn’t working, so I have no choice but to continue talking to this Word doc.
12:53: Doesn’t matter how far the pitch moved, matters how fast it seems to be moving when the batter’s trying to hit it.
12:54: Note: I’m confused for the first time today, but hanging in there. It won’t be the last. Matt: “I’m not really the master data guy.” That came out sounding like something else. But apparently he is the master diagram guy—or someone is. Seriously, check those displays out later, if you get a chance. They’re pretty spiffy.
OK, here’s the real master data guy: Mike Fast, and his well-groomed goatee.
Mike explains that when a ball’s location and movement correspond, a higher whiff% results.
12:57: Mike Fast has left the podium! He’s going rogue! No, he’s just taking his hand-held microphone up to the screen so he can gesture at the screen more effectively. Someone get that man a laser pointer.
12:58: The typical batter’s swing plane is roughly +7 degrees (that’s a slight uppercut). Not coincidentally, that angle corresponds to where batters contact the ball the best, which shows up on Mike’s graph.
12:59: Someone did just get that man a laser pointer! It’s almost like I’m in the room. Asking for a laser pointer at a PITCHf/x summit must be like asking if there’s a doctor in the house at an AMA convention. Notice that I’m giving fewer specifics during this presentation—that’s both because this presentation is a bit more difficult to explain without graphics, and also because Mike Fast is really, really smart. I’ll have to digest this later.
1:01: Showing list of “High Heat and Low Sinkers,” which ranks starting pitchers by, well, how high they throw their heat, and how low they throw their sinkers. Derek Lowe tops the sinker list, followed closely by Aaron Cook and Brandon Webb. Chris Volstad also representin’.
1:05: Matt takes over: “Back to the touchy-feely stuff here.” Whew. My brain needs a breather. When does Mike Fast’s brain get a breather?
Why a breaking ball “hangs” if thrown high: because the downward velocity component from the batter’s perspective is reduced. Why you want to throw a 4-seamer high and a 2-seamer low: you’re getting an additive effect. A pitch that doesn’t have as much hop, and that’s sinking downward, you’d want to deliver low in the zone, to maximize the downward velocity.
This can help us compare different types of pitches to each other: for instance, is this slider as good a pitch as that fastball. Also, enables us to perform detailed analysis of a pitcher’s repertoire: we could determine the best arm angle and position on the rubber to maximize the effectiveness of each pitch. For example, if he stands on a certain side of the rubber, it might make his fastball play up. If Brian Bannister is watching, we might see him hopping all over the mound during his next start. Don’t be alarmed.
1:11: First female sighting, and she’s most definitely not filming an episode of The Real Housewives of D.C. Take that, Dibble.
1:14: Alan Nathan and Matt Lentzner talk about how HITf/x data might improve their work, and how much they’d like to have it. Uncomfortable laughter ensues. If I were a Sportvision official or team reap in the room with access to HITf/x information, I’d be just a little scared right now.
PITCHf/x Application in Player Development and Evaluation........... Glenn Schoenhals
1:14: Dr. Glenn “Butch” Schoenhals and his colleague Fred Vint at Scientific Baseball in Edmond, OK have spent the last 9 months trying to merge a bunch of discrete pieces of technology into one smooth process, so they could bring a player to their facility and analyze him efficiently, and, well, scientifically. Schoenhals has been a youth pitching coach for 15+ years, in addition to being a neurosurgeon. Vint is an experienced video analyst. They want to educate players of all ages using the most advanced technology possible. Overeager Little League parents are already frothing at the mouth at the prospect of getting little Junior f/xed.
The pair installed 8 cameras (3 of them of the PITCHf/x variety) at an indoor lab dubbed “The Zone.” These cameras capture the pitcher’s delivery from 3 views, and overlay the PITCHf/x graphics on an image of the CF view.
1:19: The two are talking about raising an army of 10-year-old Mariano Riveras (or something like that). Frightening, even before Fred jokingly calls Butch Dr. Frankenstein, but I, for one, welcome our new PITCHf/x overlords. If this thing catches on and spawns a new generation of scientifically trained pitchers, Jamie Moyer might be out of a job before he hits 60.
1:21: Vint and Schoenhals are showing images of “The Zone.” Futuristic-looking machines abound. I’m pretty sure some of these images were lifted directly from the Ivan Drago training sequences in Rocky IV. I’m waiting for the slide with the video of Little Leaguers saying “If he dies, he dies.” See that? Rocky reference. I wasn’t lying when I said this quasi-liveblog would be Simmons-style.
1:26: They also train umpires in “The Zone,” which offers “Real-time feedback in a no-consequence environment.” Of course, if you need a no-consequence environment and you can’t make it out to Edmond, OK, there’s always a Royals-Indians game in September.
1:27: Doc’s “discoveries.” What determines arm slot? The tilt of the torso and the lean of the body. With pitch plots, we should be able to determine arm slot to within 2 degrees.
1:32: The video feed freezes. Madness! Mayhem! I have absolutely no idea what’s taking place at the Westin San Francisco right now. I knew I should’ve bought that plane ticket/stowed away in Colin Wyers’ luggage.
1:33: And we’re back. Most youth coaches advocate hitting the top half of the ball. Butch isn’t buying it. The ideal swing path is level, rising slightly to meet the ball. What gives, every baseball coach I ever had?
1:35: High-speed video of Bengie Molina hitting a home run. I’ve never seen his belly jiggle with such remarkable clarity.
1:36: I’ve been watching three backs of heads since this shindig got under way at noon, and I’d love to know whom they belong to. One is balding, another wears glasses. Okay, this is a PITCHf/x summit—that probably doesn’t narrow things down much.
1:38: Summary slide. Strengths: will create unstoppable army of baseball robots. Weaknesses: unstoppable baseball robot creation is kind of expensive. My take: If you love your kids, they’re reasonably coordinated, and you want to retire on their MLB earnings, you’ll bring them to Scientific Baseball. By next year’s summit, they’ll be decanting fetuses straight into “The Zone,” where they’ll learn to throw harder than Livan Hernandez by the time they’re old enough to toddle.
1:44: Questioner asks what’s missing. Basically, they need a programmer to present their data more cleanly. Too bad there aren’t any programmers in the room.
1:47: Butch drops a quote widely attributed to Herbert Spencer: “There is a principle which is a bar against all information, which is proof against all arguments and which cannot fail to keep a man in everlasting ignorance—that principle is contempt prior to investigation." Hopes and believes that “Scientific Baseball” doesn’t have to be an oxymoron.
1:48: Twelve-minute break to let adrenaline levels subside. The camera is trained on the Sportvision logo at the front of the room, but the top of Mike Fast’s head just entered the frame at the same time as the back of Jeremy Greenhouse’s! I’ve never felt so alive.
1:56: Okay, Jeremy Greenhouse’s head is taking up the entire frame during this intermission. Can this be an accident, or does he love the camera? Inquiring minds.
Okajima’s Mystery Pitch........... Matt Lentzner
2:00: Now that Jeremy’s done hamming it up for all 23 viewers, we can get back to business. Matt Lentzner returns to the podium to discuss Hideki Okajima’s “Rainbow Curve.” The pitch has almost no break—it’s slower than a slider, but faster than a curve.
Lentzner brings back the “Pitching Peanut” from last year, which shows pitch types by arm angle, and tells us that “finger drive” typically adds 9 mph to a pitch.
2:03: Message from Jeremy: “Sorry about that.” Uh huh. Sorry enough not to hog the frame during the next intermission? We’ll see.
2:05: Takeaway: While everyone’s been paying attention to Dice-K, Okajima may have been the one throwing a gyroball this whole time.
2:09: An attendee on the scene tells me there are no fewer than 5 women in the room, and not one of them was brought by her husband. Dibble, on the other hand, has yet to be seen. Perhaps he’s still conveniently on vacation.
Leaving the No-Spin Zone........... Alan Nathan
2:10: Emcee Alan Nathan takes over for his own presentation on the spin of batted balls. Why is there spin? Mostly due to friction between the ball and the bat. Topspin makes line drives nose-dive. Backspin keeps the ball in the air longer. As Alan points out, if you’ve ever watched or played baseball, you probably know this. But his presentation has angles and graphs.
2:11: Did the feed just briefly go dark again, or were the 23 other viewers and I just blinded with science?
2:12: Alan Nathan is holding a bat! Everyone back away slowly. No, no, it’s just for demonstration purposes. Plus, he just called himself a “pencil-necked geek,” so there goes the intimidation factor.
2:15: Nathan shows examples of balls landing with backspin, which makes for some tricky bounces. Balls hit to left or right tend to curve toward the foul line; balls hit to center tend to slice (to RF if hit by a righty, to LF if hit by a lefty).
2:17: Nathan is trying to develop a model for the bat-ball collision that predicts the spin of the batted ball. We (and by “we,” I mean Dr. Nathan) have a good understanding of head-on collisions (on the sweet spot of the bat), but not much knowledge about oblique collisions. In order to predict the spin of the ball, we need to do controlled lab tests, and then compare with HITf/x/HitTracker data.
2:19: From the safety of his mysterious laboratory, Nathan’s souped-up pitching machine sent balls flying at roughly 100 mph toward a cylindrical wood surface (built to mimic a bat) provided by Rawlings, and recorded the rebounds at 1000 FPS, then analyzed the recording (by means of visible markings on the ball) to get the final spin, speed, and angle.
2:21: Watching high-speed video: when the ball goes in with top-spin and contacts one part of the bat, it also goes out with top-spin, which means that the spin reversed. Off a different part of the bat, it goes in with top-spin, and goes out with back-spin. According to Nathan, the final spin of the ball is large, and nearly independent of the initial spin.
2:23: Results suggest a larger spin than previously thought—it’s a “gripping” phenomenon, not a “rolling” motion like a bowling ball. The “gripping” produces the spin. How? Physics, that’s how. Stop asking so many questions.
2:25: Nathan is still working on a complete dynamic model to predict the spin of a batted ball post-collision.
2:32: Lengthy physics-based discussion ensues.
2:40: 25-minute break for lunch. It’s hard to present on an empty stomach.
FIELDf/x System Overview........... Vidya Elangovan
3:08: Time for the afternoon session. Vidya from Sportvision now takes the stage to provide an overview of FIELDf/x. Or, you know, to talk about a sale going on tomorrow. One of those two things. Okay, enough Dibble jokes.
3:09: FIELDf/x captures player positions, identifies the players, captures “ball events” (pitching, hitting, catching, and throwing), and records the ball positions for hit and fielded balls (this bit is still a work in progress, and wasn’t included in the data distributed to the presenters, which is why it was estimated in some of the presentations).
First version has been installed at AT&T Park since April 2010—the game is “fully tracked and resolved” within 20-30 minutes after the final out. More testing at additional ballparks is planned for this season, with an eye toward a more extensive roll-out for the 2011 season. The information is recorded from 2-4 high-res cameras (let’s hope they’re higher-res than this so-called “hi-res” webcast) that cover the whole playing field, ideally from multiple angles. Computer algorithms automatically detect players and the ball (separating them from the background image), and automatic tracking identifies which player in one frame is the same player in the next. Finally, a mere human operator/observer verifies, corrects, and assists his or her machine masters, oiling various gears and doing his/her best to ensure that they don’t go all Skynet on us.
3:17: Vidya is showing video stills from the FIELDf/x cameras, then running through several of them to show how the computer tracks and analyzes the data. Picture the view from Arnold’s perspective in the bar scene at the beginning of Terminator 2, and you’ll have a pretty good idea of what this looks like. If the FIELDf/x operator ever hears one of the algorithms saying “Your clothes—give them to me” in thickly accented English, things have gone too far.
3:18: The challenges: uncontrolled lighting (clouds, player shadows, stadium shadows), “tricky” player uniforms (green can blend in with the grass—way to go, A’s), players standing still (FIELDf/x cameras, like our eyes, are very sensitive to motion, so they don’t suffer lollygagging lightly). Sportvision is attempting to track center of mass, projected down to the ground, but shadows can make the data more difficult to analyze. The camera’s dynamic range can’t always see players and balls in shadow, so post-processing tricks have to be employed. Player collisions also present a problem.
3:20: As does storage. A season’s worth of high-res data from all 30 stadiums—just the video alone—would require disc space into the petabytes. And I thought I was pretty cool for having a couple terabytes at my disposal. Not only does all that info to be stored somewhere, but it needs to be stored in a way that can be analyzed.
3:22: Presenters were provided with 13 games of data, included raw positions of where players were in time, and some ball events.
3:27: Vidya is now showing us a slide entitled “Why is it interesting?” As if anyone watching needs to be convinced.
3:29: Question about whether a ball that rolls away (say, a passed ball) would be tracked. Vidya explains that the algorithms are constantly being improved, and that theoretically, that should be possible.
3:30: Question about estimate of accuracy of player position. Sportvision hasn’t yet compared the results produced by the computer to other technologies or tracking systems. At this time, a human operator is marking events, but in the future, that process will be almost fully automated. A questioner asked about Sportvision’s camera FPS and resolution in megapixels, but Vidya ain’t telling. She does mention that PITCHf/x and FIELDf/x timestamps should match up within a few millisecond, which should be good news for cross-f/x analysts.
From Raw Data to Analytical Database........... Peter Jensen
3:34: Peter Jensen (who looks a bit like Epic Beard Man) takes the stage to discuss how one transforms the data being collected by Sportvision into a form that can be analyzed.
3:37: Jensen is showing us how the sausage is made, on the analysis end. The presenters received 400,000 lines of code per game, and Jensen says the data presentation will need to be tweaked to make it more user-friendly.
3:40: Peter shows us an animated diagram of the data overlaid on an image of a field. Dots representing the batter and fielders move around as the time stamp advances, and the whole production can be stopped at any time. That’s the end result we’re looking for, but it takes a lot of work to make that vision a reality.
3:44: Now we’re looking at another diagram of the data, but this time the entire play is presented in one image. Fielder and batter trajectories are represented as colored lines.
3:45: Jensen slide: “Questions FIELDf/x Can Help Answer.” I don’t see “The meaning of life” listed anywhere, but the image is a little blurry, so I might be missing it. Everything short of that appears to be included.
3:50: I’m not really a database guy, but my impression is that Jensen is telling Sportvision how their data presentation could be improved. I don’t want to speak for my 29 fellow viewers, but I think it’s safe to say that the online audience is hanging on every word.
3:52: A few minor technical difficulties. Jensen: “Sorry for all the difficulties with this. Worked just fine on my computer at home.” Nice to know that even hardcore sabermetricians need tech support sometimes.
3:54: Jensen is showing us the macros he used to create the animations we saw from the raw data provided by Sportvision. You know those images of endlessly scrolling data from The Matrix? This is less green, but looks like that in all other respects. Peter Jensen is sabermetric Neo.
3:58: Jensen is showing us one of his play animations alongside an actual video of the play. They match up pretty well, but FIELDf/x has identified an umpire as a baserunner, rather than as a man in blue. That might be a little embarrassing, if FIELDf/x were capable of feeling embarrassed.
4:00: Jensen concludes that the system is complicated, but worth the effort—if it can save an organization one run a year, it’s worth several hundred thousand dollars, and he thinks it’s capable of doing much more than that. Peter thinks that player evaluation, player education and training, and player health and aging are the areas in which FIELDf/x has the most potential to help a major-league team.
4:04: Unofficial count of teams with reps in attendance, provided by a friend on the inside: 12.
Using FIELDf/x to Assess Fielders’ Routes to Fly Balls........... Dave Allen
4:07: Dave Allen takes the stage. Rob Neyer recently described Allen as the “non-stoned, baseball-research version of Hyde from That 70s Show.” He wasn’t lying. Allen also looks a lot like David Appelman, bringing the count of FanGraphs writers named David who look like David Appelman to a grand total of two. Allen apologizes to Jeremy and/or Greg Rybarczykfor potentially infringing on their presentations to come. Are you going to take that lying down, guys? I’d love to see a WWE-style confrontation here, even if the whole thing turns out to be a work.
4:08: Allen brings up concern over the reliability of fielding metrics and inaccuracy and bias in batted-ball classifications/locations, undoubtedly warming Colin’s heart. Allen also points out that FIELDf/x offers a way to address these problems.
4:09: Allen reconstructs the paths taken to a ball from the time it left the bat to the time the play was completed. Includes image of where each fielder started before each play. CF is a bimodal distribution, while other positions look more like smears. The whole thing looks something like a Rorschach blot. Allen zooms in to show each fielder starting point by position, broken down by batter handedness. FIELDf/xstacy.
4:12: We can assess how direct each path taken to a ball was. Field trajectory represented by a series of circles—farther apart when fielder is moving quickly, so you can tell whether a fielder is making the catch on the run, or coming to a stop before gloving the ball. Data is captured in yards, though it can be converted into other units. Dave shows some circuitous routes, as well as some that appear to be optimal. Straight line to the ball can be plotted against the actual path taken, to see what the deviation looks like.
4:20: Well-deserved round of applause for Dave Allen. Questioner brings up the role of wind, which could influence the routes taken by outfielders, and asks whether Sportvision will be incorporating weather information into the data? Answer: “that’s absolutely something that we can throw in there with the data.” We just broke 30 viewers. Word must be catching on.
4:30: Peter Jensen comes back on to run through some video of plays he’d looked at with the FIELDf/x data. As in, actual video. Who knew it was possible to embed MLBAM video? Sabermetric Neo strikes again. I’d like to think that as Jensen plays these videos, someone, somewhere in this conference room is asking, “What’s he doing?” And someone else is whispering, “He’s beginning to believe.”
4:35: We’ve entered a 20-minute break period. Sabermetricians love to schmooze.
4:55: We’re back. Sabermetricians are also pretty punctual, it seems. When they say 20-minute break, they mean 20-minute break.
Measuring Base Running with FIELDf/x........... Mike Fast
4:56 Mike Fast returns to the fray, this time to talk about measuring baserunning with FIELDf/x.
4:57: Pizza Cutter shout-out! Of course, that’s former BP author Russell Carleton, to the layman.
4:58: Diagram of an “Example Baserunner Track,” showing the complete trajectory of a runner who doubled, moved to 3rd on a groundout, and then came home on the 3rd out. What can we learn? Time to first, maximum speed, acceleration, optimal running routes, size of leads, and stolen base attempts.
4:59: Player may appear not to have touched the bag in these diagrams because Sportvision tracks center of mass and player is often leaning when rounding the bag, meaning that his feet won’t be directly under the center of mass.
5:01: Diagram of an unspecified runner (we wouldn’t want to hurt anyone’s feelings) going from first to third on a LD double to right. Speed peaks around 18 mph. Starts moving before the pitch is released, gets up to about 6 mph on the initial break when the pitcher delivers the ball, then slight delay before the ball is hit. Each milestone (ball delivered, ball fielded, etc.) marked clearly on the graph. Great stuff.
5:03: Acceleration graphs. Acceleration peaks at 25 ft/s^2. Mike wanted to determine the best measure of reaction time, and settled on the time from when the event occurred (ball was hit) till runner reached peak acceleration, which turned out to be .6 seconds.
5:07: Another anonymous runner. First runner was someone with a reputation as “fast,” second is someone with a reputation as “slow.” Even the “slow” runner gets up to 14-15 mph while rounding the bases, and actually accelerates more quickly than the first guy. Fast echoes a comment made by Jensen earlier: even the players whom we think of as being “bad” at certain aspects of baseball are still really, really good.
5:10: We’re now looking at diagrams of the size of leads off first base. Some of this early FIELDf/x analysis is of the “Hey, look, the data captures what’s actually happening” variety. And that’s just fine. We’re five hours in, and I know I’m not even close to tired of it yet.
5:14: Fast clarifies that these velocities are calculated, not measured. In other words, FIELDf/x doesn’t tell you how fast a player is moving at any particular time; the analyst derives that information from the distance traveled and time elapsed.
5:19: Questioner asks Fast how FIELDf/x deals with sliding runners. He refers the question to Sportvision’s Greg Moore, who repeats that Sportvision tries to find the center of mass and project down to the ground. The projection to the ground might have some degree of uncertainty during slides at this stage of development. There is some potential for the data to be detailed and granular enough for the cameras to capture something like an outstretched arm or hand, but that would take higher-resolution images, and isn’t feasible at this stage.
FIELDf/x of Probabilities: Converting Time and Distance into Outs ........... Jeremy Greenhouse
5:23: Time for Jeremy Greenhouse to make an appearance that doesn’t involve upstaging the hanging FIELDf/x summit banner during intermissions. Just kidding, Jeremy.
5:24: Jeremy leads off with a nice flowchart of the possibilities provided by each Sportvision data system, and then moves into his attempt to model the probability of a baserunner reaching safely. Dependent variable is whether the runner is safe or out, incorporates a range of inputs into a local regression. Not much data to work with: data provided by Sportvision and the Giants included only 4 stolen base attempts.
5:25 Catcher “pop” times were all between 2.0 and 2.2 seconds; runners had far more control over success of stolen base attempts than catchers.
5:27: All-time high of 33 viewers! Jeremy turns out to be the afternoon’s biggest draw.
5:29: Analysis of stolen base success probability as a function of lead length. Only 4 attempts, but with more data, this could be very instructive for training purposes.
5:30: Greenhouse hemorrhages nerd cred by confessing that he doesn’t know how to operate a laser pointer.
5:31: Jeremy takes a similar approach to modeling the probability of a fielder catching the ball, using only distance, time, and hang-time as inputs because of the limited extent of the data. This process would allow us to isolate positioning in evaluation, and enable us to evaluate the routes and range of individual fielders. On balls that weren’t caught, he had to estimate hang-times and landing points, since the complete ball trajectory wasn’t provided in this dataset.
5:35: On 95% of flies, it was readily apparent whether the fielder would catch the ball for the duration of the play. Only rarely was the outcome truly in question. Jeremy runs through each of those borderline plays, showing the trajectories of each primary fielder colored to indicate the out probability at each given interval.
5:39: Brad Hawpe play: starts with a >80% chance of catching the ball, but freezes in place and fails to make the play. Difficult to represent visually, because the out probability plummets while Hawpe stands in place and time elapses. In a different Hawpe play, his first step gives him a lower probability of catching the ball, since he broke in the wrong direction. Rumor has it no Rockies reps are in attendance, but they’re not missing out, since they’ve enjoyed a front-row seat for this sort of action for the last several years.
5:43: Jeremy wraps up and gets a nice hand. 34 viewers! Talk about going out on top.
Where Fielders Field: Spatial and Time Considerations........... Matt Thomas
5:51: Matt Thomas works as a stringer in St. Louis. Initial fielding positions: records where fielders stood prior to plays taking place. Finds that outfielders position themselves farthest from home plate late in the game. Nearest to home plate when pitchers are at bat.
5:59: Color-coded comparison of initial positioning by batter handedness. Matt has found slight differences in positioning between the Cardinals and visiting teams, which he attributes to familiarity with the ballpark.
6:02: Breaks down initial positioning by batting order. Fielders stand closer to home plate for batters likely to bunt or hit the ball weakly, and further away for middle-of-the-order batters. No word on whether fielders telegraph their moving in on weak batters as ostentatiously as possible in order to make weak batters feel even more inadequate.
6:13: We’re down to 26 viewers. Someone get Greenhouse back on stage. If these ratings fall any lower, we might not get picked up for another season.
6:20: The sabermetricians may be getting sleepy. No questions for Matt, so we’re taking a 15-minute break. If history is any guide, that means we’ll be reconvening at exactly 6:35.
6:35: Like clockwork. If sabermetricians ran major metropolitan transit systems, no one would ever be late for work.
SCOUTf/x........... Max Marchi
6:36: Here comes Max Marchi, who flew in all the way from Italy to deliver this talk. Hear that, John Walsh? When you live in Italy and you’re delivering a talk in San Francisco, you actually go to San Francisco. It’s just common courtesy. I swear—some people.
6:38: Max provides a recap of what PITCHf/x has taught us over the past few years. It’s grown up so fast. Velocity, movement/deception, stamina, selection/sequencing/command/control, inducing “fishing,” etc. Also goes over PITCHf/x analysis of catchers: blocks, framing, pitch-calling, and controlling the running game. Hitters: plate discipline, scouting info, power, etc.
6:44: FIELDf/x: potential to study defensive range and positioning and arm ratings. Marchi notes that FIELDf/x won’t enable us to study only subjects that have heretofore proved impenetrable, but will also let us take more accurate looks at topics we’ve covered before. For instance, John Walsh’s arm ratings wouldn’t have to rely on inferences from play-by-play data—they could incorporate actual speed and accuracy information from FIELDf/x.
6:48: Marchi reproduces a quote by Joe DiMaggio, in which the Yankee Clipper suggests that fielders should be in motion before they actually hear the crack of the bat. Max looked for evidence of this effect, and found it, identifying outfielder movement in between pitcher release and contact with the bat.
6:49: Sabermetrics sounds way better with an Italian accent.
6:52: Max goes over some of the same ground that Mike and Jeremy went over, providing a quick look at FIELDf/x’s potential for baserunning analysis. We can also analyze the catcher’s role, divorcing the pitcher’s performance at holding the runner from the catcher’s attempt to throw him out.
6:55: Max’s final slide is a map with his air route from Italy overlaid. John Walsh should be ashamed of himself. Max visited Buffalo and Cleveland in between landing in New York and taking off again for San Francisco, so it’s safe to say that he’s seen some of our nation’s finest cities, as well as Buffalo and Cleveland.
True Defensive Range (TDR): Getting out of the Zone........... Greg Rybarczyk
6:56: Up to the podium strides Greg Rybarczyk, who calls this event the highlight of his summer. There might be a joke to be made here, but not by someone who’s just written 7,000+ words about the same event.
6:57: Greg confesses that it’s a little early to be developing a new defensive metric with only 13 games of data available, explaining, “But have you checked to see how many acronyms are still available? Most of the good ones are gone.”
6:59: One of the drawbacks of our current defensive metrics is that we can’t (without consulting video) tell whether outfield catches required dives, or were proverbial cans of corn. Our reliance on zones is also a handicap in that generic zones might have to expand or compress to fit specific fields, distorting the quality of the data. Zone size isn’t standardized, and some zones incorporate both easy and difficult plays. Certain parks cut off zones, or have regions that aren’t technically covered by zones at all. Greg notes that the front lines in WWI were closer together than the boundaries of one particular zone in deep center field. Some loss of precision is unavoidable with zone-based systems.
7:10: Animation of ball in flight synchronized with Nyjer Morgan in motion draws a few oohs and ahhs from the assembled onlookers. If and when this technology is automated and standardized, actually watching a broadcast might seem so 2010.
7:16: Framework for TDR: as Jeremy pointed out earlier, we can assign values to plays made or missed based on the probability of the out. If a play is 90% likely to be an out, we can credit the fielder with 1-that out probability. Greg will need a larger sample to make TDR a reality, but “it should converge to a stable value in a much smaller sample size than all these zone-based metrics.” This is continuous data, and doesn’t have to be divided into buckets.
7:20: Once we have enough data, we can determine how much of a player’s defensive value comes from being positioned well, versus being able to move well after the play commences.
7:21: Possible defensive diagnostic test for prospects: position prospect at specific spots in the outfield, and program ball launcher to launch to certain randomized locations to determine the prospect’s “range line.” Then combine with baseline MLB data on range line percentiles and spread of typical outfield chances, and come up with an accurate estimate of TDR without any actual MLB data. Do this multiple times to make sure you didn’t catch prospect on an especially good or bad day, and you have a fairly accurate assessment of defensive value that wouldn’t have been possible before. Very exciting idea, though I can’t speak for scouts, who might be quaking in their boots.
7:30: “Butch” Schoenhals chimes in Oklahoma-style to urge Greg to “bring it on,” TDR-wise. I second that emotion.
The Future of Sportvision’s Data Collection........... Greg Moore
7:34: We’ve reached the bottom of the 9th. Greg Moore takes the stage to run through Sportvision’s future data collection possibilities.
7:36: Sportvision hopes to make FIELDf/x completely real-time, which would open up innumerable broadcast opportunities. Other ideas that Sportvision has discussed internally:
CONTROLf/x: Track the catcher’s mitt in an effort to determine the pitcher’s command. Data could also simply become more granular: track the batter’s feet throughout the pitch; track the batter’s bat; track the catcher’s feet, etc.
Moore: “Where we’re really going: MINDf/x. What are the players thinking?” Promises MINDf/x by 2015. Well-played,
7:40: Some highlights from the open discussion following Moore’s presentation:
Mike Fast: “You could work on one game of data for many, many, many days, and not examine its depths.”
Peter Jensen: “It’s a huge volume of data, but it is manageable if you define your questions.”
7:56: Our first Rob Neyer sighting!
8:00 All right: 8 hours and 8,000 words later, our work is finished. Thanks to all involved; the future of baseball analytics has never looked brighter.