Our own Harry Pavlidis, Baseball Prospectus’ director of data analysis, has been among the most groundbreaking voices when it comes to using the PITCHf/x, HITf/x and FIELDf/x data that Sportvision tracks in major-league and minor-league parks.
In attending the Sloan Sports Analytics Conference in Boston this weekend, it became clear that this was where many of the future analytic innovations were yet to come. Pavlidis was part of the panel called "The Revolution in Visual Tracking Analysis" and afterward sat down for a Q&A on how the industry got started, the difference between public and private data, and more.
He spoke on the panel about his Twitter conversations with then-Oakland Athletics and current Arizona Diamondbacks pitcher Brandon McCarthy about pitch types and talked in the Q&A about the other players who have come to embrace the PITCHf/x data. Here's the text, edited slightly for length, which also touches on applications to DIPS theory and implementation of FIELDf/x findings through proper coaching.
ZL: Could you give a brief explanation of why PITCHf/x is free and the rest of them aren't?
HP: It was an unproven thing. Sportvision partnered with Major League Baseball Advanced Media to make it a mandatory thing for all the teams. So in 2008, they all had the data. Why it was shared publicly, I don't know. I can see why they had the relationships, but my theory is that they didn't know what to do with the data. Sportvision had no analyst capabilities. They were basically supporting broadcasters and were like 'we need analysts.' So with BAM, they tried to open the data up, I guess. Because that's what they did later with HITf/x and FIELDf/x. So even though it's private, we had our hands on small parts of it. They were able to help them refine the product. … That's part of this dynamic is if you put it out there, you can get the innovation. I think because PITCHf/x was the first, it made it possible and may have even made it necessary. I don't think that's ever going to happen again.
ZL: Were there some ill effects of it? I know you mentioned during the panel that you weren't sure it was the right thing to do to make it public.
HP: I mean from their business perspective. Has it been profitable? It sounds like they had challenges coming up with a way to monetize PITCHf/x. … I think it was so new and so unknown what could possibly be done and the first level of interest in adopting it was on Major League Baseball Advanced Media. They wanted to have it for the entertainment value of the game. It was a good thing because it was necessary because the data has gotten a lot better. People like me and (BP alum and Astros analyst) Mike Fast and whatnot who have worked with it have given them feedback.
ZL: In HITf/x, what components go into the general realm of "quality of contact?"
HP: You basically get the velocity.
ZL: Which is a vector?
HP: They give you actually all three vectors. You get the X, Y, Z of velocity and you can combine those together and say this is the speed. You can't tell if it was off the barrel of the bat – they don't have that impact – but they do have the velocity and the angle. You can tell how hard a ball was hit. I'm not sure you can tell if it was a line drive or a fly ball. I'm not sure you can tell if it was a ground ball or a line drive. With the sample data set I did some charting to show those overlaps – so you can have trajectory and speed that are the same but human tagging is very different.
ZL: Because you don't know the spin?
HP: Part of it could be the spin. Part of it could also be that there's a big thing where it's a line drive if it's a hit, it's a fly ball if it's caught (in human tagging). That's overstating the problem but that's what happens a lot. I think Colin Wyers found that bias analytically (Here’s some of Wyers’ writing on the topic.) Also, if the ball skips through the infield in one bounce and just catches the back of the dirt, is that a ground ball or a line drive? A different person may answer that differently even though stringers are all given the same training and retraining. There have been situations where stringers have to go back and retrain because there were parks where you never ever had a line drive home run. There were parks that had tons of line drive home runs. I was finding park effects that were really stringer effects and I wrote about that a few years ago. So we try to match the HITf/x data to the tags. You don't know if it's the spin or if the tag's wrong. It's kind of a tricky conundrum, which is why FIELDf/x gives you the whole trajectory.
ZL: Does the quality of contact data poke holes in DIPS theory or supplement DIPS theory?
HP: Supplement. Basically what you're saying is that the pitcher can only control so much. What he may be able to control in some circumstances is the quality of contact, because there are some pitchers who for their career, beat DIPS and there are pitchers who do worse. You could probably find those guys who aren't getting as hard of an impact off the bat. You have to look through a lot of data and get a lot of patterns and regress it and understand it a little bit – kind of swim in it – but you could probably figure out where it's meaningful in terms of quality of pitching and how stable it is.
It would supplement the DIPS theory because it would say the pitcher has some control over the quality of the batted ball but once it comes off the bat it's out of their control. How it comes off the bat, they have some control and we might be able to quantify that.
The problem is that the data's not public. It's hard to take a study and do peer-reviewed research. That's another big thing – it's not just not having as many eyeballs, it's peer review. It's me doing something and having four other guys go 'make it better' or try to reproduce your results. That's science, basically, and it's best done in the light of day. But this isn't science for the sake of advancing society. It's for winning. It's that balance. What Sportvision's done about releasing some sets of data, I think that's the way to go. Not all of it but enough to get some juice out there and some ideas about how it can be used.
ZL: With the advances in FIELDf/x, do you think we'll get to the point where what we call 'shifts' now, we'll just start calling 'defense' where everyone's defended differently like basketball players are defended differently?
HP: I think so. The Cubs' approach that they've talked about publicly is that they do spray charts for their own pitchers. This gets to the contact thing as well, obtusely. Pitchers give up certain patterns of base-hits, it's not just the hitter. So the Cubs use that information and they came up with really, really good positioning.
One of the things about defensive positioning is that sometimes it can't be taught. Like it might be easy to position guys at the beginning of each at-bat, but not many players are smart enough to know that if a guy is late on a fastball, take a step over to right field.
ZL: Does a coach have to be a big cog in that?
HP: Possibly. Some teams don't even coach on that. (Alfonso) Soriano joined the Cubs and they didn't give him any outfield coaching. Soriano's first outfield coaching came last year with (first base coach) Dave McKay.
ZL: I enjoyed your Brandon McCarthy story. Are there a lot of examples of players who are into this stuff?
HP: Brandon McCarthy is, Max Scherzer is, Brandon Morrow just pulled some stuff off Brooks' site to show Ricky Romero some stuff about throwing sinkers. Trevor Bauer is into pitch types, but I don't think he looks at PITCHf/x data. So if I ask him questions about if I saw this or saw that, he'll write back to me. McCarthy, I don't know if he looks at it or not, but I know Max Scherzer has looked at it in the past and clearly Brandon Morrow.
So there are guys out there who are into it. They probably don't tell too many folks because inside the locker room, that's not typical. It's 'perform, play,' not 'analyze.' I think pitchers obviously have those natural questions like, 'Why did that pitch get crushed… I thought I was throwing that well, what happened?' They may want to start looking at it. Hitters just want advanced scouting reports and PITCHf/x can provide that just like a human advance scout in some cases but at a lower cost.
ZL: What will you be doing in your new role with the Washington Post?
HP: A weekly blog, stat-oriented. We're going to be basically working with the beat writers as well helping them develop content on their own, so basically doing research support. Also working with their interactive app department to do cool visualizations of data online.
I'm thrilled to be involved in it because having a major paper saying 'let's take this data and do stuff with it, not just show it in a chart in a Sunday paper, let's actually have regular content, let's actually embed this content throughout the staff," and I don't know of any other paper that's doing that and I hope we set a good example.