January 9, 2012
Resident Fantasy Genius
In the fantasy and analytics community, we often tend to talk about players in terms of components of their production. We don’t talk about ERA; we talk about strikeouts and walks and ground balls. We don’t talk about batting average; we talk about strikeout rate and BABIP. Those who have read my work or talked with me know that I also like to blend stats and scouting. I tend to say that stats tell us the “what” and scouting tells us the “why.” Stats tell us that a player hits for power, while scouting tells us that he does so because he has strong wrists, good bat speed, loft in his swing, etc. In other words, the component stats can get broken down into component abilities via scouting.
In recent years, some of these scouting components have been quantified, much to the delight of analysts and fans of the game. PITCHf/x has been particularly influential in the analysis of pitchers, capturing things like pitch type, velocity, movement, spin, location, and release point. Unfortunately, its hitting counterpart, HITf/x, is not publicly available. Still, there is some publicly available data that can be useful for batters, such as HitTracker. MLBAM—one of the creators of PITCHf/x—also provides something interesting that will serve as today’s topic: quality of contact data.
MLBAM stringers, when recording various aspects of ballgames, are tasked with classifying certain balls in play as “sharply” or “softly” hit. Today, I’d like to examine whether these classifications present us with any useful information. As I’ve shown before, a hitter’s BABIP takes roughly two-and-a-half years to “stabilize” (I know hardcore analysts hate this word, and I explain why in the linked piece, but for familiarity’s sake I’ll continue to use it). That’s a long time before we begin to get meaningful information about the kind of contact that a hitter makes, which is why this “sharp” and “soft” stuff is so appealing. After all, it stands to reason that if a player makes hard contact with the ball, the ball is going to fall safely for a hit more often than it otherwise would.
To start, let’s check out the 2011 leaders and trailers in “sharp” contact percentage:
2011 “Sharp” Contact on BIP Leaders
2011 “Sharp” Contact on BIP Trailers
For the most part, these lists seem to pass the eye test. Elite hitters are on the leaders list, and scrubs or spray hitters are on the trailers list. Aside from some curious inclusions like John Buck and Eduardo Nunez on the leaders list and Jhonny Peralta and Bobby Abreu on the trailers list, they look pretty solid.
That’s a good start, but more rigorous testing is needed to see if this data can be useful. To that end, I ran a split-half correlation to see how “stable” sharp and soft contact is for hitters. I’ve explained the methodology before, so I won’t bother going over it again, but you can read about it here if you’re interested.
Another excellent sign: BABIP takes two-and-a-half years to reach the level of stability these stats reach in half a year! That’s very promising. Now for the big test: How well does sharp and soft contact predict BABIP, and more importantly, does it predict BABIP better than BABIP predicts itself?
DNS stands for “Does Not Stabilize,” which is very disappointing. While BABIP correlated with itself takes 2.4 years to stabilize, neither sharp nor soft contact correlates at all with BABIP using this method. This is a shame since sharp and soft contact is so easily predictable, but when it comes right down to it, it doesn't matter if batters are hitting the ball "sharply" if it doesn't translate to more hits.
Running one more test, I wanted to see if there might be some use at the extremes. Perhaps those who are hitting the most “sharp” balls post better BABIPs than those who are hitting the least. To test this, I looked at the aggregate BABIPs for the top and bottom 10 percent of batters since 2007 in Sharp Percentage (sharply hit balls divided by balls in play):
While the top 10 percent does post a slightly better BABIP than the bottom 10 percent, the difference is negligible. If we do the same thing for Soft Percentage, the results are slightly different:
Here, the two groups diverge a bit more, but still not to the point where this data becomes super actionable, especially in the light of our other evidence against it.
While this all suggests that MLBAM’s sharp and soft data is largely useless for practical purposes, it does not mean that quality of contact data, as a concept, is useless. The underlying logic that harder-hit balls become hits more frequently is still sound, and tests by our own Mike Fast have shown it to be true. The MLBAM data just doesn’t capture it well enough, which could be for any number of reasons.
First and foremost, deciding which balls are hit sharply or softly is a subjective distinction. Since 2007, just three percent of balls in play have been classified as “sharp” and five percent as “soft,” both of which seem low to me. Additionally, the two don’t correlate very well, which seems a bit strange, since you might expect a player who doesn’t hit many sharp balls to hit more soft ones than the average hitter, but that doesn’t seem to be the case, as it takes nearly four and a half years for one classification to predict the other with just moderate accuracy.
Some stringers may also be more diligent than others, creating incongruencies from park to park (I’d guess Florida’s stringer fit this description this past year with Stanton and Buck leading the list and Hanley Ramirez, Logan Morrison, and Emilio Bonifacio all placing in the top 20). It’s also possible that just three distinctions—sharp, soft, and neither—aren’t complex enough to hold much relevance.
In any case, examining the validity of this type of data is important, as it’s this type of stuff that will hold the key to the next level of baseball analysis. We’ve milked now-common stats like BABIP and HR/FB for all we can at this point, and in order to understand them (and, subsequently, player performance and value) better, we’re going to need to take this next step down into more granular components. I was reading Jonah Keri’s The Extra 2% this past week (yes, I know, I’m very behind), and he talks about how the Rays have quantifiable data on swing plane and bat speed (among other things) that previously fell entirely in the domain of scouts. This isn’t to imply that scouts are or will become obsolete (they still hold a very important place in the game), but decreased subjectivity and increased precision on things like this will do wonders for analyzing players. I look forward to the day we analysts receive access to this type of data, but for the time being, we’ll have to make do with what we have.