Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

In the fantasy and analytics community, we often tend to talk about players in terms of components of their production. We don’t talk about ERA; we talk about strikeouts and walks and ground balls. We don’t talk about batting average; we talk about strikeout rate and BABIP. Those who have read my work or talked with me know that I also like to blend stats and scouting. I tend to say that stats tell us the “what” and scouting tells us the “why.” Stats tell us that a player hits for power, while scouting tells us that he does so because he has strong wrists, good bat speed, loft in his swing, etc. In other words, the component stats can get broken down into component abilities via scouting.

In recent years, some of these scouting components have been quantified, much to the delight of analysts and fans of the game. PITCHf/x has been particularly influential in the analysis of pitchers, capturing things like pitch type, velocity, movement, spin, location, and release point. Unfortunately, its hitting counterpart, HITf/x, is not publicly available. Still, there is some publicly available data that can be useful for batters, such as HitTracker. MLBAM—one of the creators of PITCHf/x—also provides something interesting that will serve as today’s topic: quality of contact data.

MLBAM stringers, when recording various aspects of ballgames, are tasked with classifying certain balls in play as “sharply” or “softly” hit. Today, I’d like to examine whether these classifications present us with any useful information. As I’ve shown before, a hitter’s BABIP takes roughly two-and-a-half years to “stabilize” (I know hardcore analysts hate this word, and I explain why in the linked piece, but for familiarity’s sake I’ll continue to use it). That’s a long time before we begin to get meaningful information about the kind of contact that a hitter makes, which is why this “sharp” and “soft” stuff is so appealing. After all, it stands to reason that if a player makes hard contact with the ball, the ball is going to fall safely for a hit more often than it otherwise would.

To start, let’s check out the 2011 leaders and trailers in “sharp” contact percentage:

2011 “Sharp” Contact on BIP Leaders

Last

First

BIP

Sharp%

Stanton

Mike

313

10.2%

Buck

John

332

7.8%

Holliday

Matt

329

7.3%

McCann

Brian

352

7.1%

Pujols

Albert

479

6.7%

Weeks

Rickie

321

6.2%

Ramirez

Hanley

258

6.2%

Prado

Martin

474

6.1%

Nunez

Eduardo

266

6.0%

Desmond

Ian

426

5.9%

Braun

Ryan

427

5.9%

Ramos

Wilson

294

5.8%

Bautista

Jose

350

5.7%

Morrison

Logan

334

5.7%

Berkman

Lance

360

5.6%

2011 “Sharp” Contact on BIP Trailers

Last

First

BIP

Sharp%

Miles

Aaron

396

0.5%

Getz

Chris

330

0.6%

Tejada

Ruben

272

0.7%

Abreu

Bobby

376

0.8%

Izturis

Maicer

376

0.8%

Kennedy

Adam

306

1.0%

Suzuki

Ichiro

592

1.0%

Carroll

Jamey

389

1.0%

Andino

Robert

363

1.1%

Bartlett

Jason

449

1.1%

Ludwick

Ryan

348

1.2%

Raburn

Ryan

255

1.2%

Callaspo

Alberto

418

1.2%

Peralta

Jhonny

408

1.2%

Pennington

Cliff

399

1.3%

For the most part, these lists seem to pass the eye test. Elite hitters are on the leaders list, and scrubs or spray hitters are on the trailers list. Aside from some curious inclusions like John Buck and Eduardo Nunez on the leaders list and Jhonny Peralta and Bobby Abreu on the trailers list, they look pretty solid.

That’s a good start, but more rigorous testing is needed to see if this data can be useful. To that end, I ran a split-half correlation to see how “stable” sharp and soft contact is for hitters. I’ve explained the methodology before, so I won’t bother going over it again, but you can read about it here if you’re interested.

Stat

Denominator

Stabilizes

Years

Sharp

CON-HR-E-I-SF-SH

315

0.5

Soft

CON-HR-E-I-SF-SH

438

0.7

Another excellent sign: BABIP takes two-and-a-half years to reach the level of stability these stats reach in half a year! That’s very promising. Now for the big test: How well does sharp and soft contact predict BABIP, and more importantly, does it predict BABIP better than BABIP predicts itself?

Stat 1

Stat 2

Denominator

Stabilizes

Years

BABIP

BABIP

CON-HR-E-I-SF-SH

1601

2.4

Sharp

BABIP

CON-HR-E-I-SF-SH

DNS

Soft

BABIP

CON-HR-E-I-SF-SH

DNS

DNS stands for “Does Not Stabilize,” which is very disappointing. While BABIP correlated with itself takes 2.4 years to stabilize, neither sharp nor soft contact correlates at all with BABIP using this method. This is a shame since sharp and soft contact is so easily predictable, but when it comes right down to it, it doesn't matter if batters are hitting the ball "sharply" if it doesn't translate to more hits.

Running one more test, I wanted to see if there might be some use at the extremes. Perhaps those who are hitting the most “sharp” balls post better BABIPs than those who are hitting the least. To test this, I looked at the aggregate BABIPs for the top and bottom 10 percent of batters since 2007 in Sharp Percentage (sharply hit balls divided by balls in play):

Sharp%

BABIP

Top 10%

0.296

Bottom 10%

0.292

While the top 10 percent does post a slightly better BABIP than the bottom 10 percent, the difference is negligible. If we do the same thing for Soft Percentage, the results are slightly different:

Soft%

BABIP

Top 10%

0.286

Bottom 10%

0.298

Here, the two groups diverge a bit more, but still not to the point where this data becomes super actionable, especially in the light of our other evidence against it.

While this all suggests that MLBAM’s sharp and soft data is largely useless for practical purposes, it does not mean that quality of contact data, as a concept, is useless. The underlying logic that harder-hit balls become hits more frequently is still sound, and tests by our own Mike Fast have shown it to be true. The MLBAM data just doesn’t capture it well enough, which could be for any number of reasons.

First and foremost, deciding which balls are hit sharply or softly is a subjective distinction. Since 2007, just three percent of balls in play have been classified as “sharp” and five percent as “soft,” both of which seem low to me. Additionally, the two don’t correlate very well, which seems a bit strange, since you might expect a player who doesn’t hit many sharp balls to hit more soft ones than the average hitter, but that doesn’t seem to be the case, as it takes nearly four and a half years for one classification to predict the other with just moderate accuracy.

Stat 1

Stat 2

Denominator

Stabilizes

Years

Sharp

Soft

CON-HR-E-I-SF-SH

2918

4.4

Some stringers may also be more diligent than others, creating incongruencies from park to park (I’d guess Florida’s stringer fit this description this past year with Stanton and Buck leading the list and Hanley Ramirez, Logan Morrison, and Emilio Bonifacio all placing in the top 20). It’s also possible that just three distinctions—sharp, soft, and neither—aren’t complex enough to hold much relevance.

 In any case, examining the validity of this type of data is important, as it’s this type of stuff that will hold the key to the next level of baseball analysis. We’ve milked now-common stats like BABIP and HR/FB for all we can at this point, and in order to understand them (and, subsequently, player performance and value) better, we’re going to need to take this next step down into more granular components. I was reading Jonah Keri’s The Extra 2% this past week (yes, I know, I’m very behind), and he talks about how the Rays have quantifiable data on swing plane and bat speed (among other things) that previously fell entirely in the domain of scouts. This isn’t to imply that scouts are or will become obsolete (they still hold a very important place in the game), but decreased subjectivity and increased precision on things like this will do wonders for analyzing players. I look forward to the day we analysts receive access to this type of data, but for the time being, we’ll have to make do with what we have.  

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Gep7Llaro
1/09
Is this different from ESPN Inside Edge's Well-Hit data?
derekcarty
1/09
I imagine it is. Where do they have that data?
Gep7Llaro
1/09
http://myinsideedge.com/Glossary_Main.aspx

BaseballHQ also has Hard Hit Ball data. I think it's the same as the Inside Edge Well-Hit data, though.
derekcarty
1/09
Yeah, this doesn't speak to any other companies who record hard-hit ball data, though I think it would be very interesting to test them out. Does the Inside Edge data get displayed at ESPN anywhere?
Gep7Llaro
1/09
No, you have to be a subscriber.
derekcarty
1/09
ESPN Insider subscriber, or to that Inside Edge site?
Gep7Llaro
1/09
I don't have either, so I may be wrong; however, as far as I know, it's a separate subscription.

For what it's worth, here were the leaders in Hard Contact Percentage, taken from the Baseball Forecaster:
David Ortiz
Ryan Braun
Carlos Beltran
Jose Bautista
Miguel Cabrera
Ryan Doumit
Adrian Beltre
Albert Pujols
Andrew McCutchen
Lance Berkman