keyboard_arrow_uptop

In late June, [Jeff] Luhnow and his brain trust gathered in the general manager’s box at Minute Maid Park in Houston to watch a 27-year-old pitcher whom they consider an indicator of what their process can yield. Collin McHugh was plucked from the scrap heap last December after bouncing between the Colorado Rockies and their Triple-A affiliate. – Extreme Moneyball

This was the world’s introduction to a story that has mesmerized the statistics-inclined baseball world since it was revealed 11 months ago. It’s a story of statistics, of innovation, of dedication to valuing the undervalued, and of persevering in the face of doubt from contemporaries and fans alike. Pitch spin has quickly become a hot topic among researchers across the sabermetric landscape. The methodology described in the Bloomberg article quoted above has spawned several attempts at replication, which, while valuable, are flawed because of the data limitations caused by traditional PITCHf/x outputs. New data sources give us the tools we need to properly replicate the Astros’ methodology.

Statcast puts analyses like the ones the Astros are reported to have done within the grasp of fans and analysts like us.

The Astros’ analysts noticed that McHugh had a world-class curveball. Most curves spin at about 1,500 times per minute; McHugh’s spins 2,000 times. The more spin, the more the ball moves during the pitch—and the more likely batters are to miss it. Houston snapped him up. “We identified him as someone whose surface statistics might not indicate his true value,” says David Stearns, the team’s 29-year-old assistant general manager.” – Extreme Moneyball

The Astros tweaked McHugh’s approach, tasking him with throwing fewer sinkers and more high fastballs. They encouraged him to use his high-spinning curveball, and taught him one weird trick to get MLB hitters out that professional hitting coaches hated them for sharing. They used data, insight, and coaching to turn a scrap-heap player into a highly productive major-league starter.

McHugh was worth 4.3 DRA-based WARP last season, and has already accumulated another 1.1 wins this year. Considering the little risk the club incurred to pick up McHugh and it’s safe to slide this one into Luhnow’s win column.

***

There’s an important nuance in understanding how to analyze pitch spin that needs to be clarified. I urge you to read (and re-read) Alan Nathan’s most recent work on pitch spin. Nathan is, undoubtedly, the premier expert in the public space regarding pitch spin and the resulting impact on the ball as it travels to the plate.

PITCHf/x, as the data is reported out by MLB Advanced Media, produces what we’ll call “calculated spin rate.” Essentially PITCHf/x takes the total movement of the pitch, removes the effect of gravity, and then uses that data to calculate algorithmically the spin necessary to generate that movement. There are some inherent flaws in this methodology. The first is that outside factors like elevation and game time temperature greatly affect the movement of a pitch. It also does not correct for the effect of air drag on the pitch. This means that the spin rates reported in raw PITCHf/x data are generally inaccurate because they don’t account for these factors in their calculation.

That said, the raw data can be re-calculated in order to get accurate spin numbers, as Nathan has discussed. It’s worth noting here that Nathan uses a slightly different algorithm than MLBAM, his being derived from recent work with laboratory tests. Incorporating factors such as air density and temperature allows you to determine the useful or movement generating spin. What does that mean exactly?

In general, the spin vector can be written as the vector sum of two components. One is parallel to the direction of motion, and I will refer to it as the “gyrospin” component . The other is perpendicular to the direction of motion, and I will refer to it as the “transverse spin” component . Remember that only contributes to the movement. In general, if we can measure the movement, we can determine both the magnitude and direction of , which is exactly what PITCHf/x does. But PITCHf/x has no way to determine anything about . We may refer to the transverse spin as the “useful spin”, since it is directly related to the amount of movement, in the sense that increasing the transverse spin will increase the movement. However, increasing the gyrospin will not increase the movement. As the title says, all spin is not alike. Note that the total spin rate is the Pythagorean sum of and :

In layman’s terms, the total spin of any given pitch comprises both useful or movement-generating spin ( above) and “gyrospin,” which doesn’t generate movement ( above). By doing some more robust calculations, we can begin to isolate the useful spin that actually helps the pitcher generate movement.

Statcast data helps us fill in the holes. Statcast provides us with the total spin, an observed data point, which will help us paint a more complete picture. Jeff Passan teased us with this revolution recently, highlighting that “teams have been getting all of the Statcast information in giant raw data files, to do with whatever it pleases. And some have glommed onto the spin of pitches – how many revolutions per minute it is turning – as a component worth studying. It’s far too early to tell whether it will prove as valuable a marker as velocity. What’s undeniable is that the spin of a ball has a demonstrable effect of how hitters react to it.”

One point in the above paragraph is an important one. Statcast provides the total spin as opposed to calculated (or movement) spin data. That means that it is actually counting the revolutions on a pitch as opposed to uncovering it algorithmically. Passan was able to confirm this upon further inquiry. The value here is that not only can we determine how much a pitch spins, but we can also ascertain how much of the spin is actually causing movement on the pitch.

***

MLBAM graciously supplied us with the Statcast spin data for every pitcher who has thrown at least 50 curveballs and/or sliders this season. With the help of Dr. Nathan we analyzed Statcast spin data in conjunction with properly calculated PITCHf/x spin data to get a more complete picture of what spin data actually looks like. Below are two tables showcasing some high-level data from the datasets as a whole:

Curveballs


Pitcher Count

Average Thrown PFx

Average Thrown SC

Average Useful Spin

Average Total Spin

Peak Total Spin

% Useful Spin

92

163

138

1520 rpm

2280 rpm

2770

65%


Sliders


Pitcher Count

Average Thrown PFx

Average Thrown SC

Average Useful Spin

Average Total Spin

Peak Total Spin

% Useful Spin

168

198

186

520 rpm

2060 rpm

2740

25%


All the spin numbers are rounded to the nearest ten because, in reality, 10 rpm is a fraction of a rotation as a pitch travels to the plate. Hat tip to Tom Tango on that one. Also, the discrepancies in the average number of pitches thrown by the pitchers included in this study stems from the differing pitch classification methods of MLBAM, which provides the Statcast data, and Pitch Info, which provided the PITCHf/x classifications. Finally, there are some sampling error concerns due to useful spin ratios over 1.0, which is physically impossible.

While the average total spin and peak spin numbers for both pitches are nearly the same, sliders have a much lower ratio of useful spin than do curveballs. This is illustrated below in charts showcasing the full sample for both pitches, with useful spin plotted against the total spin. Note that the orange line indicates a ratio of 1.0, meaning that every single rotation on the ball is useful:

The curveball chart has a few outliers that live above the orange line where their useful spin per PITCHf/x is greater than their total spin as determined by Statcast. As I said, this is physically impossible, and thus can be interpreted as some sampling error where Statcast and PITCHf/x aren’t playing nicely together. This is not unexpected: We are merging two disparate datasets.

The slider chart doesn’t have any pitchers above the line, which is promising, but also likely a function of the fact that sliders have a much lower ratio of average useful spin to total spin than do curves.

***

According to the excerpt from the Bloomberg article included above, Collin McHugh’s curveball spins at more than 2,000 rpm while the major-league average sat around 1,500 at the time. This data appears to be from PITCHf/x. (It matches PITCHf/x results, though the source of those numbers wasn’t disclosed in the article so we can’t know for sure.) Statcast gives us a new, more complete look at McHugh’s curveball: He does in fact throw one of the fastest-spinning curveballs in the game with average spin rates over 2,500 rpm and a peak at nearly 2,800 rpm. Compare that to the MLB average for total spin of just over 2,200 and it’s easy to see why the Astros might have been enamored.

That, however, isn’t the whole story. McHugh is one of those outliers whose useful spin per PITCHf/x exceeds what Statcast tells us is the total spin on his curveball. While there’s some error here, it would seem safe to say that McHugh isn’t just among the best in the league at spinning a curveball. More importantly, McHugh is among the best in the business at throwing a curveball loaded with useful spin.

It is this particular data point, I believe, that the Astros paid specific attention to in identifying undervalued pitchers. One could argue, and I might, that curves with high ratios of useful spin are better than ones with lower useful spin ratios for several reasons.

For one thing, these pitches have close to the maximum movement possible because nearly all of their spin is of the useful variety. It would also seem to be plausible that McHugh has a propensity for better command on his curveball than most for a couple of reasons. First his average and max spin readings are fairly close, which indicates that most of his curveballs are spinning at roughly the same rate. Additionally, a high percentage of this spin is causing movement on the pitch, which again would seem to promote consistency. With consistency comes command, and with command comes the ability to throw strikes.

With that hypothesis in mind, we can use Statcast data combined with PITCHf/x data tweaked using Nathan’s formula to identify who the most McHugh-like pitchers in MLB might be in 2015. Over the next few weeks we’ll dive into the data and attempt to find those diamonds in the rough, just as Houston’s front office was able to a few years ago.

Thanks to Alan Nathan for his help in deciphering, manipulating, and understanding the Statcast and PITCHf/x data, and for his ongoing support of this analysis.

You need to be logged in to comment. Login or Subscribe
nkhare
7/23
Awesome stuff. Can't wait!!
lichtman
7/23
Good article and good insight on McHugh and spin rates in general. "First his average and max spin readings are fairly close, which indicates that most of his curveballs are spinning at roughly the same rate. Additionally, a high percentage of this spin is causing movement on the pitch, which again would seem to promote consistency." I certainly agree that having consistent useful spin rate likely leads to better than average command. I am having a hard time, though, understanding why more useful spin in general leads to greater command. While more useful spin is probably harder to hit simply because it produces more movement, I would think that more movement (more useful spin) would be harder to control not easier. That is one reason why curveballs and sliders are harder to command than fastballs. But, that is a minor quibble with an excellent article.
BSLJeffLong
7/23
Thanks MGL. The thought process re: command was that a pitch that consistently has nearly all of its spin be of the useful variety would be more consistent than a pitch whose spin is split between useful and gyrospin. The idea then being that consistency means that a pitcher knows where it's going which leads to better command. I think the part that I glossed over originally was that I made the assumption that a pitch having a higher % of useful spin would have less volatility, in terms of the ratio on a per pitch basis, than one that is split more evenly. In thinking about that more, I'm not sure that's true, because we don't really know anything about the per pitch ratios using season-long data.
morro089
7/23
Warning: I studied Aerospace Engineering in school (BA only) and love baseball so I've thought about this before. Kind of rusty though. I won’t be offended if you don’t read. I think the consistent (“useful”) spin helps with consistency. I debated explaining why, but it boils down to “it’s doing the same thing so the same thing is going to happen.” I’m torn on the argument of does increasing useful spin help with consistency. TL;DR I think the faster it spins the less control there is. First thing to remember is the baseball itself isn't causing the motion; it's the air around it that is pushing the baseball. Look at this link showing “vortices” following a round object (2D only). https://en.wikipedia.org/wiki/Reynolds_number#/media/File:Vortex-street-animation.gif Those "vortices" will push/pull the ball (thus knuckleball). These vortices are happening all around the baseball (top/bottom, left/right). These vortices are pushing the baseball due to momentum. The other thing “pushing” the baseball is pressure differential due to differences in “relative velocities”. Think of relative velocity as two parts. One part horizontal velocity (pitch speed) and another part the speed at the surface of the ball because of how it's spinning. As the spin increases the "relative velocity will increase on the top of a baseball (90 MPH pitch plus 40 MPH[1] due to spin because the spin is going the same direction as the pitch) and decrease on the bottom (90 MPH minus 40 MPH because the ball is spinning away from the pitched location on the bottom). The relative velocity on the sides will be approximately 90 MPH (assuming perfectly vertical spin even though that doesn’t happen in real life). The effects of these two things will be a function of the air’s “Reynolds Number,” which is basically a number that tells you how a fluid (like air!) will react. If you change the Reynolds Number then you change how the fluid reacts. One of the biggest drivers of the Reynolds Number is the velocity of the object (baseball), but the baseball has different velocities depending on where you are on the baseball (hint: relative velocities). THEREFORE. As you increase the spin rate you, hypothetically, actually increase the difference in the Reynolds Number from the top of the baseball to the bottom leaving much more “random variance” in how the air reacts to the baseball. You also have skin drag versus form drag. Form drag is the drag you think of when watching race cars. Skin drag is due to the fact that air is sticky (it’s a fluid!) and wants to stay on the baseball (or race car too actually, or Olympic swimmer). As velocity increases, form drag becomes more important than skin drag so you might have varying types of drag between the top and bottom of the baseball (but it’s probably going so fast skin drag isn’t really involved). Speaking of air sticking to baseballs, it’s more or less sticky based on the Reynolds Number. The higher the Reynolds Number the more resilient a baseball becomes to differences in pressure. Of course you still have the difference again between the top and bottom of the baseball. I really, really think that’s going to be your biggest obstacle with a higher spin rate. I wasn’t expecting to think that when I started this essay because it’s counterintuitive to the knuckleball (no spin) logic. Of course this all may not be true. It's possible once the Reynolds number is high enough the air's movement may become ... constant? As constant as moving, turbulent air in an open system (not in a pipe) can be. Example: the water flowing around bluff bodies inside of a pipe acts the same way once the Reynolds Number becomes high enough (these bluff bodies aren’t spinning though!). I’d argue all this talk of spin rate becomes moot though if a pitcher’s mechanics aren’t repeatable. It wouldn’t surprise me if the pitcher’s mechanics are 10x more important than how fast and/or consistent his curveball spins. Backing up though, great article. Only suggestion would be a picture to explain the two different spin rates. How I read it and how you’re explaining how it affects the spin of the ball are two different things and I don’t think you’re wrong, I think I’m confused and a picture would help. Make it in Paint for all I care. For reference I always review these NASA articles: https://www.grc.nasa.gov/www/K-12/airplane/cyl.html https://www.grc.nasa.gov/www/K-12/airplane/beach.html [1] 1,500 RPM on a 9" baseball is equal to 40 MPH.[1] 2,000 RPM is equal to 53 MPH. 1500 RPM = 90,000 RPH; 9" ball diameter = 28.27" circ = .000446 mile cir. RPH*circ = miles/hour.
a-nathan
7/24
Just for the record, the circumference of the ball is 9", not the diameter. So, your speed numbers are too large by a factor of 2*pi. That is, the surface speed of a ball rotating at 1500 rpm is 6.4 mph, not 40 mph. Also, as you point out, skin drag is totally unimportant for a baseball moving at typical game speeds, where the Reynold's number is of order 1-2 x 10^5. Form drag completely dominates (drag~square of velocity). Pitch-tracking data from PITCHf/x are consistent with this the quadratic dependence of drag on velocity. No one has yet computed the motion of a spinning baseball through the air from first principles. So one relies instead on a phenomenological model for the drag and Magnus force to parameterize the dependence on spin and velocity, with unknown "fudge factors" lumped into lift and drag coefficients. In such a model, the Magnus force is proportional to the vector cross product of spin and velocity, which vanishes when the spin is parallel/antiparallel to the velocity. Therefore only the component of spin perpendicular to the velocity result in a Magnus force, hence movement. This is the "transverse" or "movement" spin. The component of spin parallel to the velocity (the "gyrospin") does not lead to movement, since the vector cross product vanishes. The total spin is the Pythagorean sum of the transverse and gyrospin. Trackman measures the total; the transverse part is inferred from the movement. The relationship between the transverse spin and the movement needs to be separately determined in controlled experiments in the laboratory. This has been done at speeds and spins typical of the game, although the relationship is not so well know for very high spins.
a-nathan
7/24
Actually you are off by a factor of pi (not 2*pi), so that surface speed at 1500 rpm is about 13 mph.
morro089
7/24
I see you've done this before http://www.hardballtimes.com/dissecting-a-mystery-pitch/
morro089
7/24
Ha. Once you get going you don't see how ridiculous your numbers look. I'm glad you corrected that because it makes a lot more sense. And the Magnus effect is of course what I should have read up on. I was better off just linking to that wiki page for half of my comment. Thank you