Most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.
Alan Nathan is Professor Emeritus of Physics at the University of Illinois at Urbana-Champaign. After a long career doing things like measuring the electric and magnetic polarizabilities of the proton and studying the quark structure of nucleons, he now devotes his time and effort to the physics of baseball. He maintains an oft-visited website devoted to that subject: go.illinois.edu/physicsofbaseball.
In my line of work, I get asked questions from all sorts of different people, such as reporters, kids doing science fair projects (and their mothers), and diehard baseball fans. Some recent examples: How much farther will a fly ball travel in Denver? What’s the deal with those BBCOR bats? Is the baseball juiced? Should the Red Sox trade Jacoby Ellsbury? Okay, I confess that no one has (yet) asked my opinion on that last question. However, one question that I get asked quite often is the following: Can we predict the landing point of a fly ball just after it leaves the bat? That’s what I want to talk about in this article.
Let me try to sharpen up that question a bit. Suppose we have data telling us the velocity of a fly ball just after leaving the bat, so that we know the batted ball speed, vertical launch angle, and horizontal spray angle. How well does that information determine the landing point? Such a question might arise, for example, in a batting cage situation. You measure the batted ball velocity—perhaps with a portable HITf/x or TrackMan system—and immediately tell the batter that he just hit a 385-ft home run, without the ball ever leaving the batting cage. But is this really possible? In a simpler, gravity-only world—I like to refer to it as the “Physics 101” world—it most definitely is possible. Under such conditions, once the initial velocity is known, the ball follows a trajectory that is completely predictable, landing in a location that can be calculated precisely with no more knowledge than one learns in the second week of Physics 101. End of discussion, right?
Wrong! Our real world is much more complicated because the ball experiences the additional forces of drag and lift as it interacts with the surrounding air. The drag, or more simply air resistance, slows the ball down. The lift, or Magnus force, acts on a spinning baseball to deflect it in a direction that depends on the spin axis. In particular, a fly ball hit with backspin will have a Magnus force that is primarily in the upward direction, opposing gravity. Still, in an ideal world, we could predict the landing point from the initial velocity, although to do so would involve a more complicated calculation and would require that we have complete knowledge of the drag and lift forces, the latter requiring that we know the spin rate and axis. Unfortunately, we don’t know these things perfectly well, so that will lead to some uncertainty in our predicted landing location. The big question now is how much uncertainty.
In principle, the question I have posed is an easy one to answer, given the availability of new technologies in MLB ballparks for tracking the baseball. One simply looks at the distribution of landing points for balls hit within a narrow range of batted ball speeds and angles. Such an analysis would be particularly well suited for venues in which TrackMan is installed, since the initial velocity vector, landing location, and hang time are readily provided by that system. The analysis is also well suited for venues with FIELDf/x installed, so that HITf/x provides the initial velocity and FIELDf/x provides the landing point and hang time. Unfortunately the data from neither system is publicly available, so I must resort to another technique, which I will now describe.
My technique uses data from two different sources. First I have the precise landing location and hang time for every home run hit in the majors during the 2009 and 2010 seasons, courtesy of Greg Rybarczyk’s ESPN Home Run Tracker. Second, for 8803 of these home runs, I have the location of the ball-bat impact and the initial velocity vector, courtesy of Sportvision’s HITf/x. Armed with the initial position and velocity, an aerodynamics model is fine-tuned to reproduce the landing point and hang time, with the result being that the entire trajectory can be reconstructed to a high level of accuracy. The trajectory can then be extrapolated to find the total distance the ball would have traveled had it eventually reached field level. It is the extrapolated distance—call it R—that I now want to investigate to see how well it is determined by the initial velocity vector.
But first a digression to bloviate a bit about this technique. There are three parameters in my model that are adjusted to fit the data: an average drag coefficient Cd plus two components of spin, wb and ws. The drag coefficient governs how rapidly the ball is slowing down as it moves through the air. The sidespin ws determines how much the ball deflects horizontally from its initial spray angle (hook or slice) during its flight. The backspin wb acts along with gravity to determine the hang time. Given three pieces of information—the x,y,z coordinates of the ball at the hang time—plus the initial velocity vector, it is always possible to find a solution, i.e., the corresponding Cd, wb, and ws. Once those parameters are known, the full trajectory can be calculated to a very high level of accuracy, including the extrapolation to field level to find the total distance R.
The validity of this technique has been verified with dedicated experiments I conducted a few years ago using a portable TrackMan device. Since TrackMan measures the full trajectory, one can compare it directly with the one determined by the technique I have described. It works remarkably well. Simulations confirm that finding and show that, once the initial velocity, the landing point, and hang time are specified, there is very little wiggle room left over for determining the rest of the trajectory. The technique is very powerful and one that I have utilized many times for baseball analysis.
Okay, end of my digression and back to the analysis. The first figure is a histogram of distances R for a small subset of home runs with an initial speed between 99 and 101 mph and an initial launch angle between 240 and 280. I will further restrict the analysis to homers for which the air density, which depends on temperature and elevation and affects the air resistance, is confined to a ±2% range. The blue-hatched plot includes all 281 home runs satisfying my conditions. There is considerable breadth to the R distribution, extending from below 380 ft to above 430 ft, with a mean of 403.9 ft and a standard deviation of 16.1 ft. There is nothing very special about the speed or launch angle range I chose for the analysis, except that they were the most common values in the data set. Suffice it to say that my conclusion does not depend on those values and persists throughout the full data set. We are forced by the data to the following conclusion:
The initial velocity poorly determines the landing point.
That is the primary conclusion of my investigation. The remainder of this article will focus on three possible reasons why R is so poorly determined by the initial velocity:
variation in wind, drag coefficient, and backspin.
First let’s talk about wind. In my analysis, wind was not included in the fitting procedure. Instead I looked at events in covered stadiums, including retractable roof stadiums with the roof closed, for which wind is not an issue. Restricting the analysis to these stadiums reduces the sample size to 49 events, for which the distribution is given by the red histogram. Statistically speaking, this distribution looks identical to the previous one, with a mean of 405.5 ft and a standard deviation is identical, 16.1 ft. I conclude that wind is not the primary contributing factor to the breadth of R.
Now let’s take a look at the variation in the other two factors by referring to the second figure, a scatter plot of Cd vs. wb for the 281 homers satisfying the initial conditions. The points are color coded by total distance R, with red, black, and blue corresponding to R<395, 395≤R≤415, and R>415 ft, respectively.
There is a wealth of information contained in this plot, and I will now summarize the essential features and what they teach us.
- For a given value of wb, R decreases as Cd increases. This certainly makes good sense physically, since drag is expected to reduce the distance. Furthermore, the spread of Cd values for a given wb suggests a possible variation in drag coefficient from one baseball to another. This is certainly new information and something we would like to know more about. I’ll save that for another day.
- For a given value of Cd, R increases as wb increases. Again this makes sense, since larger backspin keeps the ball in the air longer so that it travels farther.
- Contours of constant R are roughly parallel diagonal lines on this plot, extending from lower left to upper right, with the higher R lines lying lower. As one climbs along one of these lines of constant R, both Cd and wb increase. However, the tendency of the increased Cd to reduce R is counterbalanced by the tendency of wb to increase R, thereby keeping R constant.
- There is a moderately strong positive correlation between Cd and wb, suggesting that the drag on a baseball increases with increasing spin, all other things equal. While such an effect is well known for a golf ball, it has only been speculated for a baseball. While I would not characterize the evidence here as being a smoking gun, it certainly is suggestive. Once again, this is something we’d like to understand better.
So, let’s summarize what we have learned. We have found that the initial velocity vector is not sufficient to determine the total distance traveled, and that is the primary conclusion of this study. We have also found that wind is not the major cause. We have further shown that variation in both the drag coefficient and the backspin accounts for the spread of distance values, although some of the effects—particularly the correlation between drag and spin—are quite subtle. There are suggestions of a spin-dependence as well as a ball-to-ball variation to the drag on a baseball.
I don’t want anyone to get the impression that this is a completed piece of research. It is not. There are some annoying puzzles in the data that I have not completely sorted out yet. For sure, the conclusion about the initial velocity not determining the landing point is very firm and not likely to change with additional data and analysis. However, my conclusions about the reasons are still tentative, and I suspect there is a lot more to be gleaned from data such as these. In particular, it would be very nice to use the techniques I described here to analyze a larger data set that is not restricted only to home runs. I look forward to continuing this research.