In the bottom of the seventh inning of an August matchup between the San Francisco Giants and Washington Nationals, Erik Miller struck out Luis García Jr. on a slider eight inches below the bottom of the zone. García tries to make a late adjustment to reach out and make contact, but the pitch is much too far from where he expected it when he started swinging. He flails, misses the pitch completely, then turns and hangs his head as he walks back to the dugout.
It’s easy enough to note that García was fooled, but why was he fooled? And why does this happen to hitters across the league every night? Answering this disconnect between a batter’s expectations and their observations of where a pitch would and did cross the plate has been the topic of study and discussion for years, but only now do we have the public data to examine it in more detail. By leveraging both pitch tracking and newly released bat tracking data, I examined how batters track pitches, with what amount of measurement error, and how that pitch tracking influences their swing shapes and swing decisions. What emerges is a dynamic process in which early, uncertain information initiates the swing, while later, refined information influences whether the hitter follows through or takes.
To explore this topic fully I have to cover a lot of ground. First I’ll walk through the new bat tracking data from Statcast and how we can use it to infer when and where a batter expected a pitch to cross the plate. From there I’ll look at how batters track pitches, and how we can use that knowledge to create a pitch detection model that mimics a batter’s behavior. Finally, I’ll tune that model to find how to best predict both the shape of a batters swing and their swing/take decision. The results are tentative, but they hold interesting implications for how we evaluate hitters and pitchers and how teams should develop pitchers to take advantage of batter’s visual limitations.
Bat Tracking
Though fans have had access to pitch tracking data for nearly two decades now, it wasn’t until 2024 that we started to get the same for bat tracking. Statcast’s initial bat-tracking outputs included bat speed and swing length, and this year they expanded to stance metrics, attack angle and direction, intercept point, and swing path tilt. For this work I focused on those final two—intercept point and tilt—which together give us an idea of what the batter expected the pitch to do. To illustrate this, consider the following two plots of Shohei Ohtani’s well-hit balls in 2024.

Swing path tilt is defined by Statcast as “the angular orientation of the ‘plane’ of the swing, as compared to the ground” where “higher angle is a ‘steeper’ swing (further from horizontal) and a lower angle is a ‘flatter’ swing (closer to horizontal).” A swing’s intercept point is the “point at which their bat is nearest to the baseball on a swing (whether or not they make contact). This can be measured either relative to the front of home plate (closest to the pitcher) or to the batter’s center of mass.” Looking at our plots, two clear patterns emerge. First, Ohtani’s swing path tilt increases as pitch height drops, and his intercept point in the “y” direction (from plate to pitchers mound) gets deeper toward the plate as the pitch location moves further away and as velocity increases. Using these relationships on well-hit balls we can model a batter’s expected tilt and intercept point as a function of pitch type, location, and velocity[1]. When the actual tilt of a swing was greater than what the model expected for that batter on that pitch we can assume they swung under the pitch, indicating the pitch was higher than where they expected. When the intercept point was further toward the pitcher than we expected we can assume they were ahead of the pitch, indicating the pitch was slower than what they expected.
Modeling Pitch Detection and Tracking
Batters predict the trajectory of the pitch by interpreting both which pitch type is coming and how the pitch would continue to move if it were that pitch type. They do this by tracking the release and trajectory of the ball’s flight and subconsciously comparing that trajectory to those that they’ve seen from pitches throughout their career. If the pitch gets released straight toward the plate with little arc, it likely resembles a fastball. If it “pops” out of hand and follows a more curved arc, it is likely a curveball.[2] We can replicate this process with a classifier model, which takes information about the pitch’s trajectory and outputs probabilities for each of the possible pitch types. The specific inputs used for the classification model are shown in the figure below.
By combining these probabilities with the expected movement of each of those pitch types given the pitcher’s arm angle we are able to predict the expected location the given pitch would have crossed the plate had it been any one of those possible pitch types. Within the brains of the computer this results in something similar to the plot below, where the X indicates the actual location of the pitch and the cloud represents the distribution of possible locations for the pitch. Note in this case the pitch had the highest probability of being a fastball (34%), thus the cloud of expected locations is darkest where a fastball released similarly would have crossed the plate. The actual pitch type was a SL, which was given a six percent chance based on the pitch’s release and initial trajectory.
Tuning the Expected Location Model: Swing Initiation
To refresh: At this point we have an estimate of whether the batter was ahead of or behind a pitch and where they were under or over it based on their bat tracking data on the pitch. We also have a model that predicts the expected location and velocity of a pitch using the same information that a batter would be using to make these predictions. To understand better how the hitter’s prediction process works, we simply need to tune our model to match what we observe in the bat tracking data.
The first step is looking at our expected location information and summing the probability the pitch would have crossed the plate higher than it actually did, then doing the same for our expected velocity information. These in turn tell us the probability a swing would be under a pitch or ahead of a pitch. Doing this for every pitch and seeing how often those probabilities match realities allow us to score the performance of the model.
When we run our model using the raw data from the aforementioned inputs, we find that it does an incredible job at predicting what pitch type a pitch is but does a horrible job predicting these under/over and ahead of/behind likelihoods. The reason for this is simple: the computer model is seeing perfect information while the batter’s eyes are not. The model can tell the precise difference in a 32 and 31.5 degree arm angle; the human eye from sixty feet away cannot. To mirror this uncertainty we replace our raw inputs with bins, grouping each input into buckets rather than exact values. For example, instead of a precise arm angle we could use six bins of arm angle ranging from over the top to submarine. We then optimize the number of bins for each input and the period of time for the decision point (how long after release we measure the trajectory) to best predict whether a swing would be over/under or ahead of/behind a pitch.[3]
After running that optimization process hundreds of times we find our predictions performed best when we used a very early decision point time and highly uncertain estimates of the pitch’s velocity, its release angle, and its arm angle, and relatively more precise estimates of how far it traveled up to the decision point and how arced its trajectory was during that time. Under the hood, we find much of that improvement comes from better predicting the likelihood a batter would be ahead of or behind a pitch, with more modest improvements for predicting over/under likelihood. This suggests that the decision of when and how a swing is initiated happens very early after a pitch is released and at that point is based on highly uncertain estimates of its type and trajectory, based mostly on its initial movement out of hand.
Let’s go back to that slider to García to try to make it a little more tangible, and thus digestible. For that pitch location, García typically tilts around 43.6 degrees, and when combined with the pitch’s velocity, we’d expect an intercept point roughly 32 inches in front of his body. However, his actual tilt was around 30 degrees, and his actual intercept point was nearly 36 inches in front of him, confirming what we see in the video of him being ahead of and over it. Applying our newly optimized pitch detection model to the release and trajectory we see the model gave a 96% probability that he would in fact swing over it and a 71% probability that he would be ahead of it.
Tuning the Expected Location Model: Swing Decision
At this point we have a better understanding of why swings look the way they do, but what about the decision to swing or not to swing itself? For that we can return to our expected location model. Instead of predicting where a pitch would be relative to where it was, we can instead use our expected locations to predict whether a pitch would be in the zone, or out of the zone. Assuming that batters’ swing probabilities are heavily influenced by a pitch’s called strike probability, we can perform our optimization process again with the goal of maximizing the correlation between swing decisions and in-zone probability. This yields some surprisingly opposite results from our previous optimization.
For predicting swing/take decisions, the model performs best with very precise trajectory information captured for a much longer period of time from release. At the moment they make their final decision they likely now have a better, three-dimensional estimate of where that pitch is headed, versus the noisy two-dimensional estimate they used to initiate their swing. Looking again at our example pitch, our re-optimized model has better honed in on where the pitch ended up while still suggesting there was a 64% probability the pitch would be in the zone. With two strikes this was clearly more than enough to force García’s hand(s) and get the K.
Discussion
These two sets of results support a multistage process for a batter’s swing. Within the first 100 milliseconds, a batter uses a highly uncertain and largely two-dimensional view of the trajectory to decide whether to initiate their swing. This initiation strongly influences the eventual tilt and especially the intercept point of the swing, and batters who are better able to observe this early trajectory information will more often be on time with the pitch and require fewer late adjustments to their path.
Once the swing is underway, the hitter enters an adjustment and decision making stage. Here they refine their estimate of where the pitch is headed using a more three-dimensional estimate of its trajectory. While we lack the data (at least on the public side) to determine the type and magnitude of the adjustments they make during this stage, our results suggest that the ultimate adjustment—of whether to commit to the swing or to take—can happen extremely late in flight toward the plate. Though it’s likely how well a batter tracks a pitch during the early stage correlates to how well they track it during the later stage, the difference in demands at each period speak to a more nuanced understanding of whether a batter with good swing decisions has good eyes or an aptitude for adjustments.
This framework also carries implications for pitching development and analysis. Since initiation depends on noisy early cues, pitchers can exploit tunneling strategies differently depending on pitch type. A changeup in the zone may only need to be tunneled with that pitcher’s fastball through the first stage, long enough to disrupt timing knowing the batter likely won’t take. Conversely, a slider below the zone may need to be tunneled longer, as a disruption in timing won’t matter if the trajectory makes itself obvious during the second stage and doesn’t generate a chase.
Finally, when studying tunneling or pitch detection we must always consider the point of view of the batter, making requisite adjustments for the angles at which they view the pitch, and the levels of uncertainty with which they track its trajectory.
Conclusion
There is countless more work to be done in this area—improvements in methodology, incorporation of additional data, or greater personalization of how these behaviors vary for individual participants. New data open up new possibilities, and it’s exciting when we’re able to study baseball in ways that would have been more difficult if not impossible in years past. This project focused on how batters process information, but more broadly it shows how we can leverage tracking data in creative ways to better understand batters and hitters alike. I look forward to seeing what comes next as we all continue to dive in.
[1] The model specifications are swing_path_int ~ 1 + (plate_z|batter/pitch_group) and int_y ~ 1 + (tf + plate_x_bat_flip|batter/pitch_group)
[2] They also use pre-pitch information including knowledge of that pitcher’s usage tendencies or the pitcher’s arm speed or potentially the ball’s seam orientation, but these are considered out of scope for this work.
[3] For the nerds: I’m using a CatBoost multiclass model and Optuna for the optimization with the target of minimizing the average Brier loss in under probability and ahead probability.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now