Schrodinger's Bat: On Atmosphere, Probability, and Prediction

“All of us could take a lesson from the weather. It pays no attention to criticism.”
–Unknown

This summer we explored a variety of topics related to the wealth of new information available through PITCHf/x. Those topics ranged from profiling pitchers, to evaluating umpires, to the underlying physics involved, and even dissecting plate discipline. Today we’ll revisit the topic of atmospheric effects that we discussed in late May, only this time we’re armed with almost ten times as much data as we had back then. I’ll also have a few words on the differences between models versus reality and probabilities versus predictions.

Slow Ride

Back in May, when we had about 40,000 pitches from ten ballparks courtesy of the PITCHf/x system to analyze, we examined just how much a pitched ball slows down on its way to the plate. Now with over 325,000 pitches and data from 28 ballparks, we can revisit that discussion, adding the additional data points.

You’ll recall that in summarizing Robert K. Adair’s The Physics of Baseball we noted that the force on a moving baseball is proportional to the cross-sectional area of the ball, proportional to the square of the velocity of the ball, proportional to the density of the air, and proportional to the drag coefficient, a dimensionless number that varies with the velocity and “roughness” of the ball. Perhaps counter-intuitively, Adair shows that the drag coefficient for a baseball actually drops from around 0.40 at 75 mph to less than 0.30 at 100 miles per hour. This is because of both the typical velocities that a ball travels at in a major league game and the non-uniform texture of the ball due to the stitches, which creates a turbulent flow. Even so, when taken as a whole the result is that the faster a ball is thrown, the more drag will be created. That is, faster pitches should lose a greater percentage of their velocity on the way to the plate.

In the previous article, I showed a graph that plotted the percentage difference in starting and ending velocities for pitches thrown with little break ranging from 86 to 100 miles per hour; in other words, mostly fastballs and changeups as indicated by a low break length value, which is the maximum deviation from the straight line path. That plot showed a steady and somewhat parabolic increase in percentage difference as velocity increased, ranging from just over 10 percent at 86 miles per hour to just over 12 percent at 100. With the additional data and, just as importantly, including the additional 18 ballparks, we can now re-run the plot with a wider range stretching from 77 to 100 miles per hour:

The plot now includes both the percentage difference in starting and ending velocity in pink using the y-axis on the left, and the average actual velocity difference, which is shown in blue and uses the axis on the right.

As in the previous graph, this one shows the same non-linear increase in percentage difference and, to a lesser extent, in actual difference as the speed of the pitch increases. This again confirms the models discussed by Adair, and tells us that for each additional mile per hour a pitcher gets on his fastball, there are diminishing returns as it loses a greater percentage of its velocity. Further, that difference increases the faster a pitcher throws. So a fastball thrown at 90 miles per hour will cross the plate at 82.3 mph, while one thrown six miles per hour harder will cross at 86.7 mph, a gain of 4.4 mph. Even with the diminishing returns, pitchers still benefit from a greater initial velocity, since it also compresses the already small window of time a hitter has to react to the pitch.

There is an interesting difference in this graph from the one shown in the previous article, however. You’ll notice that the range from 86 to 100 miles per hour mentioned above is lower here, ranging from 8.2 to 11.6 percent. In the previous analysis, I didn’t select a uniform distance from the plate at which to measure. Throughout the season MLBAM changed the distance at which it recorded the starting velocity; it was 55 feet initially, went down to 40 and 45 feet, before ultimately settling on 50 feet (this was after the publication of the first article in late May). For that reason, the plot here uses only pitches recorded at the new standard of 50 feet. In addition, with the inclusion of 18 additional parks it turns out that a couple of those parks, as will be discussed below, seem to have a smaller impact in terms of slowing pitches down. In any case, when weighted by starting velocity, the average major league pitch thrown with little break decelerates an average of 8.8% from the time it is 50 feet from the plate until it reaches the front of the plate.

Moving on, we can break down the numbers above to show the differences between the deceleration of the ball in afternoon as opposed to evening games. The graph below defines afternoon games as those that start before 5pm (as shown on the red line) while evening games are the blue line.

As in the previous discussion of this topic, the deceleration is greater in the evening, as would be expected, since in the evening temperatures and humidity both decrease, which combine to increase the density of the air, causing more friction on the ball. The difference is statistically significant at the 99 percent confidence level, and so clearly the PITCHf/x system does pick up the differences in air density. On average, that difference equates to less than a quarter of a mile per hour as the ball reaches the plate; although real, the difference is likely not perceptible at the practical level, which fits with our experience (for example, neither fans nor players, that I’m aware of anyway, are able to differentiate between pitches thrown during the day and those thrown at night).

Another way to look at the effect of air density is to slice the data by the temperature recorded at the start of the game. While not perfect–as in the case of game three of the NLDS at Coors Field, where the game time temperature was 73 degrees but had dropped into the mid 50s by the second inning–performing an aggregate calculation by temperature should give us an idea of how temperature affects the deceleration of the ball. The following graph records the average percent difference in starting and ending velocity on pitches with little break, for games played at temperatures ranging from 31 to 92 degrees:

The graph shows quite a bit of variation because of the small sample sizes at some temperatures, but the overall trend–as expressed through the dotted trend line (a linear regression with a correlation coefficient of r=0.61, p<0.01)--clearly indicates a relationship between temperature and the amount a pitched ball slows down on its way to the plate. The effect is equivalent to about half a percent for every 10 degrees, or about a half mile per hour.

Having shown that pitches do indeed decelerate as expected, and do so differently by time of day and temperature, the next step is to examine the differences in ballparks. However, since there are 28 parks, a graphical representation doesn’t work particularly well; the following table lists the park, the number of pitches analyzed (using the same definition of starting location and break length used in the graphs above), the average game time temperature, and the average percentage difference in starting and ending velocities for pitches between 81 and 97 miles per hour*.


Ballpark                       Pitches    Temp PctDiff
------------------------------------------------------
Safeco Field                      6721      68   11.2%
Rogers Centre                     3411      70   11.0%
Petco Park                        6868      71   10.9%
U.S. Cellular Field               7896      76   10.4%
Busch Stadium                     6182      84    9.8%
Dodger Stadium                    7046      76    9.6%
Minute Maid Park                  4796      73    9.5%
Fenway Park                       4170      74    9.4%
Great American Ball Park          4917      84    9.2%
PNC Park                           933      73    9.2%
------------------------------------------------------
Chase Field                       5003      81    9.0%
McAfee Coliseum                   6035      67    8.8%
Shea Stadium                      2361      74    8.8%
Miller Park                       4961      75    8.8%
Hubert H. Humphrey Metrodome      5422      69    8.3%
Angel Stadium of Anaheim          5908      80    8.3%
Jacobs Field                      2235      73    8.3%
Tropicana Field                   2177      72    8.1%
Kauffman Stadium                  4477      81    8.0%
Yankee Stadium                    2679      74    8.0%
------------------------------------------------------
Coors Field                       4060      78    7.8%
Turner Field                      5844      84    7.4%
Rangers Ballpark in Arlington     7019      88    7.4%
Wrigley Field                     6065      75    7.3%
AT&T Park                         5919      65    7.2%
Citizens Bank Park                4080      79    7.1%
Dolphin Stadium                   1536      86    5.5%
Comerica Park                     5374      74    5.2%

This indicates that Safeco Field slows the ball the most, at over 11 percent while Comerica does so at just five percent. There doesn’t seem to be much correlation here with elevation or the tendency of the park to play as a hitter’s or pitcher’s park, although Citizens Bank Park, Rangers Ballpark at Arlington, and Coors Field are all near the bottom. That said, there is a small negative correlation with temperature (r=-0.32, p<0.10). The fact that the two lowest values for Comerica and Dolphin Stadium are so much lower than the rest is troubling, and makes it seem as if there is something going on at those parks, perhaps in the way the system is calibrated, to yield such low values. As a result, I wouldn't necessarily take these numbers at face value.

Finally on this topic, in a previous column we also took a stab at determining how the break of the ball differs at different parks using the pFX value recorded by the PITCHf/x system. This value, reported in inches, is a combination of the vertical and horizontal movement of the pitch relative to the straight line drawn between the starting and ending locations of the pitch. The value is defined as the hypotenuse of the right triangle formed by the other two values, with the effects of gravity removed from the vertical component. The result is that these components reflect the movement of the pitch due to the Magnus force generated on the spinning baseball. While perhaps not the most intuitive measure–since it leads to large positive vertical movement values for fastballs that don’t drop as much as a pitch thrown without spin–it should give us a pretty good way to assess whether atmospheric effects impact the flight of the baseball.

In order to look at this question, I created two data sets: one for fastballs and one for breaking balls (most curveballs and some sliders). Then I filtered the data such that only pitchers who threw 25 or more fastballs or breaking balls at a particular park and at all other parks were included. Finally, I computed the ratio of the average pFX value for each pitcher at the specific park compared to his pitches at all other parks, and then derived a weighted average of those ratios across all pitchers. The result is a value relative to 1.00, which can be thought of as a park effect for pitches**. And, for good measure, I included a weighted average of the difference in pFX over all the comparisons in order to get a feel for the magnitude of the difference in movement. For breaking balls, this procedure yielded 204 individual pitcher ratios from a total of over 29,000 pitches; for fastballs it was 886 ratios and approximately 81,000 pitches.

The results of all this can be summarized in the table below, first sorted by the fastball ratio:


                         Fastballs             pFX Breaking Balls      pFX
Ballpark                   Pitches   Ratio    Diff Pitches   Ratio    Diff
--------------------------------------------------------------------------
Coors Field                   6271    0.79   -2.73     899    0.86   -1.34
PNC Park                      1360    0.90   -1.14       0     n/a     n/a
Comerica Park                12727    0.92   -1.01    1024    1.23    1.41
AT&T Park                    10845    0.92   -1.01     918    1.16    0.92
Minute Maid Park              8446    0.92   -0.96    1396    1.22    1.91
Fenway Park                   9119    0.93   -0.90    1184    1.01    0.03
Turner Field                  9092    0.93   -0.86     321    1.09    0.59
Jacobs Field                  6826    0.95   -0.63     653    1.00    0.02
Yankee Stadium                4687    0.96   -0.43     356    1.05    0.51
McAfee Coliseum              14761    0.97   -0.40    2961    1.02    0.13
--------------------------------------------------------------------------
Wrigley Field                10711    0.97   -0.35    1713    0.90   -0.98
Angel Stadium of Anaheim     11975    0.98   -0.25    2089    1.04    0.33
Rangers Ballpark             16197    0.98   -0.22     956    0.87   -1.21
Miller Park                   9521    0.99   -0.18    1151    0.98   -0.30
Great American Ball Park      8801    0.99   -0.15     771    1.09    0.64
Dolphin Stadium               2867    1.00   -0.05     146    0.94   -0.25
Busch Stadium                13830    1.00   -0.03    1913    0.90   -0.88
Citizens Bank Park            6463    1.01    0.08     749    1.22    1.49
Kauffman Stadium             10972    1.02    0.20     636    0.92   -0.75
Hubert H. Humphrey           11627    1.02    0.20    1312    1.09    0.58
--------------------------------------------------------------------------
Rogers Centre                 7606    1.05    0.50    1439    1.10    0.68
U.S. Cellular Field          14176    1.07    0.80    1207    0.83   -1.45
Tropicana Field               6502    1.07    0.83     994    1.23    1.55
Dodger Stadium               10669    1.08    0.92    1446    0.86   -1.35
Safeco Field                 16878    1.08    0.98     919    0.89   -0.97
Shea Stadium                  4399    1.11    1.40     693    1.04    0.28
Chase Field                  11574    1.13    1.47    1401    0.94   -0.37
Petco Park                   12298    1.13    1.57    1192    0.92   -0.66

At the low end, Coors Field seems to have the biggest effect with fastballs getting only 79 percent as much movement (equating to 2.73 inches less), compared to fastballs thrown by the same pitchers at other parks. Keep in mind that this includes both vertical and horizontal components and, when broken down, indicates that fastballs at Coors drop more–roughly two inches, presumably because the backspin on the fastball doesn’t counteract gravity as well in air that is less dense–and also don’t tail as much (about 2.3 inches less). On the other end of the spectrum, Petco Park seems to enhance fastball movement 13 percent by keeping the ball up (one and a half inches) and allowing it to tail roughly a half inch more. Overall, this list seems to accord pretty well with our expectations, with Coors being the outlier, and places with denser air like Petco and Dodger Stadium on the other end.

One caution here is that in comparing the average values for pitchers who throw 25 pitches both at their home park and away parks, you necessarily run up against some bias in their away parks since they’ll likely throw more pitches within their division. This can be illustrated by Rockies pitchers also pitching at Petco Park, and vice versa, thereby possibly magnifying (or perhaps canceling out in this case?) the results for those parks.

And now we can sort the same list by ratio for breaking balls (PNC Park did not have enough data):


                         Fastballs             pFX Breaking Balls      pFX
Ballpark                   Pitches   Ratio    Diff  Pitches  Ratio    Diff
--------------------------------------------------------------------------
U.S. Cellular Field          14176    1.07    0.80    1207    0.83   -1.45
Dodger Stadium               10669    1.08    0.92    1446    0.86   -1.35
Coors Field                   6271    0.79   -2.73     899    0.86   -1.34
Rangers Ballpark             16197    0.98   -0.22     956    0.87   -1.21
Safeco Field                 16878    1.08    0.98     919    0.89   -0.97
Busch Stadium                13830    1.00   -0.03    1913    0.90   -0.88
Wrigley Field                10711    0.97   -0.35    1713    0.90   -0.98
Petco Park                   12298    1.13    1.57    1192    0.92   -0.66
Kauffman Stadium             10972    1.02    0.20     636    0.92   -0.75
Dolphin Stadium               2867    1.00   -0.05     146    0.94   -0.25
--------------------------------------------------------------------------
Chase Field                  11574    1.13    1.47    1401    0.94   -0.37
Miller Park                   9521    0.99   -0.18    1151    0.98   -0.30
Jacobs Field                  6826    0.95   -0.63     653    1.00    0.02
Fenway Park                   9119    0.93   -0.90    1184    1.01    0.03
McAfee Coliseum              14761    0.97   -0.40    2961    1.02    0.13
Shea Stadium                  4399    1.11    1.40     693    1.04    0.28
Angel Stadium of Anaheim     11975    0.98   -0.25    2089    1.04    0.33
Yankee Stadium                4687    0.96   -0.43     356    1.05    0.51
Hubert H. Humphrey           11627    1.02    0.20    1312    1.09    0.58
Turner Field                  9092    0.93   -0.86     321    1.09    0.59
--------------------------------------------------------------------------
Great American Ball Park      8801    0.99   -0.15     771    1.09    0.64
Rogers Centre                 7606    1.05    0.50    1439    1.10    0.68
AT&T Park                    10845    0.92   -1.01     918    1.16    0.92
Citizens Bank Park            6463    1.01    0.08     749    1.22    1.49
Minute Maid Park              8446    0.92   -0.96    1396    1.22    1.91
Tropicana Field               6502    1.07    0.83     994    1.23    1.55
Comerica Park                12727    0.92   -1.01    1024    1.23    1.41
PNC Park                      1360    0.90   -1.14       0     n/a     n/a

Here the results are somewhat mixed, although Coors Field still ranks third with a ratio of 0.86, equating to movement of 1.34 inches less (a horizontal movement of 1.6 inches less with a vertical movement essentially equivalent). This time, however, Petco Park has a ratio of less than 1.00, indicating that, despite the assumption of denser air, breaking balls actually break slightly less there than on the road. Although it’s difficult to say exactly why this would be the case (barring a systemic, data, or calculation problem) it’s possible that since a breaking ball isn’t thrown with complete overspin–but rather a combination of over spin and sidespin–the Magnus force generated by the increased friction causes the ball to break more horizontally, but keep it elevated, as we’ve seen with the fastball. In fact, for breaking balls at Petco the average horizontal movement is about a quarter of an inch more, but the vertical movement is just over an inch less. In any case, it seems that comparing pFX (perhaps because of the smaller sample sizes, variability, complexity of the movement or a combination of the three) as applied to breaking balls gives us less information than doing so with fastballs.

Here you can also see that Comerica Park is rated as the park that most affects breaking balls, and when looking more closely, the difference is almost entirely (over two inches) in the vertical component. Given the possibility that there is a systemic problem with the data at Comerica based on the analysis of deceleration above, I would be hesitant to take that measurement at face value.

Notes:

* This value was calculated by first averaging the percentage difference for each mile per hour between 81 and 97 mph (since there was data for all 28 parks in that range) and then averaging those averages. This procedure ensures that a pitching staff that throws harder on average isn’t biasing the results by including more pitches in the upper velocity ranges.

** For example, when looking at curveballs, I logically built the following table of pitchers who threw 25 or more pitches both at Coors Field and at other parks:


                       Coors Field     Other Parks
Name              T    Pitches     pFX Pitches     pFX   Ratio
--------------------------------------------------------------
Matt Morris       R         26    7.74     198   11.18    0.69
Matt Herges       R         38   10.90      29   10.12    1.08
Jeremy Affeldt    L         28    4.86      48    7.02    0.69
Taylor Buchholz   R         56    8.29      61    8.21    1.01
Jeff Francis      L         49    5.54     101    7.75    0.72
Ubaldo Jimenez    R        103    9.92      61    8.05    1.23
Franklin Morales  L         27    6.32      74    9.34    0.68
Total                      327             572
                                     Weighted Averages    0.86

As you can see, the ratio of pFX at Coors to pFX at other parks is calculated for each pitcher. This value is weighted by the total number of pitches thrown to produce the value of 0.86.

—

Probability Is Not Prediction

Before closing this week, I wanted to pen a few words about the nature of probability, prediction, and modeling related to sabermetrics. While this may be the quintessential example of preaching to the choir, at least I’ll feel better after venting a little.

This topic first caught my attention because of the recent articles in the mainstream media reporting on the Diamondbacks‘ run differential and how, in 2007, it ended up being a very poor predictor of their season record. By scoring 712 runs and giving up 732, the Diamondbacks would have been expected to win approximately 79 games. However, at season’s end they won 11 more games than that (actually 10.6) and a division title, a feat which places them ninth on the list of teams since 1901 in bettering their Pythagenpat in terms of games won (it places them 12th in terms of winning percentage over what would be expected) as shown in the table below:


               Actual                            Pythagenpat
Year  Team     W    L   WPct   RS    RA      WPct     W       +W
------------------------------------------------------------------
1905  DET     79   74   .516   512   602     .430    66.2    12.8
2004  NYA    101   61   .623   897   808     .550    89.1    11.9
1984  NYN     90   72   .556   652   676     .484    78.4    11.6
1970  CIN    102   60   .630   775   681     .558    90.4    11.6
2005  ARI     77   85   .475   696   856     .404    65.5    11.5
1954  BRO     92   62   .597   778   740     .523    80.6    11.4
1972  NYN     83   73   .532   528   578     .461    71.9    11.1
1924  BRO     92   62   .597   717   675     .528    81.3    10.7
2007  ARI     90   72   .556   712   732     .490    79.4    10.6
1955  KC1     63   91   .409   638   911     .339    52.6    10.4
1961  CIN     93   61   .604   710   653     .538    82.8    10.2
1932  PIT     86   68   .558   701   711     .493    76.0    10.0
1997  SFN     90   72   .556   784   793     .495    80.1     9.9
2004  CIN     76   86   .469   750   907     .409    66.3     9.7
1977  BAL     97   64   .602   719   653     .543    87.5     9.5
1943  BSN     68   85   .444   465   612     .383    58.5     9.5

While great for fans of the Snakes, the unfortunate aspect of all of this is that you end up with stories that seem to throw the baby out with the bath water by producing quotes like this one from D’backs outfielder Eric Byrnes:

I laugh. I just laugh. Because it doesn’t really apply to what this team is. It doesn’t apply to winning baseball. I mean, I don’t blame the number-crunchers, the computer geeks, for not being able to come up with a formula for how we got here. But there’s a lot more that goes into sports than numbers.

Well. Whatever else may be said, the reality is certainly not that “computer geeks” and “number-crunchers” (not to worry, no offense taken) are frustrated by their supposed inability to produce a perfect formula, as implied in the quote. At its core, any quantitative formula or methodology is an attempt to model things that happen in the physical world. Being a model–and therefore necessarily limited–it cannot (as all researchers in any field understand) take into account every factor that may influence the outcome. That formulas like Pythagenpat have had such overwhelming success (95 percent of teams since 1901 come within eight games of their estimated wins) is a testament to the fact that its model of wins and losses through the underlying relationship of runs scored, runs allowed, and run environment accounts for a significant percentage of the final outcome.

But rather than rending garments and putting on sack cloth and ashes, cases like the 2007 Diamondbacks and those in the table above should–and actually do–serve as an impetus for improving, extending, and testing the limits of the model. And that is just what good analysts and thoughtful reporters are doing as they discuss the roster construction and strategic decisions (for example, the deployment of the bullpen) made by Arizona and its staff. So Mr. Byrnes can rest easy that analysts are not overwhelmed or dismayed, but rather embrace his team’s ability to put wins on the board.

Interestingly, in the same article as the Eric Byrnes quote above, general manager Josh Byrnes had this to say on his team’s success:

In spring training we actually put forth a lot of information, just internally, that this roster composition can win. And internally, we believed in it the whole way. And as the season went along, it proved that as the games got more important, a lot of these players just kept getting better.

While a bit cryptic, it could certainly be the case that Byrnes and his staff did some analysis whereby “beating Pythagoras” was not all that unexpected. If so, that’s great news, since ideas have a way of spreading, and so those additional factors will become part of the positive feedback loop that drives these kinds of efforts.

A subtler but related point in this vein is that some seem to think the models used to discuss events are necessarily predictions and therefore take a “told you so” approach when the end result seems improbable according to the model. But probabilities are not predictions, and so in addition to the fact that the models used to generate the probabilities are incomplete, even events that are unlikely do in fact happen. Only if you could replay the event hundreds or thousands of times could you say with confidence that the model is not useful.

So now go relax and enjoy the postseason and all the variability and randomness that it entails.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Schrodinger’s Bat: On Atmosphere, Probability, and Prediction

Thank you for reading

Latest Articles

Deep League Landscape ’24: Week Four $

MLU: Bratt Frustrates Opposing Hitters $

Box Score Banter: Knuckling (Way, Way) Up B

The Most Dominated Teams of All-Time: 18-19 $

Golden Age: April 19-27 B

Dan Fox

Latest Articles

Deep League Landscape ’24: Week Four $

MLU: Bratt Frustrates Opposing Hitters $

Box Score Banter: Knuckling (Way, Way) Up B