May 2, 2007
The Big Picture
What is the optimum level of scoring for Major League Baseball? The answer to the question tells us when we need to tinker with the game. If a long-term trend shows runs are too far from the optimum, steps can be taken to restore balance. Rule changes, equipment redesign, and/or park modifications could all be used to bring runs in line with a long-term norm.
So how to find this optimum? Once again, the attendance data enters the picture. Our desire is to find a function that transforms runs per game for a season into attendance per game. Where that function reaches a maximum tells us the optimum scoring level. Looking at raw attendance per game played, as in previous essays, won't work here. We need a normalization function to take into account the underlying growth of the industry over time. The first one I'll try simply takes the percentage change from the previous season. The following scatter plot graphs runs per game versus the percentage change in league attendance per game:
I should note that there are two outliers removed from this data. In 1919 and 1946, the end of the World Wars spiked attendance; those two numbers were so far out from all the others I decided to remove them from this chart.
As you can see, the data looks fairly random. However, around 8.7 runs per game, there are very few negative years, and no big negatives. So our eyeballs suggest that one value, but let's see what the math tells us. On the scatter plot are two trend lines, one a second-order polynomial, the other a fourth-order polynomial. Both these polynomials are good for fitting data that you expect to exhibit a 'hump.' As you can see, both these trend lines display a maximum around 9.0 runs. Bear with me, but using calculus, the first derivative of these equations allow us to calculate these maximums. For the quadratic, the optimum scoring level is 9.2 runs per game. The fourth-order equation gives us an optimal run scoring level of... 8.8. Score one for the eyeball.
Please notice the low R-squared values for these equations, meaning they have very little predictive value. A second way to normalize the data puts attendance per game in constant terms relative to 1901. This method is analogous to measuring prices of goods in terms of constant dollars, adjusting for inflation. I used an exponential model of growth, with a rate of .0216. The formula:
Expected Attendance = 3246*e^.0216t
... where 3246 is the average attendance per game in 1901, and t is the number of years since 1901. This number for a season became the analog of the consumer price index (CPI); let's call it the Baseball Attendance Term (BAT). So to put average attendance per game in terms of 1901 attendance:
Constant Attendance = Attendance(year)*BAT(1901)/BAT(year)
So now we can look at attendance in terms of a fixed number. The next graph represents a scatter plot of runs per game versus constant attendance:
Now, I did not remove outliers from this data set. The quadratic trend line doesn't fit the data well at all, maxing out at 11.788 runs per game, but the fourth-order equation does fit. Again, using calculus, the maximum occurs at 9.05 runs per game; again, the R-squared values are very low. In both cases, however, the R-squared values are better for the fourth-order equations, so let's use those.
The two models indicate that the optimum level of runs per game lies somewhere in a range around 8.8 to 9.05 runs per game--let's call it 8.9. As this column progresses, this number will be central to any suggestions about changing the game. Some pundits feel Major League Baseball should take action to lower the number of hit batters. But if that's to happen, we need to ask ourselves how far that change might moves us away from this number--8.9 runs per game. The same applies to intentional walks. If policies designed to decrease those go into effect, can those be made to not cause a drift away from the optimum?
Then there is the question of the designated hitter. How does the league make up for the offense lost if the AL abolishes the DH? What if the even more unlikely scenario of league-wide adoption of the DH takes place? The majors would need to be ready to counter a surge in offense.
Remember, this number, 8.9 runs per game, represents a long-term trend. A season ticking up to 10.5 runs per game is fun once in a while; the probability of setting batting records goes up. Likewise, the low-scoring years give pitchers a chance to enter the record books. But when long-term trends emerge that move scoring too far away from the sweet spot, the league needs to examine how to bring the game back into balance.
The NFL constantly tinkers with its rules for this very reason. They move the placement of kickoffs, play with the field goal rules, even narrow or widen the distance between the hash marks, all with the goal of maintaining a scoring optimum. Baseball needs to keep an eye on 8.9 runs per game not only when it institutes changes, but when the natural flow of the game takes it in one direction for too long a period of time.