June 8, 2010
Today Tommy Bennett takes a stab at some mathematical modeling regarding the possibility of a pitcher throwing a perfect game, a feat to which we baseball fans are almost in danger of becoming jaded, given the fact that we've seen two official ones plus another with a big ol' asterisk attached in roughly one-third of a season. Last week, I crunched some numbers on behalf of San Jose Mercury News writer Dan Brown (no, not that guy), who once upon a time gave my Futility Infielder site its first mainstream media mention, not to mention its long-enduring slogan, "the scrappiest place on earth." At the risk of exposing the limitations of my mathematical skillz — I hail from the Liberal Arts wing of BP, which means I can only wade so far out into the sea of spreadsheets before I'm over my head — the least I could do was return the favor by adding some back-of-the-envelope figgerin' to his assignment to investigate the sudden spate of perfectos.
Like Tommy, I subscribe to the fact that the increased number of games is a major explanation for so many hitless wonders. In a given season there are roughly twice as many games in the 30-team, 162-game era as there were in the 16-team, 154-game era. Going back to 1901, when the AL began play in parallel with the NL, about 0.063% of games wind up as no-hitters, and 0.005% of games wind up as perfect games. That means under current conditions we should expect to see about three no-hitters per year, and a perfect game every four years. Note that the latter number is about one-fifth of Tommy's starting-point estimate based upon the likelihood of a pitcher stringing together nine consecutive 1-2-3 innings.
If we set aside the 19th century and divide major league history into pre-expansion and post-expansion eras (1901-1960 and 1961-present), we see that the rates of no-hitters across the two eras are pretty similar, but the rates of perfect games are not (note that I'm excluding Don Larsen's 1956 perfect game because it happened in postseason play):
Era NH + PG NH only PG 1901-1960 0.067% 0.065% 0.002% 1961-2010 0.060% 0.053% 0.007% 1901-2010 0.063% 0.058% 0.005%
While the rate of no-hitters has gone down slightly in the post-expansion era, the rate of perfect games has skyrocketed, becoming about 3.4 times more likely as it was in the pre-expansion era.
Looking at the results of a single year, there's really no sensible interpretation for the distribution of no-hitters and perfect games other than randomness. The standard deviation for the percentage of no-hitters in a given year is 0.056%, which means that about two-thirds of the time we should expect to see between 0.3 and 5.7 no-hitters per year given the current schedule of 4,860 team-games per year (30 x 162). Because we're obviously bound by the number zero at the low end, we start to see that we need a few years worth of data to begin assessing the frequency for no-hitters fairly. As for perfect games, we need an even larger sample.
If we further divide modern baseball history by decades starting in 1901 (so, 1901-1910, through 2001-2010), we find that the frequency of no-hitters has a very strong inverse correlation with scoring levels (r = -0.8), which is to say that the lower the scoring rate, the more likely there is to be a no-hitter, and the higher, the less likely. The current decade (of which we're nearing the end) rates as the second least-likely one in which to throw a no-hitter:
Decade PG% NH% R/G 1911-20 0.000% 0.116% 4.04 1961-70 0.009% 0.105% 4.03 1901-10 0.008% 0.087% 3.92 1951-60 0.000% 0.081% 4.37 1971-80 0.000% 0.070% 4.21 1991-00 0.009% 0.055% 4.97 1941-50 0.000% 0.048% 4.35 1981-90 0.007% 0.044% 4.46 1931-40 0.000% 0.041% 5.20 2001-10 0.009% 0.037% 4.80 1921-30 0.004% 0.032% 4.96
I realize that scoring levels aren't the most direct route to measuring no-hitters and perfect games — a focus on batting average or on-base percentage would provide a more granular look — but in terms of the conception of high- or low-offense eras, it was what I chose to work with. In any event, if we divide the data into even larger chunks than decades, we get an even better correlation between scoring rates and no-hitter frequency. Splitting the 110 years into five 22-year chunks, we get a correlation of r = -.95, confirming that the larger our sample size, the more predictive scoring rates are of no-hitters.
On the other hand, if we check the correlation between scoring levels by decade and perfect game frequency, we find that the relationship is essentially random (r = .01), and even if we up it to the 22-year samples, we only get a correlation of r = -.23, which is fairly faint. Remember, this is covering somewhere between 25,000 and 50,000 games per decade, and yet there's really no pattern we can spot at that level, nothing that particularly clues us into the fact that an era — to say nothing of a sliver of a season, as in 2010 to date — should be more or less likely to yield a perfect game.
Anyway, hopefully I didn't stray too far from the strike zone while landing a couple of quotes adjacent to some from Bert Blyleven, a pitcher for whom I've got some affection, not to mention Ron Gardenhire, whose self-description as a futility infielder happened more or less in parallel with the genesis of my site back in 2001. Fun company to keep...