keyboard_arrow_uptop

Today Tommy Bennett takes a stab at some mathematical modeling regarding the possibility of a pitcher throwing a perfect game, a feat to which we baseball fans are almost in danger of becoming jaded, given the fact that we've seen two official ones plus another with a big ol' asterisk attached in roughly one-third of a season. Last week, I crunched some numbers on behalf of San Jose Mercury News writer Dan Brown (no, not that guy), who once upon a time gave my Futility Infielder site its first mainstream media mention, not to mention its long-enduring slogan, "the scrappiest place on earth." At the risk of exposing the limitations of my mathematical skillz — I hail from the Liberal Arts wing of BP, which means I can only wade so far out into the sea of spreadsheets before I'm over my head — the least I could do was return the favor by adding some back-of-the-envelope figgerin' to his assignment to investigate the sudden spate of perfectos.

Like Tommy, I subscribe to the fact that the increased number of games is a major explanation for so many hitless wonders. In a given season there are roughly twice as many games in the 30-team, 162-game era as there were in the 16-team, 154-game era. Going back to 1901, when the AL began play in parallel with the NL, about 0.063% of games wind up as no-hitters, and 0.005% of games wind up as perfect games. That means under current conditions we should expect to see about three no-hitters per year, and a perfect game every four years. Note that the latter number is about one-fifth of Tommy's starting-point estimate based upon the likelihood of a pitcher stringing together nine consecutive 1-2-3 innings.

If we set aside the 19th century and divide major league history into pre-expansion and post-expansion eras (1901-1960 and 1961-present), we see that the rates of no-hitters across the two eras are pretty similar, but the rates of perfect games are not (note that I'm excluding Don Larsen's 1956 perfect game because it happened in postseason play):

```Era         NH + PG   NH only      PG
1901-1960    0.067%    0.065%    0.002%
1961-2010    0.060%    0.053%    0.007%
1901-2010    0.063%    0.058%    0.005%```

While the rate of no-hitters has gone down slightly in the post-expansion era, the rate of perfect games has skyrocketed, becoming about 3.4 times more likely as it was in the pre-expansion era.

Looking at the results of a single year, there's really no sensible interpretation for the distribution of no-hitters and perfect games other than randomness. The standard deviation for the percentage of no-hitters in a given year is 0.056%, which means that about two-thirds of the time we should expect to see between 0.3 and 5.7 no-hitters per year given the current schedule of 4,860 team-games per year (30 x 162). Because we're obviously bound by the number zero at the low end, we start to see that we need a few years worth of data to begin assessing the frequency for no-hitters fairly. As for perfect games, we need an even larger sample.

If we further divide modern baseball history by decades starting in 1901 (so, 1901-1910, through 2001-2010), we find that the frequency of no-hitters has a very strong inverse correlation with scoring levels (r = -0.8), which is to say that the lower the scoring rate, the more likely there is to be a no-hitter, and the higher, the less likely. The current decade (of which we're nearing the end) rates as the second least-likely one in which to throw a no-hitter:

```Decade       PG%       NH%     R/G
1911-20    0.000%    0.116%    4.04
1961-70    0.009%    0.105%    4.03
1901-10    0.008%    0.087%    3.92
1951-60    0.000%    0.081%    4.37
1971-80    0.000%    0.070%    4.21
1991-00    0.009%    0.055%    4.97
1941-50    0.000%    0.048%    4.35
1981-90    0.007%    0.044%    4.46
1931-40    0.000%    0.041%    5.20
2001-10    0.009%    0.037%    4.80
1921-30    0.004%    0.032%    4.96```

I realize that scoring levels aren't the most direct route to measuring no-hitters and perfect games — a focus on batting average or on-base percentage would provide a more granular look — but in terms of the conception of high- or low-offense eras, it was what I chose to work with. In any event, if we divide the data into even larger chunks than decades, we get an even better correlation between scoring rates and no-hitter frequency. Splitting the 110 years into five 22-year chunks, we get a correlation of r = -.95, confirming that the larger our sample size, the more predictive scoring rates are of no-hitters.

On the other hand, if we check the correlation between scoring levels by decade and perfect game frequency, we find that the relationship is essentially random (r = .01), and even if we up it to the 22-year samples, we only get a correlation of r = -.23, which is fairly faint. Remember, this is covering somewhere between 25,000 and 50,000 games per decade, and yet there's really no pattern we can spot at that level, nothing that particularly clues us into the fact that an era — to say nothing of a sliver of a season, as in 2010 to date — should be more or less likely to yield a perfect game.

Anyway, hopefully I didn't stray too far from the strike zone while landing a couple of quotes adjacent to some from Bert Blyleven, a pitcher for whom I've got some affection, not to mention Ron Gardenhire, whose self-description as a futility infielder happened more or less in parallel with the genesis of my site back in 2001. Fun company to keep…

### Latest Articles

11/17
9
11/17
1
• ##### Short Relief: Lost to Time
11/17
0
You need to be logged in to comment. Login or Subscribe
aquavator44
6/08
Could the change in bat specifications help at all? MLB decreased the maximum barrel diameter from 2.75" to 2.61" over the offseason to keep bats from breaking as much. Perfect games are too random an occurrence for this change to increase the likelihood of one happening, but is there a way to tell if this is having an effect on scoring? Or is there just too much noise in run-scoring levels?
jjaffe
6/08
Interesting point, but one I'd have to leave to the physics experts. Scoring is down 3.1 percent from last year, a fluctuation typical of recent history. Nonetheless, we haven't hit the warmest part of the season, when offensive rates typically improve, it's certainly not a whole lot to get worked up about.
aquavator44
6/08
I found it interesting that it wasn't a bigger story before the season, actually. I didn't even know about the change until a few weeks ago.
BillJohnson
6/08
I'm not convinced there's a thing going on here other than small sample size, but if there is, consider: what the data are saying is that the fraction of no-hitters that are perfectos is currently the greatest it's ever been. This, I claim, is not hard to understand at all. The things that make no-hitters imperfect are errors and walks. Fielding percentages today are the highest they've ever been, for reasons partly technological (bigger and better gloves, well-groomed fields) and maybe partly procedural (changes in the willingness to call things errors rather than hits). Walks are also rarer than at many (though not all) times during baseball history. These factors taken together should certainly lead to a higher percentage of perfectos among no-nos, shouldn't they?
jjaffe
6/08
Your point makes a good deal of theoretical sense, and probably in the grand scheme, but I'm not sure the data entirely bears it out, at least in terms of the very recent past. Walk rates have been more or less flat for awhile and are actually up a bit relative to recent years (see http://www.baseballprospectus.com/statistics/sortable/index.php?cid=392886), and the current MLB fielding percentage (.983) is actually down a point from the .984 it held at from 2007-2009. Ten years ago it was .981, 20 years ago it was .980, 30 years ago it was .978 - so I think we're talking a very, very small effect at best.
BillJohnson
6/08
Well, it's really the comparison to 30 (and many more) years ago that I meant to make, not anything about the "very recent" developments. Sure, the difference between current and 07-09 is insignificant (statistics of small numbers, etc.), just as the current "cluster" of perfectos probably is. It's the longer term, where "modern" fielding and perfectos on a 20-year time frame may be correlated, that needs to be examined to see if my hypothesis explains what's been seen.