Premium and Super Premium Subscribers Get a 20% Discount at MLB.tv!
May 29, 2012
No No-No, No Cry
Believe it or not, most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.
Craig Glaser is an Application Developer at Bloomberg Sports, where he helped design and implement the algorithms that make Bloomberg’s fantasy baseball tool Front Office tick. He has previously written articles for The Hardball Times, Surviving the Citi, Amazin’ Avenue, and his own site, Sabometrics. A member of SABR, he has recently participated in panels at the SABR Analytics and 50th Anniversary of the Mets conferences. In a prior life, he studied Experimental Economics and Cognitive Psychology at NYU, focusing on how people perceive probabilities, a field of study that continues to color his view of life and the sport of baseball. You can find his musings about sports, probability, and everything else on Twitter @Sabometrics.
When sample size is invoked in baseball research, it is almost always prefaced with the word “small.” Analysts often attempt to identify the true talent level of an individual player, and for that purpose, a single baseball season can be maddeningly brief. We’ve all watched enough Shane Spencers to have some (though often not enough) perspective when we see a Will Middlebrooks come up and have an extremely impressive first few games as a major leaguer. Of course, as anyone who has lived through a mediocre season knows, playing 162 games takes time. The 2011 season featured 2,429 games and 185,245 plate appearances. If you’re not focused on a specific player, the season is anything but small.
I’ve heard that you see something new in every game of baseball you watch. I’m not sure I would go quite that far, but the long season allows for incredibly rare peaks to go along with the typical valleys. No-hitters are one such sort of peak and, while they often say more about the length of the season and the probabilities involved than the skill of the pitcher, it’s always fun to see the pitchers who are good and lucky enough to achieve one get their moment in the sun.
On April 3, 2012, Mets starter Jon Niese completed six innings against the Atlanta Braves without giving up a hit. No Met has ever thrown a no-hitter in the franchise’s 50-year history, and Mets fans have always been eager to see one. Niese gave them hope. I may have been the only Mets fan in the world with mixed feelings about it.
A few weeks before Niese’s outing, I was asked to take part in The 50th Anniversary of the New York Met Conference held at my Alma Mater, Hofstra University. As part of my panel (titled “By The Numbers: Statistics and Analytics”) I was asked to prepare a 5-10 minute presentation on a topic of my choosing. I decided to examine just how unlikely it is that the Mets have never thrown a no-hitter. I had started doing my research, and everything was coming together nicely.
While I had always rooted hard for a Mets no-hitter before, I now had a selfish reason to hope that it didn’t happen in the first month of the season. I knew that my research would be more pertinent and more fun to present if the no no-hitter streak was still intact. So it was with a combination of relief and sadness that I watched Freddie Freeman record a hit in the seventh. I figured that Mets fans could wait another month, and that I’d be able to enjoy the eventual no-hitter more fully after giving my presentation.
My research focused on three questions:
All of these questions can be answered by looking at a binary probability distribution, but in order to do that, you need to estimate one key piece of information—the probability that a start would result in a no-hitter. I used two models to estimate this piece of information—a naïve model and a model used by Rob Neyer and Bill James based on Out Percentage. Additionally, to make the presentation more interesting, I decided to abandon sound statistics and add a third model, looking at all of the former Mets pitchers who have thrown no-hitters for other teams.
The primary question—how unlikely is it that the Mets have never thrown a no-hitter—can be answered by a very simple equation. You take the probability of not throwing a no-hitter (1-p) and raise it to the number of opportunities the Mets had to throw a no-hitter (g,) the number of games they have played. To calculate the probability of n no-hitters, the calculation becomes a little more complex: (1-p)^n * p^(g-n) * nCg, where nCg is the number of combinations of n no-hitters that could be thrown in g games (think back to high school math.)
The naïve model assumes that each start in the major leagues is equally likely to become a no-hitter. Between the birth of the Mets in 1962 and May 27th, 2012, there were 209,764 starts made by major-league pitchers, with 131 ending up as no-hitters. This gives us a p(no-hitter) of .000625.
While no-hitters always involve some amount of luck, they are not completely random. We should expect the Neyer-James model to be more precise than the naïve model. The logic behind their method is simple—you look at the two stats that matter for no-hitters—outs and hits allowed—and calculate the “out percentage” (outs / (outs + hits).) You then raise this out percentage to the 26th power. (There is an average of one out on the bases, which makes 26 more accurate than 27.) This gives us a better estimate of the probability of a no-hitter, since it captures the quality of the team’s pitching and defense and the effects of the stadium the games took place in. (Note: All numbers are from 1962 – 4/27/12.)
When you look at their out percentage, the fact that the Mets have never thrown a no-hitter becomes even more amazing. The Mets’ out percentage of .757 is the second highest over the past 50 season, trailing only that of the Dodgers. This means that the Mets were the second-most likely team to throw a no-hitter. This is not surprising, since the Mets have often featured great pitching and played in parks that favored pitching over offense. The new p(no-hitter) when using the Mets’ out percentage rises to .000718, an increase of about 15 percent.
We can now get to the heart of the matter and use our two probabilities to start answering some questions:
According to the naïve model, there is only about a .67% chance that a team with 8,008 starts would have yet to throw a no-hitter. When we customize this to the Mets, the probability is cut in half, with a .32% chance of no no-hitters in 8,008 games. That's just a little more likely than selecting any one of Juan Pierre's 7,660 career plate appearances at random and having it be one of his 16 career home runs.
Each model predicts five as the most likely number of no-hitters for the team. Additionally, to match the low probability of zero no-hitters, we’d have to go pretty far out on the distribution, with 11 no-hitters (.83%) being slightly more likely than zero by the naïve model and 13 no-hitters (.38%) being slightly more likely than zero by the “out percentage” model.
The third model, based on former Mets who have thrown a no-hitter for other teams, is not a statistically sound one, but it is probably the most fun of the three to discuss. For this model, I went on a pitcher-by-pitcher basis and calculated specific p(no-hitter) for each pitcher as p = (# of no-hitters/# of starts.) The most recent addition to this list, Philip Humber, is a perfect example of why this model doesn’t make sense statistically. Having thrown one no-hitter in only 36 career starts gives him an extremely high rate of no-hitters—one he is not likely to keep up for his career. In fact, one could use the Neyer-James model at the pitcher level and come up with a better estimation. There is one huge time-saving advantage to this weak model, however. It allows you to completely ignore any pitcher who never threw a no-hitter in his career, since their p(no-hitter) equals zero.
Once we have the p(no-hitter) for each of these pitchers, we can raise (1-p) to the number of starts he made for the Mets to find the probability that each pitcher would not have thrown a no-hitter for the Mets. If we then multiply all of these final values together, we can find the probability that none of these pitchers would have thrown a no-hitter for the Mets.
First, the pitchers who threw a no-hitter after leaving the Mets (note: Hideo Nomo pitched for the Mets in between his pair of no-hitters).
And the pitchers who threw a no-hitter before joining the Mets:
Multiplying all of these probabilities together gives us a 2.1% probability that none of these pitchers would have thrown a no-hitter for the Mets. By any of these three models, it is pretty surprising that the Mets have not thrown a no-hitter. The more accurate the model, the lower the probability becomes.
Armed with these numbers, I started putting my presentation together. Presenting for Mets fans ,I wanted to establish the right narrative, to try to avoid telling this as a negative story and to spin it as something amazing and unique about the Mets. I hoped that I could convince my fellow Mets fans that while it might be easier to celebrate the rareness of a no-hitter, we could also take some joy from the rareness of no no-hitters.
At 8,008 games, the Mets’ no no-hitter streak is the longest and the rarest in baseball, even without considering the more accurate Neyer-James model. In my mind, it has become part of the identity of Mets fans. When the Red Sox finally won the World Series after 86 years, their fans were obviously thrilled. Yet some Red Sox fans also feel like they lost a piece of their identity. The torturous wait helped fans bond together and became a key part of the experience of being a fan of the team. With no-hitters, the stakes are much lower (I shudder at the thought that any Mets fan would trade one of their two World Series victories for a no-hitter), and the probability of not throwing a no-hitter in 8,000 games is a factor of magnitude more unlikely than not winning the World Series in 100 years.
I thought that after presenting my data and writing this article, I would go back to wholeheartedly rooting for the Mets to throw a no-hitter, but now that it’s done, I’m not so sure. The Mets could throw a no-hitter tomorrow, and for a week, we’d have something to celebrate. We’d also lose a piece of our identity—a statistic and a feeling that makes the franchise and its fans unique. We’d become just another team with one no-hitter. I’m not sure I can get behind that.
That brings me to the last question I’ll address: How long would the Mets’ streak have to get before it became as unlikely as throwing a no-hitter in any individual game? Using the naïve model, the probability of no no-hitters in 11,801 games is about the same as the probability of a no-hitter in any one of those games. With the Neyer-James model for the Mets, this number drops to 10,079 games. With 8,008 games down, the Mets need to go 3,783 more games by the naïve model, or 2,071 more games by the Neyer-James model, to achieve what I like to call the no no-hitter challenge.
In a way, I’m hedging my bets. By anticipating 2,071 more games without throwing a no-hitter, I set myself up for success either way. Either they’ll continue their incredibly rare streak long enough to complete the challenge during the 2025 season, or they won’t, and I’ll get to celebrate the no-hitter itself. So, will I be rooting for Johan Santana, R.A. Dickey, or Jon Niese to finish it off next time one of them doesn’t allow a hit after seven innings?
Maybe if it’s also a perfect game (p = .00007.)