August 26, 2013
Baseball: Like Nothing You've Ever Seen Before
They say that if you stick around long enough at a baseball game, you'll see something that you've never seen before. Something odd and surprising like a double play where the catcher records both outs at third base or an invisible home run or Teddy Roosevelt winning the President's race. While I was thinking about that particular baseball axiom, another question popped into my head. If I stick around long enough at a baseball game, would I see something that had happened before?
The phrase "something that has happened before" in the context of baseball might seem a little strange. After all, there will be plenty of strikeouts and walks and singles and routine grounders to third each night. Each game is composed of the same small suite of basic events that, when added together, produce a final score. But I got to wondering the following: Have there ever been two games that have unfolded in exactly the same manner?
Here it becomes an issue of resolution. There have been plenty of 5-4 games in baseball history, so if our only notation of a baseball game is the final score, then we run into the problem of not having very many unique events. Since 1901 (through 2012), the most common final score of a game has been 3 to 2 (10,507 times), followed by 4 to 3 for the home team (10,106 times).
Still, unique final scores pop up every now and then. In 2010, Milwaukee and Pittsburgh played the first 20-0 game in MLB history. As you might imagine, all of the unique scores involve at least one team putting up at least 18 runs (an 18-13 game between the Cardinals and Phillies in 1932 holds the record for the lowest number of runs by a winning team in a unique game.) There have been 41 games in MLB since 1901 whose final score has never been duplicated. Baseball trivia junkies love that kind of stuff.
Let's see what happens when we drill down one more level to see whether we can find a game where the line score (the score by innings, both home and visitor) has never happened again. For some reference, the most common line score since 1901 has actually been a 1-0 game with the home team scoring the lone run in the bottom of the ninth (274 times), followed by a 1-0 game in which the home team’s run scores in the first inning (248 times). What may surprise you is that line scores are actually more likely to be unique than duplicative. There are a lot of line scores that have happened exactly once in modern baseball history.
It's not even just exotic extra extra inning #weirdbaseball games or games that finish 30-3 (S - Wes Littleton, 1) that have a unique 18-digit code. On the last day of the 2012 season (October 3rd), nine of the games played featured line scores that had never been seen before, including a seemingly innocuous 5-1 victory by the home-team Dodgers over the visiting Giants. The Giants scored their only run in the fourth inning, while the Dodgers tallied one in the fifth, 1 in the sixth, and put it away with three in the eighth. That combination had never happened before. Ever. Seriously.
(Gory math: It's not all that shocking when you think about it. Consider a universe in which teams are able to score either nothing or one run, with no multi-run innings. In this world, there are more than 250,000 (2 ^ 18 = 262,144) possible line score outcomes, and there were only 182,759 games played from 1901-2012. And that's before we allow that two runs could score in an inning. Or three.)
In fact, just looking at line scores, the majority (70.4 percent) of games since 1901 have featured line score combinations (home and visitor) that have never happened before or since. Lest you think all the good ones are taken, in 2012, 73.7 percent of all games played had a historically unique line score combination. If you go to a baseball game and all you do is watch the line score as it slowly reveals itself on the scoreboard, you are still more than likely to see something that has never been done before in a major-league game. We should be more impressed when a game features a line score that has happened before!
Suddenly, my chances of success in finding a game that had an identical twin (no, not you, Ozzie Canseco) look rather slim. But, undaunted, I charged forward. I loaded up my Retrosheet database that holds all events from 1950-2012. I looked to see whether there was any pair of games in that database that had even the first 20 Retrosheet event codes the same.
This is a really low bar. For example, Retrosheet codes all outs on balls in play as a "2." In this case, a grounder to third was considered the same as a fly ball to center. There are codes for some events, such as stolen bases and passed balls, but a single where a runner is thrown out on the bases gets the same code (20) as a plain old single where everyone just moves up a base and the same as when someone goes first to third on that single. It also doesn't matter whether the single went to left or center or right. There were no games that matched after 20 events, so I backed up a bit.
When I tried to get a little more specific by requiring that the balls in play actually match in terms of the type of batted ball and to whom they were hit (Retrosheet sometimes lacks data on the subject, especially going further back), my ability to find a game that even started the same as another faded quickly.
There are only two pairs of games (a 1979 game between the Pirates and the Giants, and a 2010 game between the Cardinals and the Rockies; a 1977 game between the Padres and the Mets and a 2004 game between the Dodgers and the Cubs) that match even through eight events. They both stop matching in the top of the second inning.
Well, if I wasn't going to get much past the first inning in finding two games that twinned each other, maybe I could at least find a couple of half-innings (no matter where they happened in the game) that have mirrored each other. This actually isn't much of a challenge. In 2012, there were 491 separate half-innings in which all three hitters struck out. The five most common half-innings:
What if I required that not only would the outcomes of the plate appearances need to match, but also that the pitch sequence matched. Could I find it?
It turns out that in 2012, there were two half-innings that matched up with one another (out of 43,477!). On August 21st, the bottom of the third inning of a Yankees-White Sox game matched up beautifully with the top of the sixth of a June 17th game between the Astros and Rangers.
In 2011, there was a similar match, but none in 2010 or 2009. From 1993 to 2012 (the years for which Retrosheet has consistent pitch sequence data), there have been 361 half-innings that have been repeated at least once (out of 822,842 half-innings played during those years). The most commonly played half inning happened four times and involved a first pitch 6-3 groundout by the first batter, a ball and another 6-3 grounder, and then another 1-0 count resulting in a 5-3 grounder. There were 10 half-innings that had been played three times each, and 350 matching pairs. A full match, even at the half-inning level, doesn't happen very often.
Billions and Billions
Maybe the order in which things happen actually matters for something. Maybe it doesn't (we've never really asked the question, have we?) If there is a greater lesson to be learned, it's that while we think we have a lot of data on the game of baseball, what we really have is a small subset of what might be, and most of those potential outcomes we have only single case studies on. The astronomer Carl Sagan was fond of expressing his wonderment at the universe, pointing out that there were billions upon billions of stars in the universe and that we have studied only a few of them. How little we know about the universe and how wondrous it is! It's like that with baseball. We think we know a lot, and that we've seen it all, but there's a lot more to discover. In fact, if you watch a baseball game, chances are good that you really will see something that no one has seen before.