August 2, 2006
The Odds of Chase Utley Catching DiMaggio
So what are Chase Utley's chances of getting his hitting streak to 56 games?
Short answer for the really impatient: 1 in 194.
How I got there:
Hitting streaks are notoriously difficult beasts for a statistician to deal with, even though they seem to be a simple application of probability. You would think that it would work like this: take some representation of his ability to get a hit, say his current average of .328, and then invert it to .672 (1 minus his average), his chance of not getting a hit in any AB. If he gets four ABs a game, his chance of not getting a hit in any game is just .672 raised to the fourth power, or .204. You have to reverse it again to get .796 (his chance of getting a hit in a game), and raise that to the number of games needed--currently 23--so you have .796 raised to the 23rd power, or 0.05%, about 1 chance in 190.
The big problem in this approach is that he doesn't get 4 AB every game, and the permutations of juggling 3 AB games and 5 AB games quickly grow into a nightmare. I decided to take a shortcut by modeling Utley's chances, in almost exactly the way I model the rest of the season in the playoff odds reports. A simulation can build the probabilities by trial and error and good counting skills, rather than by solving an exact equation, although the simple off the cuff method did exceptionally well.
The model starts with a game counter. For each game, a random number will tell us how many AB Utley gets today, based on his distribution this season. Here's that list:
AB G ------ 1 2 2 2 3 18 4 51 5 26 6 5In the two games with one AB, Utley was a pinch-hitter, which we can safely assume is not going to happen while his streak is on the line. The games with 2 AB, however, were fully legitimate games. I'm going to mirror the distribution here, ignoring the 1 AB games; however, there is another small problem which I'm going to mention and then ignore. The Phillies no longer have Bobby Abreu and his .427 OBP; in his place they have David Dellucci, who's more like a .360 OBP guy. That should cost the Phillies about .4 plate appearances per game, and that's 23 plate appearances between now and the end of the year. If last night's game is part of a trend with Utley moving from a mostly #2 hitter to #3 as a further consequence of the move, then he personally stands to lose about 8 ABs from the distribution presented above.
Utley's batting average is .328. That's for this season only; for his career he's at .292. He's hitting .344 at home and .310 on the road, .357 vs lefties and .315 vs righties, and if I dug in enough I could tell you what his average was on Wednesdays but I don't really care. The point is that his average on any given day is probably not .328, but somewhere between about .250 and .400 depending on who's pitching, who's umpiring, where he's playing, and the weather conditions that day, not to mention things like whether his breakfast is agreeing with him or not. This sounds pointless, but it isn't--streaks, which by definition end with the weakest link, are enormously sensitive to anything that lowers the probability, even for a day. A steady 4 AB a game produces better odds than a mix of 3-4-5 AB games, assuming the 3 and 5 are equal, because the 3 AB games hurt your chances a lot more than the 5 AB game helps them. Likewise with average: a steady .328 average will produce better odds of building a long streak than any normal distribution around it. Of course, I wrote this up while running the simulation, and as it turns out it really didn't make that much difference; the difference between running at a steady .328 batting average and a .328 plus or minus 100 points reduced his chances by less than 5%, so this is another thing that I'm going to mention and then ignore.
There's also the likelihood that increasing scrutiny as he goes along raises the pressure and hurts his ability to play. I have no idea how to model hair falling out from stress and what it does to one's batting average, but consider this: if we simply say that the stress will cost his batting average a point a game (so that at 23 games it will be down to .305), his chances of reaching 56 games drops by a third. Drop his average by 2 points a game and it loses another third (or about 44% of the original odds).
So, then, using a simple .328 average, along with Utley's current distribution of ABs per game, here's how many times he had a streak of N games or longer in a million trials. He comes in with a 33-game streak, so 23 is the magic number to tie and 24 is the number to beat DiMaggio:
1 797735 Utley's streak ends tonight in 20.2% of the simulations. 2 635216 3 505901 4 402784 5 320310 6 255260 7 202923 20% chance of reaching 40 games. 8 161574 9 128797 10 102361 11 81611 8% chance of reaching Rose and Keeler at 44. 12 65013 13 51736 14 41081 15 32719 16 25957 17 20693 Fifty game streak, 2% chance. 18 16461 19 13142 20 10463 21 8288 22 6537 23 5166 1 in 194 chance of reaching DiMaggio 24 4105 1 in 244 of passing him 25 3302 26 2618 27 2086 60 game streak, .2%. Notice we're losing two orders of magnitude every 10 games. 28 1654 29 1295 30 1022 31 836 32 661 33 518 34 413 35 320 36 254 37 206 Seventy game streak, .02%. 38 155 39 122 40 102 41 83 42 67 43 58 44 52 45 46 46 33 47 27 48 19 49 16 50 11 51 9 52 7 53 5 54 4 55 3 56 3 57 1 One in a million of finishing the year with the streak intact, or 90 games.Special thanks to BP reader Timo Seppa for suggesting the column idea.