Having played the first half of his career before the Second World War, Joe DiMaggio is not eligible to be on Albert Pujols‘ PECOTA comparables list. However, there’s little doubt that the Yankee Clipper would place high atop the table if he had been born just 10 years later. The similarity scores at listed the pair as the best age-based likenesses for one another entering the season, and the events of this year are only likely to enhance the comparison.

DiMaggio won his first batting title and his first MVP award in 1939–at age 24, he was one year older than Pujols is currently listed. DiMaggio, unlike Pujols, had been heralded as a top prospect from the time he was a teenager playing in the PCL, and was coming off of a fine triplet of seasons in the big leagues. But 1939 was his coming out party, much like 2003 has been for Pujols.

Conveniently enough, DiMaggio, limited by a foot injury that he suffered in April, played in just 120 games that season, almost exactly the total that Pujols has accumulated up until now. Compare DiMaggio’s ’39 against Pujols’ current campaign, and the similarities are striking:

      G   AB    R    H  2B  3B  HR  RBI  BB  SO  SB  AVG  OBP  SLG  EqA
JD  120  462  108  176  32   6  30  126  52  20   3 .381 .448 .671 .358
AP  122  461  108  171  43   1  34  108  53  46   2 .371 .438 .690 .365

Certainly, there are some differences: Pujols has displayed a bit more power, although it has come in an era in which balls leave the yard more frequently. DiMaggio, legendary for his ability to make contact, had fewer strikeouts. But considering that Pujols plays in a league environment in which strikeouts are nearly twice as common as in Joe D’s day, his record in that department is no less impressive. DiMaggio had a few more RBIs–but hitting immediately behind Red Rolfe (.404 OBP) and Charlie Keller (.447 OBP), he also had a few more opportunities.

DiMaggio had one tool that Pujols lacks, however: speed, which shows up not in his stolen bases (DiMaggio was a 77% basestealer for his career and could have stolen more bags if he’d wanted to) but in his triples and graceful defense in center field. Pujols, on the other hand, has demonstrated a propensity to stay healthy that Joe D never did, something that bodes well for his continued development.

Mostly, though, it’s the similarities that stand out, right down to their uniform numbers (No. 5), their temperaments (introspective, but without being sullen), and their busty female companions.

And then there’s the small matter of The Streak. FIFTY-SIX is one of those sublime numbers that needs no introduction to a baseball fan. Pujols, entering tonight, is at a humdrum 30, a threshold reached by such luminaries as Ken Landreaux and Jerome Walton.

But except for Ichiro Suzuki, there is likely no player more qualified than Pujols to sustain his hitting streak in today’s game. He’s a lifetime .335 hitter who doesn’t strikeout or walk very much (in his case, that’s not meant as a criticism). He plays in a good lineup that gets him to the plate fairly often. He makes adjustments quickly, and doesn’t display any particular weaknesses against any particular type of pitchers:

Against          AVG
vs. RHP		.365
vs. LHP		.391
vs. Power	.376
vs. Finesse	.448
vs. Groundball	.404
vs. Flyball	.406	

Another strength: Pujols’ play early in the year, when he continued to hit well despite battling an elbow injury. This suggests that he might be less vulnerable to day-to-day tweaks and bruises than an ordinary player, something important since the Cardinals–locked in a three-way pennant race–are not likely to give him another day off unless the league requires them to.

So what are the odds that he can hit in the next 26 straight and catch Joe D?

The customary way to estimate a player’s likelihood of maintaining a hitting streak is a straightforward application from Probability 101, and can be figured from a player’s games, hits, and at bats. The idea is, first, to determine what a player’s odds are to have at least one hit on any particular day, and second, to extrapolate that calculation over as many days as you’re looking for the streak to stay intact.

Odds of streak being broken = (1 - (H/AB)) ^ (AB/G)

In the equation above, H/AB is, of course, batting average. The probability that a player doesn’t get a hit in a certain at bat is one less his batting average. Take that amount to the power representing the number of at-bats that a player is likely to have in a given game, and you come up with the probability that a player goes hitless on the day, his streak broken.

These are Pujols’ odds based on his 2003 numbers to date. The .371 batting average he’s hit for on the season is above his career norm, but below what he’s hit for during the course of the streak itself, so it seems like a reasonable number to use.

Pujols' odds of streak being broken = (1 - (171/461)) ^ (461/122)
				    = (1 - (.371)) ^ (3.78)
				    = .174

That comes out to a 17.4% chance that Pujols’ streak is broken on any given day.

Odds of maintaining streak for one day = 1 - ((1 - (H/AB)) ^ (AB/G))
				       = 1 - .174
				       = .826

Conversely, Pujols has an 82.6% chance to maintain the streak for one day.

Odds of maintaining streak for n days = (1 - ((1 - (H/AB)) ^ (AB/G))) ^ n
			              = (.826) ^ 26
                                      = .0070

In this case, of course, we’re primarily concerned with Pujols’ ability to hit for 26 more days consecutively. The odds of that occurring, according to the formula above, are slightly better than 0.7%, or 1-in-142. For the sake of reference, those are better odds than Gary Coleman has of winning the California recall, but worse than the chance that you’ll see the Madonna-Missy Elliot Gap Jeans commercial at least twice in any given 15 minute span of television watching. (The odds that Pujols hits in 14 more games consecutively, to match the NL record held by some guy named–help me here–Peter Tulip are considerably better: about 1-in-14.)

Still a long shot, no doubt, but it’s a testament to Pujols’ greatness that the feat is even within the realm of possibility. If he were a .300 hitter, rather than a .370 hitter, he’d face considerably longer odds–about 1-in-2500.

In fact, we might be underestimating Pujols a little bit. He’s averaged fewer than 3.8 at bats per game, but that’s partly because he was sitting on the bench a lot when he was hurt early in the year, appearing only as a pinch hitter, or pulled from the game for a defensive replacement (Pujols’ elbow problem made him a gimpy thrower) as soon as the Cardinals took the lead. Remove the games in which Pujols received fewer than three plate appearances–the minimum that a player in the starting lineup is guaranteed–and his AB/G rises to just above 4.0. His odds of matching DiMaggio’s streak increase correspondingly, to around 1-in-80, or about the odds that you sat through Gigli without walking out.

Then again, it’s also possible that we’re being too generous. The formulas above assume that the outcome of any one at-bat is independent from any other, but that’s not really the case. Things that even out over the course of a season are not likely to even out in any particular game. If you struck out in your first at bat because you’re facing Pedro Martinez in a twinight start at Safeco Field, the same conditions are going to drag down your likelihood of getting a hit in your second AB too. Moreover, when conditions are unfavorable to hitting, not only are batting averages reduced, but also is the number of at-bats that each player receives, as a team won’t cycle through its lineup as many times: a player trying to maintain a hitting streak is denied extra at-bats on the very days that he is likely to need them most.

But that’s not really the point. We could jump through more mathematical hurdles to try and refine our estimate, but however you slice it, it’s still too early to say that Pujols has better than a marginal shot at DiMaggio’s hitting streak.

It isn’t too early, however, to say that Pujols embodies much of DiMaggio’s greatness. If the most famous of all baseball records is to fall, Pujols will be a worthy successor.

Thanks to Mark Armour for helping to compile the historical information for this piece.