A great deal of attention, both here at our site and elsewhere, has been focused on the possibility that Derrek Lee will win the Triple Crown. For the most part, the tone of the analytical commentary has been skeptical. Winning the Triple Crown is difficult, the argument goes–just look at how long it’s been since it was last accomplished–and Lee isn’t the player to do it anyway.
I decided to perform a statistical analysis of this question to add a bit more, well, statistical analysis to the debate. But first let me make a quick assertion that will seem to contradict decades worth of history: winning the Triple Crown shouldn’t inherently be all that difficult. There’s nothing all that difficult about leading the league in batting average, home runs, or RBI. Of course only one guy gets the opportunity to lead the league, per category and per year, but it isn’t a feat on the order, say, of hitting .400 or in 56-straight games-or hitting 73 home runs.
The trick, of course, is managing to do all three things in the same season. That can indeed be fairly difficult in a league in which one player “dominates” a category–think Wade Boggs and batting average, or Mark McGwire and home runs. But there are few such players in the league today, especially with Barry Bonds on the shelf and Ichiro Suzuki off to a slow start. Most great hitters, in fact, tend to do nearly everything well–that is one of the hallmarks of the modern era. I actually think we’ve been somewhat “unlucky” not to have seen a Triple Crown winner since 1967–if you went back to 1967 and simulated what proceeded one million times, I think you’d wind up with at least a couple of Triple Crown winners in an typical trial.
So headline number one is that the Triple Crown is fairly realistic possibility this season But there’s a second headline too; Lee might not be the guy who is most likely to do it.
Let’s go to the computer tape.
What I did was to figure out the probabilities of various players leading the league in each of the three Triple Crown categories. We’re going to focus on the National League for now–I’ll touch on the American League briefly at the end. The procedure I applied was as follows:
- I took the top 15 players in each league in each category, through last night’s games. The only exception was Clint Barmes, who won’t qualify for the batting title and was excluded from the analysis.
- In order to estimate a player’s productivity for the balance of the season, I took a quick-and-dirty weighted average based on one-fourth of his rate to date on the season, and three-fourths of his preseason PECOTA weighted mean projection. Sean Casey, for example, has hit .322 on the year and had a PECOTA projection of .294; his “true” batting average under this method is assumed to be one-fourth of .322 plus three-fourths of .294, which works out to .301. It’s true that we could have come up with a more creative sort of method–but I don’t expect it would produce substantially different results.
- In terms of playing time, I took a strict linear projection based on the number of at bats to date, and the number of team games remaining.
- The balance of the season was then simulated 10,000 times, and combined with actual, year-to-date results to determine who the league leaders were.
Here are the results of that simulation for batting average:
NL Batting Crown Race Lee, D 55.1% Pujols 36.2% Cabrera 3.9% Abreu 1.7% Casey 1.5% … Field 1.8%
This is strictly a two-man race. Lee has a huge leg up on the field; he could hit “just” .290 or so and still wind up with the batting title. Still, there is some stiff competition in the form of Albert Pujols. We’re estimating that Pujols is going to hit .335 from this point onward, and Lee .304; there’s still plenty enough time in the season for there to be a fight to the finish.
By the way, I don’t think that estimate is too pessimistic for Lee. We’re talking about a 29-year-old who is a career .266 hitter. There has obviously been some substantial and fundamental improvement–but Lee has just as obviously been extremely lucky on top of that.
NL Home Run Race Pujols 32.1% Lee, D 29.9% Jones 20.3% Dunn 18.1% Glaus 6.1% Lee, C 3.1% Delgado 1.1% … Field 2.5%
This contest is a little bit more wide-open, but the same two players stand at the top of the chart. One important thing here is that both Pujols and Lee (as well as Andruw Jones) have tremendous health records; if Troy Glaus gets an owie, or Adam Dunn sits against a tough lefty, that gives the guy who is in the lineup every day a leg up.
NL RBI Race Pujols 56.8% Lee, D 23.9% Lee, C 18.8% Delgado 4.3% Burrell 1.3% … Field 1.0%
I wasn’t surprised to see Pujols in the lead here; he was the co-RBI leader in the last batch of our PECOTA projections, and is just five RBI behind Carlos Lee right now. But I was surprised to see him as far out in front of the field as he wound up. It helps tremendously that Pujols is hitting behind two excellent OBPs in David Eckstein and Larry Walker; PECOTAs RBI projections are sensitive to this sort of lineup analysis. Derrek Lee hasn’t been nearly as fortunate–in fact he’s hit with about 20 percent fewer runners on base than Pujols thus far this season. Carlos Lee has actually had a ton of RBI opportunities this year, more even than Pujols, but that’s likely to change once folks like Brady Clark and Jeff Cirillo come back to earth. Most importantly of all, at least until we have another eighteen months or so worth of evidence to the contrary, there’s still every reason to believe that Pujols is a substantially better hitter than either of the Lees.
Putting this all together is another trick. It’s clear that Pujols and Lee are the only viable candidates–a few others like Miguel Cabrera and Carlos Delgado rank in the top fifteen in each of three categories, but their chances of actually winning any of them, let alone all three of them, are pretty slim. But just what are Pujols and D-Lee’s chances of actually winning the Triple Crown?
One way of estimating this is simply by multiplying the possibilities of their winning each of the three categories together. But the way that I’ve done this analysis, the simulations for each category were done independently, when in fact the actual output in those departments is not. A home run, for example, not only helps a player toward the leaderboard in that department, but also produces at least one RBI, as well as a base hit. In layman’s terms, if a hitter gets “hot,” he’s going to benefit in all three categories at once.
So multiplying the probabilities together really represents the Lower Bound for a player’s Triple Crown chances. The Upper Bound, conversely, is represented by a player’s minimum probability in any of the three categories. If Carlos Delgado has a 4.3% chance of leading the league in RBI, then his chances of winning the Triple Crown cannot be greater than 4.3%, no matter how phenomenal his performance in the other departments.
Player BA HR RBI Lower Bound Upper Bound D. Lee 55.1% 29.9% 23.9% 3.9% 23.9% Pujols 36.2% 32.1% 56.8% 6.6% 32.1%
Pujols ranks higher than Lee in both Lower and Upper Bound; he’s actually the better Triple Crown candidate.
Those are some pretty broad ranges between the bounds, however, so it would be nice to know just how to interpret them. I decided to tackle this by taking the geometric mean of the two bounds. This produces a slightly more conservative estimate than the traditional, arithmetic mean. Does that make it right? I don’t know, but it “feels” about right to me, and I’m absolutely exhausted from preparing for a cross-town move in an apartment with busted air conditioning, so we’re going with it. Using the geometric mean, we estimate that:
- Lee has about a 10% chance of winning the Triple Crown this year, and
- Pujols has about a 15% chance of winning the Triple Crown.
So there could be as much as a 25% chance that a National League player wins the Triple Crown this season. It’s going to make for some exciting stuff, especially for those of us in the Midwest.
A couple of other footnotes.
- There was some talk on the internal BP chatter list about whether D-Lee is more likely to win the Triple Crown, or to hit .400. This one isn’t even close. Assuming that his “true” batting average is .304, Lee hit .400 or better less than once per 10,000 simulations. Even if we increase his “true” batting average to .350, he’s still fighting an uphill battle, and would hit .400 only about one time in 100.
- The “average” league-leading totals in each category are a .347 batting average, 47 home runs, and 135 RBI. So those are the figures that Pujols and Lee will need to shoot for.
Finally, let’s run very briefly through the American League; it turns out the results aren’t nearly as interesting here.
AL Batting Crown Race Guerrero 45.1% Roberts 26.8% Damon 11.5% Rodriguez 5.5% Young 2.8% Tejada 2.7% Anderson 1.4% Ichiro 1.4% … Field 3.0%
I’ve cheated a bit and included Ichiro in the mix, even though he was a couple of points shy of the top fifteen at night’s end. The best Triple Crown candidate is Alex Rodriguez, but he hasn’t been a huge batting average hitter for a few years now, doesn’t play in a favorable ballpark, and is facing some tough competition in the form of folks like Vladimir Guerrero and Brian Roberts.
AL Home Run Race Teixeira 33.5% Rodriguez 27.0% Ortiz 19.2% Ramirez 16.0% Soriano 9.3% Tejada 3.6% Konerko 3.3% Sexson 1.9% … Field 0.5%
A-Rod does better here, but again there’s a lot of competition.
AL RBI Race Ortiz 55.3% Ramirez 27.2% Teixeira 16.0% Rodriguez 6.6% Tejada 1.4% … Field 0.4%
PECOTA really likes the Red Sox guys, and it’s auspicious that the Boston offense has really started to get on track over the past couple of weeks. It might have the Red Sox sluggers in the wrong order, since the preseason lineup analysis assumed that David Ortiz would hit cleanup, benefiting from Manny Ramirez‘ high OBP, rather than the other way around. In fact, Terry Francona has settled on Ortiz in the three-hole instead, but that isn’t necessarily much better news for Rodriguez.
BA HR RBI Lower Bound Upper Bound Rodriguez 5.5% 27.0% 6.6% 0.1% 5.5%
Taking the geometric mean gives Rodriguez Triple Crown chances of about 0.7 percent. Miguel Tejada ranks next, but rates as only about a 1-in-500 longshot.