May 18, 2005
Lies, Damned Lies
Can A-Rod and Pujols Beat Aaron's Record?
Joe Sheehan asked me, prior to his appearance to discuss Barry Bonds' future on ESPN's Outside the Lines last night, if I had any way to estimate the chance that Albert Pujols and Alex Rodriguez will break Hank Aaron's home run record. In other words, a version of Bill James' Favorite Toy, no doubt to be inspired in some large part by PECOTA.
It's worth mentioning something before we proceed further. Though the Favorite Toy is one of James' more popular and accessible inventions, it has not to my knowledge been validated empirically. That is, while it produces some answers that look about right and can spark some lively barroom discussions, we have no way of knowing whether it is accurate. My guess, actually, is that the Favorite Toy tends to overestimate the chance that a certain record will be surpassed, mostly because it doesn't account for the way in which problematic events in a player's career path tend to snowball. In other words, the Favorite Toy might estimate that say Ivan Rodriguez has a break-even chance of reaching 3,000 hits, based on an assumption that he will play about seven more seasons and average 140 hits per year (which awould give him 3,031). The problem is that, if Rodriguez only gets say 90 hits in 2007, that likely indicates that something has gone seriously wrong with him (probably an injury), and would radically reduce his projection for future seasons. But if Rodriguez had a good year in 2007 and had say 170 hits, it would probably not substantially increase our estimate of his productivity in the years beyond that, as he'd still be on the wrong side of the aging curve.
A potentially more accurate way to go about estimating a player's chances of breaking a certain record is to examine comparable players, which is exactly what PECOTA does. Of course it will require some care to do this properly, but intuitively it seems reasonable that, if we can identify a certain number of similar players, and a certain number of those players ended their careers favorably, then the player in question has about that likelihood of ending his career favorably.
PECOTA uses as many as 100 comparable players in order to form its estimates. For purposes of this exercise, we will restrict things to the top 20 comparables, as listed on Rodriguez's and Pujols' PECOTA cards. Here, for example, are A-Rod's best 20 comparables:
1. Dale Murphy 2. Mike Schmidt 3. Tony Perez 4. Sal Bando 5. Johnny Bench 6. Frank Robinson 7. Dave Winfield 8. Ken Boyer 9. Bobby Bonds 10. Gil Hodges 11. Pedro Guerrero 12. Eddie Murray 13. Doug DeCinces 14. Chet Lemon 15. Vern Stephens 16. Eddie Mathews 17. Dick Allen 18. Reggie Smith 19. Richie Sexson 20. Rocky ColavitoIntuitively you will recognize right away that comparables like Mike Schmidt and Dave Winfield--players who were productive well into their 30s--are favorable, while others like Pedro Guerrero and Dale Murphy are unfavorable. You may also have some questions about some of the comparables further down the list, and it's worth noting that the PECOTA comparables are motivated mostly by performance during a three-season period, and not over a player's entire career. If the goal of PECOTA were to produce career forecasts, rather than single-season forecasts, I might have done things a bit differently, and that is a limiting factor here.
Let's use Winfield as our example and think about how we might use his career to make some inferences about how Rodriguez is going to perform in the remainder of his. Winfield hit a raw total of 311 homers from age 29 onward. If we combine that number with the 381 that Rodriguez hit through age 28, we come up with 692, a fair bit short of Aaron. But this math underestimates the production implied for Rodriguez for a couple of reasons:
The way to get around these problems is not to compare Winfield's production to Rodriguez's directly, but rather to compare Winfield's production to Winfield's. So for example we might estimate that, based on his career path to date and the run-scoring context that he was performing in, Winfield would hit 32 home runs at age 30. In fact he hit 37, or about 16% more than our prediction for him. The implication is that, if Winfield's career path tells us something about Rodriguez's, then Rodriguez would also hit about 16% more home runs than our estimate for Rodriguez's productivity at age 30.
So if we predicted, based on his previous performance, that Rodriguez would hit 40 home runs at age 30, his implied productivity from Winfield's career is 16% more than 40, or about 46 homers. Notice that this resolves some of the problem from having somewhat weaker players like Doug DeCinces listed as comparables for Rodriguez. PECOTA is not really saying that Rodriguez is going to perform like Doug DeCinces going forward. Rather, it's saying that, if DeCinces tended to outperform a reasonable baseline prediction of DeCinces' production, then he's just similar enough to Rodriguez that we can infer that Rodriguez is also going to outperform his baseline. This is a subtle distinction, but an important one, and one of the backbones of PECOTA.
Back to Winfield. The following is the number of home runs that Winfield actually hit in each season from age 29 onward, as well as the number of implied Rodriguez home runs based on the method explained above.
Year Age Actual HR Implied A-Rod HR 1981 29 13 37 1982 30 37 55 1983 31 32 52 1984 32 19 30 1985 33 26 38 1986 34 24 33 1987 35 27 32 1988 36 25 37 1989 37 0 0 1990 38 21 31 1991 39 28 39 1992 40 26 39 1993 41 21 31 1994 42 10 18 1995 43 2 4 Subtotal 311 476 A-Rod Career thru '04 382 Total 858From Winfield's career path, we infer that Rodriguez will hit 476 more home runs, giving him 858 total, smashing Aaron's record. Of course, Winfield's late career went very favorably, even compared to other great players. By the way, if the numbers for 1981 look funny, it's for good reason. We're giving Rodriguez credit for the 13 home runs he's hit on the season to date, which is somewhat higher than the rate that we'd otherwise estimated for him. Also, PECOTA adjusts everything upward to a 162-game schedule, so Winfield (and Rodriguez) are not punished for the strike year in 1981. Of course, this may be a dubious assumption, at least until a new CBA is signed.
Here is what we get when we perform the same analysis for each of Rodriguez 's top 20 comparables, excluding Richie Sexson, who is obviously pretty far from finishing his career.
Player Implied A-Rod Career HR Mike Schmidt 905 Dave Winfield 862 Frank Robinson 831 Tony Perez 776 Eddie Murray 769 Doug DeCinces 671 Ken Boyer 631 Reggie Smith 630 Sal Bando 621 Gil Hodges 613 Johnny Bench 600 Dale Murphy 588 Eddie Mathews 587 Rocky Colavito 571 Dick Allen 570 Bobby Bonds 565 Chet Lemon 536 Pedro Guerrero 531 Vern Stephens 492Five of the 19 comparables, or 26%, have an implied HR total that surpasses Aaron's mark of 755. So we estimate that Rodriguez has about a one-in-four chance of eventually topping Aaron, or perhaps a bit higher if we want to give extra credit to the players who rank higher on A-Rod's comparables list, as PECOTA does in producing its forecasts.
We can use a similar method in order to achieve a "weighted mean" estimate for Rodriguez' HR production in each remaining season of his career (for these purposes, we can put Sexson back in the mix, accounting for his production at age 29 but excluding him from the average thereafter). This won't tell us anything about probability, but it does give us a good over/under number to shoot for:
A-Rod Weighted Mean Projection Year Age Implied HR 2005 29 40 2006 30 39 2007 31 42 2008 32 33 2009 33 29 2010 34 24 2011 35 17 2012 36 15 2013 37 11 2014 38 8 2015 39 5 2016 40 3 2017 41 1 Subtotal 266 Career Total 647So a good over/under looks to be about 650 home runs for Rodriguez. For what it's worth, I'd take the over, especially considering that players are lasting longer nowadays than they used to.
Now, running the same calculations for Pujols:
Player Implied Pujols Career HR Hank Aaron 907 Frank Robinson 783 Eddie Murray 761 Carlton Fisk 733 Mark McGwire 657 Jack Clark 557 Jose Canseco 544 John Olerud 515 Rico Carty 508 Rocky Colavito 499 Johnny Bench 466 John Mayberry 464 Will Clark 460 Jeff Burroughs 351 Bob Robertson 263 Ron Blomberg 218Three of the 16 comparables, or about 19%, pass the 755 threshold, while one other comes very close (again, we're excluding players like Manny Ramirez who have a lot of their careers left to play). Note that the variance on these estimates is a little higher, which is natural since Pujols is at an earlier stage of his career; it is easy to forget just how good Jeff Burroughs was in the early going.
Pujols Weighted Mean Projection Year Age Implied HR 2005 25 38 2006 26 39 2007 27 33 2008 28 33 2009 29 30 2010 30 29 2011 31 29 2012 32 27 2013 33 24 2014 34 26 2015 35 22 2016 36 17 2017 37 17 2018 38 9 2019 39 8 2020 40 6 2021 41 3 2022 42 4 2023 43 2 Subtotal 395 Career Total 555This is a career path at least something in the spirit of Aaron's, with excellent consistency down the line, which has been a hallmark of Pujols' career to date. But there are also enough comparables whose careers ended unfavorably that Pujols' estimate begins to tail off quite a bit after about age 34, especially as Pujols does not run well and is not an especially good "athlete." This point is underscored by looking at the chart below, which compares Aaron's actual production to the weighted-mean projections for Rodriguez and Pujols.
It is a bit incomplete to say that Aaron's consistency was the reason for his success. More specifically, it was Aaron's consistency late in his career that allowed him to break Ruth's record. Rodriguez has a substantial head start on Aaron at present, and we project that he'll remain ahead of Aaron's pace up through about age 37. However, a sampling of his comparables reveals that A-Rod has a good chance to hit the wall at right about that point, just as Mark McGwire did. Being a productive player at age 37 is in many ways much harder than being a superstar at age 27; that's why Aaron was so remarkable.