Like many baseball fans, I was taken aback by Derek Jeter’s declaration of his impending retirement. Together with Roy Halladay’s somewhat quieter conference and Mariano Rivera’s farewell tour, a trio of players I admired tremendously will soon be out of baseball. It’s a bittersweet fact of the big leagues that just as one generation of transcendent superstars is born (e.g. Mike Trout, Bryce Harper), another trails off gently into retirement.
The question of career length has more than emotional import within the business of baseball. Owing to the advanced age at which many players are hitting the market, for many free agent contracts there is substantial risk that a player’s useful career will end before his deal does. Even when a player does remain employed late into his 30s, between injuries and the tireless decline of aging, the terminal years of his career can be unproductive.
We usually view the question of when a given player’s career will end as largely unforeseeable. One twisted swing of the bat or one wrong pitch can cause catastrophic, career-destroying damage. After Halladay’s magnificent 2011 (arguably the best season of his illustrious career), and his relatively healthy history to that point, who could have foreseen that a string of seemingly innocuous strains would bring about the end of his career?
Because of the randomness of injuries and the overarching shadow of survivor bias, studying career length is a statistically thorny issue. To overcome these obstacles, I’ll employ a special type of regression model called a Cox Proportional Hazards Model. This kind of model was built specifically to estimate how survival varies as a function of different medical treatments; it’s the kind of tool medical researchers use to understand whether patients on a drug live longer than those given a placebo. The analysis is therefore somewhat macabre, if also (and more importantly) technically accurate.
Indicators and Myths
The first factor I wanted to investigate was how career lengths might differ depending on the dominant position a player fielded. I was inspired to look into this by the collection of heuristics on the subject I have seen repeated in baseball circles: that second basemen and catchers fall off in their early 30s, and that first basemen last longer.
Some methodology, should you be interested or wish to confirm my results: I used Sean Lahman’s player database to extract career lengths for more than 10,000 players, and I limited my sample to careers beginning post-1900. I linked these records to fielding data and extracted the position at which a player had logged the most innings as his dominant position. Then I fit the survival model.
The first thing I examined was how pitchers age relative to position players:
This graph charts what percentage of players (y-axis) are still playing at various points in their career (x-axis). It looks like a flight of stairs because some fraction of players are lost every year, represented by the height of each step. So for instance, we can see that 20-25 percent of players play in only a single season, which is represented by the initial drop from 1 to ~.75. I imagine that this pool of players is composed primarily of roster-filling Triple-A types: replacement-level players who never have much of a chance to make it in the majors.
If you’ve wrapped your head around this graph, then it should (hopefully) be obvious that pitchers, represented by the blue line, consistently show lower survival percentages at the same ages than position players (the red line). The model confirms that statistically, pitchers are much more likely to have their careers end prematurely relative to position players. That notion comports well with what we know about pitching injuries. Jeff Zimmerman and Josh Kalk’s excellent work has shown that a pitcher has something like a 40 percent chance of going on the disabled list in a given year, and any one of those DL trips can spell doom (just ask Halladay).
Pitchers have shorter careers: this much we already knew. Let’s now turn specifically to the position players and look within this heterogeneous mix. For the following analysis, I restricted the sample to the players who played in at least two consecutive seasons, as I think that pool is of broader interest than the type of roster fillers who lasted only a single year.
The Hazard Ratio represents the model’s estimate of how likely a player of a given position is to end his career prematurely. It’s best considered in comparison to a ratio of one, which is (conveniently enough) almost exactly the estimate for corner outfielders. If the Hazard Ratio is above one, that means that the player is more likely to be removed from MLB than a corner outfielder; if less than one, less likely. I’ve bolded the positions that show the most significant deviations from hazard ratios of one. This data is perhaps better understood in the context of a graph, as follows.
Rather than plot nine overlapping and undecipherable lines, I’ve condensed it to the five positions with the most interesting patterns. The most significant effect belongs to Designated Hitters (grey), who are almost 40 percent more likely to end their career in a given time span than corner outfielders. Conversely, the players with the least risk (or the most protection) are two defensive standouts, catchers (blue) and shortstops (purple). In fact, looking over the table, you might notice a concordance between a given position’s hazard ratio and that position’s standing on the defensive spectrum.
In some ways, I find the idea that shortstops and catchers are most protected counterintuitive. Both positions are athletically demanding, albeit in different ways, and catching, in particular, is notoriously hard on the body. The flip side of these demanding defensive positions, however, is that players who spend several years at them can always move down the defensive spectrum and still contribute positive value.
We’ve seen this most recently with Joe Mauer’s journey to first base. Even though he’s no longer a catcher, he will be able to contribute via his thundering bat for some years, and that will prolong his career. By the same token, fading shortstops can often be solid third basemen or second basemen for some time after. Conversely, at the other end of the defensive value spectrum, first basemen too unhealthy to play the field have only one position left open to them, designated hitter, and you have to be a pretty solid hitter to stick there. And many designated hitters move to that position because of injury troubles, so it makes sense that they are the players most likely to drop out of MLB.
The Risk of Free Agency
A cool aspect of this statistical model is that I can cut out all of the players who made it to a certain service time and then look at their risk only after that service time. Since I opened this article by referring specifically to the survival risk inherent in big free agency contracts, I figured I’d re-run the analysis but restrict myself to the players who had already accrued enough service time to hit free agency. Generally, special circumstances notwithstanding, this number is six years, and I tacked on an additional year to account for the “cup of coffee” year many prospects get (which doesn’t count for service time, but does in my model). Rerunning the analysis, we arrive at this:
By limiting ourselves to only the players who stuck around for seven or more years, we drastically reduce the sample size (from thousands per position to a few hundred for each). Strikingly, in this more select sample, none of the hazard ratios differ significantly from one. With that said, I don’t want to get too hung up on p-values because there are two interesting cases here: second basemen, who are oh-so-close to the magic threshold (.05); and designated hitters, who once again lead the pack in early retirement.
Second basemen, of course, are notoriously short-lived in the league, or so the received wisdom goes. I went ahead and re-ran the model for years 8+, 9+, and 10+, and in each case, second basemen came out as having significantly higher hazard ratios, in the range of 1.15-1.2 (that is, a 15-20 percent increased risk of leaving MLB relative to the other players). This ratio is considerable, given the huge amount of money recently given to a certain free agent second baseman (I’ll return to this point). As a side note, although my analysis can’t speak to the reasons underlying the accelerated attrition of second basemen, the usual explanation given is related to the difficulty and strain of turning the double play in the presence of sliding baserunners. The designated hitter result never becomes significant, largely owing to the fact that the sample size is so much smaller than the others (since the DH rule came into being in 1973).
The length of time for which players remain useful and employed is of vital importance, not only to the emotional health of fans but for the economics of the front office. Because of the complexities in predicting career length, making sound conclusions on this subject is no trivial task. But with a little help from a different kind of model, I find that players at different positions have substantially different rates of attrition.
Most obviously, pitchers are never a sure thing, a lesson to which we have all grown accustomed. Even among position players, though, interesting patterns predominate, wherein players higher up on the defensive spectrum seem to last longer than the designated hitters and first basemen who contribute less defensive value. Finally, in examining a later-career sample of players, I see a kernel of truth to the old folk tale that second basemen rarely last long.
I think that result is worth dwelling on, especially in the context of Robinson Cano’s enormous 10-year contract. To summarize that episode: In a stunning reversal of their usual roles, the Seattle Mariners poached Cano from the deep-pocketed Yankees with an unexpected $240M offer. At the time, I chalked it up to the Yankee’s newfound desire to keep a slim and agile payroll. That explanation handily exploded when the Yankees went on their annual spending spree, leaving me confused as to why they’d let Cano walk.
The Yankees supposedly offered a greater annual value but wanted Cano to sign for fewer years. Randy Levine said at the time, “We just don't believe our policy is for players over 30 years old, we don't believe in 10-year contracts.”
It may be that the Yankees dodged a bullet in avoiding Cano’s decadal contract. With the increased risk the model predicts for aging second basemen, the years and money owed to Cano look even worse than they did at the time. Cano may be the rare second baseman who bucks the trend (see Rogers Hornsby), but for a team already on the hook for considerable risk, the statistics portend a potential disaster.