My grandfather used to say that in heaven, everyone was 25. He figured that was the perfect age in life. You’re old enough that you’re not a kid any more, but young enough to enjoy everything. Grandpa lived to age 93, and more than six years later, I still miss the guy. This one’s for you, Grandpa.
So what’s the perfect age to be if you’re a baseball player? For a while now, there’s been a small brouhaha going over those who say that the peak age is 27 and those who hold out for age 29. Now that I’m past both of those landmarks myself, it doesn’t seem like that big a difference, but in a profession where a player might play six years if he’s good, knowing which of those six years will be his best is vital to a team.
The problem with doing this sort of work is that baseball is a logistical nightmare in terms of doing well-controlled research. Players are not selected at random (like I used to teach my research methodology students) and there is a severe bias in who gets to play and who doesn’t. Indeed, we have an entire genre of radio which exists for people to call in and complain when a manager plays the not-so-good guys. Still, the joy of doing research … and yes, there is joy in doing research … is being able to crack some of these issues, despite the fact that they drive you nuts.
As someone who has dealt with more children than I care to mention (and that was before I became a dad), this question of “peak age” struck me as a development question, the same way that I’m often asked questions about whether Junior (no, not Ken Griffey Jr.) is on target with his developmental milestones. But, I wasn’t comfortable with one of the hidden assumptions, one built very deeply into how our culture perceives development, which people tend to make in this line of research. We assume that players develop in a gradual and relatively uniform manner consistent with their age. It’s a one-size-fits-all approach that’s reflected in the other major developmental measurement that’s a common feature in our society, schools.
Kids who are 12 belong in sixth grade and are all roughly at the same point in life, right? Maybe not. Kids develop in different ways and at different rates. Go to any sixth grade classroom, and you’ll see that the idea of uniform development is preposterous. Kids hit puberty at different ages; girls hit puberty before boys, and it’s all on display for you right there in your average sixth grade homeroom. Sure, in the aggregate, kids at 12 years old are “middle school” material. But what about this individual kid?
In education and in child development more generally, if kids aren’t learning or developing as quickly as we’d like them to, it’s not legal to just remove them from the population. But in baseball, that’s exactly what happens. Players who develop quickly are politely invited to be part of the team. Players who don’t develop so well are simply sent packing.
So I propose that we first look at this question of peak age from the other direction. When do players generally become just good enough to become regular players in MLB and when do they stop being good enough? I took all players who started their careers after 1980 and ended their careers before 2009. Only seasons in which the player had at least 200 plate appearances counted. It left a sample just shy of 1,000 batters (997 to be exact.) It’s not a surprise that most players debut some time between their age-23 and age-27 seasons (using April 1 age). What surprised me was the distribution for when players left the game. Take a look:
There’s a spike at age 27 for players leaving the game, but then after that, the rate of attrition falls for a few years and then spikes again at age 31. Odd.
I took a look at when players left the game as a function of what age they debuted at. I wanted to make sure that these twin spikes were some sort of selection artifact based on debut age. Looking individually at every debut age group, there was a similar pattern. Generally, there were attrition spikes around age 27 or 28, and then again around 31 or 32. And then there was one other spike I noted. The most dangerous year for attrition for a batter is his first year. For example, more than 30 percent of players who debut (i.e. have their first 200-plate appearance season) at the age of 26 don’t have another season in which they get regular reps. So, there seem to be three major winnowing periods in baseball for batters. The first year, age 27, and age 31.
Not shockingly, players who made their debut younger tended to be the guys who stuck around longest and were most likely to clear those three hurdles. They also tended to be better players. Indeed, a quick stroll through the “survival rates” for each of the groups in the study is enlightening. You can read the chart below as “of all of the players who debuted at age 24, X percent of them survived (had another season of 200-plus plate appearances) past their first year, X percent survived past age 27, and X percent survived past age 31.”
Survived Past ------------------------------- Debut Age First Year Age 27 Age 31 0-22 93.8% 72.5% 47.8% 23 88.1% 64.9% 39.3% 24 81.7% 64.0% 35.5% 25 82.9% 61.7% 33.6% 26 69.1% 55.5% 29.2% 27 72.0% 72.0% 33.3% 28 67.3% -- 33.7%
The fact that the players survived past these hurdles says something about their relative quality. Teams do not hang on to 30-year-olds with no skills. But, do members of these groups peak at different times? To find that out, I went into the statistical toolbox and pulled out one of my favorites. Remember, if you don’t like statistical gore, just say “and then a miracle happened” and skip to “the results.”
Warning! Gory Methodological Details!
I used a mixed linear model, with one fixed factor: age. I also used an AR(1) covariance matrix (auto-regressive, first order). This type of covariance matrix comes in very handy in this type of research, because it specifically corrects for the fact that we have several repeated observations for the same player. This is important because there are some players who are in the sample at age 27, but not age 28 (because they “retired,”) The covariance matrix sniffs out the fact that the group still present at 28 was better at age 27 than the retirees and corrects for it when spitting out the relevant output.
My dependent variable was OPS. (Yes, I know I didn’t use your favorite No.1 measure for a player; fire when ready.) The output that comes out the other end can be read “if you took an average player from the sample, and only told me his age, I would expect that his OPS for the year would be X.” Of course, we’d know more than just a player’s age, but the point is to come to some sort of aggregate conclusion.
I split the players in my dataset up, again by debut age and by what the last talent-age “hurdle” (first year, 27, 31) they cleared was. So, we may have a player who debuted at 24, and made it past 27, but not to 31. If he didn’t clear the “first year” hurdle, then he only played one year, which is by definition, his best (and worst) year. I found the year in which the model had the predicted OPS as highest. The numbers here are peak ages.
Last Hurdle Cleared -------------------------- First Debut Age Year Age 27 Age 31 0-22 24 26 31 23 25 26 30 24 26 27 31 25 25 27 28 26 26 29 29 27 -- 27 29
Players who stay in the league longer have later peaks, roughly around the age of 29 or 30, which is what J.C. Bradbury found using a sample that included players with longer careers (minimum 5,000 plate appearances, which is roughly eight years at 600-ish PA per year). Those players who play only into their early 30s and who comprise the plurality of players in MLB have peaks around age 27. Those who espouse the age-27 model (Mitchel Lichtman being only the most recent) generally use models that are variations on “what’s the most common age to hit the high point?” No wonder they get 27.
Another surprising finding is that good-but-not-great players (those who made it to age 27, but not 31) and who debut later tend to peak later. There’s no such thing as one magical age where all forward motion stops. Some guys are later bloomers, and have the same sort of arc as others … they just do it later in life. Those who stick around for a long time, however show the opposite pattern. In that group, those who debut earlier have later peaks. What to make of this two-trajectory model?
It’s tempting to say that players who come up early on the phenom track are a riskier lot. If they have a long career, they’re likely to have a longer arc of improvement. But, if they have a short to mid-range career, they’ll peak quicker. However, we’ve seen that earlier debut generally heralds a greater chance of a longer career. When they do flame out, it’s generally a bigger fireball, but the chances of a fireball are actually lower. It’s a tradeoff.
The phenom track can be compared to the Brook Jacoby track. (For those who didn’t spend the ’80s in Cleveland, Jacoby was the good, serviceable corner infielder for the Indians who actually made a couple of All-Star teams.) In general, it seems that some players come up in their mid-20s, have a two-three year period where they improve, and then fall back to earth (and out of baseball) by the time they hit age 30. The two-three year period appears to be constant. It’s just a matter of when they bloom. Of course, the problem is that when a player is coming up, we have no way to know which track he will fall on. His debut age does give us some idea, but it’s not a guarantee.
When Bill James originally took up this question, he suggested that players generally peak earlier than is generally thought, and decline more rapidly than is generally thought. He might have inadvertently been picking up on a wrinkle in how people think about the game. The good players do peak around 29, and those are the players about whom we first think. The great unwashed mass of players peak earlier.
The obvious take-home from this study is that method and sample will affect the answer to the question “at what age does a player peak?” I’d argue that this very fact means that the discussion of the one age for player peaks is actually kinda silly. Even beyond the usual cries that “You have to treat everyone as an individual!”, assigning one number to “peak age” vastly oversimplifies the situation. Sure, if we’re playing a probability game of “given no other information than his age, when can we expect this guy’s peak?”, then 27 is the best guess.
But to a team making a multi-million dollar bet on a free agent, it’s also the type of number that has the illusion of being a lot more informative than it really is. There are some concepts that can be reduced to a simple rule of thumb, and while the rule obscures the details, it’s easier to employ than having to sort through the mess of data. I don’t think this is one of those cases. Player development works in a much more complicated way than is generally thought.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.Subscribe now
Actually, I think we have a lot to learn from studying the margins. People are fascinated by the exceptional cases, but there's so much more to be learned by figuring out why it is that some players just don't make it.
Technical question: did you experiment with any other covariance matrices before settling on AR1? Particularly, if we believe that players have performances that vary around some "true" talent level (a nontrivial assumption), a compound symmetric model might be more appropriate.
Just curious. And once again, I really liked this.
So if you think players have underlying skill levels that change substantially over time, and AR(1) structure is pretty reasonable. If you think players have a basically stable underlying skill level and random variability of outcome is the main cause for changes in year-to-year performance, a CS structure might be better. There are lots of other options for covariance structures, some more restrictive than others. For example, an "unspecified" structure lets you fit unique correlations for any year-to-year differences (corr(n,n-1)=.2, corr(n,n-2)=.5, corr(n,n-3)=.1, etc.). This can be useful if you have a large sample size and can afford to estimate a lot of different parameters. (In contrast, the main of AR(1) is that it requires you to estimate only one parameter.)
One other thing to consider is estimating unique covariance matrices for different categories of player. It's possible that, say, catchers have weaker (or stronger) year-to-year correlations than, say OFs. Allowing for heterogeneity in your covariance matrices could allow you to pick up on this.
One possibility is the contract, free agency cycle -- or the tendency to write contracts in certain standard units. A high performance player may get a new multiyear (4-5 year) contract. A "survivor" may get a 1 or 2 year contract renewal. And so the outstanding player who has a strong quick start is, in effect, "guaranteed" a multiyear renewal.
Another way to look at this also involves the idea of heterogeneity. In mortality models, we imagine an underlying (unmeasured) trait that distinguishes people by their innate "frailty." Those who are "frail" are going to get beaten down by events -- influenza or other diseases or injuries -- while those who are "strong" will survive these threats. To link my previous paragraph with this one, I would propose that the frail (injury prone, game missers) also are the ones who get one-year renewal contracts while the strong get multi-year contracts. There are some frailty models in actuarial and demographic (mortality) research that might be applicable here.
I also wonder what impact the 6th year of a career has on long-term career possibilities, as it is after this year that a player gets his 1st crack at free agency, and long-term $ committment (excepting those arb-eligible cases who have their arb and maybe some FA years "bought out" early in their career, like Longoria). Would Gary Matthews Jr. still be in the majors if he hadn't had that career year in his FA-to-be season?
If I hope to get any point across in this article, it's that this mantra of "peak at 27" is over-played. If you absolutely forced me at gunpoint to say one number, 27 is the best number, but it vastly over-simplifies things. Human development isn't that linear or precise. It's messy, and I think that the bulk of the work is in getting in and cleaning up the mess.
Or is it not enough of an effect to challenge the assumption that only the best of the best start at an early age?
When a player gets called up he is either a prospect or filling in for an injured veteran. From the big league club's perspective they are asking themselves, "Can this player hack it in the major leagues." If yes,we'll keep giving you PA. If no, pack your bags. Once a player has established his PA are worth the league minimum then next question for the big club is, "should we offer this player arbitration or non-tender him?" For most of these players (it appears from your findings),even during arbitration years players will continue to be good value for their team. Once a player reaches the free agent market (I think 27 is a good age for that but have no evidence) the same question arises. Is paying this player x$/per win during his supposed prime an upgrade over my 21-25 year old cost controlled player? If he is worth it, he'll have a job, if he isn't he wont. The last question obviously combines the first two. Is this player who is past his peak, more valuable than a 21-23 year cost controlled player or a free agent who should be entering his "prime"?
I'm no economist, but I'd love to see how economics play into player peak studies.
Next, I thought about exactly what JDSussman mentions, but I don't think it's an issue. Ultimately, when a player is non-tendered, they are not banned from baseball. They can receive their market rate. If they can receive their market rate, that means that if they are approximately a quad-A player, they can get a minor league deal and still will get a chance to play if they can sneak above replacement level. For players who reach six years of service time, they will face a similar situation, where once they are a free agent they can get paid accordingly to their quality, and if they are above replacement level, they can play.
What is possible is that the investment of playing a player below replacement level in hopes that he will learn something starts to become less and less valuable as he gets closer to six years service time. That could be playing a role, but probably not a very large one like the one we see in the graph above. There could also be team bias factoring in somehow where the team that drafted the player is the only one who thinks he can perform above replacement level, but I don't know if that's much of an issue. It certainly is less of an issue with the latest CBA allowing teams to sign their own free agents.
Basically, I really doubt it's a big issue, even if I think it's possible I'm missing something. I guess the most obvious thing to check is if there are a pair of modes for number of years of service time.
When I was reading the article, I thought of the rookie performance of players like Greg Maddux, Tom Glavine. Clearly they stayed in the major leagues because teams saw the development potential.
It seems to make much more sense to start with draft age and then plot each players translated stats from that point. With that, you don't have to worry about selection bias and you don't have to worry about guys who simply fall off the map. By using translated minor league stats, a guy who gets demoted to AAA isn't gone at all.
And you defeat most of your selection bias since there are guys who hang around forever in the minors because they aren't good enough to make the majors and are too stubborn to retire.