Recently, there’s been a decent amount of chatter regarding how baseball players age, and I have to admit that it’s mostly my fault. In a study that was recently published in Journal of Sports Sciences, I find that players tend to peak around the age of 29; this finding has been met with resistance from some individuals in the sabermetric community, where 27 has long been considered the age when players peak. Will Carroll and Christina Kahrl graciously asked if I would be willing to defend my findings on Baseball Prospectus. I agreed, and I thank Will and Christina for the opportunity to do so.
Due to the length of the explanation, I have broken the analysis into two parts. Part I explains the empirical problems faced when estimating aging, and examines why past sabermetric studies have failed to properly measure player aging; Part II explains my recent study.
Part I. It Usually Begins with Bill James
For the past few years, I’ve been investigating the financial worth of players for a book that will be coming out next winter. Valuing present performance is difficult enough, but players are normally under team control for several years rather than signing annual arrangements. Soon after I began this project, I realized that to understand player contracts I was going to have to gain a better understanding how players age. I found previous sabermetric studies to be instructive but deficient for reasons that I’ll explain below. I also asked for guidance from my exercise physiologist colleagues who were familiar with the academic literature on aging and athletic performance. These studies were good, but did not account for several aspects unique to baseball, such as changes in baseball’s competitive environment. Therefore, I decided to conduct my own study.
Of course, my investigation began with Bill James. Like many baseball analysts, my first introduction to sabermetrics was through James. Nearly a decade ago, my colleague Doug Drinen pulled out his prized childhood collection of Bill James Baseball Abstracts for me to peruse, and I was awed by James’s approach. In my mind, this was the economic way of thinking, applied to baseball. I doubt that is what James intended, but I remembered saying to Doug, “if I had stumbled on James when I was 12, it would have changed my life.” Instead of having to wait until freshman economics before I was introduced to the lens through which I was already viewing the world, I could have learned about it through my favorite sport. Of the essays I read, James’s “Looking for the Prime” always stuck with me, because of James’s keen understanding of the difficulty in using baseball performance statistics to measure aging.
The essay is most frequently referenced as a successful challenge the conventional wisdom that baseball players peak between 28 and 32, with James bluntly stating, “that one truism is blatantly false.” James used his “Value Approximation Method” (VAM) to measure player performance by aggregating individual VAM ratings of players by age. He found that the player-age of 27 had the highest total performance of any other age and concluded, “If you must assign a five-year peak period to all players regardless of description, the best shot would be 25 to 29.”
But in hindsight, the evidence for this was actually quite weak. James would eventually abandon VAM because it was, in his own words, “ultimately undermined by the lack of logic behind it” (NBJHA, p. 338). Also, aggregating performance into age buckets is biased by the fact that many players play baseball in their mid-twenties before they wash out-these buckets are full because of the number of people at this age who play, not because they are so good.
But even while taking the popular notion to task, James was quick to note that current studies of aging, including his own, suffered from a unique problem. When players are deemed to fall below major-league quality, they stop playing; therefore, we can’t measure how their performance changes. He referred to this unobserved hypothetical performance as “white space.” When we look at players who do play to measure aging, we have to keep this selection bias in mind.
James’s decision to aggregate total performance by age was done to avoid a problem that Pete Palmer, another prominent early sabermetrician, had run across in his own study of aging. Palmer had taken average performance of all players by age and found performance to be nearly flat across ages. The reason for this is not that players don’t age, but that good players get to play when they are young and old, while marginal players do not; thus, the remaining good players make the tails look comparatively flat. Ultimately, James acknowledged that his own method didn’t solve the problem, and left it as a question for future analysts to address:
I think that any successful statistical analysis of aging must find, and none of us has yet, some way to deal with “the white space.” In the Palmer study, all the players within the white space are not counted; in the VAM study all players in the white space are counted at zero. Neither is correct (pp. 196 – 197).
Two other methods have also been employed by sabermetricians attempting to quantify aging: the mode method, and the delta method. The mode method identifies the age at which players typically have their best season. While the mode age is a player’s peak age by definition, it’s inappropriate to assume that the most common age at which players have their best season is the pure product of aging. The mode peak age will likely lie below a player’s expected peak, because there are two main factors that cause players to decline and leave the game: aging, and non-aging-related injuries.
Injuries and age are correlated, but players also suffer performance declines that have nothing to do with age. For example, in 2008 Chien-Ming Wang severely damaged his foot while running the bases, and it’s unlikely that Wang will pitch as well as he did when he was 27. This injury could have struck Wang just as easily at 24 or 36. Because players who wash out of baseball are normally replaced by young players, there will be many more players having their best season at a young age. When we look at the mode, we are not differentiating from the cause of deterioration, however; non-aging attrition gives more players the opportunity to have peak ages earlier than later. Studies that employ the mode to measure aging inadvertently pick up this bias, which has nothing to do with aging.
The delta method looks at players in adjacent seasons to see how their performances change to measure aging, but this method suffers from a different type of bias. In order to play in consecutive seasons, a player must normally have a good first season. Players with bad seasons don’t get an opportunity to be in the sample. Thus, many players who had lucky-good seasons will stay in the sample to decline the following year, but lucky-bad players won’t get an opportunity to improve. The lucky-good deterioration will not be canceled out by lucky-bad improvements; therefore, this selection bias (which I call “the survivor effect”) will exaggerate aging.
Recently, Mitchel Lichtman used a modified delta-method approach to quantify aging that attempts to correct for the survivor effect. His solution was to include players who would typically be dropped from the sample (i.e., players who do not play in consecutive years) by assigning them hypothetical performances for the following year. This sounds good in theory, but where does this hypothetical performance come from? Lichtman explains:
The projection is their last three years lwts per 500 PA, weighted by year (3/4/5) added to 500 PA of league average lwts for that age minus five. In other words, I am regressing their last three years lwts (weighted) toward five runs lower than a league average player for that age.
While Lichtman believes using five runs below average generates a “conservative” projection, the substitution is just a guess informed by nothing more than a hunch. In this case, the guess imposes the outcome for the exact factor we are trying to measure: the estimated decline is a pure product of the assumption. Thus, it is no surprise that Lichtman’s adjusted delta-method estimates yield results that differ little from his raw delta-method estimates.
The good news is that we don’t have to guess what players might have done; instead, we can look at how players who continue to play over several years age and not rely on snapshots of one-time annual changes. Such a sample has two advantages over the delta method. First, the fluctuations in player performance due to random noise-not aging-will be smoothed out over time to generate a trend. Second, and more importantly, it allows us to track individual players over time to see how each player progresses according to his own unique baseline.
Doing this analysis requires using a multiple regression analysis technique that allows us to observe how a cross-section of players change over time, while controlling for other potential factors that affect performance but have nothing to do with age. Part II describes the study I conducted using this method and discusses its results.
Part II. Looking for the Prime in a New Way
In Part I, I identified problems with previous studies of aging that motivated me to conduct a new study on the subject; here, I explain the method in general terms. If you are curious about the technical details, they are available in my paper.
I used an 86 years of baseball performance data from 1921 through 2006 to produce a large sample for generating estimates. The historical sample also allowed me to compare how player aging may have changed over the several decades.
In order to see how players improve and decline with age, it’s necessary to use players with sufficiently long careers to quantify age changes. I included players who played at least ten seasons with a minimum of 5,000 career plate appearances for hitters, and 4,000 career batters faced for pitcher. Each season, a player must have at least 300 plate appearances, and 200 batters faced for pitchers. Furthermore, because when players begin and end their careers is not random-good players tend to start earlier and end later than inferior players-I only looked at player performances from the ages of 24 to 35, even though careers extend beyond this range.
A potential problem with the sample is that the restrictions that allow players to be tracked over a long period of time mean that I am using a cohort of good baseball players. If good players age differently from bad players, then the results might not be applicable beyond this group. However, as I detail below, aging does not appear to be correlated with performance and the results do not change when I lessen the restrictions for inclusion in the study.
The Estimation Method
I used a multiple regression technique specifically designed to address common problems that arise when estimating relationships across many different units over time. Multiple regression analysis allows me to estimate the impact of individual factors while holding other potential contributing factors constant. For example, the performance of a player will be affected not just by his age, but by his natural ability. By including a player’s average career performance in the estimation model, I was able to hold this factor constant, so that changes in age and natural ability could be determined independently. In addition, it’s important to adjust for the changing playing environments in which players play. I controlled for the influence of home parks by adjusting each players performance using park effects. Accounting for changes across eras is a bit more difficult; for example, a hitter who began his career in the late-1980s, when offense was low, and played into the early-2000s, when it was high, may appear not to have aged much at all with raw statistics. Therefore, I converted each player’s performance into a z-score, measuring how many standard deviations a player’s performances is from the league average that year. This way, as the run environment changes over a player’s career, his performance will be measured relatively against his peers so as to reveal changes from aging.
I used the data to look at several different aspects of player performance from the general to the specific. Overall, I found that both hitters and pitchers peaked around age 29. However, some skills peaked earlier and others peaked later.
Peak Age by Skill Hitters Pitchers Metric Peak Age Metric Peak Age Linear Weights 29.4 ERA 29.2 OBP 30.0 Strikeout Rate 23.6 SLG 28.6 Walk Rate 32.5 AVG 28.4 Home Run Rate 27.4 Walk Rate 32.3 2B+3B Rate 28.3 Home Run Rate 29.9
The table reveals that player skills peak at different times, often quite far apart from each other. Hitters peak in batting and slugging average at 28 while continuing to improve in their home-run hitting and walking abilities until 30 and 32, respectively. Home runs rising beyond the peak for doubles and triples indicates that foot-speed on the basepaths fades before hitting power. In addition, batters may be using veteran knowledge to better manage the strike-zone-or possibly becoming more friendly with umpires-to walk more and hit with power as they age. Pitcher strikeout ability peaks around 24, while walk prevention peaks nine years later. Again, veteran know-how appears to be playing a role in improving performance to compensate for diminishing physical skills. This is consistent with something that exercise physiologists have documented among golfers who hit more fairways as driving distance begins to fade. It’s also been observed that athletic feats that involve quick bursts of speed and strength peak earlier than skills that rely on more endurance and knowledge.
The length of the historical sample also allowed me to observe how peak performance has changed over time. The graph below maps the peak ages for hitters and pitchers by decade of birth:
For hitters, the lowest and highest peaks occur among the oldest and newest cohorts, and there is a slight upward trend in peak age; however, the trend is not continuously increasing. For pitchers, there is no trend in peak ages. Overall, there is some evidence that peak ages may have risen for hitters, but it’s not clear that the higher peak age in the present is different from random fluctuation. While it might be surprising that the dramatic improvements in wealth and health over the past century have not increased baseball players’ peaks, this is actually consistent with evidence from other sports. World records consistently fall in sports, but the age at which records are typically recorded has remained constant. While athletes are improving relative to the past, they continue to reach their highest potential at the same age.
Revisiting Sample Choice
The most frequent critique of my study has been that the results are derived from a sample of good players. That is certainly a possibility, but the way in which the regression technique estimates the aging function means that the estimates should not be affected much by the restricted sample. Whether good players decline early in their 20s or in their mid-30s, the regression will identify this. For example, look at the wide disparity in peaks between strikeout and walk abilities for pitchers.
Because a long playing career is required to be included in the sample doesn’t mean that we cannot investigate the correlation between playing ability and aging. If playing ability is positively correlated with aging then elite players should age differently than good players, just as good players will age better than bad one. I compared how Hall of Fame players aged relative to the entire sample, but I found very little difference. If anything, Hall of Fame players peaks a little earlier than other players. Another source of information is to examine how player ability correlates with another product of aging, longevity. A recent study by sociologists Jarron Saint Onge, Richard Rogers, and Patrick Krueger found that though major-league baseball players have life-expectancies five years higher than the average American male, better players don’t live longer than inferior players.
As a final test, I estimated the model using a larger sample of hitters who only had 1,000 career plate appearances. This had very little effect on the estimated aging function. The diagram below maps the aging functions for the original and less-restrictive samples, with the improving and declining performance measured in standard deviations below the peak. The estimated peak ages in both samples are virtually identical, and the rises and declines are similar. Thus, it appears that the sample restrictions are not biasing the estimates of peak age upwards.
I began my investigation into how baseball players age in order to address some potential problems with past studies. It turns out that after correcting for those flaws that the peak age of baseball players appears to be around 29, and possibly 30 for hitters in modern times. Of course, some players will peak earlier and others later, but this is a general benchmark.
I find it interesting that despite his unwavering pronouncement of when players peaked when the article opened, James’s tone was tempered in his general conclusion:
Good hitters stay around, weak hitters don’t. Most players are declining by age 30; all players are declining by age 33. There are difference in rates of decline, but those differences are far less significant for the assessment of future value than are the differing levels of ability (James, 1982, p. 205).
And that’s probably about as technical as we need to get.
J.C. Bradbury is an economist and associate professor at Kennesaw State University. He runs the website Sabernomics.com and is the author of The Baseball Economist. He can be reached here.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.Subscribe now
I guess my main question from this research is determining the value in regards to an extended period of time as to where a player is at his most valuable as opposed to just his peak. If the general benchmark is 29, then is the range of top performance in the 27-31 range? 26-32?
It could be that or could it be the longer the player is in the game the more leeway he gets from the umpires on close pitches?
With how far away from the other skills this is, I'd think it has to be something more significant than umpire favoritism. Such as skill improvement.
Some players choose to take the walks (Sammy Sosa). Others (Garret Anderson) choose not to.
You just can't select the sample based on "long survivorship."
The former is unlikely to be an issue; these are all elite players on any absolute scale, and the differences among them are unlikely to affect the shape of the career arcs due solely to aging. Independent research confirms this.
I'm much more worried about the latter potential bias. I understand that the point of the study was to identify when players peak when aging is the only issue, but in real life it isn't the only issue, as the Wang example makes clear. Part of why "players peak at 27" may be because they get hurt, and are never quite as good afterward even if they aren't washed out. Players who missed significant time mid-career are eliminated from your sample -- unless I misunderstood?
I would even assert that, if you want to understand contract value, there's no point in distinguishing loss of value due to aging from loss of value due to non-career-ending injury. Unless, that is, you also have a model that is good at predicting differential injury-related risk for different players.
At any rate, an excellent writeup of some VERY interesting research. Now to dig into the technical details of your longitudinal regression methodology...
Dave: I agree with you that this study appears to focus solely on aging, while teams may be more interested in overall attrition, whether through aging or injuries. If all players age the same (say 29 is peak), and have, say, a 10% chance of career-ending or degrading injury every year, then it would lower the mode peak age. You might peak at 29 without injury, but have only a 75% chance of making it, for example. If you could find the rate of "freak" injury, and correlate it with this aging chart, you might find that 27 is a mode peak age. Or maybe not, anyone have some free time?
Thanks for the article, engaging piece.
The motivation of this study is to understand smart contracts. If I have a 24 year old player, I don't know if he'll still be playing when he's 35. If I did, that would be very helpful information.
To give an extreme example...say I have a 21 year old pitcher with an ERA of 3.09. I want to know if locking him up for his age-44 season is worthwhile. In general, the answer is no because pitchers - injuries or not - just aren't good when they're 44. But what if I do an empirical study containing only pitchers who pitched when they were 21-44? Well, now I'm looking at Nolan Ryan and it looks like a great idea. But I don't know that I have Nolan Ryan because I don't know that my current 21 year old pitcher will be pitching when he's 44. Knowing that would be tell me something about his ability and, more importantly, the change in his ability over time.
It's perfectly plausible that on average (say) 35-yr-olds preserve 80% of their EqA from when they were 29, and that this is essentially independent of what that age-29 value was. If you're looking at a 30-yr-old Mickey Mantle, that gives you an idea of how good he will still be at 35. If you're looking at a 30-yr-old Kevin Young, it gives you an idea how long he will have been out of baseball when he's 35.
(I gave that example in terms of percentages, but it could just as easily be how much VORP a player loses per year after his peak, on an absolute scale.)
If you look at his study, his curve predicts that players at age 29 should have a higher OPS than they did at age 27. They don't. I've checked. (And I controlled for the change in league averages, as Bradbury suggested to me in a rebuttal.)
He doesn't find that conclusion persuasive enough to change his model. But given his assumptions, I don't know that it makes sense to take his aging curve and apply it prospectively to players outside of his sampling criteria. And his sampling criteria pretty much rule out his aging curve for most players we would have an interest in projecting.
Incidentally, I wonder what result J.C. would have found if he had included minor-league statistics and translated them to the majors. Probably that players peak at 27.
It seems to me that Bradbury has conditioned on a player playing through age 34 whereas you don't seem to be assuming this condition.
This just means that you have to be careful when trying to apply his results for purposes of prediction. It also means that there is still a lot of work to be done if we are to properly understand the role aging plays in performance for baseball players.
I don't see the practical utility of an aging curve between ages 24 and 34 that requires me to know that the players who fit the curve played through age 34 - by that point I can just look at what they actually did.
Nate Silver had a great article a while back about forecasting and aging where he describes some hypothetical aging curves for MLB players. Well, because of the way Bradbury designed this study, he's a lot more likely to capture the career trajectory of a guy like Nate's Lenny Latebloomer than he is someone like Eddie Early, who is likely out of the league well before he gets to the late ages implied by Nate's graphs.
If I understand this correctly, PECOTA allowed for an unlimited variety of aging curves, and the probability distributions of the projections themselves were asymmetrical.
While from the article that Colin cites it would appear that "on average" Nate accepted the idea of an age 26-27 average peak, the method of projections used by PECOTA didn't depend on whether or when there was such a peak. The historical performance of the comparables selected from the database of all player-seasons since 1946 provided a wide range of "futures" for making projections for each player.
This method also implicitly allowed for the likelihood of injuries that beset the "comparables." (I don't know what it did if the comparables retired. Hence, I don't know how susceptible PECOTA might be to the type of selection effect that JC is being questioned about.)
I think PECOTA's occasional larger mistakes had to do with two factors: inappropriate MLE's based on minor league data, and inappropriate park factors. These types of error were not intrinsic to the "similar players" method but rather to how the performance data were massaged.
Thus, while I think this hunt for the right shape of the aging curve is worthwhile as an exercise, I'm not sure that looking for the "average" is as important as looking for the "variance" or different patterns of aging -- not just skill by skill, but for players of different "type" as PECOTA characterized them (by position, handedness, body type, speed).
The test began by taking a sample of one group of players at ages 27 in 2006 and 29 in 2008 and compared their OPS. You reported that the change was negative; therefore, "What we see does not seem to support Bradbury's conclusion." I then pointed out that when you made a minor correction for the decline in offense from 2006 to 2008 that the players all improved almost exactly as my model predicts. You also used a sample of players from 1997--2008 to see how their performance changes from 27 to 29 and found an OPS decline of 0.006. I showed that league offense declined during this period. Upon learning this, you then abandoned the specific 2006/2008 example and focused on normalizing the larger sample that included additional years. This sample showed a decline of 0.007. I'm a bit unsure why after controlling for the declining offense the change became more negative, but I suspect it has something to do with using another sample.
What we have here is the result of cherry-picking. You have taken a few data points out of a larger sample and found a relationship that appears to be incongruent with the whole. Such incongruencies can be found in any sample of data, without damning the overall result. Let me show you in a picture.
Here we have an aging function that maps a sample of data (it's just a collection of semi-random points to illustrate) for a large population of players. The points are average performances by age for the players, and they deviate from the estimated function, both above and below. As the dotted line shows. it's possible for there to be a negative relationship between 27 and 29, with 29 still being the peak of the aging function. If I picked 29 and 32, it would appear that players were peaking after the estimated peak. There are many of these potential relationships that can be found between any two points, and none should be used for evaluating the overall fit of the model.
Furthermore, while you focus on predicting outside the function, I don't think you understand what the regression estimate is. The aging function is "picked" by the procedure to minimize prediction error. It is a prediction, designed to be the "best" predictor. And it's information from a much larger sample and controlling for many other factors not included in your analysis.
PS -- I made some attempts to use html, but I'm not sure if they will work, so I included separate links as well.
Sorry, but I keep getting lost in the paragraph about "predicting outside the function". Never have a Smidgen (heavy sugar bomb) before reading an article like this, though, I tried to counteract it with a stiff espresso.
Logically, just as many of you, I had trouble with JC's original sample. However, if JC's conclusion held up even when he changed the pool of players to those requiring 1000 career ABs and the results were the same, then I don't see why it doesn't have clearer predictive value.
How significant is the decline Colin found from 27 year-olds to 29 year-olds. Is the take home here that players reach a plateau rather than a peak?
Does the decline that Colin finds come from players getting hurt, then becoming less productive afterwards? Isn't that a valid form of decline?
Perhaps, the confusion between decline and aging is the disconnect here. Perhaps, our old understanding is correct in that players overall decline after 27. Perhaps, as long as a player stays healthy, he will most likely peak at 29 or 30. Is that useful information?
it's the latter.
The 5000 at bat requirement would eliminate those Ben Grieve types who were certainly good enough to expect a long career, but it didn't materialize. Those players aren't noise. They are valid data points.
Career killing injuries shouldn't be ignored either.
Perhaps, what needs to be done is to look a sample of players who were, at least, at a certain level at a certain age and see how all of them age - whether they make to 5000 at bats or not.
If you want to determine the effects of aging on the health in the general population, you can look at every adult who lived to age 85 to see how they age and what maladies they have suffered. But then you've excluded a very large portion of the sample, because we don't all live to age 85. The ultimate 'aging' effect has already taken its toll before then.
Same in baseball. By reducing the sample to those who are still in the game at age 35, and started by age 24, we've reduced the sample to players who will be eligible for the Hall of Fame. The rest have retired due to injury, or been forced out because they can't perform, etc. So, while it might be useful to understand the aging effects in potential Hall of Fame eligble players for some analyses, it may not be useful in other analyses (including the general profile of a ballplayer).
One way to expand the information in order to determine whether there are 'survivor' biases would be to re-run the analysis on different populations and see how it differs. For instance, what if we look at players who were playing at Age 24 through Age 31, Age 32, Age 33. Do the peaks of the different skills shift significantly?
I think it's reasonable to expect that they will.
Doing so does not defeat the purpose of the study as long as it is properly understood what we're being told, which is that of all players good enough to play for ten years (and there are a lot in there who really are not good) the general aging curve peaks at around 29.
This data can then be used for further studies, which might yield the answer to the questiong: "should we sign this pitcher through age 44?"
I don't know if JC is going to respond to comments, but I've suggested before using a selection model. Is there a reason, you're not doing this? Estimate separately the probability that someone will be playing next year (using, ideally, some type of "shock" for identification...position quality? Strength of team prospects at that position?) and control for that probability. This should get rid of selection bias if done correctly. I'm not an expert at selection models so maybe someone else could jump in here and help out...
It just seems like everyone's trying to do weird stuff to account for selection when they need to attack the problem head-on. You want an "experiment" that makes sense. Say you could find a group of players that you knew had a 100% chance of playing next year because they had compromising pics of their GM (these pics were handed out randomly) Well, you wouldn't even need anyone else in your sample - you would just use this group to get at the core issue here. Of course, this group doesn't exist. But say you could estimate that some group has a 60% chance of playing next year because their team's farm system isn't great. And another group has a 40% chance. Comparing the results from the 2 groups accounts for "selection" and you can figure out the true effect then. That's the underlying experiment. You just need to find some "shock" to Pr(playing next year) that is unrelated to this year's performance.
My study is imperfect, as making sample choices always involves tradeoffs. I went with what I thought was best and I openly acknowledged the downside as well as the upside. Furthermore, as I stated above when I drop the sample inclusion requirements considerably the results do not change. When I look at HOF players versus the entire sample, they don't appear to age much differently.
I would welcome anyone to study aging in baseball through new methods. As I have highlighted above, the methods previously employed to measure aging have some serious problems, and I think that my study handles these issues better. I don't intend for this study to be the last word on the subject, and I would be delighted to see further studies that employ advanced empirical methods designed to handle the relevant issues.
Hitters born in 1980 is not shown. If steroids increased the peak age for hitters, I would expect a peak at hitters born in 1980. Is this data available?
For all intents and purposes, your intercept is at 24.0 years of age. Anything less than 24.0 is out of bounds.
If your model is correct, there's no problem at all evaluating the estimated curve outside the sample range. Of course, the quadratic model is an approximation, and it's possible to argue that it's not valid at the extremes, like age 23.6. But that's hard to justify when it's so close to the sample, agreeing that it's OK at 24.0 but arguing that at 23.6, well, it's just too far off. At least not without a serious argument about why those 0.4 points makes a big difference.
"You should refuse to consider values of the function even 4% outside your sample domain" is like, "you should wear a helmet even if you're only walking to the backyard in case a meteor hits." I think you have to use a bit of common sense here.
"Extrapolation beyond the original range of X values is hazardous and should be avoided. In other words, one should not engage in predictions for values of the independent variable that are outside the range used in the study." (cf. Pedhauzer).
My original comment above did not question whether the extrapolation is reasonable or likely to give a wrong answer in this case, but instead whether it's standard practice. It's not.
I disagree with the "standard practice." I think in this case what JC did is not hazardous and need not be avoided.
BTW, when CAN you extrapolate? I don't get it ... is this specific to curvilinear regression, or any best fit line?
Should I just stop worrying about global warming because the future is outside the data set?
Or, if my flesh-eating disease seems to be spreading from my toes to my ankles to my knees, should I avoid worrying that it's going to spread to the rest of me?
Snarkiness not meant for the commenters here, but to the textbook that preaches blind adherence to a rule of thumb.
Fundamentally, what AC is doing is descriptive -- estimating the functional relationship between age and (average) performance over a certain age range. If he had been estimating a survival function, based on all players who were on ML rosters at age 24, he'd still be constrained to those who are 24.0 at the start of the survival function.
If he were really interested in what happens from earlier ages, say age 20 or 21, there are data out there to do that. He wouldn't need to use data from 24-35 year-olds.
JC's not engaging in forecasting or projecting into the future or retrojecting into the past or to older or younger ages. He's describing a funcitonal relationship that he observes between two end points. (And as some have noted in the comments here, he's not necessarily describing the performance of all players who were in the ML at any time from age 24 to 35, but only a highly selected subset who survived the entire interval.)
Sure, we know from typical survival analysis, for example, that the probability that somebody will reach age 60 is a lot higher if they've already reached age 50 than if they're still only age 20. But that's been established based on descriptive analysis of a much larger data set of those who were alive at age 20 and then survived or died between age 20 and 60, not one based only those who survived to age 50. True, at age 60 I may feel my chances of surviving to 80 are pretty good; but if I looked at a life-table based on observation of actual populations past and present, I'd be able to estimate that the odds are just about 50-50, not just "pretty good."
As for your forecasts or conjectures or your concern about flesh-eating diseases, sure make them. Use good sense. Draw on experience. But that's not what JC is purporting to do when he's describing a relationship based on actual observations (data) within a specific age range. He's doing descriptive statistics about behavior within a specific age range. And the standard practice in that type of analysis not to extrapolate -- especially in this case since data beyond the 24-35 boundary actually exist.
Thanks for posting this.
The better aging model would seem to be the one that has predictive value. If what Colin W. says above is true (that Bradbury's system is decidely inferior at making predictions) then in what situations would it be useful? This doesn't seem like the system you'd want to use to evaluate contracts and that's what Bradbury was intending to do.
All that said, let's give Bradbury some credit for writing up his work very clearly and presenting it here.
Let's say we found that the most common peak age is 27, and attempted to estimate aging changes according to the percentage of diminished observations at each age. When I have a number of players washing out of the league at a young age, the mode will be biased downward. If I have a player who is 27, what should I expect him to do? If I used the mode, I would expect him to diminish. But this expectation is based on something that is irrelevant to this player's situation. On average, 27-year-olds improve, not decline.
One of the interesting things that I noticed in my study is that among players who remain in my sample---that is, they don't just quit playing baseball, so complete attrition isn't the cause---is that the mode peak is about a year lower than the average peak. Though 28 is the mode, there are many players having their peaks into their early-30s, which pushes the mean upwards. Even among players who don't get knocked out, they have their production knocked down. And the longer you stay in the game, the more opportunity you have to suffer one of these random injuries. And they are purely random, which means knowing the age of the player is irrelevant to predicting such attrition.
Oh, I do echo the sentiments in these comments that this was a clearly written thought provoking article. Thank you. It is because of its clearity that we have so many readers who read it and understood it enough to make so many comments as they have.
So he comes up with an observed aging curve for a very, very small subset of players who by definition peak late and age gracefully (gradually). If we assume that all players have somewhat different "true" aging curves (if, because of nothing else, their differences in physiological versus chronological age), his subset of players is one that necessarily is going to have an aging curve with a late peak and a gradual decline - otherwise they likely would not have lasted that long and played as much and as regularly as they did.
In addition to that, and to make matters worse, the trajectory he found is not even a trajectory of "true talent." Because of his selection bias, it will necessarily be comprised of players who, by chance alone, had late peaks and gradual declines.
To illustrate that, let's say that all players have the exact same true aging curves. Now, if we let all players play 10 years, obviously by chance alone, some players will peak at 26, some at 32, etc. (even though they all have the same true peak). And some players will have steep post-peak (and pre-peak of course) trajectories and some will have shallow ones (in fact, every possible shape will occur if we have enough players in our sample), again by chance alone, even though they all have the same "true" shape. The players who peak late by chance and have a gradual performance decline by chance alone will tend to dominate JC's sample. Basically JC's sample (a VERY small subset of MLB players) consists of players who have true trajectories that peak late and decline gradually AND players whose "observed" peaked late and declined gradually, by chance alone. Is it any surprise that he finds a peak of 29 or 30 and a gradual decline after that? Heck if we look at players who played 15 years and 7000 PA, we are likely to get a later peak and more gradual decline still! Does the name Bonds sound familiar?
I am sorry, but with every fiber in my body, I think that it is ridiculous to characterize JC's resulting trajectories as "of the typical MLB player," or some such thing. It is an "observed" (as opposed to "true" - representing the changes in true talent of a player over time) aging trajectory of a very small subset of players who we already knew had long and prosperous careers. Nothing more and nothing less. Can someone tell me any practical use for this kind of data?
Your comments come off a little overcharged.
The first that popped into my mind was the old saw about no models being correct, but some are useful. Even taking all of these criticisms at face value, we're still left with a very good study on the subset of data involving players who show good survivorship. Even if this can't represent all MLB players, it will still have its uses, and I think you risk oversimplifying by saying it can;t have value if it can;t be used for predictive purposes. Plenty of studies contribute simply by effectively showing where the next missing link resides.
If you wanted to know how long a 20-year-old man was going to live, you would look at men who were alive at age 20. You would not look at men who lived long lives and conclude that they were a representative sample of 20-year-olds.
That is: like MGL, I believe that JC's finding that hitters peak at 29 is completely due to his selective sampling of long-tenured players. That's even accepting his method itself without qualification.
I know that JC disagrees with my conclusions on this point, as he disagrees with MGL's. Either MGL and I are wrong, or JC is wrong.
Or, maybe one of us is 90% wrong and 10% right; or maybe we're each half right and half wrong; or maybe we're all full of crap. If you're interested, take a look and judge for yourself.
All my comments on JC's study are here. For the 1000 PA case in particular, look for the "part II" post.
I think dpowell, MGL, ProTrade, etc. are right -- when you limit your analysis exclusively to people who play from ages 24-35 you are necessarily excluding anyone who peaks early. If I wanted to know how a 20 year old would age, you're suggesting that I take a look at all people who survive from age 20-60, ignoring all of the 40 & 50-somethings who drop dead from heart disease and other age-related illnesses. That's introducing bias through sampling.
I don't have access to your full text (my university's not a JSS subscriber), but if you had a working paper version available somewhere, I'd like to read it. There are so many conditions you're using to generate this sample that it's hard to understand what applies to the various robustness checks since it's not particularly clear from this article or explanations in comments. I'd also like to see the working paper version for the full set of regression equations; there's no reason to expect this aging function to behave quadratically, for example. It seems more intuitive to expect a quicker increase (as age and experience are both positive forces on performance) and an asymmetric decrease on the back end (i.e. a flat in the middle as experience compensates for immediately post-prime age, then further deterioration of skills that experience can't account for).
Send me an e-mail, and I'll send you a reprint.
The quadratic fits the data just fine and have the advantage of being simple. I experimented polynomials of higher magnitudes and used fractional polynomial estimation, all yielded similar results.
Despite its negative rating. :)
Thanks for the comments. I'll do my best to get to all of them. Unfortunately, it probably won't be until tomorrow until I get to the bulk of them. Why of all days did Mark McGwire have to admit he used steroids?
If you want to see a rough measure how the peak might differ, you could take the extreme bounds for each variableâ€™s confidence interval (positive for age and negative for age2, and vice versa) the peak age estimates range from 29.28 to 29.5. But, you canâ€™t hold one constant and push the other to its extreme, because that would generate a shape far different from best estimate.
"CLARIFY: Software for Interpreting and Presenting Statistical Results by Michael Tomz, Jason Wittenberg, and Gary King; version: 2.1, 1/5/2003. This is a set of easy-to-use Stata macros that implement the techniques described in Gary King, Michael Tomz, and Jason Wittenberg's "Making the Most of Statistical Analyses: Improving Interpretation and Presentation". To install Clarify, type "net from http://gking.harvard.edu/clarify" at the Stata command line. The documentation [ HTML | PDF ] explains how to do this. We also provide a zip archive for users who want to install Clarify on a computer that is not connected to the internet. Winner of the Okidata Best Research Software Award. (All questions, bugs, requests: Clarify Listserv [Un-]Subscribe or Search archives). Also try -ssc install qsim- to install a wrapper, donated by Fred Wolfe, to automate Clarify's simulation of dummy variables."
I don't think using MLEs for minor league data will work. My MLEs already have a fair amount of aging built in, and when I found that comparing each year's projection with the next year's actual performance worked well to zero out any mean errors in the projections, it was worthless as an aging curve.
Sounds like there's work to do to figure out how to harness it for predictive purposes (maybe model it's results by longevity simulations or injury risk classes?), but there should be some building blocks in there.
As I reiterate throughout the two parts, there really is no one-size fits all aging curve. And in practice, you are better off addressing each player on a case-by-case basis.
For example, if you have a 31 year-old FA that you are thinking about signing, you would want, at the very least, to look at the aging curves of similar players during the modern era - for example, full-time, 30-32 yo players who have played for X years already.
None of the generic curves I discuss, or JCâ€™s, or anyone elseâ€™s, will be much help. You have to look at specific aging patterns for similar-type players, including such things as body-type, speed, injury history, etc.
In addition, a playerâ€™s own historical trajectory might give you some idea as to his future trajectory.
Really, the only 3 things you want to take away from this article, including the second part a-coming, is:
One, if you look at players who have already played for 10 seasons and many IP, as JC did, of course you get a very different aging curve than you would expect from any player before the fact, even in the middle of their careers. To extrapolate that to all players, most players, or even the â€œgenericâ€ player, sans the very part-time ones or the ones with very short careers, is ridiculous, as Tango, Phil B., and many others have already stated.
Two, the modern era appears to have a significantly different aging curve, probably for all players. i arbitrarily defined the modern era as post-1979, but it could be anything really.
And three, if you absolutely have to answer the question, â€œWhat does the average aging curve look like for MLB players, including those who do not have long and/or illustrious careers (and many of these part-time players DO reach their peaks), and what is their peak age,â€ the answer probably looks something like my last curve, at least in the modern era, although the one in the next installment after I adjust for survivor bias is probably more appropriate.
And that curve (for the â€œaverageâ€ MLB player) is not unlike what we have thought all along - a fairly steep ascent until a peak of 27 or 28, and then a gradual decline which gets a little steeper if and when a player gets into his thirties and beyond. There is simply no way that we can expect a player (not knowing anything else about him, such as body type) who has not already finished a long career (or come close to finishing) to peak at 29 or 30, as JC suggests.
Of course it makes no real practical sense to talk about a playerâ€™s peak age and his trajectory after he has already finished his career.
JC's players are a subset of these players. Some of these (5/2000) players will go on to amass 10 years and 5000 PA and some will not. When I looked at all of these 5/2000 players going forward, I found a peak age of 27-28 and the same basic overall trajectory that I found for ALL players using my delta method corrected for survivor bias.
So obviously the subset of these 5/2000 players who do not make it to the 10/5000 level (JC's sample) peak earlier (and probably have a steeper decline after their peak) than the players who do, as you would expect.
Just more evidence that JC's sample is a biased set of players who peak later than the "average" player as well as later than the full-time player with 5 years under his belt.
Again, what purpose does it serve to determine the trajectory of this very small subset of players and why does JC refer to this trajectory as that of the "average MLB player" rather than a very small, biased, subset of players who play MLB?
I really don't understand his point and why this article is even called "How do baseball players age?" as if his very small subset of players represents the average or typical player. And why does JC think that his results are in opposition to that of myself, Tango, Bill James, and others who looked at ALL players and not just those who played for at least 10 years with at least 5000 PA? Obviously the smaller the subset of players we look at after the fact, and the longer and more prosperous the career, the later the peak age we will find and the shallower the curve after that peak, almost by definition. What is the point, I ask for the umpteenth time?
I use my aging research in order to help us with projecting player performance. Most or all of the other analysts that JC criticizes do the same, or at least that is implicit in their work. You can't possibly do that with JC's data.
I think that this is JC against the world on this one. There is no one in his corner that I am aware of, at least that actually does any serious baseball work. And there are plenty of brilliant minds who thoroughly understand this issue who have spoken their peace. Either JC is a cockeyed genius and we (Colin, Brian, Tango, me, et. al.) are all idiots, or...
I'm using the same data to estimate aging functions, but I'm using a different estimation technique. Though advanced,it is quite common and controls for identified problems with other methods. Inserting a -5 runs below average performance by age for non-survivors as correction for the survivor effect arbitrarily imposes aging into the sample. It's just the delta method all over again, and I've explained the problems with the delta method.
For anyone interested in other things that I have written on the topic of the past few weeks, I post links below.
How Do Players Age?
More on Player Aging
On Other Methods for Estimating Aging
Aging and Selective Sampling
Doesn't this study boil down to the following statement?
"Players who play longer than average have later peaks than average."
And isn't that almost begging the question?
This particular topic is new to me and I come into it with an open mind, unfamiliar with whatever controversy there is around it. Your "Us vs. Them" approach isn't working for me here. I'm sorry if you don't think an economist can cross over into baseball unless he's got a 100% watertight study, but unfortunately, this is actually how research often progresses. It is increasingly accepted within scientific fields.
I say this because I went and downloaded Bradbury's full article last night, and poked around enough on the intertubes to find that your initials are everywhere on this critique. Let it go, man, you're beating a dead horse, and are so wrapped up in it you can't see the good. You focus entirely on whether the study is flawless (it's not, and he says so), and not on what it kind of instruction it can provide going forward. Honestly, it smacks of a grudge, and such things don't come off well within debate over published research.
Bradbury states in his ABSTRACT, he's trying to "...isolate the effect of age on several player skills..." Paring a dataset down to try to understand underlying processes is OK, as long as you acknowledge it. He does this. It doesn't mean he's saying that all players will peak at 29. He's providing evidence that these skills might do so, unfettered by things like injury, and limited to players good enough (or lucky enough, or persevering enough) to play during those ages.
Here's some text from his conclusion:
"...while controlling for many exogenous factors
that influence player performances and might inhibit
the isolation of the impact of ageing, this study finds
that both groups reach peak performance around age
29. Players peak earlier in skills that require more
athleticism, and later in those that require less
I don't see him making the claims that it's the end-all study that you seem to think he's saying it is, or that it predicts what will happen to a given player. He's trying to isolate factors so he can look at the effect of ageing on SKILLS, sans confounding factors.
Your choices, as I see it, are to throw rocks all over the intertubes about it not being representative because of the filtering (or because he's not in your sandbox), or to do some follow up research to find out HOW REPRESENTATIVE it might be if you applied it to other, different populations.
His model and methods are well documented. With your skills, you could easily replicate it, then compare the shape and fit of different models that address your concerns. What would happen if you widened the age group, still including age-29? Compressed it? Fit different curves and compared them with an AIC? That's how you win the argument, not by saying "There is no one in his corner that I am aware of, at least that actually does any serious baseball work."
Bradbury has provided something possibly worth building on, and instead it sounds like a very entrenched faction is just telling him to get out of their sandbox.
But, Tangoand Nate Silver had done better studies of aging before Bradbury's and MGL did a better study after. Those are the studies we should build on. That's what's being pointed out in the comments.
"Players who play longer than average have later peaks than average."
And isn't that almost begging the question?
Yes and yes.
Not only do players who play longer have later "true" (underlying talent) peaks, but their observed peaks (which are not necessarily the same as their "true" peaks - for example, a player might have his best season at age 22 or at age 36) are also going to be later than their true peaks. It is sort of survivor bias in reverse. Players in JCâ€™s sample tend to have gotten lucky all along the way, pushing the peak age forward as they do so and flattening out the post-peak part of the curve as they go forward in their careers.
"I especially appreciate the positive feedback."
When I write an article or do research, I especially appreciate the criticism. I have nothing to learn from the "pats on the back." But that's just me...
Not that I disagree with your substantive point, but it would be cool to keep your responses exclusively substantive.
Did I have to explicitly say, â€œWarning! Time out. This next comment I am about to make has nothing to do with the argument at hand. Nothing at all. It is a side-bar.â€
Iâ€™ve spoken my peace (or is it piece?) on JCâ€™s research and on aging in general, and I donâ€™t really have anything more to say on either issue, otherwise I run the risk of being even more redundant and repetitive than I already have been. And as always, I could be wrong on one or more of the assertions that I have made. Not to mention the fact that there is a lot of muddy water and gray area in this particular topic.
Jesus, dude, have a little bit of class, you're already ahead on the merits of the argument.
Tango and MGL are the best and The Book Blog is an incredible resource for folks who care about keeping up with the SABR community. It's totally free. By the way, it's no accident all that the new hires are all regular participants in The Book Blog comments. TBB has done an incredible amount to connect new members of the community and their work to a larger audience. In sum: they are awesome. Buy The Book.
Beyond that, it's really disappointing that whoever is in charge (KG?, Will?, CK?) of vetting this stuff let JC take the point. I re-upped with BPro because of the new hires, any of whom would have done a better job, both writing the article and responding substantively to criticism.
It's an indictment of the higher-ups that they would turn to JC of all people to lead this discussion. Hopefully Mike Fast (who might not object to being called a pitch f/x guru?) won't mind me quoting him:
"BP giving first serve to JC, who has decidedly the most intellectually inferior position of the three groups and isnâ€™t really engaging either of them, is a very poor way to start. Iâ€™d like to see more debate between Philâ€™s ideas and MGLâ€™s ideas. Instead we get stuck in a stupid rut learning nothing because JC defends his position against all comers and against all logic and will make no concessions."
He does later say
"...itâ€™s tough to have the real debate in comments to an article, as Tango mentioned, but also, I and many others have no ability to have a voice in the â€œdiscussionâ€ because weâ€™re not subscribers. "
So I'm probably not out of order. See here:
Ultimately, my point is that there's no real reason to be hosting JC here in this manner. Let him have his piece and a link or maybe even a full rebuttal, but to let him lead off is either shit-stirring or ignorant. I don't appreciate either.
I think there are probably some issues with JC's sampling procedure (looking forward to reading the full paper, though, thanks for sending it to me JC!) but let's note that MGL hasn't exactly been defending his modified delta method against the argument that his baseline performance projection is completely arbitrary. Seems to me that no one has adequately answered the question yet, because, as Matt and dpowell mention earlier, no one has bothered to model the selection process. Whoever does THAT first can claim to be the person who came up with the first mostly satisfactory answer to the peak age question.
The whole point is that he's been, at length, not forthcoming about his methodology. This is a significant trend. What have you read about his thoughts on replacement level? Like I said, I've been following this discussion for a long time. I'm pretty disappointed that there was no one up top at BP who hadn't done the same. I know that integrating the new hires to create a cohesive whole will take some time, but this is, at least, clear evidence that it hasn't happened yet. And may perhaps signal something worse.
I quoted Mike Fast first because he's a reputable representative of the saber community. But also because I see this as whole debate as, ultimately, a diversion from the point, which is that there have been numerous useful studies published, one by Nate Silver no less, that the readership (who presumably have questions about the matter if WC/CK/KG signed off in the first place) should have much more time with.
Most of the comments here seem to be related to people not having read or not believing this paragraph.
Or believing it, but arguing that if "virtually identical" means only slightly lower, that is actually strong evidence of a lower peak when interpreted more closely.
I argue for both of the above in my post, which I linked to in one of the other comments.
I'm looking at both the original and less restricted graphs in the original article and it strikes me that the real point here is that the average player will plateau somewhere between age 27 and 30, with only slight differences within that age range.
But the whole process strikes me as generic to the point of meaninglessness. Several people have analogized this to trying to figure out when a 20 year old will die. If I wanted to find out roughly when a 20 year old would die, though, I wouldn't just average out the age of death of all people who reached age 20 around the world and come out with a figure. I'd need to know where that person lived, his family medical history, his or her gender, income level, eating and exercise habits, and a million other things. To bring the analogy back to baseball, if I want to predict the aging path of a baseball prospect, I would never turn to a generic aging curve. Rather, I'd compare him to a set of similar players and base it off that.
Back when Bill James was making these comments, hitters were viewed differently. A good player was judged on batting average, home runs, runs and RBIs. These days, we look at OPS, especially walks and home runs. From JC's "Peak Age by Skill" chart, a lot of the traditional stats peaked around 28 such as BA, 2B/3B, SLG... which affect runs scored or driven in. The "Moneyball" stats such as walks and home runs and OBP peak at the 30-32 range.
So, the results of the study aside (and whether there's cherrypicking or not), is it possible that player valuation has changed since the Age 27 theory came about?
"The projection is their last three years lwts per 500 PA, weighted by year (3/4/5) added to 500 PA of league average lwts for that age minus five. In other words, I am regressing their last three years lwts (weighted) toward five runs lower than a league average player for that age.
While Lichtman believes using five runs below average generates a "conservative" projection, the substitution is just a guess informed by nothing more than a hunch. In this case, the guess imposes the outcome for the exact factor we are trying to measure: the estimated decline is a pure product of the assumption. Thus, it is no surprise that Lichtman's adjusted delta-method estimates yield results that differ little from his raw delta-method estimates."
JC completely mis-characterizes or does not understand what I was doing.
He seems to imply that I am assuming a 5 run decrease for all of the "one-year" players (those who do not get a Year II).
I am not. I am assuming a Year II performance equal to a basic Marcel projection. While aging is or should be a part of a Marcel, so that it is true that I have to make some aging trajectory assumptions in order to construct the projection, the "5 runs worse than average" is the mean that I am using in the regression that is part of the Marcel (the projection). That is completely different than assuming a Year II which is 5 runs worse than Year I. That would be ridiculous. And that is what JC is implying that I am doing, I think.
Normally a Marcel regresses toward a mean which is the league average performance of similar players (age, size, etc.). The reason I used a mean (to regress towards) that was runs worse than a "generic" mean was that these players who do not see the light of day in Year II tend to be fringe players and therefore the means of their population are likely worse than the means of the population of all players.
In fact, if anything, I think I used a conservative (too optimistic/too high) mean. I contemplated using a mean which was 10 runs worse than a "generic" mean (mean of all MLB players).
Interestingly, as you can see from the charts in my articles, even using a "low" mean when doing the regressing, all of the players' projections in Year II were BETTER than in Year I until age 30. So these players actually showed a "peak" age of 29 or 30 (it is not a "true" peak because Year I is an "unlucky" year), which pushed my overall peak age slightly forward.
The most important thing is that whether I used a typical MLB mean for the regressing, 5 runs less than that (as I did) or even 10 runs less than that (which, as I said, may have been even more correct), it would not have changed my results. So criticizing that aspect of my work cannot indict the conclusions generated from that work.
Because, if it only makes a little difference, and the peak still comes up around 27 regardless, wouldn't that pretty much settle the question of peak with respect to the delta method?
That is: the problem with the delta method is the dropouts. If the results are robust (roughly the same) regardless of what reasonable method you use to compensate for the dropouts, doesn't that give you a solid conclusion?
As I said, JC's criticism of the "5 runs worse than average" turns out to be a red herring, as whatever I use does not significantly affect the results.
Really, the survivorship problem is not as large as I thought it would be. If I do not include these players (who do not have a Year II), and thus my remaining players are a little lucky in all of their Year I's (that is the problem with not including the non-survivors), there is essentially a plateau from 27 to 28 (a .1 run increase from 27 to 28 actually).
Once I include the non-survivors and use the "5 runs less" for the mean that I regress towards, the 27 to 28 interval shows a .4 run increase (rather than .1 run without the non-survivors).
If I do not use "5 runs less" for the mean (I simply use a standard league average), that 27 to 28 interval is now a .5 increase rather than .4.
If I were to use "10 runs less" for the mean rather than "5 runs less", I get a .3 run increase for that same interval.
So, the peak age and overall trajectory is not very sensitive to the mean I use for the regression in the projections for the non-survivors.
Well, JC wanted a response to his criticism on that topic and I have provided a very adequate one I think.
We are merely trying to balance out the players who do play a Year II, because they will have tended to be slightly lucky in ANY year (Year I) that is followed by a subsequent year. And when we use the delta method, we are only including player season in which there is a Year I and a Year II, so all of the players we are including will tend to show a false decline (or a false not-so-large increase) in ANY pair of years.
In order to account or adjust for that, we include ALL players, even the ones who do not get a Year II (for any given Year I), by creating a phantom Year II and using a Marcel-type projection for their Year II performance (which doesn't really exist). That way, we can simulate a random, controlled experiment, whereby all players are forced to play at least one more year at any age. That would be the only way we could really ascertain true aging curves and peaks - by either forcing all players to play until they are 40 years old or so, or by at least forcing all players to play "one more year" whether they were allowed to or not (and then use the delta method because we have players who have played only 2 years, 3 years, 5 years, 10 years, etc., and we want to include all of these players, unlike JC).
Actually forcing all players to play until they were 40 or so (and starting them in the majors when they were 20 or so) would not give us a very good answer either. That would answer the question, "What is the average aging curve look like for all players who had some time in MLB and were allowed to play until they were 40 regardless of how well they aged or how well they played?" That would be sort of the reverse of JC's sample, but equally biased.
Forcing players to play one more year, which is essentially what I am doing when creating those "phantom Year II's," creates a little bit of bias as well, because there are reasons why these players do not get to play in Year II other than the fact that they got unlucky in Year I (although that is definitely part of it for some of these players), but it is a good method to balance out those slightly lucky players who do get a Year II at any age. And actually using the "5 runs worse" method of regressing that JC does not like is actually a good way to counteract that bias.
So using all players AND creating phantom Year II's for non-survivors, and then using the delta method to construct an aging curve, I believe is by far and away the best method of answering the question, "How does the typical MLB player age?" where "typical" means all players combined, from the ones that have a cup of coffee to the ones who play for 5 or 6 years, to the ones - as in JC's sample - who have long and illustrious careers.
I still think that assuming the trajectory for missing players just assumes the final result in some way for the actual aging profile (again, I'm fine with that to answer the peak age question rigidly).
I _hate_ selection models and avoid using them at almost all costs. But this entire topic is dependent on selection so I think any method which does not attack the selection issue head-on is a non-starter. That stinks, but it's just the nature of the topic.
Also, in order to make the argument that this clears up for bias, shouldn't the skewness of the distribution of dropouts be important? If the distribution is sufficiently positively skewed, it will bias results regardless of whether you use a hypothetical Year II. If the mode of the distribution of dropouts is at age 25, for example, even if they are fringe players projected forward, they're going to peak at something less than or equal to age 26 and bias the results leftwards. I have no idea if the distribution is actually positively skewed, but someone should put up a scatterplot of the ages at which people end their careers.
While this might not be of much practical value in and of itself, I'd think that combining it with some information on catastrophic injury rates could help one build a pretty spiffy career arc projector, not to mention inform the multimillion dollar investments GMs have to make.
So this really needs a companion piece quantifying injury rates as a function of age (and position). Though I suppose we've needed that for, well, forever. I'm guessing, like many others, that including major injuries shifts the peak age down, though I'd be curious to see exactly how much.