Alas! for this gray shadow, once a man—
So glorious in his beauty and thy choice,
Who madest him thy chosen, that he seem'd
To his great heart none other than a God!
I ask'd thee, "Give me immortality."
Then didst thou grant mine asking with a smile,
Like wealthy men who care not how they give.
But thy strong Hours indignant work'd their wills,
And beat me down and marr'd and wasted me,
And tho' they could not end me left me maim'd
To dwell in presence of immortal youth,
Immortal age beside immortal youth,
And all I was in ashes.
– “Tithonus,” by Alfred Tennyson
There is an uneasy overlap between sabermetric analysis and forecasting things to come. To be sure, not all prognostication (not even most of it, I would say) comes from sabermetricians, people who would call themselves sabermetricians, or even people who are well versed in the work of sabermetricians. At the same time, the sort of skillset and temperament required to do sabermetrics frequently leads one to the conclusion that predicting baseball is hard and that the sum of what we don’t know about the future often exceeds the sum of what we do know.
But predicting the future is sometimes very useful and often very interesting; while it is not the only application of sabermetrics (something I feel we should emphasize more often), it is certainly a valid field of inquiry. And one thing sabermetrics does very well is examine previously held conceptions about the game objectively and quantifiably.
What I want to examine is how age affects how we project a team’s performance going forward. The common assumption is that, all else being equal, being young is better than being old. A young team has promise yet unfulfilled, while an old team is on the decline and needs to consider rebuilding. But is this really true? And how much does it matter?
I took all teams from 1950 on and figured their winning percentage and the age of their batters (weighted by plate appearances, omitting pitchers hitting) and pitchers (weighted by innings pitched). Then I found the record of the franchise over the next five seasons (counting a team’s results even if it changed names or city in the interim, figuring that roster constitution is more vital than geography or nomenclature for our purposes here). Then I ran an ordinary least squares regression to see how these factors worked together to predict future wins. (The results were similar when I limited the scope of the study to the next three seasons.)
I should speak briefly as to what a regression like this does and how to interpret the results. A linear regression is so named because it outputs a linear formula that takes the following form:
y = m*x + b
Where y is what we want to predict, x is what we’re using to predict y, m is the coefficient (also called the slope, when we have only one predictor variable) and b is the constant (which can also be called the intercept).
In this case, with three predictors, you have three different terms that are multiplied by coefficients. Each of those coefficients has a measure of our confidence in the finding, called a pvalue. As a shorthand, we can say that pvalues below 0.05 typically indicate statistical significance (this is a useful shorthand, but it should not be taken as a dogmatic rule).
We also have a measure of the overall effectiveness of our regression, the adjusted rsquared. Rsquared measures the percentage of the variance in y that is explained by our model. Why does it need to be adjusted? Because of the way OLS is figured, adding another explanatory variable will improve Rsquared, even if the new variable does not actually increase our understanding. Adjusted Rsquared controls for the number of variables, so that it only increases when a new variable increases Rsquared more than we would expect if the new variable was random (i.e. lacked any additional explanatory value).
So now let’s take a look at the output of our regression (I used Gretl, although other software packages would probably give a similar output):
coefficient std. error tratio pvalue

const 0.432562 0.0250882 17.24 6.56e061 ***
W_PCT 0.357024 0.0165703 21.55 6.53e090 ***
AGE_BAT 0.00442296 0.000899343 4.918 9.71e07 ***
AGE_PIT 0.000608661 0.000813805 0.7479 0.4546
Statistics based on the weighted data:
Sum squared resid 2286.318 S.E. of regression 1.235414
Rsquared 0.245282 Adjusted Rsquared 0.243770
What can we make of that? The pvalue on AGE_PIT is very, very much above our ruleofthumb of .05, so we think that it isn’t significant. We don’t necessarily think that means age of pitchers is irrelevant, just that it isn’t adding any additional understanding of what’s going on here. (I should note that AGE_BAT and AGE_PIT are correlated, even after omitting pitchers hitting from the calculation.)
What does omitting the AGE_PIT variable do to our regression?
coefficient std. error tratio pvalue

const 0.440695 0.0226061 19.49 1.28e075 ***
W_PCT 0.359260 0.0162960 22.05 1.61e093 ***
AGE_BAT 0.00414516 0.000818937 5.062 4.67e07 ***
Statistics based on the weighted data:
Sum squared resid 2287.172 S.E. of regression 1.235232
Rsquared 0.245000 Adjusted Rsquared 0.243992
Our adjusted Rsquared increases, ever so slightly. In practice, the difference between the two models will be slight, for the most part. But there’s no reason for us to prefer a more complicated model that offers less explanatory power, because while most cases will be practically unaffected, there is a chance that there will be some rare cases that are negatively affected by a meaningful amount.
(I do want to caution as well that you cannot directly compare coefficients—different variables have different scales and standard deviations, and comparing the coefficient of a value with an average of .500 with a variable with an average of 28 can cause problems. But even if two values have the same average, different standard deviations can lead to differences in the scales of coefficients that can make them difficult to compare directly. In this case, however, winning percentage is truly a more important predictor than the average age of position players.)
So what does this mean, exactly? Teams in our sample ranged from an average age of 24 to 34. That means that the most extreme difference in ages possible lead to a difference in expected win percentage over the next five seasons of 0.041, or roughly seven games per season.
But that’s the most extreme case. How does it apply to real teams—say, 2012 teams? Here you go:
AGE_BAT 
AGE_W_PCT 

26.2 
0.009 

26.7 
0.007 

27.1 
0.005 

27.2 
0.005 

27.4 
0.004 

27.4 
0.004 

27.5 
0.004 

27.4 
0.004 

27.7 
0.003 

28.0 
0.002 

27.9 
0.002 

28.0 
0.002 

28.1 
0.001 

28.1 
0.001 

28.4 
0 

28.3 
0 

28.7 
0.001 

28.7 
0.001 

28.7 
0.001 

29.0 
0.002 

29.1 
0.003 

29.7 
0.005 

29.7 
0.005 

29.5 
0.005 

29.9 
0.006 

29.9 
0.006 

29.8 
0.006 

30.6 
0.009 

31.6 
0.013 

33.0 
0.019 
The last column is the difference in expected win percentage for a team that age and the average team in the regression. Looking at age alone, a team like the Yankees would expect to win three games a season fewer over the next five years than a team with their wonloss record otherwise would. Over at the other end of the spectrum, a team like the Astros would expect to win one game per season more than a team with the same record but of an average age. Now given the sizable differential in other measurable attributes, you might still want to be the Yankees (even though it would mean passing up a chance to hang out with Mike Fast and Kevin Goldstein in the break room). But the age of a roster has a real impact on our estimates of how a team will perform going forward.