Alas! for this gray shadow, once a man—
So glorious in his beauty and thy choice,
Who madest him thy chosen, that he seem'd
To his great heart none other than a God!
I ask'd thee, "Give me immortality."
Then didst thou grant mine asking with a smile,
Like wealthy men who care not how they give.
But thy strong Hours indignant work'd their wills,
And beat me down and marr'd and wasted me,
And tho' they could not end me left me maim'd
To dwell in presence of immortal youth,
Immortal age beside immortal youth,
And all I was in ashes.
– “Tithonus,” by Alfred Tennyson

There is an uneasy overlap between sabermetric analysis and forecasting things to come. To be sure, not all prognostication (not even most of it, I would say) comes from sabermetricians, people who would call themselves sabermetricians, or even people who are well versed in the work of sabermetricians. At the same time, the sort of skillset and temperament required to do sabermetrics frequently leads one to the conclusion that predicting baseball is hard and that the sum of what we don’t know about the future often exceeds the sum of what we do know.

But predicting the future is sometimes very useful and often very interesting; while it is not the only application of sabermetrics (something I feel we should emphasize more often), it is certainly a valid field of inquiry. And one thing sabermetrics does very well is examine previously held conceptions about the game objectively and quantifiably.

What I want to examine is how age affects how we project a team’s performance going forward. The common assumption is that, all else being equal, being young is better than being old. A young team has promise yet unfulfilled, while an old team is on the decline and needs to consider rebuilding. But is this really true? And how much does it matter?

I took all teams from 1950 on and figured their winning percentage and the age of their batters (weighted by plate appearances, omitting pitchers hitting) and pitchers (weighted by innings pitched). Then I found the record of the franchise over the next five seasons (counting a team’s results even if it changed names or city in the interim, figuring that roster constitution is more vital than geography or nomenclature for our purposes here). Then I ran an ordinary least squares regression to see how these factors worked together to predict future wins. (The results were similar when I limited the scope of the study to the next three seasons.)

I should speak briefly as to what a regression like this does and how to interpret the results. A linear regression is so named because it outputs a linear formula that takes the following form:

y = m*x + b

Where y is what we want to predict, x is what we’re using to predict y, m is the coefficient (also called the slope, when we have only one predictor variable) and b is the constant (which can also be called the intercept).

In this case, with three predictors, you have three different terms that are multiplied by coefficients. Each of those coefficients has a measure of our confidence in the finding, called a p-value. As a shorthand, we can say that p-values below 0.05 typically indicate statistical significance (this is a useful shorthand, but it should not be taken as a dogmatic rule).

We also have a measure of the overall effectiveness of our regression, the adjusted r-squared. R-squared measures the percentage of the variance in y that is explained by our model. Why does it need to be adjusted? Because of the way OLS is figured, adding another explanatory variable will improve R-squared, even if the new variable does not actually increase our understanding. Adjusted R-squared controls for the number of variables, so that it only increases when a new variable increases R-squared more than we would expect if the new variable was random (i.e. lacked any additional explanatory value).

So now let’s take a look at the output of our regression (I used Gretl, although other software packages would probably give a similar output):

             coefficient    std. error    t-ratio    p-value


  const       0.432562      0.0250882     17.24     6.56e-061 ***

  W_PCT       0.357024      0.0165703     21.55     6.53e-090 ***

  AGE_BAT    -0.00442296    0.000899343   -4.918    9.71e-07  ***

  AGE_PIT     0.000608661   0.000813805    0.7479   0.4546  

Statistics based on the weighted data:

Sum squared resid    2286.318   S.E. of regression   1.235414

R-squared            0.245282   Adjusted R-squared   0.243770

What can we make of that? The p-value on AGE_PIT is very, very much above our rule-of-thumb of .05, so we think that it isn’t significant. We don’t necessarily think that means age of pitchers is irrelevant, just that it isn’t adding any additional understanding of what’s going on here. (I should note that AGE_BAT and AGE_PIT are correlated, even after omitting pitchers hitting from the calculation.)

What does omitting the AGE_PIT variable do to our regression?

             coefficient   std. error    t-ratio    p-value


  const       0.440695     0.0226061     19.49     1.28e-075 ***

  W_PCT       0.359260     0.0162960     22.05     1.61e-093 ***

  AGE_BAT    -0.00414516   0.000818937   -5.062    4.67e-07  ***

Statistics based on the weighted data:

Sum squared resid    2287.172   S.E. of regression   1.235232

R-squared            0.245000   Adjusted R-squared   0.243992

Our adjusted R-squared increases, ever so slightly. In practice, the difference between the two models will be slight, for the most part. But there’s no reason for us to prefer a more complicated model that offers less explanatory power, because while most cases will be practically unaffected, there is a chance that there will be some rare cases that are negatively affected by a meaningful amount.

(I do want to caution as well that you cannot directly compare coefficients—different variables have different scales and standard deviations, and comparing the coefficient of a value with an average of .500 with a variable with an average of 28 can cause problems. But even if two values have the same average, different standard deviations can lead to differences in the scales of coefficients that can make them difficult to compare directly. In this case, however, winning percentage is truly a more important predictor than the average age of position players.)

So what does this mean, exactly? Teams in our sample ranged from an average age of 24 to 34. That means that the most extreme difference in ages possible lead to a difference in expected win percentage over the next five seasons of 0.041, or roughly seven games per season.

But that’s the most extreme case. How does it apply to real teams—say, 2012 teams? Here you go:




Kansas City Royals



Houston Astros



Washington Nationals



Seattle Mariners



New York Mets



Pittsburgh Pirates



San Francisco Giants



San Diego Padres



Oakland Athletics



Detroit Tigers



Toronto Blue Jays



Chicago Cubs



Baltimore Orioles



Cleveland Indians



Atlanta Braves



Miami Marlins



Arizona Diamondbacks



Los Angeles Angels



Colorado Rockies



Cincinnati Reds



Minnesota Twins



St. Louis Cardinals



Boston Red Sox



Milwaukee Brewers



Chicago White Sox



Texas Rangers



Tampa Bay Rays



Los Angeles Dodgers



Philadelphia Phillies



New York Yankees




The last column is the difference in expected win percentage for a team that age and the average team in the regression. Looking at age alone, a team like the Yankees would expect to win three games a season fewer over the next five years than a team with their won-loss record otherwise would. Over at the other end of the spectrum, a team like the Astros would expect to win one game per season more than a team with the same record but of an average age. Now given the sizable differential in other measurable attributes, you might still want to be the Yankees (even though it would mean passing up a chance to hang out with Mike Fast and Kevin Goldstein in the break room). But the age of a roster has a real impact on our estimates of how a team will perform going forward.