Yesterday’s column made the claim that small differences in age among high school hitters can have a dramatic impact on their return as draft picks. Today, I intend to prove that claim.
Rather than simply looking at the youngest and oldest players in each draft year, I’ve taken all 846 players in the draft study and separated them by age into five roughly equal bins: Very Young, Young, Average, Old, and Very Old. I then calculated the combined expected value of the players in each bin based on where they were drafted and the combined Discounted WARP that they actually generated.
(If you want the technical details: “Very Young” players were less than 17 years, 296 days old on draft day; “Young” players were between 17 years, 296 days and 18 years, 38 days; “Average” players were between 18 years, 38 days and 18 years, 120 days; “Old” players were between 18 years, 120 days and 18 years, 200 days; “Very Old” players were more than 18 years, 200 days old.)
Here are the results:
# Players 
XP 
DW 
Return 

Very Young 
169 
386.31 
482.26 
24.84% 
Young 
169 
405.98 
453.05 
11.59% 
Average 
170 
390.68 
418.58 
7.14% 
Old 
168 
407.65 
282.65 
30.66% 
Very Old 
170 
370.42 
249.07 
32.76% 
And here it is in graph form:
As you can see, there is an almost shockingly smooth progression in the data. Very Young players, as a whole, return 25 percent more value than expected by their draft slots. Young and Average players also return positive value, whereas Old and Very Old players return substantially less value than expected. The gap between the youngest and oldest groups of players isn’t quite as large by this method as what I measured in Part 1 yesterday—the youngest group returns about 86 percent more value than the oldest group as opposed to 117 percent. It’s still an enormous difference.
This difference does not appear to have changed over the years. Here’s the same data as above but limited to players drafted in the first 16 years of our study, from 1965 to 1980:
# Players 
XP 
DW 
Return 

Very Young 
86 
197.87 
261.67 
32.24% 
Young 
88 
223.69 
249.44 
11.51% 
Average 
86 
224.57 
277.59 
23.61% 
Old 
88 
211.91 
179.16 
15.45% 
Very Old 
86 
216.94 
135.91 
37.35% 
And here’s the data from players drafted in the last 16 years of our study, from 1981 to 1996:
# Players 
XP 
DW 
Return 

Very Young 
82 
179.42 
218.79 
21.94% 
Young 
83 
185.98 
190.93 
2.66% 
Average 
81 
167.11 
124.57 
25.46% 
Old 
84 
185.16 
146.36 
20.95% 
Very Old 
82 
168.39 
101.20 
39.90% 
The data in each half of the study is not quite as smooth, which isn’t surprising given that the sample size is half as large. In the first study, Average players are a better value than Young players, while in the second, Average players return less value than Old players. But otherwise, both halves of the data show the same thing: the younger the player, the better the return on investment.
And if you compare the Very Young players with the Very Old players, you’ll notice that the advantage enjoyed by the youngest set of players is greater in each half of the data than in the data as a whole. From 1965 to 1980, Very Young players return 111 percent more value than Very Old players; from 1981 to 1996 they return 103 percent more value.
That seems counterintuitive, that the advantage enjoyed by young players is greater in each half than in the study as a whole. But there’s a reason for this, which is something that isn’t very well known: high school players are getting older over time. This isn’t something limited to baseball; a better way of putting it is that high school students are getting older over time. There is a societal trend towards holding back children from starting school early. Whereas 40 years ago, parents frequently tried to get their soontobefiveyearold child—whose birthday might be in October or November—into kindergarten, today’s parents frequently will hold back their fiveyearold—whose birthday might fall in July or August—until the following year. There is a growing belief in educational circles—the data on this is controversial—that kids who are among the oldest in their class do better academically than those who fall on the youngest end of the spectrum.
And in fact, the average high school player drafted in 2010 is roughly three months older than the average high school player drafted in 1965 (many thanks to Diane Firstman for her help in mining that data.) When I broke the data into two halves above, the age cutoffs for Very Young, Young, etc, were about six weeks higher for the draft group from 1981 to 1996 than it was for the draft group from 1965 to 1980.
This is why, I believe, the results we get from pooling the data for all players from 1965 to 1996, without regard to the year they were drafted, might actually underestimate the advantage younger players have. Derek Jeter and Jason Kendall would not have ranked among the 10 youngest high school hitters from 1965. But when teams are drafting, what matters is the draft pool in front of them, and both Jeter and Kendall were among the five youngest high school hitters in 1993. As draft classes get older as a whole, the youngest players in each class get older as well—but so do the oldest players, giving the youngest players the same age advantage they've always had.
Ultimately, we’re splitting hairs here. We can safely say that the youngest 20 percent of high school hitters in any particular year will return, on average, about double what the oldest 20 percent of high school hitters will.
If you would prefer a graphical, as opposed to mathematical, view of the data, this graph takes the scatter plot from yesterday’s article but separates players into two groups—players who were still 17 on draft day vs. players who were 18 or older:
There are two bestfit lines on the graph, and the red one—corresponding to 17yearolds—tracks significantly above the black one. You’ll probably notice that while only 33 percent of the players in the study were still 17, a preponderance of the red dots (indicating the 17yearold players) sit above the bestfit lines.
We can sum up all the data above by performing a second linear regression, this time making a player’s age a variable along with his pick number. If we do so, here is the formula we get:
Expected Return = 21.30 – (1.17 * Age) + (11.14/SQRT(PK))
First off, the pvalue for the age variable is very low at just .0063. This means that there is less than a one percent chance that we would get data like this if there weren’t an actual correlation between age and expected return. This is a statistically significant result.
Secondly, we can now estimate to what degree teams should be drafting younger players higher than they already are. If Player A is exactly one year younger than Player B, and they were both selected with the same pick in the draft, Player A should be expected to return an additional 1.17 Discounted WARP over his career. Because the value of draft picks does not go down in a linear fashion, we can’t say that one year of age is worth exactly X number of picks in the draft—X changes depending on where you are in the draft.
We can say that, using the above formula, 1.17 Discounted WARP is roughly the difference between the expected values of picks #24 and #100. In other words, a 17yearold player drafted #100 overall has as much expected value as an 18yearold drafted #24. If a player who might look like a thirdround pick on talent alone happens to be a full year younger than his draft class, he ought to be considered a latefirstround pick.
That is a massive, massive impact. One year of age is the difference in the expected value of pick #25 and pick #11. It’s bigger than the difference between pick #5 and pick #8. And remember, this is even after adjusting for the fact that teams—at least some teams—may already be taking age into consideration and drafting younger players earlier than they would otherwise. They clearly don’t take age into account enough.
Even a sixmonth difference is meaningful. The difference in value between a player born in, say, October and in April is the difference in value between the #100 pick and the #43 pick, or the difference between the #30 pick and the #18 pick.
It’s hard to overstate the importance of this. I can’t say that major league teams have ignored age completely when drafting players, but age has clearly been subordinate to present talent, and this study argues strongly that this has been a mistake. If Player A grades out slightly better than Player B, but Player B is 6 or 12 months younger than Player A, teams have been drafting Player A first, and they should have been drafting Player B.
As this data set ends with the 1996 draft, it is quite possible that the edge towards younger players has diminished if some teams have privately done their own research and realized the bonanza to be had in younger high school hitters. In order to study whether this was true or not, I performed an abbreviated study of high school hitters drafted from 1997 to 2003.
For this eight year span, I calculated Discounted WARP in the same way as above, with the exception that I only looked at the first eight years after the draft (this way, even players drafted in 2003 had a full eight years of data through 2011). This is an incomplete measure of a player’s value—we’re cutting off every player’s contribution after the age of 27—but it’s the best we can do at this point.
As I did with the data set from 1965–1996, I used linear regression to come up with a formula to estimate a player’s DW based on his pick number. That formula was:
XP = (6.39/SQRT(PK)) + .04
I then grouped the 176 players in this study into five groups by age—from the youngest 20 percent (those were younger than 18 years, 15 days old) to the oldest 20 percent (those who were at least 18 years, 263 days old). Here are the results:
# Players 
XP 
DW 
Return 

Very Young 
35 
49.10 
64.55 
31.47% 
Young 
35 
56.48 
69.32 
22.73% 
Average 
35 
40.32 
50.22 
24.55% 
Old 
36 
41.00 
25.71 
37.29% 
Very Old 
35 
40.09 
17.19 
57.12% 
According to the data, it appears that the importance of a draft pick’s age has, in fact, changed over time… but not in the direction you’d expect: the advantage enjoyed by young players increased dramatically from 1997 to 2003. The average return from the youngest 20 percent of draft picks during this span was more than triple the return of the oldest 20 percent.
If those numbers are hard to wrap your mind around, let’s go back to looking at anecdotes. From 1997 to 2003, 22 high school hitters drafted in the Top 100 were at least 18 years, 293 days old. Just two of them reached the majors: Sergio Santos, who only made it after he converted from shortstop to reliever, and Jorge Padilla, who got 25 belowreplacementlevel atbats for the Nationals in 2009, when he was 29 years old. None of the other 20 players sniffed the majors, including a #4 overall pick (Corey Myers).
Meanwhile, among the 22 youngest high school hitters drafted in that span were Daric Barton, Carl Crawford, Grady Sizemore, Adam Jones, and Brandon Phillips—none of whom were drafted in the top 25 picks. Crawford was taken #52, Phillips #57, and Sizemore (who, granted, got $2 million to sign) #75.
Here’s the data from 1997 to 2003 expressed in chart form:
The yellow dashed line is the bestfit line for all the players in the study. The median age of the players in the study was about 18.4 years old, so the players were split into two groups: those younger than the median (represented by green squares) and those older than the median (represented by blue x’s).
You don’t need linear regression to see that the green squares are floating to the top of the graph, while the blue x’s tend to hug the zero line. The bestfit green line for the younger players is dramatically higher than the bestfit (blue) line for the older players. (The blue x at the top of the chart, by the way, is David Wright, who at 18.45 years old was barely above the median age.)
Much like I did with the data from 1965 to 1996, I performed a linear regression for the 1997 to 2003 data that included a player’s draft status and his age as variables. The formula I got was this:
Expected Return = 19.96 – (1.08 * Age) + (5.97/SQRT(PK))
What you’ll notice is that the coefficient for a player’s draft pick number (5.97) is much lower than it was in the study from 1965 to 1996 (11.14). This isn’t surprising, because in the more recent data set, we’re only looking at how they played in the first eight years after they were drafted instead of the first 15, so their expected return should be lower.
But by comparison, the coefficient for a player’s age (1.08) is hardly changed from the previous formula (1.17). What that means is that, relative to where the player was drafted, his age had a significantly greater impact from 19972003 than it did from 19651996.
The data from 1965 to 1996 suggested that a player drafted #100 overall could be expected to perform as well as a player one year older who was drafted #24 overall. But from 1997 to 2003, the impact of age was so great that the 17yearold player drafted #100 was as valuable as the 18yearold player drafted #13 overall.
The conclusion is clear: at least as recently as 2003, the baseball industry as a whole massively underrated the importance of age in drafting high school hitters and massively undervalued high school hitters who still needed their parents’ permission to sign their contract. While we simply don’t have enough data to evaluate more recent drafts, Mike Trout and Jason Heyward are two powerful data points in support of the notion that the advantage towards younger high school hitters in the draft is still there, and teams ignore it at their own peril.
Additional studies are needed to determine whether a similar edge towards younger players exists with pitchers or at the college level; if it does, it is almost certainly a smaller one. But even a smaller edge is worth exploiting. There are fewer and fewer market inefficiencies remaining in the postMoneyball era, and they usually require a hell of a lot more research than simply finding out a player’s date of birth. Implementing this evidence into an organization’s draft preparation is free and painless and ought to have a significant impact on where players are selected.
In the 2011 draft, we had the rare circumstance where two highlytouted high school players, drafted close together, were widely separated by date of birth. With the #5 overall pick, the Kansas City Royals took the first high school hitter in the draft, worldclass tools goof Bubba Starling. Three picks later, the Cleveland Indians took the second high school hitter off the board, Florida shortstop Francisco Lindor.
Dozens of articles were written on Starling and Lindor leading up to the draft, but to the best of my knowledge, not one of them made mention of this simple fact: while Starling was born on August 3, 1992 (he actually turned 19 before the signing deadline), Lindor was born on November 14—November 14, 1993. Lindor is more than 15 months younger than Starling and will be younger at the end of next season than Starling was on the day he was drafted.
There are many reasons to think that Starling, despite his advanced age, will meet the formidable expectations placed on him. And speaking as a Royals fan, I hope he does. But if these numbers are even close to being correct, the younger player should have been drafted first. It wouldn’t be the first time.
An expanded version of this article will appear in the forthcoming book Extra Innings: More Baseball Between the Numbers from Baseball Prospectus.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
"Secondly, we can now estimate to what degree teams should be drafting younger players higher than they already are. If Player A is exactly one year younger than Player B, and they were both selected with the same pick in the draft, Player A should be expected to return an additional 1.17 Discounted WARP over his career."
"That is a massive, massive impact."
Ok and also this oen:
"Itâ€™s hard to overstate the importance of this."
The latter two statements are based on what seem to be very large and therefore important difference in percentage above and beyond expected draft pick return. That method, imo, has significant disadvantages in that the expected returns are so low that very small differences lead to dramatic changes in return on investment, but on the playing field those differences are actually quite small.
And by quite small, I mean like 1.17 WARP over a career. Now I know someone will say at ~5M/WARP that over 5M dollars and that's not small at all.
And that's fine, but only if we believe that WARP (or WAR or whatever) precisely measures player value down to two decimal points.
If I told you had to pick between two players' careers and over the course of those careers they were separated by 1.17 WARP would you still say that the differences were "massive" and "could not be understated"?
I don't think so. Whether we're at the utility infielder level (4 vs 5.17 WARP) or the star level (60 vs 61.17 WARP), there just isn't that much difference. Those differences are not beyond the noise in the measurements themselves. It would be quite likley that BP, FanGRaphs and BRef would be in disagreement as to who had the better career simply based on their individual implementation of their comprehensive win value stat.
Interesting study and I do think there is something real here, but the unfortunate dependence on percentages of very small expected production has led to small on field differences being magnified.
I think you missed a point about the discounting of the WARP in this analysis. A 1.17 Discounted WARP is different than 1.17 WARP. Though I agree with you that this appears to be a small difference over a 15 year career. Not sure what this would mean to actual WARP over that timeframe, but it's more (much more) than 1.17.
For the sake of the analogy below, let's assume that the actual difference in the players WARP might be 2.4 (pulled this out of my hat, but go with me for a moment).
Let's say that knowledge of this inefficiency had impacted a team's draft several years ago, without other teams catching on to their draft strategy. Let's also posit that at any given time today, 10 of the players on their roster today had been drafted by the organization. Which specific ten players are on the roster may change over time as players leave for free agency, suffer injuries, etc.. However, over the next fifteen years, and all else being equal, the aggregate of the 10 drafted players would produce 240 more wins than the competition.
That's an average advantage of 16 wins a year. I think that could be important at an organizational level.
Nice job, Rany. I love learning something new about baseball.
At the value of $5m per win, that's $8m value each year. That's a big chunk of a team's payroll.
To philly  if I understand this all correctly, the discounted WARP applies to each year of the respective players' careers. As Rany stated in Part 1, the gap closes a little bit each year, but the point is that, over the 15year career that Rany is using as his framework, the younger player is enjoying a discounted WARP advantage each year.
That means the difference in quality  and return on investment  for the younger player is substantially greater than just 1.17 discounted WARP.
That being said, this study is very welldone, and has some important implications. I fear that some implications may be overstated based on the nature of the dataset (most cases are near zero, with several considerably above the median).
Dang it!
...but the last table showing 1997 to 2003 high school players proves that the league has caught up to the idea of drafting high school batters for age. By your percentages, it looks like the XP and DW data were switched. You probably want to correct that.
#75 Rays, Granden Goetzman DOB: 11/14/92
#79 Cards, Charlie Tilson DOB: 12/2/92
#81 Redsox, Williams Jerez DOB: 5/16/92
#84 Reds, Gabriel Rosa DOB: 7/2/93
I know, I know. Sample size, US vs. PR schools, etc. But Rany's work here shines a new light on viewing drafts and creates another level of interest in how the drafts unfold. Thanks Rany.
And while I know this theory can't work every time, I can't help thinking of a fairly recent scenario where it doesn't hold up well.
In 2007, the 2nd and 3rd picks were Mike Moustakas and Josh Vitters, respectively  both high ceiling prep 3B. With some differences, yes, but as close to an apples to apples comparison as we're likely to get. Yet while Vitters was Very Young, Moustakas was Very Old. And I can't imagine there's any team that would prefer Vitters to Mous at this point  acknowledging that there's still some nonzero chance for Vitters to break out.
Offered in the spirit of constructive dialogue.
Obviously, it appears Moustakas was the better choice  although Vitters is still young enough to have a say in that. But again, this is a general principle, and there will always be specific exceptions.
(I'm writing about this specific point for my ranyontheroyals.com blog, by the way.)
I'm also not sure I follow the rationale for the discount factor selected. It seems to me that either it should be evaluated for the player's entire career or for the time the player remains under club control, since that is what the value of the draft pick is.
As I said, I may just not be following it completely.
The average may be a smooth progression, but it could be that the expectations generally hold true for 95% of the population and it's merely the types of outliers in each group that move the average.
But is that a bad thing? As Kevin Goldstein likes to remind us, teams aren't drafting for role players; they're drafting for stars. If the underlying reason behind these findings is simply that a younger player has a 5% higher chance of becoming a star player, that's reason enough to draft him, I think.
What I mean is that perhaps older players while having less average value also exhibit less variance in their value, they are more predictable. If younger players are more boom or bust, teams might fear completely missing on a Top 100 pick, and prefer high floor, lower ceiling players in the early rounds. Then in the later rounds when the cost of completely missing on a guy isn't as high, they go for the younger, boom or bust types.
You can't really judge the quality of returns without an assessment of the risk incurred. So, yes younger players offer higher returns on average, but perhaps it is simply because they are riskier, in which case there might not really be a market inefficiency. There's only a market inefficiency when you can show that you can generate better returns without taking on extra risk.
Great series, by the way. Am enjoying the read.
On a related point, I think there might be a sampleselection problem here as well. The first article stated that 10% of the draft picks were discarded because there was no DOB info for them. The hypothesis was that they flamed out too soon, so their careers didn't progress to the point where someone might care to note their DOB. Isn't it possible that the very young draft picks are more likely to flame out than the older picks, so that the part of the sample that was discarded was disproportionately young? If that's true, it would create a positive bias on the average return estimates for the young (i.e., make the average return for the young look larger than it really is).
I think there needs to be a qualification to usage of this data and you hit on it:
I canâ€™t say that major league teams have ignored age completely when drafting players, but age has clearly been subordinate to present talent, and this study argues strongly that this has been a mistake.
Present talent. If a players talent is requisite of his age, then this is not an issue. Is Bubba Starling's talent level equal to what it should be considering he is older than his competition? The fact that he is raw at the plate and already an age that some players are in full season baseball has to make you question his future success.
There are so many angles this kind of study could expand into but the fact that there is statistical proof that confirms that age relative to competition is very important is huge.
Thank you.
If a youngerfortheiryear player is good enough to be drafted, overcoming the bias towards older players in terms of quality/quantity of instruction received to that point, they will tend to perform better once the quality of development and instruction becomes equal for all.
With all of the studies that have been done on baseball, especially in the recent years, the fact that you have unearthed this significant of a market inefficiency speaks volumes to how much thought and work you have put into this study.
I can't wait to read more on this subject, and see how the post2003 draft picks pan out.
If you are born in October and start school at age 5, you are by definition going to HAVE to be an old draft pick. But that certainly wouldn't mean that no one born in October can ever be a great ballplayer.
I remember reading years ago in one of Bill James' baseball Abstracts that the great ones make it to the majors when they are young. They might not set the league on fire at first, but they make it there young.
Mantle, Yount, Bonds, Griffey, etc.
So Hosmer may have been an "old" draft pick, but he made it young to the big leagues, and acquitted himself quite well. My hunch is that the latter be much more predictive of his career path than the former.
The problem with players born in October is NOT that they are old draft picks. Actually, let me rephrase that: there is NO PROBLEM with players born in October.
The problem is that teams are drafting players born in October TOO EARLY. The problem is that they see a player born in October, and a player born the following May, and they don't account for the fact that the October player has 7 additional months of physical maturity.
The problem isn't that a player born in October can't be any good, or can't even be worth a #1 overall pick. Eric Hosmer is an October player, and obviously that pick looks great right now. The problem is that teams are drafting TOO MANY October players, and not enough May players, in the early rounds of the draft.
The age at which a player reaches the major leagues is more predictive than the age at which he is drafted? Of course. But on draft day, teams don't have the luxury of knowing which high school players are going to make the majors within 3 years. Looking at a player's date of birth is another tool teams should use to maximize their chances of finding those players.
Risk  Overall, it seems to me that you're lacking a risk measurement (variance comes to mind). Are the younger players more varied in their output? It doesn't seem so, but this would be important.
Cost  Does it cost more to sign & develop these guys? Do fewer of them sign?
Draft position as a measure of consensus value estimate  We all know that some kids fall in the draft because their demands are known to be extravagant, or whatever. Possibly the young sample is overweight on guys who fall due to signability and/or get way aboveslot bonuses? I guess that ties back into the cost question, above. And maybe I should read your 2005 article.
Is capping the downside at zero unrealistic? Probably, but given the scant data you'd have to work with in terms of the resources spent on a player before he's out of baseball, I don't know what else you'd do here, exactly...but the absence of any cost/risk in the calculation could be distorting.
Discount rate...I think 8 is probably too high, but on the other hand you should probably cut the analysis off before year 15, when player salaries are set at market rates after 810? years. There may be great production in year 12, but you're likely to have paid through the nose for it.
Super article.