October 14, 2011
Doctoring The Numbers
Starting Them Young, Part Two
Yesterday’s column made the claim that small differences in age among high school hitters can have a dramatic impact on their return as draft picks. Today, I intend to prove that claim.
Rather than simply looking at the youngest and oldest players in each draft year, I’ve taken all 846 players in the draft study and separated them by age into five roughly equal bins: Very Young, Young, Average, Old, and Very Old. I then calculated the combined expected value of the players in each bin based on where they were drafted and the combined Discounted WARP that they actually generated.
(If you want the technical details: “Very Young” players were less than 17 years, 296 days old on draft day; “Young” players were between 17 years, 296 days and 18 years, 38 days; “Average” players were between 18 years, 38 days and 18 years, 120 days; “Old” players were between 18 years, 120 days and 18 years, 200 days; “Very Old” players were more than 18 years, 200 days old.)
Here are the results:
And here it is in graph form:
As you can see, there is an almost shockingly smooth progression in the data. Very Young players, as a whole, return 25 percent more value than expected by their draft slots. Young and Average players also return positive value, whereas Old and Very Old players return substantially less value than expected. The gap between the youngest and oldest groups of players isn’t quite as large by this method as what I measured in Part 1 yesterday—the youngest group returns about 86 percent more value than the oldest group as opposed to 117 percent. It’s still an enormous difference.
This difference does not appear to have changed over the years. Here’s the same data as above but limited to players drafted in the first 16 years of our study, from 1965 to 1980:
And here’s the data from players drafted in the last 16 years of our study, from 1981 to 1996:
The data in each half of the study is not quite as smooth, which isn’t surprising given that the sample size is half as large. In the first study, Average players are a better value than Young players, while in the second, Average players return less value than Old players. But otherwise, both halves of the data show the same thing: the younger the player, the better the return on investment.
And if you compare the Very Young players with the Very Old players, you’ll notice that the advantage enjoyed by the youngest set of players is greater in each half of the data than in the data as a whole. From 1965 to 1980, Very Young players return 111 percent more value than Very Old players; from 1981 to 1996 they return 103 percent more value.
That seems counterintuitive, that the advantage enjoyed by young players is greater in each half than in the study as a whole. But there’s a reason for this, which is something that isn’t very well known: high school players are getting older over time. This isn’t something limited to baseball; a better way of putting it is that high school students are getting older over time. There is a societal trend towards holding back children from starting school early. Whereas 40 years ago, parents frequently tried to get their soon-to-be-five-year-old child—whose birthday might be in October or November—into kindergarten, today’s parents frequently will hold back their five-year-old—whose birthday might fall in July or August—until the following year. There is a growing belief in educational circles—the data on this is controversial—that kids who are among the oldest in their class do better academically than those who fall on the youngest end of the spectrum.
And in fact, the average high school player drafted in 2010 is roughly three months older than the average high school player drafted in 1965 (many thanks to Diane Firstman for her help in mining that data.) When I broke the data into two halves above, the age cutoffs for Very Young, Young, etc, were about six weeks higher for the draft group from 1981 to 1996 than it was for the draft group from 1965 to 1980.
This is why, I believe, the results we get from pooling the data for all players from 1965 to 1996, without regard to the year they were drafted, might actually underestimate the advantage younger players have. Derek Jeter and Jason Kendall would not have ranked among the 10 youngest high school hitters from 1965. But when teams are drafting, what matters is the draft pool in front of them, and both Jeter and Kendall were among the five youngest high school hitters in 1993. As draft classes get older as a whole, the youngest players in each class get older as well—but so do the oldest players, giving the youngest players the same age advantage they've always had.
Ultimately, we’re splitting hairs here. We can safely say that the youngest 20 percent of high school hitters in any particular year will return, on average, about double what the oldest 20 percent of high school hitters will.
If you would prefer a graphical, as opposed to mathematical, view of the data, this graph takes the scatter plot from yesterday’s article but separates players into two groups—players who were still 17 on draft day vs. players who were 18 or older:
There are two best-fit lines on the graph, and the red one—corresponding to 17-year-olds—tracks significantly above the black one. You’ll probably notice that while only 33 percent of the players in the study were still 17, a preponderance of the red dots (indicating the 17-year-old players) sit above the best-fit lines.
We can sum up all the data above by performing a second linear regression, this time making a player’s age a variable along with his pick number. If we do so, here is the formula we get:
Expected Return = 21.30 – (1.17 * Age) + (11.14/SQRT(PK))
First off, the p-value for the age variable is very low at just .0063. This means that there is less than a one percent chance that we would get data like this if there weren’t an actual correlation between age and expected return. This is a statistically significant result.
Secondly, we can now estimate to what degree teams should be drafting younger players higher than they already are. If Player A is exactly one year younger than Player B, and they were both selected with the same pick in the draft, Player A should be expected to return an additional 1.17 Discounted WARP over his career. Because the value of draft picks does not go down in a linear fashion, we can’t say that one year of age is worth exactly X number of picks in the draft—X changes depending on where you are in the draft.
We can say that, using the above formula, 1.17 Discounted WARP is roughly the difference between the expected values of picks #24 and #100. In other words, a 17-year-old player drafted #100 overall has as much expected value as an 18-year-old drafted #24. If a player who might look like a third-round pick on talent alone happens to be a full year younger than his draft class, he ought to be considered a late-first-round pick.
That is a massive, massive impact. One year of age is the difference in the expected value of pick #25 and pick #11. It’s bigger than the difference between pick #5 and pick #8. And remember, this is even after adjusting for the fact that teams—at least some teams—may already be taking age into consideration and drafting younger players earlier than they would otherwise. They clearly don’t take age into account enough.
Even a six-month difference is meaningful. The difference in value between a player born in, say, October and in April is the difference in value between the #100 pick and the #43 pick, or the difference between the #30 pick and the #18 pick.
It’s hard to overstate the importance of this. I can’t say that major league teams have ignored age completely when drafting players, but age has clearly been subordinate to present talent, and this study argues strongly that this has been a mistake. If Player A grades out slightly better than Player B, but Player B is 6 or 12 months younger than Player A, teams have been drafting Player A first, and they should have been drafting Player B.
As this data set ends with the 1996 draft, it is quite possible that the edge towards younger players has diminished if some teams have privately done their own research and realized the bonanza to be had in younger high school hitters. In order to study whether this was true or not, I performed an abbreviated study of high school hitters drafted from 1997 to 2003.
For this eight year span, I calculated Discounted WARP in the same way as above, with the exception that I only looked at the first eight years after the draft (this way, even players drafted in 2003 had a full eight years of data through 2011). This is an incomplete measure of a player’s value—we’re cutting off every player’s contribution after the age of 27—but it’s the best we can do at this point.
As I did with the data set from 1965–1996, I used linear regression to come up with a formula to estimate a player’s DW based on his pick number. That formula was:
XP = (6.39/SQRT(PK)) + .04
I then grouped the 176 players in this study into five groups by age—from the youngest 20 percent (those were younger than 18 years, 15 days old) to the oldest 20 percent (those who were at least 18 years, 263 days old). Here are the results:
According to the data, it appears that the importance of a draft pick’s age has, in fact, changed over time… but not in the direction you’d expect: the advantage enjoyed by young players increased dramatically from 1997 to 2003. The average return from the youngest 20 percent of draft picks during this span was more than triple the return of the oldest 20 percent.
If those numbers are hard to wrap your mind around, let’s go back to looking at anecdotes. From 1997 to 2003, 22 high school hitters drafted in the Top 100 were at least 18 years, 293 days old. Just two of them reached the majors: Sergio Santos, who only made it after he converted from shortstop to reliever, and Jorge Padilla, who got 25 below-replacement-level at-bats for the Nationals in 2009, when he was 29 years old. None of the other 20 players sniffed the majors, including a #4 overall pick (Corey Myers).
Meanwhile, among the 22 youngest high school hitters drafted in that span were Daric Barton, Carl Crawford, Grady Sizemore, Adam Jones, and Brandon Phillips—none of whom were drafted in the top 25 picks. Crawford was taken #52, Phillips #57, and Sizemore (who, granted, got $2 million to sign) #75.
Here’s the data from 1997 to 2003 expressed in chart form:
The yellow dashed line is the best-fit line for all the players in the study. The median age of the players in the study was about 18.4 years old, so the players were split into two groups: those younger than the median (represented by green squares) and those older than the median (represented by blue x’s).
You don’t need linear regression to see that the green squares are floating to the top of the graph, while the blue x’s tend to hug the zero line. The best-fit green line for the younger players is dramatically higher than the best-fit (blue) line for the older players. (The blue x at the top of the chart, by the way, is David Wright, who at 18.45 years old was barely above the median age.)
Much like I did with the data from 1965 to 1996, I performed a linear regression for the 1997 to 2003 data that included a player’s draft status and his age as variables. The formula I got was this:
Expected Return = 19.96 – (1.08 * Age) + (5.97/SQRT(PK))
What you’ll notice is that the coefficient for a player’s draft pick number (5.97) is much lower than it was in the study from 1965 to 1996 (11.14). This isn’t surprising, because in the more recent data set, we’re only looking at how they played in the first eight years after they were drafted instead of the first 15, so their expected return should be lower.
But by comparison, the coefficient for a player’s age (1.08) is hardly changed from the previous formula (1.17). What that means is that, relative to where the player was drafted, his age had a significantly greater impact from 1997-2003 than it did from 1965-1996.
The data from 1965 to 1996 suggested that a player drafted #100 overall could be expected to perform as well as a player one year older who was drafted #24 overall. But from 1997 to 2003, the impact of age was so great that the 17-year-old player drafted #100 was as valuable as the 18-year-old player drafted #13 overall.
The conclusion is clear: at least as recently as 2003, the baseball industry as a whole massively underrated the importance of age in drafting high school hitters and massively undervalued high school hitters who still needed their parents’ permission to sign their contract. While we simply don’t have enough data to evaluate more recent drafts, Mike Trout and Jason Heyward are two powerful data points in support of the notion that the advantage towards younger high school hitters in the draft is still there, and teams ignore it at their own peril.
Additional studies are needed to determine whether a similar edge towards younger players exists with pitchers or at the college level; if it does, it is almost certainly a smaller one. But even a smaller edge is worth exploiting. There are fewer and fewer market inefficiencies remaining in the post-Moneyball era, and they usually require a hell of a lot more research than simply finding out a player’s date of birth. Implementing this evidence into an organization’s draft preparation is free and painless and ought to have a significant impact on where players are selected.
In the 2011 draft, we had the rare circumstance where two highly-touted high school players, drafted close together, were widely separated by date of birth. With the #5 overall pick, the Kansas City Royals took the first high school hitter in the draft, world-class tools goof Bubba Starling. Three picks later, the Cleveland Indians took the second high school hitter off the board, Florida shortstop Francisco Lindor.
Dozens of articles were written on Starling and Lindor leading up to the draft, but to the best of my knowledge, not one of them made mention of this simple fact: while Starling was born on August 3, 1992 (he actually turned 19 before the signing deadline), Lindor was born on November 14—November 14, 1993. Lindor is more than 15 months younger than Starling and will be younger at the end of next season than Starling was on the day he was drafted.
There are many reasons to think that Starling, despite his advanced age, will meet the formidable expectations placed on him. And speaking as a Royals fan, I hope he does. But if these numbers are even close to being correct, the younger player should have been drafted first. It wouldn’t be the first time.
An expanded version of this article will appear in the forthcoming book Extra Innings: More Baseball Between the Numbers from Baseball Prospectus.