BP Comment Quick Links


October 14, 2011 Doctoring The NumbersStarting Them Young, Part TwoYesterday’s column made the claim that small differences in age among high school hitters can have a dramatic impact on their return as draft picks. Today, I intend to prove that claim. Rather than simply looking at the youngest and oldest players in each draft year, I’ve taken all 846 players in the draft study and separated them by age into five roughly equal bins: Very Young, Young, Average, Old, and Very Old. I then calculated the combined expected value of the players in each bin based on where they were drafted and the combined Discounted WARP that they actually generated. (If you want the technical details: “Very Young” players were less than 17 years, 296 days old on draft day; “Young” players were between 17 years, 296 days and 18 years, 38 days; “Average” players were between 18 years, 38 days and 18 years, 120 days; “Old” players were between 18 years, 120 days and 18 years, 200 days; “Very Old” players were more than 18 years, 200 days old.) Here are the results:
And here it is in graph form:
As you can see, there is an almost shockingly smooth progression in the data. Very Young players, as a whole, return 25 percent more value than expected by their draft slots. Young and Average players also return positive value, whereas Old and Very Old players return substantially less value than expected. The gap between the youngest and oldest groups of players isn’t quite as large by this method as what I measured in Part 1 yesterday—the youngest group returns about 86 percent more value than the oldest group as opposed to 117 percent. It’s still an enormous difference. This difference does not appear to have changed over the years. Here’s the same data as above but limited to players drafted in the first 16 years of our study, from 1965 to 1980:
And here’s the data from players drafted in the last 16 years of our study, from 1981 to 1996:
The data in each half of the study is not quite as smooth, which isn’t surprising given that the sample size is half as large. In the first study, Average players are a better value than Young players, while in the second, Average players return less value than Old players. But otherwise, both halves of the data show the same thing: the younger the player, the better the return on investment. And if you compare the Very Young players with the Very Old players, you’ll notice that the advantage enjoyed by the youngest set of players is greater in each half of the data than in the data as a whole. From 1965 to 1980, Very Young players return 111 percent more value than Very Old players; from 1981 to 1996 they return 103 percent more value. That seems counterintuitive, that the advantage enjoyed by young players is greater in each half than in the study as a whole. But there’s a reason for this, which is something that isn’t very well known: high school players are getting older over time. This isn’t something limited to baseball; a better way of putting it is that high school students are getting older over time. There is a societal trend towards holding back children from starting school early. Whereas 40 years ago, parents frequently tried to get their soontobefiveyearold child—whose birthday might be in October or November—into kindergarten, today’s parents frequently will hold back their fiveyearold—whose birthday might fall in July or August—until the following year. There is a growing belief in educational circles—the data on this is controversial—that kids who are among the oldest in their class do better academically than those who fall on the youngest end of the spectrum. And in fact, the average high school player drafted in 2010 is roughly three months older than the average high school player drafted in 1965 (many thanks to Diane Firstman for her help in mining that data.) When I broke the data into two halves above, the age cutoffs for Very Young, Young, etc, were about six weeks higher for the draft group from 1981 to 1996 than it was for the draft group from 1965 to 1980. This is why, I believe, the results we get from pooling the data for all players from 1965 to 1996, without regard to the year they were drafted, might actually underestimate the advantage younger players have. Derek Jeter and Jason Kendall would not have ranked among the 10 youngest high school hitters from 1965. But when teams are drafting, what matters is the draft pool in front of them, and both Jeter and Kendall were among the five youngest high school hitters in 1993. As draft classes get older as a whole, the youngest players in each class get older as well—but so do the oldest players, giving the youngest players the same age advantage they've always had. Ultimately, we’re splitting hairs here. We can safely say that the youngest 20 percent of high school hitters in any particular year will return, on average, about double what the oldest 20 percent of high school hitters will. If you would prefer a graphical, as opposed to mathematical, view of the data, this graph takes the scatter plot from yesterday’s article but separates players into two groups—players who were still 17 on draft day vs. players who were 18 or older:
There are two bestfit lines on the graph, and the red one—corresponding to 17yearolds—tracks significantly above the black one. You’ll probably notice that while only 33 percent of the players in the study were still 17, a preponderance of the red dots (indicating the 17yearold players) sit above the bestfit lines. We can sum up all the data above by performing a second linear regression, this time making a player’s age a variable along with his pick number. If we do so, here is the formula we get: Expected Return = 21.30 – (1.17 * Age) + (11.14/SQRT(PK)) First off, the pvalue for the age variable is very low at just .0063. This means that there is less than a one percent chance that we would get data like this if there weren’t an actual correlation between age and expected return. This is a statistically significant result. Secondly, we can now estimate to what degree teams should be drafting younger players higher than they already are. If Player A is exactly one year younger than Player B, and they were both selected with the same pick in the draft, Player A should be expected to return an additional 1.17 Discounted WARP over his career. Because the value of draft picks does not go down in a linear fashion, we can’t say that one year of age is worth exactly X number of picks in the draft—X changes depending on where you are in the draft. We can say that, using the above formula, 1.17 Discounted WARP is roughly the difference between the expected values of picks #24 and #100. In other words, a 17yearold player drafted #100 overall has as much expected value as an 18yearold drafted #24. If a player who might look like a thirdround pick on talent alone happens to be a full year younger than his draft class, he ought to be considered a latefirstround pick. That is a massive, massive impact. One year of age is the difference in the expected value of pick #25 and pick #11. It’s bigger than the difference between pick #5 and pick #8. And remember, this is even after adjusting for the fact that teams—at least some teams—may already be taking age into consideration and drafting younger players earlier than they would otherwise. They clearly don’t take age into account enough. Even a sixmonth difference is meaningful. The difference in value between a player born in, say, October and in April is the difference in value between the #100 pick and the #43 pick, or the difference between the #30 pick and the #18 pick. It’s hard to overstate the importance of this. I can’t say that major league teams have ignored age completely when drafting players, but age has clearly been subordinate to present talent, and this study argues strongly that this has been a mistake. If Player A grades out slightly better than Player B, but Player B is 6 or 12 months younger than Player A, teams have been drafting Player A first, and they should have been drafting Player B. As this data set ends with the 1996 draft, it is quite possible that the edge towards younger players has diminished if some teams have privately done their own research and realized the bonanza to be had in younger high school hitters. In order to study whether this was true or not, I performed an abbreviated study of high school hitters drafted from 1997 to 2003. For this eight year span, I calculated Discounted WARP in the same way as above, with the exception that I only looked at the first eight years after the draft (this way, even players drafted in 2003 had a full eight years of data through 2011). This is an incomplete measure of a player’s value—we’re cutting off every player’s contribution after the age of 27—but it’s the best we can do at this point. As I did with the data set from 1965–1996, I used linear regression to come up with a formula to estimate a player’s DW based on his pick number. That formula was: XP = (6.39/SQRT(PK)) + .04 I then grouped the 176 players in this study into five groups by age—from the youngest 20 percent (those were younger than 18 years, 15 days old) to the oldest 20 percent (those who were at least 18 years, 263 days old). Here are the results:
According to the data, it appears that the importance of a draft pick’s age has, in fact, changed over time… but not in the direction you’d expect: the advantage enjoyed by young players increased dramatically from 1997 to 2003. The average return from the youngest 20 percent of draft picks during this span was more than triple the return of the oldest 20 percent. If those numbers are hard to wrap your mind around, let’s go back to looking at anecdotes. From 1997 to 2003, 22 high school hitters drafted in the Top 100 were at least 18 years, 293 days old. Just two of them reached the majors: Sergio Santos, who only made it after he converted from shortstop to reliever, and Jorge Padilla, who got 25 belowreplacementlevel atbats for the Nationals in 2009, when he was 29 years old. None of the other 20 players sniffed the majors, including a #4 overall pick (Corey Myers). Meanwhile, among the 22 youngest high school hitters drafted in that span were Daric Barton, Carl Crawford, Grady Sizemore, Adam Jones, and Brandon Phillips—none of whom were drafted in the top 25 picks. Crawford was taken #52, Phillips #57, and Sizemore (who, granted, got $2 million to sign) #75. Here’s the data from 1997 to 2003 expressed in chart form:
The yellow dashed line is the bestfit line for all the players in the study. The median age of the players in the study was about 18.4 years old, so the players were split into two groups: those younger than the median (represented by green squares) and those older than the median (represented by blue x’s). You don’t need linear regression to see that the green squares are floating to the top of the graph, while the blue x’s tend to hug the zero line. The bestfit green line for the younger players is dramatically higher than the bestfit (blue) line for the older players. (The blue x at the top of the chart, by the way, is David Wright, who at 18.45 years old was barely above the median age.) Much like I did with the data from 1965 to 1996, I performed a linear regression for the 1997 to 2003 data that included a player’s draft status and his age as variables. The formula I got was this: Expected Return = 19.96 – (1.08 * Age) + (5.97/SQRT(PK)) What you’ll notice is that the coefficient for a player’s draft pick number (5.97) is much lower than it was in the study from 1965 to 1996 (11.14). This isn’t surprising, because in the more recent data set, we’re only looking at how they played in the first eight years after they were drafted instead of the first 15, so their expected return should be lower. But by comparison, the coefficient for a player’s age (1.08) is hardly changed from the previous formula (1.17). What that means is that, relative to where the player was drafted, his age had a significantly greater impact from 19972003 than it did from 19651996. The data from 1965 to 1996 suggested that a player drafted #100 overall could be expected to perform as well as a player one year older who was drafted #24 overall. But from 1997 to 2003, the impact of age was so great that the 17yearold player drafted #100 was as valuable as the 18yearold player drafted #13 overall. The conclusion is clear: at least as recently as 2003, the baseball industry as a whole massively underrated the importance of age in drafting high school hitters and massively undervalued high school hitters who still needed their parents’ permission to sign their contract. While we simply don’t have enough data to evaluate more recent drafts, Mike Trout and Jason Heyward are two powerful data points in support of the notion that the advantage towards younger high school hitters in the draft is still there, and teams ignore it at their own peril. Additional studies are needed to determine whether a similar edge towards younger players exists with pitchers or at the college level; if it does, it is almost certainly a smaller one. But even a smaller edge is worth exploiting. There are fewer and fewer market inefficiencies remaining in the postMoneyball era, and they usually require a hell of a lot more research than simply finding out a player’s date of birth. Implementing this evidence into an organization’s draft preparation is free and painless and ought to have a significant impact on where players are selected. In the 2011 draft, we had the rare circumstance where two highlytouted high school players, drafted close together, were widely separated by date of birth. With the #5 overall pick, the Kansas City Royals took the first high school hitter in the draft, worldclass tools goof Bubba Starling. Three picks later, the Cleveland Indians took the second high school hitter off the board, Florida shortstop Francisco Lindor. Dozens of articles were written on Starling and Lindor leading up to the draft, but to the best of my knowledge, not one of them made mention of this simple fact: while Starling was born on August 3, 1992 (he actually turned 19 before the signing deadline), Lindor was born on November 14—November 14, 1993. Lindor is more than 15 months younger than Starling and will be younger at the end of next season than Starling was on the day he was drafted. There are many reasons to think that Starling, despite his advanced age, will meet the formidable expectations placed on him. And speaking as a Royals fan, I hope he does. But if these numbers are even close to being correct, the younger player should have been drafted first. It wouldn’t be the first time. An expanded version of this article will appear in the forthcoming book Extra Innings: More Baseball Between the Numbers from Baseball Prospectus.
Rany Jazayerli is an author of Baseball Prospectus. 39 comments have been left for this article. (Click to hide comments) BP Comment Quick Links philly (1628) Very interesting study and something that I've noticed anecodotally over the years, but let me pick out two statements that seem quite contradictory to me. Oct 14, 2011 04:25 AM BurrRutledge (18981) A couple thoughts: Oct 14, 2011 05:26 AM BurrRutledge (18981) Of course, I just magnified the effect by 10 in my caffeinedeprived state. A 2.4 actual WARP x 10 players over 15 years would naturally translate to 1.6 win difference each year. While not as bowlyouover important as what I wrote above, that's still an important impact. Oct 14, 2011 05:36 AM Adrian (23655) Awesome set of studies, Rany  really great stuff and a fascinating read. Oct 14, 2011 04:45 AM SaberTJ (10045) More amazing stuffy Rany. I am now even more pumped my Indians drafted Lindor! Oct 14, 2011 05:46 AM ScottyB (23917) It strikes me that, since most of the value we are attributing to younger players is a result of a handful of them having MASSIVE careers (Griffey, etc), that statistical methods comparing the means among age groupings may not be the best way to qualtify this. Perhaps some logistical regression or some other transformation would smooth out the data (or also conducting some outlier analyses). Oct 14, 2011 05:57 AM Shawn (17220) It's possible that I'm missing something... Oct 14, 2011 06:59 AM PepeShady (37241) The Lindor/Starling bit will be interesting to see play out. But I wonder if we'll have another neat little case study in the four CFs drafted late 2nd round this year.... Oct 14, 2011 08:11 AM Cromulent (32088) I love this set of articles  some of my favorite pieces ever on BP, easily. Oct 14, 2011 09:31 AM Funny you should mention that  the whole Moustakas/Vitters choice was the seed for this hypothesis in my head 4 years ago. The Royals were going to take Vitters until the morning of the draft, and I thought it was strange that in the public discussion of which player, the fact that Vitters was almost exactly one year younger than Moustakas never came up. Oct 14, 2011 09:38 AM timber (61526) Curious question for the Royals fan in you, Rany: If all this leads you to concerns about Bubba Starling, how does it make you feel about Eric Hosmer, who was also Very Old for his draft class? Oct 14, 2011 08:19 AM kantsipr (1382) Help me out here, because I'm not sure I understand all the statistical interactions. First, in part 1, you wrote, "I also “zeroed out” any seasons in which a player generated negative WARP. Given that most draft picks don’t reach the major leagues at all, it would be misleading to penalize a player who was good enough to reach the majors for having a negativeWARP season, relative to a player who might never have gotten out of rookie ball." This seems to imply that anyone who didn't make the majors just got zero WARP? If I'm following what you did correctly, the conclusion of the study is more that players who were young relative to their peers and who made it to the majors performed better than those peers once they started to be productive. That's not quite as strong a statement. Did they take longer to adjust once they made it to the majors? It seems like this produces something of a systematic bias. Oct 14, 2011 08:29 AM IvanGrushenko (45528) It amuses me that an 18 year old can be called "very old". Oh, and this is the most awesome article I've seen here in years. Oct 14, 2011 08:33 AM RedsManRick (23592) Should we be asking how appropriate it is to use the average value approach here? That is to say, it seems important to have a good understanding of whether "very young" players tend to be better than expected across the board or if there is simply a slightly more frequent occurrence of "hitting a home run" the young you go. Oct 14, 2011 09:08 AM I think that's a very valid point, and a very real possibility that it's the rare outlier who is moving the needle here. Oct 14, 2011 09:40 AM tbwhite (361) But aren't you assuming that teams draft to maximize the value of each pick, and not the overall value of their draft ? And I suspect that teams may well be optimizing the overall value of their draft, and not the value of each pick. Oct 14, 2011 12:35 PM cjrhgarmon (22748) I agree. This is my big pet peeve with most sabrmetrics. There is too much emphasis on average results and not enough on the distribution of results. For instance, it is common knowledge that junk bonds and penny stocks earn more on average than Tbills and blue chips. That doesn't mean there's a massive inefficiency that hedge funds have yet to exploit. You can't just look at average return. Risk also matters a lot! How risky are the very young HS draft picks relative to the older HS draft picks? Oct 14, 2011 10:02 AM andygamer (58966) It will be interesting how soon the drafting will correct itself to minimize this inefficiency, in much the same manner as College player inefficiency has been apparently corrected. Oct 14, 2011 09:39 AM Matt Garrioch (32400) Great stuff, Rany. I wonder what the result would be if you would only use the data that doesn't include the high end outliers that are two deviations higher than average. How about if you would use signing bonus compared to overall pick? Now I will have numbers spinning in my head all day. Oct 14, 2011 09:43 AM ScottyB (23917) BTW this is totally consistent with Gladwell's whole olderfortheiryear players' advantage. Oct 14, 2011 11:26 AM coachadams5 (34753) Awesome study. The results make sense on an intuitive level if you think about it like using this scenario: a MLB team really loves two similarly talented players in big high schools in SoCal: Player A born July 1, 1993 and Player B born July 1, 1992 are both drafted in 2011. Player B, being a 18yearold Senior, has an additional year of development and puts up monster offensive numbers as a HS OF (let's say a slash line of 14/50/.525). Player A is a 17yearold Senior and puts up less than monster offensive numbers (8/40/.450) and plays a similar defensive OF as Player B with similar peripheral skills. Scout goes to CrossChecker who goes to Scouting Director who goes to GM and says, we can get Player B for $1M in 1st 50 picks or we can get Player A for the same. Which one do you choose if it could mean your job? Human nature would indicate it's more likely that the safer pick goes first, if for no other reason than to be able to justify it later. Most teams take Player B first and hope Player A is there next round  which drives the study you've created. I think it will take time before teams take Player A before Player B (and paying him more). I see them paying more attention to Player A but not to the point of reversing the order  yet. We'll see. Oct 14, 2011 12:57 PM kdringg (20974) I agree with most of the other comments  great series! Nice work all around. I can't wait to see this with pitchers as I have been a huge Kershaw fan and have enjoyed following his career versus some guys who came in with him that were college starters. I've never been able to buy the whole "college players are better than HS players" debate. Oct 14, 2011 14:03 PM mwashuc06 (47497) A good HS example is Taijuan Walker. He was drafted at 17 and he is the same age as many of the top pitching prospect out of HS from the 2011 draft. Oct 14, 2011 14:36 PM NYYanks826 (37443) Rany, I know I'm a bit late to the party here, but I just want to say how fantastic these articles have been. Oct 14, 2011 18:16 PM evo34 (33584) Would be interesting to see if this age effect exists at all among college draftees. Oct 14, 2011 19:34 PM Dodger300 (3120) One important point was never addressed in the article, which is that lots of guys will never have an opportunity to be a young draft pick. Oct 16, 2011 10:04 AM I want to address this comment because I think a lot of people are thinking along these lines, and I want to clear up any misconceptions. Oct 16, 2011 19:49 PM adenzeno (24495) As a teacher/coach for 30 years, I can say that I find the kids who are YOUNGER and were not held back are better students and often better baseball players than those who were held back. I have no data, but it is an observation made over 30 yrs of teaching. I think it is because the younger child must work harder at the beginning to keep up, and this becomes a habit where the more mature child does not have to work as hard early and this too becomes a habit. Oct 17, 2011 02:37 AM Schere (39923) Wow, Rany...wow! This is great stuff, and the effect is so large that these are likely to be merely quibbles. You've got the interns, though, so I'll ask you: Oct 17, 2011 13:11 PM Not a subscriber? Sign up today!

Bryce Harper fits the "Very young" category when he was drafted last year. Coupled with his 'historical' talent, it sure looks like he has the chance for a very productive career.