October 13, 2011
Doctoring The Numbers
Starting Them Young, Part One
Everyone missed on Mike Trout. Don’t get me wrong: Trout was a well-regarded player headed into the 2009 draft, a certain first-round talent. But he wasn’t—yet—a phenom. Everyone liked Trout; it’s just that no one loved him. Baseball America ranked him as the 22nd-best player in the draft. No one doubted his athleticism or his work ethic; a lot of people doubted the level of competition he faced as a high school player from rural New Jersey. The Angels drafted him with the 25th pick overall, and they’ll tell you today that they knew he was destined to be a special player. What they won’t tell you is that they had back-to-back picks at #24 and #25, and they announced Randal Grichuk’s name first.
It didn’t take Trout long to prove to everyone that he had been underestimated. He hit .360 in rookie ball that summer, which was just an appetizer for his 2010 season, when he hit .341/.428/.490 with 56 steals in A-ball, and was ranked as the #2 prospect in the minor leagues by Baseball Prospectus. In 2011 he jumped to Double-A and hit .326/.414/.544, was named Baseball America’s Minor League Player of the Year, and debuted in the major leagues at the age of 19.
Any time a team misses on a player the way almost every organization in baseball missed on Trout, there’s bound to be some soul-searching: what did we miss? Many times, there’s no satisfactory answer to that question. Albert Pujols was a 13th-round pick in 1999, and less than two years later was one of the best baseball players in the world—and even today, no one has been able to adequately explain why every organization in baseball misjudged him so badly.
In Trout’s case, there’s one astoundingly obvious reason why he was underrated going into the draft. It’s one of the most basic pieces of information we have about a player, a piece of information that precisely because of its ubiquity is almost always ignored: his date of birth.
Mike Trout was born on August 7, 1991. This is relevant because, unlike most players drafted out of high school, Trout was still just 17 years old when he was picked. His performance as a high school senior came at an age when many of his fellow draftees were still in their junior year; he played as well as he did without the benefit of an extra year of development.
Baseball’s aging curve is fairly well known by now. Most hitters peak at or around the age of 27, and their performance usually proceeds along a parabolic curve, rapidly improving in their late teens and early 20s, then a more gradual improvement in their mid 20s, before a gradual decline in their late 20s that accelerates in their 30s. The following chart simulates the aging curve for players in the aggregate, plotting talent as a function of age:
(Note that the chart is an approximation not based on actual data, which is hard to come by for 14-year-old major leaguers.)
The implication of the aging curve is that, the younger a player is, the more likely he is to improve over a short period of time. Take two players who are equally valuable today; if one of them is 25 and the other is 26, the difference between their long-term projections is minor. If one of them is 20 and one of them is 21, the differences can be massive, and much greater than you would intuitively expect.
Nearly a quarter-century ago, Bill James addressed this very point in the 1987 edition of his Abstract:
Suppose that you have a 20-year-old player and a 21-year-old player of the same ability as hitters; let’s say that each hits about .265 with ten home runs. How much difference is there in the expected career home run totals for the two players?”
As best I can estimate, the 20-year-old player can be expected to hit about 61% more home runs in his career. That’s right—61%.
The list of 20-year-olds who perform well as everyday major-league hitters is small, and they almost all go on to have stellar careers. This is what made Jason Heyward’s rookie season so promising. His .277/.393/.456 performance isn’t particularly noteworthy for a rookie, but for a 20-year-old rookie it was almost unprecedented, which is why—despite his sophomore struggles—he has almost limitless upside.
Incidentally, Heyward was born in August, and like Trout, was also just 17 on his draft day.
As you can see from the above chart on Baseball’s Aging Curve, the younger the player, the greater the slope of the curve—meaning the greater the rate at which he improves. So if there is such a substantial difference in the expectations between 20-year-old and 21-year-old players, it stands to reason that the difference between 17-year-old and 18-year-old players would be even more massive. At such a young age, a difference of even eight or nine months—the difference between an 18-year-old born in September and an 18-year-old born in May—might move the needle.
The two best high school hitters selected with the #1 overall pick in the draft, Alex Rodriguez and Ken Griffey Jr, were both 17 on draft day. Griffey was born in November—making him one of the youngest first-round picks ever. Meanwhile, the oldest high school hitter selected #1 overall, Shawon Dunston (who was already 19 at the time), spent his entire career leaving people wanting more.
Here’s my point: I don’t think anyone would argue that, all things equal, a 17-year-old player is likely to develop into a better player than an 18-year-old player. But I wondered if the baseball industry as a whole has underestimated the importance of age. I wondered if, given two players taken at the same slot in the draft, the younger player returned greater value. In other words, even accounting for the fact that teams took age into consideration—presumably, a player who is particularly young for his draft class might get picked earlier—I wondered if those players were still undervalued. So I decided to do a study.
So far, all I’ve presented to you are anecdotes, and the plural of anecdote is not data. For instance, the youngest hitter drafted #1 overall wasn’t Griffey, it was Tim Foli, who in 16 years in the majors hit a total of 25 home runs. We need some data.
Fortunately, this is what BP interns were created for. With the help of Bradley Ankrom, Paige Landsem, and Clark Goble, I compiled a list of every high school hitter selected in the first 100 picks of every draft from 1965 through 1996. I stopped the data set at 1996 because I wanted to look at how these players performed over the course of their careers—I defined “careers” as the 15 years after they were drafted.
I avoided college players because the impact of age, if there is one, would be much more likely to show up in evaluating players who are 17 or 18 rather than players who are 20 or 21. I avoided pitchers because the aging curve for pitchers is much less predictable than it is for hitters, as many pitchers never throw as hard again as they did in high school. I figured that, if there were an effect to be seen, it would be most obvious in high school hitters. If it turned out that age on draft day did have an impact on a high school hitter’s future prospects, we could expand the study later to look at other draft groups.
Roughly 10 percent of the players in the data set have no date-of-birth information available—most of these players flamed out of pro ball too quickly to leave an impression. Those players were eliminated from the study, as were three players who were listed as being drafted out of high school but were 21 or older on draft day. (I’m assuming these players had extenuating circumstances; perhaps they had a stint in the army before they started their pro careers.) That left us with a data set of 846 players.
What I wanted to find out is whether players who were younger than average on draft day tended to return more value than expected. In order to determine that, the first step was to figure out what “expected value” was for each player. First, I had to define “expected value.” Fortunately, WARP is an incredibly handy tool whose express purpose is to estimate a player’s value. So I took the WARP generated by each draft pick for the 15 years after he was selected. However, I also applied a discount factor of 8%—meaning that 1 Win Above Replacement Player generated the year after a player was selected was worth 0.92 WARP in the year he was selected. By Year 15, 1 WARP was as valuable as only 0.29 WARP generated in his draft year. That seems fair, given that by Year 15 the player would almost certainly have become a free agent and likely as not moved on to another team.
I also “zeroed out” any seasons in which a player generated negative WARP. Given that most draft picks don’t reach the major leagues at all, it would be misleading to penalize a player who was good enough to reach the majors for having a negative-WARP season, relative to a player who might never have gotten out of rookie ball.
Using the data, I tried to determine the best formula to predict a player’s expected value in discounted WARP based on when he was picked. As I showed in my draft study from 2005, the expected value of a draft pick is highly dependent on when he was picked, and it isn’t a linear relationship—the expected value of a draft pick drops quickly from the #1 to the #2 pick in the draft and gradually levels out so that the difference in expected value between picks #99 and #100 is miniscule.
I looked at a number of different formulas to determine which would best fit the data, and the most accurate correlation I came up with was a linear relationship between “expected value” and 1/SQRT(PK). That is to say, the value of a draft pick correlates with the reciprocal of the square root of the pick number.
An easier way to look at it is this: the square root of the pick number is a measure of how much more valuable the #1 overall pick is relative to that pick. By this formula, the #1 overall pick is three times more valuable than the #9 pick, four times more valuable than the #16 pick, and so on. It also means that the #4 pick is three times more valuable than the #36 pick, and the #25 pick is twice as valuable as the #100 pick.
Performing a linear regression on the data leads to this formula:
XP = 11.21/SQRT(PK)—0.04.
XP refers to a player’s eXPected value. By this formula, the #1 overall pick is expected to bring back 11.17 Discounted WARP (henceforth known as DW). The #10 pick has an expected value of 3.50 DW; the #100 pick would be valued at 1.08 DW. The correlation between DW and 1/SQRT(PK) is highly statistically significant; the p-value was essentially zero. And if you’ve made it to the end of this paragraph, you need to get out more.
Here’s a graph that plots out the value generated by every player in the study and where they were taken in the draft:
The data shows an enormous amount of variation—not surprising, given the boom-or-bust nature of the draft—but you can sense that on average, the higher a player is drafted, the more he is worth.
I also looked at whether there was any correlation between a player’s expected value and the year he was drafted. My thought was that perhaps, as teams have done a better job of identifying players over time, players from more recent drafts might be expected to do better. On the other hand, it was possible that, with more and more major-league talent coming from foreign lands not subject to the draft, players from more recent drafts might be less valuable than those drafted in the 1960s and 1970s. It turns out these two factors must have canceled each other out, as there was no statistically significant relationship between draft year and DW.
Now that we have a simple formula for calculating what a particular draft pick “should” be worth, we can evaluate whether players who were particularly young or old were likely to return more or less value than expected on their investment. For example, we can look at the very first draft in 1965, when 25 high school hitters were selected among the top 100 picks. The oldest of them was a shortstop named Carl Richardson. Richardson was born on June 2nd, 1946. For the sake of standardization, we set “draft day” as occurring on June 1st for every year, so in our system, Richardson is listed as a day shy of 19 years old on his draft day.
Richardson was selected #77 overall by the Cincinnati Reds. The expected return of that draft slot was 1.32 DW. Richardson never made the major leagues, so his actual DW was zero.
On the other hand, the youngest high school hitter selected in the 1965 draft was a catcher from Oklahoma who was born on December 7th, 1947, making him more than 18 months younger than Spencer. He was also selected by the Reds, with the #36 overall pick, which has an expected value of 1.91 DW. As it turned out, that hitter—Johnny Bench—was worth considerably more that. (Bench ranks fifth among all the players in our study with 34.05 DW, behind only Alex Rodriguez, Rickey Henderson, George Brett, and Ken Griffey Jr.)
Now that you get the idea, here’s some data. I took the five youngest players from every draft from 1965 through 1996 and compared them to the five oldest players from the same draft. Here’s that data in chart form:
“Young XP” refers to the eXPected value of the five youngest high school hitters in that year’s draft, based on where they were selected. “Young DW” refers to the total Discounted WARP those five players actually earned. “Old XP” and “Old DW” refer to the same for the five oldest high school hitters in that year’s draft. “Return” refers to the return on investment above or below expectations; +100% would mean that those five players returned, in total, 100% more than (i.e. double) what was expected from them.
Now that the explanations are out of the way: wow. Over the 32 years combined, the youngest players in each year’s draft were expected to produce slightly less value than the oldest players, because on average they were taken with slightly later draft selections. Despite that, the five youngest players in each year returned MORE THAN TWICE AS MUCH VALUE as the five oldest players. If you adjust for the fact that the older group had a slightly higher expected value on Draft Day, the younger group had a return that was 117% higher than the older group.
Let me repeat that: a team that drafted one of the five youngest high school hitters selected among the top 100 picks could expect MORE THAN TWICE AS MUCH VALUE from him as a team that selected one of the five oldest high school hitters. And that’s not a small sample size fluke; that’s a result derived from 32 years of the draft, looking at 160 players from both camps.
Here’s a graph that displays the data. The bars indicate the return for both young and old players in each season, while the lines measure a rolling average of the return over the previous five years:
The take-home from the graph is that the red line—which indicates the five-year return from the youngest players in the draft class—is above the yellow line (the five-year return for the oldest players in the draft class) for virtually the entire length of the study. And in most years the gap between the lines is substantial.
The other thing the chart reveals is that the five-year return for the oldest players in a draft class has been at or below 0% in every year of the study. There has never been a time when old high school hitters generated a positive return.
While the advantage enjoyed by younger players ebbs and flows from year to year, it doesn’t appear to grow or diminish over time. If we combine the draft years into four year bins (i.e. 1965-1968, 1969-1972, etc.), we can see that:
With the exception of the four-year span from 1981-1984—thank you, Shawn Abner—the young players beat their expected return (and beat the pants off of the older players) every time. Hedge funds would kill to beat the market this consistently.
Young high school hitters are simply much more likely to develop into stars, particularly players who weren’t elite picks. I already mentioned Johnny Bench, who went from the second round to the Hall of Fame. In 1972, Chet Lemon was selected with the #22 overall pick; Lemon was 17 years, 3 months, and went on to a fantastic career.
The following year, amazingly, two of the five youngest high school hitters went on to the Hall of Fame. Maybe it’s not a surprise that Robin Yount did, given that he was the #3 overall pick and was starting at shortstop in the majors the following year—the only 18-year-old to play regularly in the majors in the last 75 years—but it was a surprise that Eddie Murray, drafted as a catcher/first baseman with the #63 overall pick, went on to find the success he had. It shouldn’t have been; Murray was two weeks younger than Lemon had been. Murray and Lemon, in fact, were both among the six youngest players in the entire study.
The youngest player in our study from 1976 was taken with the #96 overall pick. Rickey Henderson was a month younger than Mike Scioscia, drafted #19 overall that year. In 1980, the #71 overall pick was used on a young high school second baseman named Danny Tartabull. In 1986, the Brewers had the 6th overall pick and didn’t screw it up, using it to select Gary Sheffield. And in 1987, the Mariners selected Ken Griffey Jr 1st overall.
In 1992, Derek Jeter was selected #6, and Jason Kendall, born on the same day, was selected #23. And in the last year of the study, 1996, the youngest player selected in the Top 100 was Jimmy Rollins, who was drafted #46 overall. Meanwhile, the best players in the entire study selected from among the five oldest players in their draft class were Willie Wilson, Johnny Damon, and Richie Hebner.
This is, all modesty aside, quite possibly the most impressive and significant finding of my career. When it comes to the drafting of high school hitters, even slight differences in age matter. At least when it comes to high school hitters, young draft picks are a MASSIVE market inefficiency.
In The 1985 Bill James Abstract, James published the results of his study which showed that “The rate of return on players drafted out of college is essentially twice that of high school players.” That is considered to be one of James’ most important findings, and in fact it was more than a little surprising when, in 2005, I found that the advantage for college players had almost disappeared over the years.
Based on the data above, the advantage the youngest high school hitters in a draft class have on the oldest high school hitters is just as great as the advantage college hitters once enjoyed. And this advantage does not appear to be diminishing over time.
These numbers are so dramatic that they cry out for corroboration. In Part Two tomorrow, I’ll delve into the data a little more.
An expanded version of this article will appear in the forthcoming book Extra Innings: More Baseball Between the Numbers from Baseball Prospectus.