May 25, 2005
Doctoring The Numbers
The Draft, Part Three
Before we move on to a discussion of whether the balance of power between high school and college players has changed over time, I want to clear up some loose ends. The two salient conclusions from the last article were that:
The shorthand for this is that college players are both more likely to reach the majors and more likely to develop into star-caliber players once they reach the majors. However, a number of readers questioned this final summation, wondering whether, even though they trailed in both their matriculation rate and their overall value, it was possible that high-school players were more likely to achieve true stardom than their college counterparts.
The best way to tackle this question is to break down each group, college and high school players, by their 15-year WARP tally. For instance:
High School College Never reached majors: 459 281 WARP Between 0 and 1: 58 66 WARP Between 1 and 5: 63 101 WARP Between 5 and 10: 30 47 WARP Between 10 and 20: 41 56 WARP Between 20 and 30: 27 43 WARP Between 30 and 40: 16 23 WARP Between 40 and 50: 6 15 WARP Between 50 and 60: 2 8 WARP Between 60 and 80: 6 11 WARP Greater than 80: 4 13 Total: 749 715Despite the fact that slightly fewer players were drafted out of high school than out of college, there are more players on the college side of the ledger in every row of players that reached the major leagues. If anything, the gap widens as we look at the truly elite players: Of the 65 players who amassed 40 or more WARP, 47 of them (72%) were college draftees. Of the players who reached the majors but failed to reach 40 WARP, "only" 336 of 571 (59%) were from the college ranks.
Or think of it another way: If we come up with PECOTA-like "percentile ranks" for how much WARP you could expect from a player drafted out of college vs. high school, here's what we end up with. (Keep in mind that the mean value for each set of players is 10.33 WARP for college players, 6.68 for high school players):
High School College 90th % 18.3 29.3 75th % 1.1 8.6 50th % N/A 0.3 25th % N/A N/A 10th % N/A N/ASince less than 50% of high-school draftees made the majors, their 50th-percentile projection (i.e. the median) is zero. This is why it's important to differentiate between the median and the mean; the presence of a few superstars in each group increases the mean WARP value, but overall the majority of draft picks in either group are near worthless.
If we run this same chart, but exclude all the players who didn't reach the majors at all--an exclusion which definitely favors the high-school draftees, since so many more of them failed to reach the majors --here's what we get:
High School College 90th % 33.4 42.0 75th % 15.2 20.7 50th % 3.6 4.9 25th % 0.4 0.8 10th % 0.0 0.0Even in this study, college players make out better. So, at least from 1984 to 1999 as a whole, not only are college players more likely to reach the majors than high-school players, but among those who reach the majors, college players are more likely to fashion a valuable career than high schoolers.
But the question arises: Has this advantage in favor of college players changed over time? Sixteen years is a long time, after all. Jim Callis' study for Baseball America looked at players drafted from 1991 to 1997; it is certainly possible that the advantage enjoyed by collegiate players in our study is just a residual artifact left over from the 1980s.
We can break down the data into chunks, looking at the high school vs. college data from 1984 to 1991 separately from the data from 1992 to 1999. There is one problem with this approach, however: The players in the more recent data set do not have as many years of data to analyze. Extending the data in the more recent group to Y15 is impossible because none of the players have reached Y15.
We'll work around this problem. First, here are the 15-year WARP charts for both groups:
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 Y13 Y14 Y15 Total COL 0.01 0.11 0.36 0.62 0.84 0.91 1.00 0.98 0.98 0.84 0.76 0.73 0.64 0.59 0.53 0.44 10.33 HS 0.00 0.02 0.15 0.31 0.46 0.61 0.69 0.74 0.71 0.73 0.66 0.51 0.52 0.34 0.24 6.68A couple of points about this comparison. One possible objection to this study's methodology is that, even looking 15 years out, the study is biased against high-school players because, being younger than college players, they would still have more good years left in them at the conclusion of the study.
What I find interesting is that, even in the last few years of this study, the college players are still kicking the high-school guys all over the playground--even though you would think the three-year gap in age would become a particularly large advantage for the high-school draftees at that point. At Y15, the college players are about 36 years old, the high-school guys are only 33--and yet the college guys have nearly twice as much value. It seems to me that if we ran the study out until every player had retired, the advantage enjoyed by college players might increase even more. Hell, just including Barry Bonds' last few years would move the needle.
However, even though college players seem to do better at the far right end of the chart, they still derive more of their value (as you would expect) in the early years after the draft. College players tend to peak at Y6, high school players between Y8 and Y10 (which fits comfortably with the old canard that most players tend to peak around age 27.) Let's look at when a typical college vs. high-school draft pick accumulates most of his value:
High School College Y0 through Y5: 0.94 (14%) 2.85 (28%) Y6 through Y10: 3.47 (52%) 4.56 (44%) Y11 through Y15: 2.28 (34%) 2.93 (28%)Not only do collegiate players return 55% more value than high-school players, but they produce twice the fraction of that value within five years of being drafted. Without getting into economic details like the discount rate of future earnings, it's obvious that if you have the choice between two players of equal overall value, you'd rather have the one that will produce all that value now than one who will produce that value a decade from now. The farther off in the future a player's value is, the less likely the team that drafted him will reap that value; he might get released in the interim, file for free agency, get claimed on waivers, get traded for pennies on the dollar, whatever. Al Leiter has had more career value than Steve Avery, but I know which one I'd want my team to draft.
So that's another point in favor of college players.
But back to the original question. Have the scales that have historically favored college players tipped the other direction in more recent times?
To answer that, I'll start by reproducing the 15-year chart comparing high school and college players above, but using data from 1984 to 1991 only:
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 Y13 Y14 Y15 Total COL 0.01 0.16 0.47 0.71 0.92 0.99 1.05 1.05 1.04 0.87 0.81 0.83 0.68 0.59 0.53 0.44 11.15 HS 0.00 0.02 0.13 0.26 0.38 0.56 0.57 0.66 0.66 0.68 0.58 0.48 0.52 0.34 0.24 6.08Relative to our entire group of players, college players drafted from 1984 to 1991 amassed more value (11.15 to 10.33 overall), while high school players amassed less (6.08 to 6.68 overall). Between 1984 to 1991, collegiate players were worth 83% more than high-school players, compared to a figure of just 55% more for the entire period from 1984 to 1999.
But if the gap between collegiate and high-school players was larger in the first half of the study…you see where I'm going with this.
Here's the same chart, this time using data from 1992 to 1999. In this case I'm only taking the data out to Y10, as only three draft years (1992-94) have even reached Y10.
Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Total COL 0.00 0.06 0.23 0.52 0.74 0.81 0.91 0.86 0.85 0.76 0.57 6.33 HS 0.00 0.01 0.18 0.34 0.55 0.69 0.86 0.92 0.79 0.87 5.22This is what Alan Dershowitz might call Reversal of Data. The gap between collegiate and high-school players has narrowed to just over 1 WARP; after rolling up 83% more WARP than high schoolers from 1984 to 1991, collegiate players have only maintained a 21% edge from 1992 to 1999.
Actually, it's even smaller than that. As you can see from the numbers, while high-school players lag their collegiate counterparts in the first six years after the draft, the lines cross at Y7 (when collegiate players are about 28, and high schoolers are only 25), and then high-school players have more value from Y8 through Y10, when our study runs out of numbers to crunch. It stands to reason that with the benefit of more years of data, the gap between high school and college players would narrow further.
We can estimate what the final numbers might look like by extrapolating the data from our older set of players. As we showed above, about 28% of a college player's value comes between Y11 and Y15, compared to 34% for high-school players. Those are the figures for the entire data set; if we look only at players from 1984 to 1991, the numbers are 27.5% and 35.5% respectively.
If we assume that the more recent set of players will also derive about 27.5% and 35.5% of their 15-year WARP value between Y11 and Y15, we can fill in the numbers with a little algebra:
High School College Value through Y10 (Actual) 5.22 6.33 Value, Y11 - Y15 (Estimated) 2.87 2.40 Total 8.09 8.73I wouldn't take these numbers as pure gospel; there are a lot of reasons why these data may prove to be inaccurate in the long run. These players are all still in the middle of their careers, so the the numbers could change. And by using only eight years of data, we've cut our sample size in half. In particular, the numbers for Y9 and Y10 involve very few draft classes, and a few more years of data might change the numbers significantly.
But let's not beat around the bush here: According to the data, the advantage between college and high-school players has shrunk from 83% to all of 8%. That's not just significant --it's mind-blowing.
Mind you, it's not mind-blowing simply because the gap has evaporated. When I first started compiling these data two months ago, I actually suspected that the gap between high-school and college players had closed over the years. But my rationale was that in the 1990s, teams were more and more inclined to take college players in the early rounds of the draft. After all, almost no college players were taken in the first round in the early 1970s; by the early 1980s, nearly half the players being taken in the first round were out of college. I assumed that trend had continued.
And if it had, it would certainly explain why the gap has closed: Namely, teams had sensed the overwhelming advantage enjoyed by college players in the 1970s and 1980s, and had changed their draft strategies to accommodate. In other words, they had identified the inefficiencies in the market, and by taking advantage of them as a group, they had worked to remove them.
Only that's not the case. Here's how the distribution of college vs. high school players break down:
1984 - 1991 1992 - 1999 High School College %Col High School College %Col First round 95 131 58% 109 118 52% Second round 150 135 47% 167 125 43% Third round 102 112 52% 126 94 43% Overall 347 378 52% 402 337 46%I'm man enough to admit it: I'm completely befuddled by these results. Not only did the gap between the value of college and high-school players shrink to almost nothing in the 1990s; this has occurred even though the pendulum swung back towards taking more high-school players.
In my 10 years of writing for Baseball Prospectus, this is the most surprising conclusion I have ever reached in an analytical study. I suspected that the advantage enjoyed by collegiate players had diminished, but I didn't anticipate the degree to which it has. And I certainly did not suspect that high-school players would jump in value relative to college picks even as teams were drafting more high school players, not less. This seems to violate the basic principles of economics. Prices don't drop when demand goes up, but in this case the "price" of high-school talent--the difference between the value of the draft pick and the return on the player drafted--has gone down even though the demand for high-school players has also increased.
Bill James was right. But it appears that Jim Callis is right, more or less. Meanwhile, we need a couple new draft rules:
Draft Rule #6: Draft Rule #5 appears to be obsolete when looking at data from after 1991. The advantage enjoyed by college players over high-school players has dropped to 8%, a margin of dubious statistical significance.
Draft Rule #7: The value of high-school players relative to their college counterparts has shot up, even though teams were more likely to use top draft picks on high-school players in the 1990s than in the 1980s.
What can explain this paradox? One factor that can cause prices on a commodity to drop even as demand increases is the advent of technology that makes the production of that commodity more efficient, and therefore increases the supply of the commodity to keep up with demand. Applying that analogy to the baseball draft, if teams somehow became more efficient at identifying high-school talent--if they somehow managed to weed out the worst mistakes of years past--they could increase their yield even as they drafted more high-school players.
It's a good theory. Of course, my last theory blew up in my face. Next time, we'll delve into this topic a little deeper.