May 13, 2005
Doctoring The Numbers
For most teams, the most important day of the year isn't Opening Day, or a day in October that ends in a dogpile, or the November day that marks the start of free-agent season. No, for most teams, the red-letter day falls on the first Tuesday of June, a day that involves sitting around a telephone, on a conference call with 29 other teams and the Commissioner's office, a day on which, if you're really, really lucky, you get to say something like this: "the Chicago Cubs select Redraft Number two-six-four-one, Mark William Prior, University of Southern California."
Sexy, it's not. Neither is it all that telegenic, although it certainly could be if MLB ditched the conference call for an amphitheatre with good lighting and tried to make a production out of it. There's no denying its importance, though. There is no source of talent that comes close to matching what's available in what is officially called the Rule 4 Draft. Moreover, there is almost no way to build a successful ballclub without some measure of success in the draft. (The Yankees are trying to prove that last sentence incorrect. They are not succeeding.)
It's one thing for the media to treat the baseball draft with far less reverence than its football counterpart. Historically, it has been taken far less seriously by the participants. Some football teams were using sophisticated, computer-aided analysis of potential draft picks as far back as the early 1970s. For the first 20 years of the baseball draft, so little effort was made by teams to hone their draft strategies that it was revolutionary when Bill James discovered, as he wrote in the 1985 Baseball Abstract, that "not only is there no basis for the prejudice against the drafting of college players, but in fact the reverse is true." He went on to state that "the rate of return on players drafted out of college is essentially twice that of high school players."
Keep in mind, in the early years of the draft, it was an inviolable concept that the best way to create a superstar was to find an athletic 18-year-old with the good face and mold him into one. Take 1971, for instance, in which every first-round pick was drafted out of high school. All those picks yielded just two stars (Jim Rice and Frank Tanana) and two more quality players (Rick Rhoden and Craig Reynolds). Take 1977, when 21 of the 26 first-round picks were high schoolers. The five college players included Paul Molitor, Bob Welch and Terry Kennedy. Of the high-school players, aside from Harold Baines and Bill Gullickson--the first two picks in the draft--the only picks that had any kind of substantial major-league careers were Rich Dotson, Wally Backman and Dave Henderson.
Teams finally started to catch on in the early 1980s. In 1981, more college players than high schoolers were taken in the first round, the first time that had occurred. For their efforts, teams selecting from the college ranks were rewarded with Mike Moore, Joe Carter and Ron Darling; the best high-school players picked that year were Dick Schofield and Daryl Boston.
Nevertheless, the inefficiencies in the baseball draft continued to vastly exceed the inefficiencies in every other method of talent acquisition, in large part because no team seemed willing or able to exploit those inefficiencies at all. They were all drafting with one eye closed. (The A's may be the exception to the rule; they started focusing on college talent soon after Charlie Finley sold the team in 1980.) Even as it became acceptable to select college players in the early rounds of the draft, the notion that teams ought to pay attention to the production of those players while in college--the idea that college statistics matter--is a remarkably recent innovation, as chronicled in Moneyball.
Still, draft patterns have changed over the years, making it unlikely that James' original study, as groundbreaking as it was in its time, still paints an accurate portrait of the draft two decades later. In particular, the age-old question of high-school vs. college players is being continually revisited, and more than one expert has arrived at the conclusion that there is no longer any substantial advantage in selecting college players over high-school players early in the draft. Two years ago, Jim Callis of Baseball America wrote in this article (subscription required) the following regarding a study of the first ten rounds of the draft from 1991 to 1997:
"we find that 90 college players (8.8 percent) and 77 high school players (8.4 percent) became at least major league regulars for a few seasons…Though colleges produced slightly more regulars, high schools won the race for above-average players. They came out ahead in terms of good regulars (3.2 percent vs. 1.5 percent) and stars (1.1 percent vs. 0.9 percent). Once again putting that in terms of 250 draft picks, collegians generally would yield four above-average regulars and two stars. The prep ranks would generate eight above-average regulars and three stars."BA's study produced interesting results, but I wanted to take a look at the data myself. I couldn't do it all, so I recruited an army of one, also known as BP intern John Erhardt. This study would not have been possible without the efforts of John, who entered almost all of the data that makes up this study into a spreadsheet, mostly by hand.
This study includes the first 100 picks--roughly equivalent to the first three rounds--in every draft from 1984 to 1999. (Going forward, as shorthand I will refer to picks one through 30 as "first-round picks"; picks 31 through 70 as "second-round picks"--this would include the supplemental first round; and picks 71 through 100 as "third-round picks.")
I selected these endpoints for a couple of reasons. The 1984 draft was the first one in which college players really took center stage; the accomplishments of the U.S. Olympic Team gave college talent a national audience for the first time. Guys like Cory Snyder, Mark McGwire, Shane Mack and Oddibe McDowell were drafted that year, and were just an appetizer for the following year, when the first six picks of the draft featured B.J. Surhoff, Will Clark, Bobby Witt, Barry Larkin and Barry Bonds. (The only high-school player selected in the top six, catcher Kurt Brown, never reached the majors.)
Also, the draft process was streamlined in the mid-1980s with the abolishment of the "supplemental" draft in January, leaving just one draft to analyze.
I end the study in 1999 because, for one, not enough time has elapsed to adequately judge more recent drafts. Also, the 2000 and 2001 drafts appear to be unusual; the 2000 draft was historically barren, the 2001 draft was historically fruitful. Consider this: the first five picks in 2000 were Adrian Gonzalez, Adam Johnson, Luis Montanez, Mike Stodolka and Justin Wayne. The first five picks in 2001 were Joe Mauer, Mark Prior, Dewon Brazelton, Gavin Floyd and Mark Teixeira. Slight difference there.
So we have 16 seasons of data, which will become handy when I break up the data into eight-year chunks to see whether draft patterns have changed over time.
I eliminated from the study all the players who did not sign with the team that drafted them. This includes players who went back into a future draft, as well as "draft loophole" players like Travis Lee and John Patterson. There are enough variables that determine the value of a draft pick without having to account for whether he was signed or not. I also didn't want players selected twice, like J.D. Drew, mucking up the data.
(It's a lot harder than you might think to determine, 20 years after the fact, which players signed and which didn't. We think we located every player, but I wouldn't be surprised if we missed a few. In a study of this size, I don't think an omission or two alters the data significantly.) Out of 1,600 draft picks, that left us with 1,526 players.
Once we had every player in our database, the next thing we--OK, John--did was to enter WARP data for every year of that player's career. WARP stands for Wins Above Replacement Player, and is an overall measure of a player's worth in the only currency that really matters, wins. The data was categorized by how many years it had been since the player was drafted. In other words, for a player drafted in 1984, "Y0" refers to the player's WARP number in 1984, "Y5" refers to his value in 1989, etc. If a player didn't play in the majors in that year, the cell was left blank. (Obviously, only about a dozen players of the 1,526 played in the majors in Y0.)
The one tweak we made to the data was that a player who managed a negative WARP value in any given year had his value zeroed out instead. Given that we're evaluating a set of players who may or may not have even reached the major leagues, I felt it was inappropriate to penalize a player for being a bad major leaguer--even a woefully bad major leaguer--relative to another player who may have peaked in A ball. (This didn't affect many players, because it's awfully difficult to muster a negative WARP. David Howard, who spent most of the 1990s driving me to the edge of homicidal rage, never had a negative WARP as a member of the Royals.)
Finally, we only looked at the first 15 years of a player's career, ending the columns at "Y15." For one thing, just six draft classes (1984 through 1989) even had completed data for Y15, and extending the data any further would have rendered sample sizes of dubious significance. Also, anyone who drafts a player based on what he does 16 years later needs to take a refresher course in baseball labor relations over the past 35 years, with a special emphasis on the abolishment of the Reserve Clause and the establishment of free agency.
Enough prologue: let's look at the data. Before we break the data down into subsets, let's start by looking at the overall numbers. After all, before you can decide whether a certain set of players make for a good pick, you have to know what those picks are worth.
First, let's look at the percentage of players selected who reached the majors, even for a Moonlight Graham appearance, based on their draft position. The data has been aggregated in groups of five to minimize random variation from one draft slot to the next:
As you would expect, the probability of reaching the majors drops in a fairly linear fashion from the first pick to the 100th. There is a big dropoff after pick 35, but there's also a spike upward between picks 45 and 50, which suggests random variation. Overall, the probability of reaching the majors starts at 90%, drops by about 0.9% per draft spot for the first 50 spots, then drops by about 0.3% per spot from picks 50 through 100.
I was curious as to whether teams became any more savvy in making draft picks as the years went on, so I decided to break the data up into 1984-91, and 1992-99. Here's the chart comparing each set of eight years:
The lines are similar throughout, although the more recent drafts, surprisingly, have seen fewer major leaguers identified in the middle of the first round. They have done a slightly better job between picks 65 and 75--the early third round, basically--but not enough to compensate. Overall, 386 of 760 players taken in the first 100 picks from 1984 to 1991 reached the majors, or 50.8%. The numbers from 1992 to 1999 are 366 of 768, or 47.7%. The difference is almost entirely the result of the fact that players from the more recent drafts have had less time to reach the majors; with the benefit of a few more years, the more recent drafts should equal, if not exceed, the success rate of the earlier set.
(There is also the question of how much major-league talent is available in the draft at all, as more and more major-league players are signed as international free agents. I don't have the data to answer that question, but intuitively it would seem that as international talent occupies more roster space, the relative amount of talent available in the draft has gone down. In other words, relative to the total amount of value available in the draft, it is possible that teams are doing a better job of identifying that talent and selecting it in the first three rounds.)
Of course, the goal of the draft isn't simply to find a player who will get a pinch-hit appearance in the majors a decade later; any draft measure which labels Alan Zinter a "success" is obviously incomplete. So let's look at a different and much more telling set of data, which is the average WARP accumulated by a draft pick in the first 15 years after he was drafted (in other words, his Y0 through Y15 cells added together).
For this exercise, if a player didn't play in the majors that year, he was treated as if he had zero WARP for that season--which, of course, is true. But if a player had not yet reached that year in his career, then he was not counted in that year of the study. For instance, since 1999 draft picks have only reached Y5 in their career, any data analysis involving Y6 onward eliminated these players from the study. In this way, we are able to get an accurate measurement of value in each year of a player's career, without penalizing recent draft picks because they haven't reached that point in their career yet.
So in the following chart, keep in mind that "Total WARP" means that the average WARP for all players in Y15 who played in that season were added to the average WARP for all players in Y14 who played in that season, etc. Years Y0 through Y5 include every player in the study, as it has been more than five years since the last players in the study were drafted.
Here's the 15-year WARP data for every draft position from 1 through 100:
In this case, because I didn't aggregate the data into groups of five, the data looks significantly more erratic as the smaller sample sizes introduce more randomness into the study. So in this next chart, I've clumped the data into groups of five again, with the exception that the #1 overall selection is divorced from picks 2 through 5:
The first thing that stands out is that the #1 overall selection is significantly more valuable than the picks that come after it, even the picks that come immediately after it. This isn't just the result of smoothing out the data; here's another chart that looks only at the first 25 picks:
The typical #1 overall pick is worth more than 46 WARP in the first 15 years of his career; no other draft slot comes within even 10 wins of that total. Just as importantly, the benefits of the #1 overall pick do not extend to the #2 pick; in fact, historically, the #2 pick has been worth slightly less than the #3 and #4 picks, and from that point random variation kicks in and strongly influences the downward progression for the rest of the first round. This leads me to coin the first of many draft rules from this study:
Draft Rule #1: The greatest difference in value between consecutive draft picks is the difference between the first and second picks in a draft.(Naturally, this is the year that major league baseball decided to stop arbitrarily assigning the first pick in the draft to a specific league--it would have been the AL's turn this year--and simply have all teams draft in inverse order of their winning percentage from the year before. So my Royals are picking #2 instead of #1.)
Getting back to the previous chart, you will notice that the value of a draft pick drops off rather steeply and consistently for the first 40 picks…and then flatlines. There are a pair of weird plateaus in the data, at picks 46-50 and again at 71-75, but otherwise the value of a draft pick seems to hold steady until pick 90, at which point it starts to tail off again. In fact, the average WARP for picks 86-90 (3.55) is actually greater than the average WARP for picks 41-45 (3.26).
This is in contrast to the chart which looked only at whether draft picks reached the majors. While draft picks are less likely to reach the majors when picked in the third round than in the second, in terms of real value the classes are almost identical. Overall, picks 41-65 had an average WARP of 4.51; picks 66-90 had an average WARP of 4.56. Which leads to the next rule:
Draft Rule #2: There is surprisingly little difference in value between second-round and third-round draft picks.As an interesting sidenote, in James' original study, he wrote that "players drafted #1 have produced about 8.5 times the major-league approximate value of those drafted #50." In our study, the value of a player drafted #1 (46.37 WARP) was 7.2 times the average value of players drafted from #46 through #55 (6.44 WARP). The eras may have changed, and the metrics certainly have (the difference between Approximate Value and WARP is sort of like the difference between logarithm tables and Nate Silver's supercomputer), but the relative numbers have remained reassuringly stable.
Next week, we'll answer the age-old question: on the whole, do college players or high school players make better picks?