August 2, 2005
Doctoring The Numbers
The Draft, Part Six
It's been a long time since we spoke. Since my last article on the draft--for the first five parts, click here, here, here, here, and here--I've suffered two emotionally draining events: the big 3-0, and the birth of my second daughter, Jenna. (Mother and daughters are doing fine. Father is trying to figure out how to squeeze in baseball highlights between diaper changings and viewings of Dora the Explorer.)
I'm back now, and promise to get the next installment of this series out well in advance of the big 4-0.
The time off wasn't entirely wasted--I did have an epiphany of sorts while trying to figure out different ways to extract useful information from a veritable mountain of data.
Up until now, I had tried to calculate the value of a draft pick by aggregating them into large groups--like all college players drafted in the first round--and then determining the average value that each of those picks had in Year 1, Year 2, Year 3, etc. after each player was drafted.
That was a very useful method for looking at large groups, but there were problems in extrapolating that method to look at other issues. For instance, an issue as simple as asking "which teams have done the best job of drafting?" would be poorly-served by this approach. Comparing, say, the average value of every first-round pick by the Atlanta Braves with the first-round picks of the Tampa Bay Devil Rays would be useless, because--as tradition dictates--they draft on opposite sides of the round. You simply can't compare the value of the #3 overall pick with the #30 overall pick. And if you break the individual groups down into smaller parts--say, looking at Top 10 picks only--you quickly end up with sample sizes so small that they're essentially meaningless.
So I had to look for another way. Which meant I had to get over my hang-up over using discounted values.
By that, I mean the best way to compare one pick to another is to assign a precise value for each pick--to determine, as accurately as possible, what the #4 pick in 1992 was "worth" going into the draft, i.e. its expected value. But to come up with an accurate measure to determine expected values, I first had to come up with an acceptable way to determine actual values of previous draft picks.
Until now, I had used the simple approach of summing up a player's WARP value for the first 15 years after he was drafted. It was a quick-and-dirty method that yielded quick-and-dirty results. To get a more precise answer, we need a metric that, like 15-year WARP values, distilled a player's contribution into one tidy figure, but unlike 15-year WARP values, only measured the value that player generated while still under the control of the team that drafted him.
The ideal method would have been to figure out the exact date at which every player was released or declared free agency for the first time, and only counted his value to that point. Players who were traded would have the players they were traded for tacked on to their value, and so on. The problem with this method is that it would have taken, approximately speaking, forever.
So I tried to come up with a system that sacrificed a minimum of accuracy in exchange for a maximum of usability. The formula I came up with also had to account for the fact that high-school players tend to spend more time in the minor leagues, and therefore tend to spend more years with their original team before hitting free agency. Here's the formula:
For collegiate draftees: Full value for years Y0 through Y8, plus two-thirds of their value for Y9, plus one-third of their value for Y10.
For high school picks: Full value for years Y0 through Y9, plus three-quarters of their value for Y10, plus half their value for Y11, plus one-quarter of their value for Y12.
Are there college players who don't reach the major leagues until five years after they were drafted, and don't reach free agency until after Y11? Sure. Are there high school players who are in the major leagues 18 months after they were drafted and say goodbye to their original club (hello, A-Rod) after Y7? Absolutely. This formula is going to be inaccurate on the margins. But for large groups of players, or even small ones, it's accurate enough.
(Incidentally, we haven't addressed Junior College players up until now--that's one of the goals of this new system. For Junior College players, I decided to compromise, figuring out their value using each formula and then averaging the two.)
The other adjustment I had to make was to discount future value relative to present value. All things considered, you'd much rather prefer an All-Star season this year than an All-Star season five years from now. And when drafting a player, you'd much rather have a player who can contribute right away than one who won't have an impact for years. For one thing, the longer you have to wait for a player to contribute, the more likely you are to have given up on his future and let him go for a fraction of his value.
In previous articles, I had suggested a fairly steep 10-15% discount rate as appropriate. I have heard from several readers with a slightly more impressive economics background than mine--at least if you consider "University Professor of Economics" to be an impressive background--who intimated that such a discount rate was too high. Among other reasons, baseball players are not as liquid as, say, Microsoft stock. You can't cash in a ballplayer at any time and get his exact fair market value in return.
If you have a thousand dollars in the bank that you don't really need, you can put that in a money market and withdraw it--with interest--at any time. If your farm system develops a terrific shortstop and you've already got Miguel Tejada under contract for the next five years, you're either going to move that shortstop prospect to another position--and lose some of his value--or put him on the trade market, where you may or may not find a good fit for another player who 1) has equal value and 2) fills a need position.
So in the end, I used a discount rate of 8%. Which means that a player's WARP value in year Y0 is counted in full; his value in year Y1 is counted at 92%, in year Y2 at (92%)^2 or 84.6%, etc. Because future value is always worth less than present value, this means that a player's "discounted value" is always going to be less than his original 15-year WARP value, which was not discounted.
Let's take a look at two college pitchers to give you an idea of how the new system differs from the old one:
Player Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 Y13 Y14 Y15 Total DisVal Kevin Brown 0.2 0.2 6.0 4.7 4.1 8.5 7.5 3.7 6.6 11.3 8.8 10.2 8.0 8.3 4.6 92.7 26.06 Ben McDonald 0.0 4.8 2.1 5.0 8.1 5.5 2.3 7.5 3.7 39.0 26.99Using the old system of simply summing up each player's value for the first 15 years of his career, Kevin Brown has more than twice as much value as Ben McDonald. But if we look at the year-by-year breakdown of each player's career, it's clear that much of Brown's value came late in his career--he was 31 when he had his breakthrough season with the Marlins--which was of no value whatsoever to the team that drafted him, the Rangers. For the period of time when both pitchers were under the control of their original teams, they had essentially equal value. In fact, McDonald comes out slightly ahead, largely because he had a fine rookie season just one year after he was drafted.
On average, a player's Discounted Value works out to about 46% of his 15-Year Value, but for players who flamed out quickly, the ratio is much higher, whereas for players who were late bloomers the ratio is much lower. Here's a list of the five most extreme players on each end of the spectrum:
Player 15-Year WARP DisVal Ratio Player 15-Year WARP DisVal Ratio Ariel Prieto 7.9 7.07 0.895 Mike Remlinger 23.7 1.42 0.060 Jim Parque 11.3 9.39 0.831 Paul Abbott 10.2 0.67 0.065 Matt Anderson 7.1 5.71 0.804 Anthony Telford 14.6 1.23 0.085 Dave Fleming 14.8 11.86 0.801 Rudy Seanez 8.4 0.96 0.115 Brian Barnes 7.6 5.97 0.785 David Weathers 22.2 2.65 0.119Ariel Prieto, the one player in our draft study who didn't fall into the "high school," "college," or "junior college" camps, was drafted in 1995 and made nine starts for the A's that year, hung around as a .500 pitcher for two more years, then fell off the face of the earth. He wasn't a particularly good draft pick, but he did provide an immediate return for the A's. Compare him to Mike Remlinger, who has appeared in nearly 600 major-league games…but only eight of those came for the team that drafted him, the Giants; he didn't permanently stick in the major leagues until ten years after he was drafted.
So…now that we have a method to determine the Discounted Value (henceforth called the DV) of every drafted player, we can come up with a formula to calculate how much DV a team can expect from a specific draft pick. Only one adjustment had to be made, which is that the value of more recent picks had to be adjusted to account for the fact that a player drafted in, say, 1999 hasn't had an opportunity to achieve his full DV. The adjustment I made was to figure out what proportion of a player's DV was earned through Y5, since a player drafted in 1999 has only played through Y5, and then give all 1999 draftees credit for future performance by assuming they would continue to amass DV at a typical rate. For 1998 draftees, I looked at the average proportion of DV earned through Y6, and so on. (Keep in mind that these proportions differ for high school vs. college picks, as college picks tend to reach the majors sooner and earn more of their DV early in their careers.)
That adjustment having been made, the chart below plots the average DV of every draft pick from the #1 through the #100 pick in each draft:
The shape of this chart looks very similar to the 15-Year WARP chart that was run in the first article. The key now was to construct a best-fit curve that would smooth out the random spikes and troughs in this chart. The first thing to notice was that the drop in value was not linear, but rather logarithmic; the value of a draft picks comes closer and closer to zero as we move later in the draft, but it never crosses zero--it's impossible for a draft pick to have negative value. (Though some have tried.)
Without getting into the mathematical details here (in large part because my methods would mortify the mathematicians in the audience), what I found was that the best way to approximate the values in the chart above was to assume there was an inflection point around pick 38 or so. Up until that pick, the value of each pick dropped by about 4.5% from the preceding pick; after pick 38, the depreciation rate fell to about 1.2%. Here's the same chart, with the best-fit curve overlaying the actual data:
Now we have a tool to measure the expected value of every draft pick in our study. I can say, for instance, that the average DV of the #1 overall pick is 12.96 WARP; the average DV of the #100 overall pick is only 1.01. Again, an adjustment has to be made for more recent draft picks; the #1 overall pick in 1999 is worth only 3.97 WARP. This figure--the average DV--can also be called the Expected Value of that draft pick, or XV.
Armed with this knowledge, it becomes a cinch to answer a question like, who were the five best draft picks in our study?
Player DV XV Diff Barry Bonds 56.34 10.40 45.93 Mike Mussina 43.09 5.47 37.62 Frank Thomas 46.24 9.95 36.28 John Olerud 36.78 1.35 35.43 Will Clark 46.75 12.40 34.35Bonds is an easy #1, but Will Clark, who had the second-highest DV of any player in the study, falls to fifth by this measure because the as second overall pick, he could have been expected to have more value than Frank Thomas (#7), Mike Mussina (#20), and especially John Olerud (#78).
How about another question: which draft class (at least among the first 100 picks) had the most talent? What the following chart shows is the DV, the XV, and then the margin by which the DV exceeds or lags the XV in percentage terms. So, for instance, the players selected in 1984 combined to produce 5.6% less discounted value than they should have, had they all been "average" draft picks.
Year DV XV Margin 1984 299.2 317.0 - 5.6% 1985 433.5 317.6 +36.5% 1986 309.2 322.5 - 4.1% 1987 419.2 321.6 +30.4% 1988 363.0 323.8 +12.1% 1989 313.9 306.2 + 2.5% 1990 297.9 331.1 -10.0% 1991 298.5 309.4 - 3.5% 1992 289.6 322.6 -10.2% 1993 328.7 324.7 + 1.3% 1994 212.4 321.1 -33.8% 1995 322.5 300.8 + 7.2% 1996 223.6 242.4 - 7.7% 1997 167.6 221.7 -24.4% 1998 228.4 189.8 +20.3% 1999 136.4 135.2 + 0.9%The 1985 draft has long held serve as the strongest draft of all time, and as this study shows, with good reason--players drafted that year include Bonds, Clark, Barry Larkin, Rafael Palmeiro, and Randy Johnson. The 1987 draft doesn't receive as much attention, but was nearly as fruitful, with Craig Biggio, Ken Griffey Jr., Kevin Appier, Ray Lankford, and Albert Belle leading the pack.
The 1994 draft? The first five picks were Paul Wilson, Ben Grieve, Dustin Hermanson, Antone Williamson, and Josh Booty. The scary thing is that Hermanson was actually the fifth-best pick among the top 100; only Nomar Garciaparra, Aaron Boone, A.J. Pierzynski, and Brian Meadows (Brian Meadows?!) were better.
(Keep in mind, even the 1994 draft looks like a gold mine compared to the 2000 affair. Five years later, only two first-round picks from that season are even marginal major-leaguers: Rocco Baldelli and Chase Utley.)
Now let's see what this new draft tool has to say about the issue that consumed most of the past five articles in this series: the debate between high school and college picks.
First, the comparison of the overall data from 1984 to 1999:
Class Players DV XV Margin COL 715 2965.6 2455.7 +20.8% HS 749 1479.5 2003.4 -26.2% JuCo 62 191.5 137.7 +39.1%As we would expect, college players hold a significant edge on high school players. If we compare the two groups to each other, a draft pick spent on a collegiate player is worth 64% more than the same draft pick spent on a high school player. This figure isn't far off from the 55% edge that we calculated using the older method in Part 2.
What is interesting is that Junior College players--an admittedly small group of draftees--have yielded significantly more value than the players taken from four-year institutions.
Looking at the data from 1984 to 1991 only:
Class Players DV XV Margin COL 378 1837.1 1349.6 +36.1% HS 347 755.4 1113.9 -32.2% JuCo 35 141.8 85.9 +65.1%Again, as we would expect, the differences between college and high school players are accentuated during this period. College players were worth almost exactly double (101% more, to be exact) what high school picks were worth. What also stands out is that Junior College players from this period were VERY valuable. Only 35 players were signed out of JuCo in those eight drafts, but among those 35 were Appier, Alex Fernandez, Ray Lankford, and Jaime Navarro. Take those four players out, and Junior College players would perform under expectations, an example of how only a few players can make a difference when you're looking at small groups.
From 1992 to 1999, the data stacks up like this:
Class Players DV XV Margin COL 337 1128.5 1106.1 + 2.0% HS 402 724.1 889.6 -18.6% JuCo 27 49.7 51.8 - 4.0%Just as we discovered in Part 3, the disparity between collegiate and high school picks has been compressed dramatically. College players were 25% more valuable than high school players during this period. This number is significantly higher than the 8% figure that we arrived at in Part 3, but keep in mind that the assumptions have changed quite a bit. For instance, in Part 4 we pointed out that if we simply ignored a player's value after Y10 (instead of after Y15) the advantage enjoyed by college players increased to 21%. This is a reasonable assumption, one that is employed by our new formula. (Actually, high school picks do get a little credit for what they produced in Y11 and Y12, while college picks don't.)
Essentially, what I am saying is that simply using 15-Year WARP data probably understated the value of college picks a little. In each of the three charts above, college players have a more pronounced edge over high school players than they did using our original method. It's not an enormous difference, and it does not undercut our conclusion that the collegiate edge shrunk dramatically in the 1990s. But there are differences in the conclusions reached by the two methods, and I strongly feel that the newer method yields a more accurate result. I am comfortable stating, then, that for the most recent years in our study, collegiate draft picks yielded approximately 25% more value than high school players.
The value of junior college picks normalized during this period, but there is certainly no evidence to indicate that junior college players are a bad value; it's safe to say that they are at least as good a value as regular college players.
Finally, it's interesting to note that as a whole, players selected from 1992 to 1999 appear to be less valuable than picks from 1984 to 1991. I suggested this as a possibility earlier, but this data strongly supports the notion that the increasing internationalization of Major League Baseball has made the draft a less fruitful source of talent than in years past. Overall, draft picks from 1984 to 1991 were worth 15.6% more than picks from 1992 to 1999.
Next time--whenever that is--I'll use this new method to examine the value of draft picks at different positions on the diamond, and then wrap up by examining which teams have done the best (and worst) jobs of drafting talent over the last 20 years.