May 27, 2014
The Annual Amateur Draft Guessing Game
Next week is going to be a big one for all 30 teams in Major League Baseball. It’s draft week! The Rule 4 Draft (which is the fancy name for the amateur draft) will take place from June 5th through June 7th. There will be pageantry (which is a fancy name for people trying to make a boring administrative event into a less-boring administrative event). There will be Hall of Famers representing teams. And the end result of a year of hard work by your favorite team’s scouting staff will come to fruition in the form of 30 teams making a bunch of wild guesses.
Every general manager has to deal with at least one “you could have had this guy, but you drafted the other guy and he never made it!” complaint. I think they all involve the 2009 Draft and Mike Trout. The draft is an inexact science. Unlike the NFL and NBA, where draftees are put directly into the starting lineup, it’s going to be a while before a team sees the fruits, whether luscious or rotten, of their draft. You have to project what a guy who only recently attained the right to vote will look like at age 27. Every year, there are can’t-miss prospects who end up missing and “he’s a nice org guy” picks who turn into really good players.
But how good are teams at predicting the future? All 30 teams have a scouting department filled with people who are experts at evaluating amateur talent, many whom measure their experience in decades. They get access to all sorts of extra information that is not public. They have cross-checkers and big secret meetings. They have every incentive to get this right because, at the end of the day, someone will be cutting checks with a lot of zeroes in them. They’d better get it right.
In a perfect world, the team that picks first should take the player who will eventually provide the most major-league value. The team picking second should take the guy who will provide the second-most value. But then, in the past, the draft was a game that was half proper draft and half auction. Teams with high picks would pass on players whom they believed to be more talented, but whom they saw as wanting too much money. For example, in 2001, the Minnesota Twins drafted Joe Mauer no. 1 overall and gave him a signing bonus of $4 million. The no. 2 overall pick was Mark Prior, who actually got more money ($4.6 million), as did no. 4 pick Gavin Floyd at $4.2 million. The Tampa Bay (Devil) Rays got Dewon Brazelton for the bargain price of $2.5 million at no. 3. (I guess you get what you pay for.) Then again, for that same $2.5M figure, the Rangers got Mark Teixeira two spots later. The market might be efficient, but the shoppers might be fools.
It’s been said that if you want to know how much a team really values a draft pick, don’t look at his overall position, but instead look at his signing bonus. Like everything else in life, follow the money. But if that price tag is the end result of months of research and high-level discussion among the best experts available, how good is that process?
Warning! Gory Mathematical Details Ahead!
I played around with a few indicators, but they all basically told the same story. I tested both overall pick position (i.e., Mike Trout was picked 25th overall) and signing bonus as predictors, but as expected, signing bonus was the stronger of the two, so I will report those findings. I normalized all signing bonuses so that they were entered as a percentage of all signing bonus money spent in the first 10 rounds of that draft. If a player got $3 million and there were $100 million spent overall, he got credit for receiving three percent of the overall draft pool.
As for outcomes, I coded all draft picks for whether they ever appeared in a major-league game (at any point), whether they had appeared in 162 or more (position players) or 50 or more (pitchers) games in their career, whether they produced more than one, five, or 10 WAR during their career (many are still active, so career to date), and the number of career WAR that they have posted. The answers were fairly similar across outcomes, so I will report on three: total career WAR, appearing in a major-league game, and producing at least five career WAR. Of course, teams are most concerned about what a player will produce in his first six years of major-league service time, since that's the period when they have him cost-controlled, but using career numbers should give us the same basic results.
If all 30 teams had magic crystal balls and could somehow know exactly what each draft-eligible player would produce once he got to the big leagues, then in an efficient market, there would be a price structure that developed around those wins. We would see a very close relationship between the signing bonus that a player received and his contribution. And in some sense, that’s the whole point of the scouting system, to try to predict what’s going to happen. Maybe in 2003, people weren’t thinking in terms of WAR, but we should at least see some relationship, right?
I looked at the correlation between signing bonus (again, standardized against overall spending in that draft class) and career WAR, among those who made it to the majors. The result? A correlation of .343 (an R-squared—and that will become important in a minute—of .118). If you assume that all players who never made it posted exactly zero WAR, the correlation jumps a bit to .395 (R-squared of .156). The other two outcomes—whether or not the player ever appeared in a game, and whether or not he collected five or more career WAR (to date)—are binary, so we need to use a binary logistic regression.
(Note, super-gory details ahead: The super-initiated know that there’s not a “real” R-squared statistic in binary logit. There’s Nagelkerke’s pseudo R-squared, which does mostly the same thing, and for our purposes we’re going to say that it’s good enough. But, since we’re already playing funny math and because we’re already using Pearson correlations above, I’m going to commit one more statistical sin and take the square root of the Nagelkerke R-squared. Now it’s Nagelkerke’s R! Yes, this is a little slapdash, but for good cause. If you read this whole paragraph and understood it, you win a cookie.)
Using Nagelkerke’s R-squared, the size of the signing bonus picks up 19.5 percent of the variance in whether or not the player even made it to the majors, for a “correlation” of 0.44. For the relationship between signing bonus and whether the player achieved five WAR or more, the R-squared is 14.8 percent, for a “correlation” of 0.38.
The maddening thing about correlations in the mid-30s to the mid-40s is that you can’t dismiss them out of hand, but they aren’t all that impressive either. It means that teams have some clue what they’re doing, but there’s so much room for error. Maybe what we need is some semblance of a baseline. In 2013, the correlation between salary and WAR among those making more than $1,000,000 was .23. I played around with various filters, but the result was always similar. So, from that point of view, teams are doing a better job of understanding the market for wins that prospects will eventually put up (years in the future) than they are of understanding market for actual MLB free agents.
To make sure that the numbers weren’t being spoiled by one bad draft, I looked at each draft individually, with the following correlations (or pseudo-correlations) for each indicator:
Looks like 2005 was a good year for teams matching their signing bonuses to the eventual product, although the message across time is pretty clear.
Let’s see if the old adage that college players are safer bets than high school players is true.
In general here, there’s a stronger correlation between signing bonus and eventual performance for players drafted out of college, which is consistent with the fact that college players are more well-known quantities. In fact, the R-squared for college players is twice that for high-school players.
What about by round? Are teams better at matching signing bonus to performance in the first round when everyone’s paying attention or are they consistent throughout?
Whoa! In the first round, we see some reasonable correlations between signing bonus (our proxy for how much teams value each player) and what they end up becoming. By the second round, teams are guessing.
Finally, pitchers vs. hitters.
Looks like a tie to me.
Did You Say Guessing?
But we did learn a few interesting things in the process about how good teams are at drafting. It is true that teams are better at pricing college players than high-school grads. That shouldn’t surprise anyone, because they have at least three more years of data to draw from with the college kids. The high-school kids are more likely to be high-volatility types, and that makes for a poor correlation. Contrary to popular belief, hitters do not end up being safer (or at least more properly priced) bets than pitchers. But the big finding comes from our correlations by round. All told, teams price things a little better in the draft than they do in the free agent market. They have to wait 3-5 years before getting any returns on that investment, but maybe that’s the point. Because they don’t have to deal with the added blinder of “this could be the one piece that takes us over the top next year” teams are actually able to behave a bit more rationally.
But then there’s the finding about how teams seem to do decently well in matching their valuations to what actually happens when it comes to the first round, but by the second round and beyond, their valuations have almost no relationship the eventual outcomes. In other words, teams are in effect drafting grab bags. That could mean that it’s really only the top 40 or so (I included supplemental first-rounders as first-rounders) players that the league really has a grip on. Certainly, some of those fourth-round picks go on to the majors and turn into at least useful players. And given that a cost-controlled player costs about half as much per win as a free agent player does, there’s a lot of value to be gained from hitting on those fourth-rounders.
Everyone worries about the first round and trying to figure out which of the top 10 players will have the best career. A place where there’s a lot more work to be done is figuring out which fourth-rounder might turn into a decent bullpen arm to give a team some value. If we’re to take a signing bonus as the market price for future talent, then the market is currently inefficient. Since teams have every incentive to take the player who offers them the most value for whatever budget they have, then the problem must be that their best guesses, once they get into the second and third round, just aren’t that good.