BP Comment Quick Links


May 27, 2014 Baseball TherapyThe Annual Amateur Draft Guessing Game
Next week is going to be a big one for all 30 teams in Major League Baseball. It’s draft week! The Rule 4 Draft (which is the fancy name for the amateur draft) will take place from June 5th through June 7th. There will be pageantry (which is a fancy name for people trying to make a boring administrative event into a lessboring administrative event). There will be Hall of Famers representing teams. And the end result of a year of hard work by your favorite team’s scouting staff will come to fruition in the form of 30 teams making a bunch of wild guesses. Every general manager has to deal with at least one “you could have had this guy, but you drafted the other guy and he never made it!” complaint. I think they all involve the 2009 Draft and Mike Trout. The draft is an inexact science. Unlike the NFL and NBA, where draftees are put directly into the starting lineup, it’s going to be a while before a team sees the fruits, whether luscious or rotten, of their draft. You have to project what a guy who only recently attained the right to vote will look like at age 27. Every year, there are can’tmiss prospects who end up missing and “he’s a nice org guy” picks who turn into really good players. But how good are teams at predicting the future? All 30 teams have a scouting department filled with people who are experts at evaluating amateur talent, many whom measure their experience in decades. They get access to all sorts of extra information that is not public. They have crosscheckers and big secret meetings. They have every incentive to get this right because, at the end of the day, someone will be cutting checks with a lot of zeroes in them. They’d better get it right. In a perfect world, the team that picks first should take the player who will eventually provide the most majorleague value. The team picking second should take the guy who will provide the secondmost value. But then, in the past, the draft was a game that was half proper draft and half auction. Teams with high picks would pass on players whom they believed to be more talented, but whom they saw as wanting too much money. For example, in 2001, the Minnesota Twins drafted Joe Mauer no. 1 overall and gave him a signing bonus of $4 million. The no. 2 overall pick was Mark Prior, who actually got more money ($4.6 million), as did no. 4 pick Gavin Floyd at $4.2 million. The Tampa Bay (Devil) Rays got Dewon Brazelton for the bargain price of $2.5 million at no. 3. (I guess you get what you pay for.) Then again, for that same $2.5M figure, the Rangers got Mark Teixeira two spots later. The market might be efficient, but the shoppers might be fools. It’s been said that if you want to know how much a team really values a draft pick, don’t look at his overall position, but instead look at his signing bonus. Like everything else in life, follow the money. But if that price tag is the end result of months of research and highlevel discussion among the best experts available, how good is that process? Warning! Gory Mathematical Details Ahead! I played around with a few indicators, but they all basically told the same story. I tested both overall pick position (i.e., Mike Trout was picked 25th overall) and signing bonus as predictors, but as expected, signing bonus was the stronger of the two, so I will report those findings. I normalized all signing bonuses so that they were entered as a percentage of all signing bonus money spent in the first 10 rounds of that draft. If a player got $3 million and there were $100 million spent overall, he got credit for receiving three percent of the overall draft pool. As for outcomes, I coded all draft picks for whether they ever appeared in a majorleague game (at any point), whether they had appeared in 162 or more (position players) or 50 or more (pitchers) games in their career, whether they produced more than one, five, or 10 WAR during their career (many are still active, so career to date), and the number of career WAR that they have posted. The answers were fairly similar across outcomes, so I will report on three: total career WAR, appearing in a majorleague game, and producing at least five career WAR. Of course, teams are most concerned about what a player will produce in his first six years of majorleague service time, since that's the period when they have him costcontrolled, but using career numbers should give us the same basic results. If all 30 teams had magic crystal balls and could somehow know exactly what each drafteligible player would produce once he got to the big leagues, then in an efficient market, there would be a price structure that developed around those wins. We would see a very close relationship between the signing bonus that a player received and his contribution. And in some sense, that’s the whole point of the scouting system, to try to predict what’s going to happen. Maybe in 2003, people weren’t thinking in terms of WAR, but we should at least see some relationship, right? I looked at the correlation between signing bonus (again, standardized against overall spending in that draft class) and career WAR, among those who made it to the majors. The result? A correlation of .343 (an Rsquared—and that will become important in a minute—of .118). If you assume that all players who never made it posted exactly zero WAR, the correlation jumps a bit to .395 (Rsquared of .156). The other two outcomes—whether or not the player ever appeared in a game, and whether or not he collected five or more career WAR (to date)—are binary, so we need to use a binary logistic regression. (Note, supergory details ahead: The superinitiated know that there’s not a “real” Rsquared statistic in binary logit. There’s Nagelkerke’s pseudo Rsquared, which does mostly the same thing, and for our purposes we’re going to say that it’s good enough. But, since we’re already playing funny math and because we’re already using Pearson correlations above, I’m going to commit one more statistical sin and take the square root of the Nagelkerke Rsquared. Now it’s Nagelkerke’s R! Yes, this is a little slapdash, but for good cause. If you read this whole paragraph and understood it, you win a cookie.) Using Nagelkerke’s Rsquared, the size of the signing bonus picks up 19.5 percent of the variance in whether or not the player even made it to the majors, for a “correlation” of 0.44. For the relationship between signing bonus and whether the player achieved five WAR or more, the Rsquared is 14.8 percent, for a “correlation” of 0.38. The maddening thing about correlations in the mid30s to the mid40s is that you can’t dismiss them out of hand, but they aren’t all that impressive either. It means that teams have some clue what they’re doing, but there’s so much room for error. Maybe what we need is some semblance of a baseline. In 2013, the correlation between salary and WAR among those making more than $1,000,000 was .23. I played around with various filters, but the result was always similar. So, from that point of view, teams are doing a better job of understanding the market for wins that prospects will eventually put up (years in the future) than they are of understanding market for actual MLB free agents. To make sure that the numbers weren’t being spoiled by one bad draft, I looked at each draft individually, with the following correlations (or pseudocorrelations) for each indicator:
Looks like 2005 was a good year for teams matching their signing bonuses to the eventual product, although the message across time is pretty clear. Let’s see if the old adage that college players are safer bets than high school players is true.
In general here, there’s a stronger correlation between signing bonus and eventual performance for players drafted out of college, which is consistent with the fact that college players are more wellknown quantities. In fact, the Rsquared for college players is twice that for highschool players. What about by round? Are teams better at matching signing bonus to performance in the first round when everyone’s paying attention or are they consistent throughout?
Whoa! In the first round, we see some reasonable correlations between signing bonus (our proxy for how much teams value each player) and what they end up becoming. By the second round, teams are guessing. Finally, pitchers vs. hitters.
Looks like a tie to me. Did You Say Guessing? But we did learn a few interesting things in the process about how good teams are at drafting. It is true that teams are better at pricing college players than highschool grads. That shouldn’t surprise anyone, because they have at least three more years of data to draw from with the college kids. The highschool kids are more likely to be highvolatility types, and that makes for a poor correlation. Contrary to popular belief, hitters do not end up being safer (or at least more properly priced) bets than pitchers. But the big finding comes from our correlations by round. All told, teams price things a little better in the draft than they do in the free agent market. They have to wait 35 years before getting any returns on that investment, but maybe that’s the point. Because they don’t have to deal with the added blinder of “this could be the one piece that takes us over the top next year” teams are actually able to behave a bit more rationally. But then there’s the finding about how teams seem to do decently well in matching their valuations to what actually happens when it comes to the first round, but by the second round and beyond, their valuations have almost no relationship the eventual outcomes. In other words, teams are in effect drafting grab bags. That could mean that it’s really only the top 40 or so (I included supplemental firstrounders as firstrounders) players that the league really has a grip on. Certainly, some of those fourthround picks go on to the majors and turn into at least useful players. And given that a costcontrolled player costs about half as much per win as a free agent player does, there’s a lot of value to be gained from hitting on those fourthrounders. Everyone worries about the first round and trying to figure out which of the top 10 players will have the best career. A place where there’s a lot more work to be done is figuring out which fourthrounder might turn into a decent bullpen arm to give a team some value. If we’re to take a signing bonus as the market price for future talent, then the market is currently inefficient. Since teams have every incentive to take the player who offers them the most value for whatever budget they have, then the problem must be that their best guesses, once they get into the second and third round, just aren’t that good.
Russell A. Carleton is an author of Baseball Prospectus. Follow @pizzacutter4
13 comments have been left for this article. (Click to hide comments) BP Comment Quick Links Geoff Young (46563) "The market might be efficient, but the shoppers might be fools." This might be my new favorite phrase. May 27, 2014 08:48 AM MGL (2121) "In 2013, the correlation between salary and WAR among those making more than $1,000,000 was .23. I played around with various filters, but the result was always similar. So, from that point of view, teams are doing a better job of understanding the market for wins that prospects will eventually put up (years in the future) than they are of understanding market for actual MLB free agents." May 27, 2014 09:07 AM I threw in the single year (2013) MLB salary stuff as a quick benchmark. The message there is that the correlation ain't .80 even when we have MLB data to work from and we're not projecting into the future. May 27, 2014 10:13 AM MGL (2121) Thank you for the explanation about why signing bonus is a better predictor than actual pick number, at least before "slotting." It makes sense. May 27, 2014 18:42 PM mitchiapet (27863) Question in regards to the Gory Math: Can you provide the c statistics for the logistic regressions you ran? May 27, 2014 09:22 AM jfranco77 (64578) Isn't there a little bit of bias built in? Teams are more likely to give their first rounder some time in the majors to justify their investment to the fans. I don't know how to account for that but I suspect it is there somewhere. May 27, 2014 09:34 AM Not a subscriber? Sign up today!

Just a thought, but would the high school versus college comparisons not be impacted by the fact that high school players will have had less time to accumulate WAR. For example, a high school pick from 2008 might reasonably have reached the bigs in 2012, and would have to have done pretty well from the start to have reached 5WAR by now. A college guy might have reached MLB in 2010, and would have that much more time to start WAR collecting.
In those comparison, I'm comparing HS to HS and college to college. So, there will be some 2003 HS grads and 2008 HS grads in there, same as there will be 2003 college and 2008 college grads. Sure, there's bias in that a 2003 HS grad would be 29 now, while a 2008 HS grad would only be 24. It's why I sliced things a few other ways. What amazed me was that the correlations stayed so consistent. And moderate.