Next week is going to be a big one for all 30 teams in Major League Baseball. It’s draft week! The Rule 4 Draft (which is the fancy name for the amateur draft) will take place from June 5th through June 7th. There will be pageantry (which is a fancy name for people trying to make a boring administrative event into a less-boring administrative event). There will be Hall of Famers representing teams. And the end result of a year of hard work by your favorite team’s scouting staff will come to fruition in the form of 30 teams making a bunch of wild guesses.
Every general manager has to deal with at least one “you could have had this guy, but you drafted the other guy and he never made it!” complaint. I think they all involve the 2009 Draft and Mike Trout. The draft is an inexact science. Unlike the NFL and NBA, where draftees are put directly into the starting lineup, it’s going to be a while before a team sees the fruits, whether luscious or rotten, of their draft. You have to project what a guy who only recently attained the right to vote will look like at age 27. Every year, there are can’t-miss prospects who end up missing and “he’s a nice org guy” picks who turn into really good players.
But how good are teams at predicting the future? All 30 teams have a scouting department filled with people who are experts at evaluating amateur talent, many whom measure their experience in decades. They get access to all sorts of extra information that is not public. They have cross-checkers and big secret meetings. They have every incentive to get this right because, at the end of the day, someone will be cutting checks with a lot of zeroes in them. They’d better get it right.
In a perfect world, the team that picks first should take the player who will eventually provide the most major-league value. The team picking second should take the guy who will provide the second-most value. But then, in the past, the draft was a game that was half proper draft and half auction. Teams with high picks would pass on players whom they believed to be more talented, but whom they saw as wanting too much money. For example, in 2001, the Minnesota Twins drafted Joe Mauer no. 1 overall and gave him a signing bonus of $4 million. The no. 2 overall pick was Mark Prior, who actually got more money ($4.6 million), as did no. 4 pick Gavin Floyd at $4.2 million. The Tampa Bay (Devil) Rays got Dewon Brazelton for the bargain price of $2.5 million at no. 3. (I guess you get what you pay for.) Then again, for that same $2.5M figure, the Rangers got Mark Teixeira two spots later. The market might be efficient, but the shoppers might be fools.
It’s been said that if you want to know how much a team really values a draft pick, don’t look at his overall position, but instead look at his signing bonus. Like everything else in life, follow the money. But if that price tag is the end result of months of research and high-level discussion among the best experts available, how good is that process?
Warning! Gory Mathematical Details Ahead!
I obtained data on signing bonuses here. Data were fairly complete beginning in the year 2003, and there were generally data for the first 10 rounds. I examined how well teams performed in the 2003 to 2008 drafts, as the players picked in later drafts haven’t yet had a full chance to develop and to make their major-league selves known yet. I got career data from Baseball-Reference, who had their draft data set up so nicely for my purposes.
I played around with a few indicators, but they all basically told the same story. I tested both overall pick position (i.e., Mike Trout was picked 25th overall) and signing bonus as predictors, but as expected, signing bonus was the stronger of the two, so I will report those findings. I normalized all signing bonuses so that they were entered as a percentage of all signing bonus money spent in the first 10 rounds of that draft. If a player got $3 million and there were $100 million spent overall, he got credit for receiving three percent of the overall draft pool.
As for outcomes, I coded all draft picks for whether they ever appeared in a major-league game (at any point), whether they had appeared in 162 or more (position players) or 50 or more (pitchers) games in their career, whether they produced more than one, five, or 10 WAR during their career (many are still active, so career to date), and the number of career WAR that they have posted. The answers were fairly similar across outcomes, so I will report on three: total career WAR, appearing in a major-league game, and producing at least five career WAR. Of course, teams are most concerned about what a player will produce in his first six years of major-league service time, since that's the period when they have him cost-controlled, but using career numbers should give us the same basic results.
If all 30 teams had magic crystal balls and could somehow know exactly what each draft-eligible player would produce once he got to the big leagues, then in an efficient market, there would be a price structure that developed around those wins. We would see a very close relationship between the signing bonus that a player received and his contribution. And in some sense, that’s the whole point of the scouting system, to try to predict what’s going to happen. Maybe in 2003, people weren’t thinking in terms of WAR, but we should at least see some relationship, right?
I looked at the correlation between signing bonus (again, standardized against overall spending in that draft class) and career WAR, among those who made it to the majors. The result? A correlation of .343 (an R-squared—and that will become important in a minute—of .118). If you assume that all players who never made it posted exactly zero WAR, the correlation jumps a bit to .395 (R-squared of .156). The other two outcomes—whether or not the player ever appeared in a game, and whether or not he collected five or more career WAR (to date)—are binary, so we need to use a binary logistic regression.
(Note, super-gory details ahead: The super-initiated know that there’s not a “real” R-squared statistic in binary logit. There’s Nagelkerke’s pseudo R-squared, which does mostly the same thing, and for our purposes we’re going to say that it’s good enough. But, since we’re already playing funny math and because we’re already using Pearson correlations above, I’m going to commit one more statistical sin and take the square root of the Nagelkerke R-squared. Now it’s Nagelkerke’s R! Yes, this is a little slapdash, but for good cause. If you read this whole paragraph and understood it, you win a cookie.)
Using Nagelkerke’s R-squared, the size of the signing bonus picks up 19.5 percent of the variance in whether or not the player even made it to the majors, for a “correlation” of 0.44. For the relationship between signing bonus and whether the player achieved five WAR or more, the R-squared is 14.8 percent, for a “correlation” of 0.38.
The maddening thing about correlations in the mid-30s to the mid-40s is that you can’t dismiss them out of hand, but they aren’t all that impressive either. It means that teams have some clue what they’re doing, but there’s so much room for error. Maybe what we need is some semblance of a baseline. In 2013, the correlation between salary and WAR among those making more than $1,000,000 was .23. I played around with various filters, but the result was always similar. So, from that point of view, teams are doing a better job of understanding the market for wins that prospects will eventually put up (years in the future) than they are of understanding market for actual MLB free agents.
To make sure that the numbers weren’t being spoiled by one bad draft, I looked at each draft individually, with the following correlations (or pseudo-correlations) for each indicator:
Year |
Bonus-Total WAR |
Bonus-Appeared in MLB |
Bonus-Reached 5 WAR |
2003 |
.22 |
.40 |
.31 |
2004 |
.33 |
.48 |
.41 |
2005 |
.50 |
.50 |
.52 |
2006 |
.35 |
.45 |
.32 |
2007 |
.41 |
.43 |
.42 |
2008 |
.31 |
.40 |
.37 |
Looks like 2005 was a good year for teams matching their signing bonuses to the eventual product, although the message across time is pretty clear.
Let’s see if the old adage that college players are safer bets than high school players is true.
Drafted from |
Bonus-Total WAR |
Bonus-Appeared in MLB |
Bonus-Reached 5 WAR |
High School |
.24 |
.44 |
.31 |
College |
.40 |
.47 |
.43 |
In general here, there’s a stronger correlation between signing bonus and eventual performance for players drafted out of college, which is consistent with the fact that college players are more well-known quantities. In fact, the R-squared for college players is twice that for high-school players.
What about by round? Are teams better at matching signing bonus to performance in the first round when everyone’s paying attention or are they consistent throughout?
Round |
Bonus-Total WAR |
Bonus-Appeared in MLB |
Bonus-Reached 5 WAR |
1st |
.35 |
.37 |
.36 |
2nd |
-.11 (yes, negative) |
.00 (yes, zero) |
.00 |
3rd |
-.05 |
.11 |
.00 |
4th |
.16 |
.00 |
.10 |
5th-10th |
.05 |
.15 |
.10 |
Whoa! In the first round, we see some reasonable correlations between signing bonus (our proxy for how much teams value each player) and what they end up becoming. By the second round, teams are guessing.
Finally, pitchers vs. hitters.
Position |
Bonus-Total WAR |
Bonus-Appeared in MLB |
Bonus-Reached 5 WAR |
Pitcher |
.38 |
.45 |
.37 |
Non-Pitcher |
.32 |
.43 |
.39 |
Looks like a tie to me.
Did You Say Guessing?
We know just from reading a recap of old first rounds that there’s a lot of randomness in drafting. Guys who were supposed to pan out sometimes don’t. There’s no doubt that in 2020, we’ll all look back on the 2014 draft and shake our heads at what could have been for some team. Trying to figure out what an 18-year-old will look like when he’s 27 is hard. It’s half of his life away. Try to remember that.
But we did learn a few interesting things in the process about how good teams are at drafting. It is true that teams are better at pricing college players than high-school grads. That shouldn’t surprise anyone, because they have at least three more years of data to draw from with the college kids. The high-school kids are more likely to be high-volatility types, and that makes for a poor correlation. Contrary to popular belief, hitters do not end up being safer (or at least more properly priced) bets than pitchers. But the big finding comes from our correlations by round. All told, teams price things a little better in the draft than they do in the free agent market. They have to wait 3-5 years before getting any returns on that investment, but maybe that’s the point. Because they don’t have to deal with the added blinder of “this could be the one piece that takes us over the top next year” teams are actually able to behave a bit more rationally.
But then there’s the finding about how teams seem to do decently well in matching their valuations to what actually happens when it comes to the first round, but by the second round and beyond, their valuations have almost no relationship the eventual outcomes. In other words, teams are in effect drafting grab bags. That could mean that it’s really only the top 40 or so (I included supplemental first-rounders as first-rounders) players that the league really has a grip on. Certainly, some of those fourth-round picks go on to the majors and turn into at least useful players. And given that a cost-controlled player costs about half as much per win as a free agent player does, there’s a lot of value to be gained from hitting on those fourth-rounders.
Everyone worries about the first round and trying to figure out which of the top 10 players will have the best career. A place where there’s a lot more work to be done is figuring out which fourth-rounder might turn into a decent bullpen arm to give a team some value. If we’re to take a signing bonus as the market price for future talent, then the market is currently inefficient. Since teams have every incentive to take the player who offers them the most value for whatever budget they have, then the problem must be that their best guesses, once they get into the second and third round, just aren’t that good.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
In one, you are using career WAR and in the other (2013 salary and WAR), you are using one-year WAR, so of course the correlation for the one-year WAR is going to be less. That does not mean that teams are doing a better job in the draft than in valuing FA.
Back to the article in general:
When you look at individual rounds and find virtually no correlation after the first round, all that tells us is that teams cannot distinguish among the talent in each of those rounds, not that they are necessarily doing a bad job in drafting in those rounds.
For example, let's say that all players in the first round accumulated an average of 20 career WAR and all players in the second round accumulated an average of 15 career WAR. That might be a good result, I don't know. Let's say that it is. Well, the correlation of zero (or negative) in the second round simply means that teams cannot distinguish between one 15 WAR player and another, which doesn't mean that they are doing a bad job at drafting in the second round. Maybe all the talent is bunched together after the first round?
In fact, if the talent is normally distributed, then there are more and more players bunched together in talent after each round so that it becomes harder and harder for teams to distinguish among all that talent PLUS the actual spread of talent between the 30 players in each subsequent round is getting smaller and smaller so we expect that the correlation between WAR and signing bonus will also get smaller and smaller.
Even in the first round, the relatively small correlation doesn't real tell us much. What if teams are very good at identifying the best 30 or 40 players, but they are not so good at distinguishing among them? And again, what if the talent even in the first round does not have such a large spread?
All the correlations in each round tells us, from a team perspective, is that it doesn't really matter that much what position in each round you select from and you are probably better off with a lower number especially after the first round since you can save yourself some money in signing bonuses.
Also, isn't there a lot of noise in the signing bonuses depending on the player's agent, his family's financial situation, his personal preferences, etc.? Why wouldn't draft pick number be a better proxy for evaluating talent? with a few exceptions, aren't teams trying to pick the best available talent for each pick, regardless of what the eventual signing bonus is?
I actually ran just about everything in here with draft position as the predictor. Signing bonus was a better predictor consistently. These drafts (03-08) are back in the era before slotting (or even "slotting") and teams made "signability" picks in the top ten, so there were likely players whom the teams believed weren't as good who got picked ahead of better players who wanted bigger bonuses.
It's true that the talent may bunch in different ways, but in an efficient market, where everyone has perfect information (obviously not actually the case), the signing bonuses should sort that out as well. What strikes me about all of this is that the market is a long way away from efficient, which I take as "prospectin' is hard."
" The message there is that the correlation ain't .80 even when we have MLB data to work from and we're not projecting into the future."
It can't be that high even with perfect information, because there is too much random variance in one season of performance. Remember that even with perfect information (if we knew every player's exact true talent and payed them fair market value for that), the smaller the time frame, the lower the correlation because correlation includes random variance. That was MY point. Comparing correlations tells you nothing about how much information you have or how efficient you are at evaluating talent, when one correlation is based on one year of performance and the other is based on a career (or at least several years).
When I worked for the Cardinals in 2004, I studied the draft. I found that after the first round, using nothing but college MLE's would do better than the actual draft. In other words, after the first round, scouting is very inefficient, as your analysis suggests. Of course, drafting is probably better now than it was 10 years ago, as many teams are using sophisticated analytics to project ML performance from high school and college performance. There is also more emphasis on defense and positional adjustments now then there used to be. In the old days, teams would much rather draft the slugging outfielder or first baseman than the mediocre hitting excellent SS. Not so anymore, or at least it shouldn't be, although I suspect that most teams still overvalue hitting at the amateur level at the expense of defense and positional adjustments (and probably base running too). Everyone wants the next great hitter and not the next great fielder.