Potential is a funny thing. The team that manages to grab the most players who outperform expectations often wins fantasy leagues. Every spring we hear about breakout candidates and which players stand the best chance of outperforming their projections. Often, these breakout candidates are selected based on their tools and their pedigree—their potential. While this kind of subjective and scouting data is very important, few people outside of Major League Baseball have a database with scouting reports on enough players dating back as far as we’d need to run a study to examine what these things actually tell us—not to mention all of the complications that would go into such a study. But there is one freely available tool that I thought might make for an interesting study: Baseball America’s archive of their Top 100 prospect lists dating back to 1990.
Today, I wanted to run a study using this archive as a proxy for pedigree to see how much pedigree matters for players who have already made it to the majors. Once a player is in the majors, does his pedigree make him more likely to break out?
Using Baseball America’s Top 100 Prospect archive and Jeff Sackmann’s infinitely useful historical Marcels database, I’ve looked at how players at each age over- or underperform their Marcels projection based on whether they had ever made the top prospect list. Herein, the two groups will be called “prospects” and “non-prospects,” though players in the “prospect” group will often no longer actually be prospects. This group will include anyone who was ever a top prospect, such as a 27-year-old Alex Gordon.
I’ve only included an age in the study if at least 200 prospects and 200 non-prospects qualified, which narrows it down to ages 23 to 30 for batters and ages 22 to 28 for pitchers. The samples are smaller for younger players because very few make it to the majors that young (especially if they’re not a top prospect) and are smaller for older players because the data only goes back to 1990. For guys who were top prospects in, say, 2000, they haven’t reached their 30s yet to be included in those buckets. Included are all players since 1990 that played in the majors and had a Marcel projection for the season (so rookies are not included).
You’ll notice that in some instances, both groups tend to underperform their projections. This doesn’t seem right at first since, on the whole, players should meet their projections. But Marcel is a very simple projection system (intentionally so) and regresses everyone to league average regardless of age. That’s somewhat incorrect, however, since the average 23-year-old is not the same as the average 28-year-old. On average, 23-year-olds are worse than 28-year-olds, so 23-year-olds will usually underperform their Marcel projection because Marcel’s regression is treating them as if they’re 28-year-olds. This doesn’t make a difference for our purposes, though, since we only care about relative performance between the two groups. Maybe all 23-year-old underperform their Marcel, but if prospects underperform less than non-prospects do, that’s valuable information (and implies they would overperform a more complex projection system, like PECOTA, which accounts for age when regressing but doesn’t account for pedigree).
(Blue Shading indicates which group does better)
Sure enough, we notice top prospects outperforming their Marcel projection more than non-prospects do (or, rather, underperforming less), and the effects get smaller the older a player gets (and, on average, the farther away he gets from the time when he was a prospect).
The 23- and 24-year-old prospects outperform their projections by 0.013 points of batting average more than non-prospects do. That’s a large number. By the time a player reaches the age 28-30 level, the effect has been greatly diminished, and it appears that after 30, it doesn’t matter at all that the player was once a top prospect. Here’s a graph to better visualize how it happens:
The table to the left displays the difference in home runs per 700 plate appearances (roughly one full season). Like with batting average, we see some very significant results here, again with a downward slope until the player turns 30. Once the player turns 30, the effects essentially evaporate. At 23 and 24, though, the prospects hit 5-6 more home runs over the course of a full season than non-prospects do, relative to their Marcel projections.
It’s also interesting to note that while non-prospects underperform their Marcel projections for almost every age, the top prospects actually manage to slightly overperform.
We don’t see steals with the same pattern we have with homers (again, on a per 700 plate appearance basis). It’s very roller coaster-y with no real pattern and none of the effects really being very significant. My guess is because in order to be a top prospect, you have to be able to hit. And if you can hit, you’re likely a top prospect. But you don’t need to be fast to be a top prospect, and you can be fast without being a top prospect. As a result, prospect status seems to matter little for a speed player on a rate basis. Where it may make a difference is in regard to playing time. If a top prospect has speed, he can also probably hit and will receive more playing time (and thus, steal more bases) as a result.
It’s hard to see on the chart, but there is a pattern here. Scroll down to the graph to get a better visual of it. ERA, as we all know, is very unstable and difficult to predict with much accuracy, but it does appear that top prospects have better ERAs than their non-prospect counterparts do, with the effect fizzling out around age 30. If we were to draw a best-fit line through this graph, we’d see that the youngest players will best their projected ERAs by 0.30 points, will beat it by 0.15-0.20 points at age 26, and will stop beating it by the time they reach 30 or 31.
For strikeouts, we see a much neater pattern than for ERA. The 22-year-olds overperform their projections by over a half-point of K/9. By the time a pitcher reaches his age-27 season, the effects start to become negligible, and they’re gone by 29.
I think we discovered some interesting things—namely, that a player’s pedigree is very important, and its effects are felt long after the player’s prospect status is revoked. Next week, I’ll run similar tests using a player’s draft round as a proxy for pedigree and ultimately see if the two tell us different things about a player.