Potential is a funny thing. The team that manages to grab the most players who outperform expectations often wins fantasy leagues. Every spring we hear about breakout candidates and which players stand the best chance of outperforming their projections. Often, these breakout candidates are selected based on their tools and their pedigree—their potential. While this kind of subjective and scouting data is very important, few people outside of Major League Baseball have a database with scouting reports on enough players dating back as far as we’d need to run a study to examine what these things actually tell us—not to mention all of the complications that would go into such a study. But there is one freely available tool that I thought might make for an interesting study: Baseball America’s archive of their Top 100 prospect lists dating back to 1990.

Today, I wanted to run a study using this archive as a proxy for pedigree to see how much pedigree matters for players who have already made it to the majors. Once a player is in the majors, does his pedigree make him more likely to break out?

The Study
Using Baseball America’s Top 100 Prospect archive and Jeff Sackmann’s infinitely useful historical Marcels database, I’ve looked at how players at each age over- or underperform their Marcels projection based on whether they had ever made the top prospect list. Herein, the two groups will be called “prospects” and “non-prospects,” though players in the “prospect” group will often no longer actually be prospects. This group will include anyone who was ever a top prospect, such as a 27-year-old Alex Gordon.

I’ve only included an age in the study if at least 200 prospects and 200 non-prospects qualified, which narrows it down to ages 23 to 30 for batters and ages 22 to 28 for pitchers. The samples are smaller for younger players because very few make it to the majors that young (especially if they’re not a top prospect) and are smaller for older players because the data only goes back to 1990. For guys who were top prospects in, say, 2000, they haven’t reached their 30s yet to be included in those buckets. Included are all players since 1990 that played in the majors and had a Marcel projection for the season (so rookies are not included).

You’ll notice that in some instances, both groups tend to underperform their projections. This doesn’t seem right at first since, on the whole, players should meet their projections. But Marcel is a very simple projection system (intentionally so) and regresses everyone to league average regardless of age. That’s somewhat incorrect, however, since the average 23-year-old is not the same as the average 28-year-old. On average, 23-year-olds are worse than 28-year-olds, so 23-year-olds will usually underperform their Marcel projection because Marcel’s regression is treating them as if they’re 28-year-olds. This doesn’t make a difference for our purposes, though, since we only care about relative performance between the two groups. Maybe all 23-year-old underperform their Marcel, but if prospects underperform less than non-prospects do, that’s valuable information (and implies they would overperform a more complex projection system, like PECOTA, which accounts for age when regressing but doesn’t account for pedigree).

Batting Average
(Blue Shading indicates which group does better)

Sure enough, we notice top prospects outperforming their Marcel projection more than non-prospects do (or, rather, underperforming less), and the effects get smaller the older a player gets (and, on average, the farther away he gets from the time when he was a prospect).

The 23- and 24-year-old prospects outperform their projections by 0.013 points of batting average more than non-prospects do. That’s a large number. By the time a player reaches the age 28-30 level, the effect has been greatly diminished, and it appears that after 30, it doesn’t matter at all that the player was once a top prospect. Here’s a graph to better visualize how it happens:

Home Runs

The table to the left displays the difference in home runs per 700 plate appearances (roughly one full season). Like with batting average, we see some very significant results here, again with a downward slope until the player turns 30. Once the player turns 30, the effects essentially evaporate. At 23 and 24, though, the prospects hit 5-6 more home runs over the course of a full season than non-prospects do, relative to their Marcel projections.

It’s also interesting to note that while non-prospects underperform their Marcel projections for almost every age, the top prospects actually manage to slightly overperform.

Stolen Bases

We don’t see steals with the same pattern we have with homers (again, on a per 700 plate appearance basis). It’s very roller coaster-y with no real pattern and none of the effects really being very significant. My guess is because in order to be a top prospect, you have to be able to hit. And if you can hit, you’re likely a top prospect. But you don’t need to be fast to be a top prospect, and you can be fast without being a top prospect. As a result, prospect status seems to matter little for a speed player on a rate basis. Where it may make a difference is in regard to playing time. If a top prospect has speed, he can also probably hit and will receive more playing time (and thus, steal more bases) as a result.


It’s hard to see on the chart, but there is a pattern here. Scroll down to the graph to get a better visual of it. ERA, as we all know, is very unstable and difficult to predict with much accuracy, but it does appear that top prospects have better ERAs than their non-prospect counterparts do, with the effect fizzling out around age 30. If we were to draw a best-fit line through this graph, we’d see that the youngest players will best their projected ERAs by 0.30 points, will beat it by 0.15-0.20 points at age 26, and will stop beating it by the time they reach 30 or 31.


Strikeouts (K/9)

For strikeouts, we see a much neater pattern than for ERA. The 22-year-olds overperform their projections by over a half-point of K/9. By the time a pitcher reaches his age-27 season, the effects start to become negligible, and they’re gone by 29.

Unintentional Walks (UIBB/9)


We see essentially no effect in WHIP and walks. This is likely because most pitchers achieve prospect status as a result of their stuff and their projectability—not their control. And if a pitcher does make the top prospect list because of his control, he’s going to need to succeed in the majors quickly because he’s not going to get as many chances as a prospect with good stuff. He just doesn’t have the same kind of projectability and upside.









Concluding Thoughts
I think we discovered some interesting things—namely, that a player’s pedigree is very important, and its effects are felt long after the player’s prospect status is revoked. Next week, I’ll run similar tests using a player’s draft round as a proxy for pedigree and ultimately see if the two tell us different things about a player.

You need to be logged in to comment. Login or Subscribe
I like the concept here, but the problem is that Marcel uses an extremely generic (regressed) projection for young players -- much more so that with 3+ year veterans. So the "projection" for most 23-year-olds, for example, is going to be nearly constant. Thus, the above data basically shows that young prospects are better than young non-prospects, not that prospects are beating any actual performance expectations. The only time Marcel truly represents an estimate of ability is after the player has played a few seasons in the majors, which is not coincidentally when the effects described above start to evaporate. This is probably shown most clearly in the K/9 data, where the prospects and non-prospects are going to receive roughly the same Marcel projection for their first couple years (regardless of minor league performance), and yet by the process of self-selection, prospects as a group will always have a much better K rate than non-prospects. I.e., not sure that this study is actually measuring breakout likelihood as much as it is measuring raw performance. Perhaps taking a look only at player seasons that came after at least ~1,000 PA/300IP over the previous three seasons would make it more of a breakout vs. expectations measurement, though obviously you wouldn't have a lot of data from young players.
Another thought is to check if there is a correlation between prospect ranking, and performance rel. to PECOTA projection. PECOTA is probably the best thing we have for performance-based projections of very inexperienced players, since it tries to use minor league performance. I.e., you'd be trying to isolate (for a certain set of players with a decent-sized minor league career) whether subjective rankings provide any predictive information not already contained within the player's stats and/or the fact that he made it to the majors.
"The 22-year-olds overperform their projections by over a half-point of K/9. Over 200 innings, that translates to 120 strikeouts!" - Unless I'm entirely misinterpreting your table, you'll need to divide that by 9, no?

Baseball Prospectus uses cookies on this website. They help us to understand how you use our website, which allows us to provide an improved browsing experience. Cookies are stored locally on your computer or mobile device and not by BP. To accept cookies continue browsing as normal. You will see this message only once. Privacy Policy

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. See the BP Cookie Policy for more information. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.