Deep, But Playable: Trust, Buckets, and Prospect Evaluation Frameworks

March 28, 2018

Image credit: USA Today Sports

Given that the basic framework of our prospect lists hasn’t been altered very much since at least the Jason Parks-era format, it might seem that we don’t spend a lot of time thinking about how to best convey the information within them to our audience. I promise this is not the case—Jeffrey Paternostro, Bret Sayre, and I take time every year to create a wish list of items we’d like to see on the top-10 lists, or tweaks we’d like to make to it. We then argue over whether any of those items actually enhance or compromise our ability to communicate the evaluations, and the concepts on which those evaluations are based.

An offseason discussion with Jason Wojciechowski regarding the A’s list gave me reason to question the effectiveness of our current setup. There are two prospects who appear at the back end of the list who couldn’t be more different in terms of player type, and yet both received the same 55/45 OFP/Likely grade. While Sean Murphy (ranked 10th) earned those grades with a more standard distribution of skills and risk, James Kaprielian checked in one spot above him with a much more … eclectic mix of present talent, projection, and risk.

To refresh, here is what we had to say on each

Kaprielian:

The Good: At his best, Kaprielian runs a big fastball into the upper-90s, pairs it with three offspeed pitches with potential to be above-average, and has strong command and a feel for pitching. The fastball is only sometimes that hot on velo, and he was a low-90s guy out of the draft, so it could be fool’s gold. On the other hand, the slider is showing up plus already and the change and curve both flash potential. Four-pitch potential with good stuff usually leads to writers bestowing their ace blessings on a young man. Why not here?

The Bad: Well, he had Tommy John surgery in April 2017 after barely pitching in 2015 and 2016 due to recurring elbow problems. He’ll return sometime in 2018 as a 24-year-old with 29 pro innings in three seasons, all in A-ball. He’s also never maintained the high-end velocity without his elbow barking nearly immediately after.

The Role:

OFP 55—Mid-rotation dude with tantalizing velocity/good Nate Eovaldi
Likely 45—Dude bouncing between roles with tantalizing velocity/normal Nate Eovaldi

The Risks: We have absolutely no idea whether he can stay healthy throwing a baseball, let alone whether it’s as a starter, a reliever, or somewhere in between. Since the improvement in stuff was so closely correlated with his elbow problems, we also have little idea what he’s going to look like as a healthy pitcher. There’s way more upside than the ranking indicates, but there’s also major concern he’ll never pitch effectively again.

Murphy:

The Good: Murphy is an excellent defender, with all of the building blocks you look for in a future above-average catcher. The arm is among the better ones currently in the minor leagues, with strong velocity that holds its line, and he maxes it out with fluid pops that will register plus-plus times on the regular. He controls the zone at the dish, demonstrating a clear plan of attack with a swing built to bring above-average power potential into games.

The Bad: It’s a strength swing that can get rigid through the zone, and he was exposed a bit by the more advanced arms of the Texas League, with loads of rolled-over ground ball contact the most frequent result. He puts the ball on the ground so often it’s unclear just how much of his power will play at higher levels, and while he’s not a black hole of running speed like some of his catching compatriots, he’s also not going to make it a habit of beating out too many of those left-side grounders, either.

The Role:

OFP 55—Above-average catcher
Likely 45—Solid second-division catcher

The Risks: The bar for catcher offense is so low, and Murphy’s defensive baseline high enough, that it’s not hard to envision him hitting enough to produce comfortably above-average value behind the dish. It’s wise to temper expectations for the offense panning out to ceiling, but the fundamentals here are such that Murphy’s a higher likelihood prospects than most.

As you can see, Kaprielian poses a problem within our valuation framework: by all accounts, when he’s healthy he projects as a potential top-of-the-rotation starter, with an upper-90s fastball and the potential for multiple above-average breakers. That type of ceiling generally earns an OFP (Overall Future Potential) of higher than 55—even if OFP isn’t representative of a true ceiling. Conversely, the health issues he’s already endured put his realistic floor well below that of a back-end starter, especially factoring in that even if he returns to the mound, it could be with diminished stuff.

So we see that Kaprielian and his high-beta outcomes pose a particular problem that his neighbor to the South, Murphy, does not. A more honest accounting of Kaprielian’s OFP might be a 60, or even a 70, when he’s healthy and at his finest. A more honest representation of his floor might read: “60-day disabled list.” But these honest representations belie the complexities of projecting the probabilities of the various permutations of a player’s career. Those complexities are present across every OFP/Likely grade we generate, but players like Kaprielian push the framework we employ close to its breaking point.

So, we’re left with a few options: switch to a different framework—perhaps one with an all-encompassing grade, such as Future Value; create an entirely different framework that better explicates the ways in which these probabilities shake out; or live with what we’ve got and understand there are going to be some guys who just won’t fit the system well.

We’ve seen the usage of one singular grade, that bakes in the good, the bad, the risks, and so forth. We have our current system that demonstrates a strong developmental outcome, as well as a likely one, hopefully demonstrating the distance in risk that some players have relative to others in more than just words. But there’s another way, that I’d argue both the singular figure produced in Future Value and the OFP/Likely both draw upon, but don’t state explicitly.

In fact, it might well be the best way for us to convey this information on a guy like Kaprielian, and arguably on every prospect—and that is clearly delineating our estimates of their likelihood to land in each grade bucket from 20-80.

For Kaprielian that could look like:

Name	20	30	40	45	50	55	60	70	80
Kaprielian	25%	5%	10%	10%	15%	15%	15%	5%	<1%

Whereas for someone like Murphy, it could look like:

Name	20	30	40	45	50	55	60	70	80
Murphy	<1%	5%	15%	35%	25%	15%	5%	<1%	<1%

The exact percentages aren’t the point so much as recognizing the way the distributions are so starkly different, and that’s without getting into the notion that Kaprielian’s likelihood of being a “20” is probably misleading because he’s probably either better than that or just not on the field (thus, not even an organizational player).

The reason we don’t (or haven’t) presented the information in the above format to our readers isn’t because they wouldn’t get it—we have some of the smartest, sharpest readers on the planet—but rather because there’s a significant element of false precision in this framework. False precision is something I worry about a lot. There’s already a fair amount of it baked into our current model, and opting to choose something that would introduce more of it has always concerned me, even if I think the above better represents how my prospect team and I go about thinking of these players.

So, for the same reason that I shy away from using grades such as 35 and 65 (yes, this is a real point of contention within the community—some do not consider them “real” grades), I shy away from adopting this approach for every player. Sometimes the broader view (OFP/Likely) allows for better signal, as the details are obscured and the bucketing of percentages and outcomes and projection provides a cleaner presentation and a cleaner take away. The granularity that the percentage approach exposes is a better way to understand potential prospect outcomes, but it also allows someone to be right no matter the outcome.

Spread the possible outcomes over the different percentages and you’re never wrong—this version of reality just played out a certain way, which “you” already conceded was a possibility. The OFP/Likely approach bakes in the more granular effort, while also forcing one to a decision point: What do you believe the ceiling version of this player is, incorporating the risks involved with health, development, and so forth. Furthermore, what do you believe a realistic outcome is for that profile? How do you think those same risks are likely to play out?

I used the term “honest” before, when describing the bucketing approach. While that is the most accurate approach, I’m not sure that the OFP/Likely designations don’t force us to be more honest in our assessments of those likelihoods, or at the very least, more useful to our readers as long as they trust us to be honest in the first place.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Craig Goldstein

More about:

Latest Articles

You need to be logged in to comment. Login or Subscribe

Shaun P.

3/28

Craig, I loved this article. Thank you for sharing your thoughts on both approaches. I really appreciate the insights and the thoughtfulness you all put into the work.

I understand that the current system potentially leaves out the useful information that the buckets would provide, and that the buckets alone may appear to allow the writer to always be "right".

However, to paraphrase what was my favorite line from any BP article before Jason Parks started writing, isn't the answer to the question "Beer, or tacos?", "Both, you fool!"?

Give us the buckets when it's a player like Kaprellian, and you feel like the buckets convey useful information that otherwise isn't present with the OFP/Likely Value. If OFP/Likely Value say pretty much everything you have to say, just use that.

I imagine that you'll get some complaints about the lack of consistency, but I think if you explain it as you did above, people will get it. Keep up the great work!

Reply to Shaun

Craig Goldstein

3/28

I guess my counterpoint would be that the buckets are fleshed out in the context of the write-up, but your suggestion is one we'll absolutely consider. Thanks for the kind words and the suggestion as well.

Reply to Craig

johansantana17

3/28

I totally agree with this.

Reply to johansantana17

Mark Majd

3/31

Yes, definitely yes

Reply to Mark

Behemoth

3/28

I think that if the metric of your choice shows players like Kaprielian and Murphy as the same, then that says something about how useful that metric is. My own opinion is that there's some helpful nuance in the OFP/Likely Value ratings that you would lose with just having a Future Value rating, but an awful lot of guys end up with a 10 point difference between their OFP and their Likely Value. If guys in AAA with few obvious questions end up with a 5 point difference, and toolsheds in short-season ball get a 15 point difference, and everyone else gets the standard 10, then the OFP/Likely Value approach tells us very little that's valuable.

For me, the question is why Kaprielian's OFP isn't 60. If his health was to hold up, then it seems a reasonably plausible outcome, and it's not that unlikely that his health will hold up. The risk would be apparent in that someone with an OFP of 60 had a Likely Value of 45 (which I think is reasonable). To sum all of that up, I think that the OFP/Likely Value approach only adds value if the two prospects featured in the article come out with different ratings somewhere - for me, it seems that Kaprielian should be a 60/45, but it may be that people who know more about prospects would see it differently.

Reply to Behemoth

Behemoth

3/28

On the question of the buckets, I would very much welcome that sort of information. The more information that you can give the better, and putting it in quantitative form is helpful. I always thought that recognising that readers could deal with complex information and that things didn't have to be simplified was one of Jason Parks' greatest strengths as a prospect writer.

Quantifying things gives nuance that the paragraph discussing risks doesn't. All I get from the risks paragraph is that pretty much anything could happen with someone like Kaprielian. This might be the sort of thing that some of the statistical expertise you have on staff could be put to good use on, to add some rigour and make sure the numbers aren't totally plucked out of the air.

Finally, I don't really think the point about the buckets always allowing the author to be correct is valid. The writeup on Kaprielian already says that a wide range of outcomes are possible, depending on health, velocity and other things, so the author can already claim to be correct whatever happens.

Reply to Behemoth

Craig Goldstein

3/28

Appreciate the feedback. Re: your last point: I don't think it gives the same out in terms of being correct in every circumstance. Acknowledging a wide range of outcomes is there and necessary but the point of the OFP/Likely is that it forces us to choose. Sometimes that choice isn't representative of the broad range of outcomes and feels unfair (I'd argue this is the case for kaprielian), but the choice has to be made either way. With the bucket approach it doesn't.

Reply to Craig

murrel

3/28

Yes, why not say 35/65 if that is the honest evaluation? Baseball has a real problem with all the "+" stats. They are meaningless in any year to year comparison because the game changes year to year in almost imperceptible ways that can only be seen in the stats. We should be using a mean plus standard deviation view of things where the range indicates risk. 35/65 provides more information than 45/55, it shows the risks that you speak of. If you want to use "buckets" to convey that information that is all well and good. But omitting it is just dumbing down the evaluation.

Reply to murrel

Jay Stevens

3/28

So...this may be unpopular, but I thought Parks brought too much information into the grading of prospects. The scouting reports were great -- I learned a lot about baseball reading them -- but I felt the detail let him shy away from making simple declarative statements about the overall value of a prospect.

I preferred Kevin Goldstein's rankings, with a simple one- to four-star ranking. Sure it's oversimple, perhaps. On the other hand, he carefully divided up the ranks of prospects into general tiers, which told us a lot about the overall probable value of the player. It's a great mental shortcut. And you could easily scan an organizational ranking and see how good it was by how quickly you got to the two-star prospects.

I don't find the current valuation useful. I don't even look at it anymore. I think that's because it seems like a most-probable kind of grading. Most players are 55/45s -- which doesn't tell me anything. The boldness in forecasting is found in the player description, not the overall valuation. And maybe that's as it should be.

I'd probably find the percentage buckets more interesting. Fangraphs' KATOH uses it well -- but, then, it's using computer-generated values, and the format properly depicts its precision. Percentage buckets based on guesswork implies a precision that isn't there.

If I were doing this, I'd want to show it in a way that's easy to see and doesn't imply cold calculation. A graphic element. Like using infographics sizing circles, say. Use three circles - floor, likely, ceiling, and size the circle based on your perceived outcome. Color 'em differently. Something like that, where you don't call out specific numbers, but you quickly and easily show the quality of prospect in a simple image...

Reply to Jay

Jay Stevens

3/28

Man, not having paragraph breaks makes me sound like a lunatic. Or...maybe I am a lunatic?

Reply to Jay

Deep, But Playable: Trust, Buckets, and Prospect Evaluation Frameworks

Thank you for reading

Latest Articles

Next Man Up ’24: Week Three $

Fantasy Starting Pitching Planner ’24: Week Four $

speX ’24: Week Three $

Box Score Banter: Experiments in Takeout Slides B

Some Potential Answers for Pete Fairbanks $

Craig Goldstein

More about:

Latest Articles

Next Man Up ’24: Week Three $

Fantasy Starting Pitching Planner ’24: Week Four $

speX ’24: Week Three $

Thank you for reading

Related Articles

Latest Articles

More about:

Latest Articles

Related Articles