Happy Thanksgiving! Regularly Scheduled Articles Will Resume Monday, December 1
February 24, 2005
On December 1, the Angels signed Cuban slugger Kendry Morales to a six-year contract. Morales would eventually find his way to an Honorable Mention in our Top 50 Prospects list. The discussion that led to that placement was perhaps the most interesting one we had during the process.
Clay Davenport: I've pulled all the data I could find on Cuban baseball. It turns out that they have complete statistics for the last five years online, and I've downloaded them all. This gives me full stat lines for Morales as well as league averages to work from.
I would call the league profile aggressive. Power numbers are a little less than the U.S. majors, but league batting averages are just under .300. Walks about the same as US, but strikeouts are down, hit-by-pitch rates are about double what they are here, stolen bases about the same, caught stealing twice as high--the '03/'04 season actually had more CS than SB. Offensive levels are high.
I don't have good data for the difficulty rating. The best I can work off the Olympic data, I'd say roughly Carolina League level. Winter ball in the Domincan runs at about Double-A level. Using that, here's the best guess DT I can make from what I know:
Kendry Morales Born 6/20/1983 Age 22 Bats B Throws R Year AB H DB TP HR BB SO R RBI SB CS BA OBP SLG EQA EQR 2001 335 87 17 1 17 30 75 43 61 1 1 .260 .324 .469 .267 46 2002 188 62 11 1 7 30 31 32 29 0 0 .330 .428 .511 .322 37 2003 118 39 12 1 2 7 17 14 15 0 0 .331 .370 .500 .294 19 Minors 588 173 37 3 24 62 113 82 96 1 1 .293 .364 .487 .289 93That's not a futurecast; those are direct translations. Even dropping the competition to New York-Penn levels leaves him with a line of .271/.336/.434 and a .264 EQA. I don't know if he is really 22, but he profiles as a major-league hitter using the difficulty rating suggested by the most pessimistic projection taken from the Olympics. Using the most optimistic estimate from the Olympics, calling it Double-A, would yield a line of .311/.386/.528, .307 EQA.
Note on playing time: he was suspended from the team in 2003 because the government found out he was trying to defect. I'm not sure why his playing time was reduced in 2002.
I think he has to be included, and highly.
After going through all that, I realized we did have another comp: Jose Contreras' 2000 and 2001 seasons in Cuba. The best fit between Contreras' lines came by driving the difficulty down to one of the more pessimistic sides, a little below the South Atlantic League, but not quite as far down as the NY-P. His PDTs now look like:
H/9 HR/9 BB/9 K/9 NERA PERA Cuba, 2000-01 8.2 0.8 3.2 8.1 4.58 3.88 U.S., 2003-04 7.9 1.1 3.7 8.2 4.55 4.06It's dangerous to use a one-person guide, but it's within the established range, and it does take a more pessimistic view which is in keeping with the general performance of Cubans who have defected, so I'm going to use it. Taking that as a guide, Morales becomes:
Kendry Morales Born 6/20/1983 Age 22 Bats B Throws R Year AB H DB TP HR BB SO R RBI SB CS BA OBP SLG EQA EQR 2001 336 83 17 1 15 27 81 40 56 1 1 .247 .306 .438 .252 41 2002 189 58 10 1 6 28 34 29 27 0 0 .307 .402 .466 .301 33 2003 118 37 13 0 1 7 19 13 13 0 0 .314 .354 .449 .275 16 Minors 593 164 37 2 20 57 124 76 89 1 1 .277 .344 .448 .271 82Let us work from that.
Chris Kahrl: I don't think we can claim with any real confidence that what Clay's doing in this situation works. When you're dealing with a more shallow pool of talent, you get people doing all sorts of incredible stuff, like major-league baseball before integration. Add in the uneven distribution of talent across the "teams" in Cuba, and I really don't know how much confidence we should invest in the exercise. I see it as far more likely to produce an unreliable result, and make an easy target for those unwilling to trust the concept of translating minor-league or independent league performances in the first place.
Derek Zumsteg: I agree with Chris. We might consider instead writing something about it in the chapter--outlining all of the caveats and warnings associated with it, and then saying "our best guess is that right now he might rank at about #50." It's a cop-out, but it's an entirely justified one that still gives readers a sense of his place, and how uncertain we are about it.
Nate Silver: Here's how he projects based on Clay's translations. I've got essentially all the data that I need to work with except for defensive translations; I'm assuming that he's a league-average first baseman. I'm also assuming that the 2003-04 season corresponds to 2004 and so forth; this might be a slight disadvantage to Morales since he's playing half a year younger than a U.S. minor leaguer would in the corresponding season.
The skill set we're looking at is that of a slow first baseman with slightly below-average plate discipline and strong potential in the batting average and power departments. The batting average translates better than the power out of Cuba but PECOTA thinks it was a bit of a fluke. Morales will be a young 22 this year.
These WARP and EqA numbers are really quite similar to Michael Aubrey's. Aubrey has somewhat better plate discipline, but is also a year older. I think Morales would deserve to rank somewhere toward the back end of the Top 50 if we believe that Clay's translations are reasonably reliable.
Rany Jazayerli: Gee, thanks guys, for throwing a monkey wrench in my plans this late in the game.
I originally ignored Morales for all the reasons that Chris mentioned. He doesn't have a professional track record, his age could be questioned simply because of his country of birth, and the track record of previous Cuban defectors has been, to put it mildly, disappointing.
But...if we trust in Clay's ability to translate numbers from different environments--and we do--then we have to accept that there's some validity to these numbers.
Also, Morales' date of birth is considered to be less in question than pretty much every defector who preceded him, simply because he was so young when he was thrust upon the national stage that it would have been hard to lie about his age. Passing off a 21-year-old as 17 is a lot harder to do than to pass off a 34-year-old as 30.
Then there's the fact that none of the previous Cuban prospects who flopped, other than the ill-fated Andy Morales, were hitters. It's possible that overly aggressive usage patterns in Cuba lead to more frequent injuries for pitchers which only manifest themselves after their defection. It's also possible that I don't know what the hell I'm talking about here.
There's the matter of his position. Clay and Nate are treating him as a first baseman, which is reasonable given that we don't know anything about his defensive abilities, and caution is advised. But it's at least possible that he can pick it at third base, or some other position, which might rate him even higher.
I guess what it comes down to is that I don't have a whole hell of a lot of confidence in any of the guys that we're considering for the last few spots anyway. So I'd be comfortable slotting Morales in towards the end somewhere. I think there's value in drawing attention to the fact that we have the ability to translate data from just about anywhere they play baseball; it's just a matter of raising the error bars. Even with large error bars, Morales' statistical record, in conjunction with the fact that the relatively anti-analysis Angels thought enough of his tools to give him a six-year deal, is enough to convince me to put him on the list.
Dave Pease: I favor a mention/HM.
Joe Sheehan: I trust Clay, Nate, and the process by which they arrived at their conclusions. Given that, I think Morales should be at the lower end of our Top 50. We either trust the translations/PECOTA or we don't, because the process for evaluating Morales is consistent with the ones they use to evaluate everyone else, pace sample size caveats
Ben Murphy: I think that the converse of Chris and Derek's argument might be plausible. That is, instead of being a key point for critics or doubters to use as an example of how translations are invalid, it could be a point where Clay (and by extension, we) can flex his muscles and say "look at how useful this technology can be--all we need is the league wide data and we can adjust the numbers to get a ROUGH idea of what his major league performance would look like."
I realize that people will jump at the chance to shoot holes or find flaws in the theory or practice, so it should come with plenty of the applicable caveats. All that included, it seems very impressive to me, and outside of the doubters (who may or may not be swayed regardless), it is a very interesting technology for those who trust our work.
I think that repeat and first time readers would likely be wowed by the applications, especially if the translations play out at all accurately.
All that is to say that I favor his mention, placing him somewhere in the top 50, and using a paragraph similar to what Clay described before (the caveats and warnings) to justify his placement and introduce this application of the technology.
JS: It's not a 90/10 decision, but I see it as 60/40 towards listing Morales, maybe a bit more if non-academic issues--some of the stuff Ben touched on--are considered.
CK: Is there sufficient supporting evidence to make this data worth printing? Because once you print it, it goes beyond what is, at best, an educated guess, it becomes our truth.
I would like to think we're committed to the most rigorous standards for whether data is valuable or not. If that's the case, do we really want to do this? To me, it seems a drastic abandonment of principle for the notional entertainment value of a math-flavored doodle.
Put him in the HMs if such is the collective wish, but I think printing a guess and calling it data is a major mistake.
NS: Clay's working off the assumption that Cuban baseball is roughly comparable to the New York-Penn League. I find that to be pretty reasonable.
If you took, say, eight or ten random above-average players from the NY-P league and stuck them in the major leagues, obviously most of them would fail, and I think that's where we're at with the past Cubans. Disappointment has been the theme because the level of competition is overrated, but that shouldn't be held against Morales.
And Morales, in fact, has been one of the very best players in his league. We're treating him as a prospect, with the usual risks and caveats, and not as an established star as some of the other Cubans have been billed. A ranking in the 30-50 range seems perfectly reasonable.
CK: Talent is distributed evenly in the NY-P league; everything I've heard about Cuba makes it appear that the talent is distributed by fiat, to produce certain contenders and a few Washington Generals-type teams to fill out the schedule. Comparing Cuba to a competitive league is wishful thinking.
CD: That is a testable hypothesis; if what you say is correct, the Cuban leagues would have a larger standard deviation of winning percentages than, say, the New York-Penn league.
The standard deviations for the Cuban league, 2001-04, are 120, 119, 115, and 109, with a four-year total of 114.
The numbers for the NY-P: 108, 109, 135, 97, total 110.
Point in your favor, Chris. It's actually a little worse than it appears, because the longer (90 vs. 75 games) Cuban schedule should narrow the distribution.
I don't think the difference warrants the blanket dismissal you gave the Cuban leagues.
CK: I think this simply highlights my concern: if Cuba's league is less competitive while being made up of an even more broad distribution of talent across age ranges than the NY-P league's odd collection of high school and college players, it would seem to me that different people will succeed or falter in abnormal distributions.
Does Cuba favor four-man rotations, or five? Do starters stay in the game later than here? How many pitchers is Morales facing? How old are they? What's the distribution of talent on the teams he's faced? Did he never have to face the best staff in the league, because it's on his team? What are the characteristics of the parks?
Unless we know some things like this, and the answers to several more questions I couldn't think of, it seems really hasty to lump in this data with everything else that has been built over decades of work by some really smart, really careful people, particularly Clay.
By contrast, this is rushed. To borrow a term from Keith W., it's the competitive ecology of Cuba that we're basically ignorant of, and this exercise highlights that.
James Click: Despite the wide range of possible difficulties for the Cuban league, Morales' translations always come out as those of a very viable prospect--especially considering how weak this year's list is. I think the ideal solution here is listing Morales near the bottom with the comment mostly consisting of the concerns Chris has already highlighted. We're always talking about the fact that we're performance analysts; this is a great opportunity to highlight both the good and the bad of what we do. We don't know that much about the Cuban leagues--say it; Morales' projection therefore has a wider distribution than that of typical prospects, which is wide anyway. Acknowledge that, but point out that while it means he may flop, he could also be the next Mark Grace--because the Angels need LOTS of those guys.
To ignore a player for whom we have a record of performance wouldn't be consistent with what we do. Let the readers take the car and drive it to work or into the tree, but tell them what's under the hood.
Steven Goldman: I think for all the reasons that Chris mentions, Morales shouldn't get a top 50 or HM. I do think he should get an M, that is, a discussion, because what we're getting into here is useful in assessing the risks that teams assume when they invest big dough in these guys. Unless they have some insight into the statistical and structural nature of Cuban baseball that we lack, they are signing players on pure scouting, and likely not even in a game environment--the guy defects, he goes to Central America, workouts are held, bidding ensues, the player is signed.
So what we have is "pure" scouting, scouting devoid of statistical analysis and league context. So far, it hasn't worked. You get Rey Ordonez, Jorge Toca, and so on. It's a little more successful with pitchers. Why?
As for Clay's numbers, they can be part of that discussion provided they are loaded with a ton of caveats saying, in effect, this isn't our best guess. It's just a guess. Maybe it's an informed guess. We think it is, but we're not sure. Conclusion: teams are looking for cost/talent certainty. With domestic product we can forecast what they're going to get within a certain range, even if that player is just out of college. If it's a minor leaguer we can get even closer. If it's a Japanese player, we can forecast with confidence because we understand their leagues. Cuban players, we just don't know, so as a team-building exercise, it's the equivalent of taking a couple of million dollars and placing it on red 13.
Chris Schofield: Normally, I would recommend that we trust our systems. I think it is amazing that Clay and Nate can come up with the translations and projections for Morales given how little information there is on him. However, given the uncertainty regarding the context for the data (league adjustment, ballpark adjustment, age, defensive ability), he sounds like an HM to me. I would advocate listing him as an HM with a longer than normal comment describing these issues (which I think a lot of readers will find interesting). I think that one use of the HM category should be to catch the players who, with a baseline projection, are low end candidates for the list, but for some reason have larger than normal uncertainty.
JS: Changing my answer...
Various people named "Chris" have convinced me. I now say include him, but as an HM. The risk is much less with that tag, but I think he has to be in the section.
The doubts about Morales make me feel better about not even mentioning Alain Soler.
Keith Woolner: I think the relative lack of Cuban players to validate the method makes it sensible to treat Morales' PECOTA with a healthy dose of risk-aversion.
I'd go for HM, with a decent sized explanation about validation of Cuban league transactions still being experimental, to at least show that we didn't ignore him completely, and that he's an intriguing question mark.
Jonah Keri: Yup, HM. Once again I find myself agreeing with Keith's suggestion on how to handle it.
Jay Jaffe: Not that I've thrown in 2¢ on anything else in this process, but yeah, HM with a good explanation seems right to me.
One thing that stands out to me in all of this is that looking over Morales' stat line, we're talking about less than 200 at-bats in both his 2002 and 2003 seasons. I keep reading all the cautions here about evaluating players in short-season leagues; these numbers really don't seem like a lot to go on, and I'd rather see a cautious take with an informative explanation than an overly optimistic one that leaves us hung out to dry.
RJ: My personal leaning would have been to put him on the list, because for all the legitimate concerns about the reliability of our translations, the sample size of his data set, the quality of Cuban baseball, his defense, etc, the fact is that his PECOTA projection doesn't look all that different than Michael Aubrey's. If Aubrey is a Top 20 prospect, even with all the question marks surrounding Morales, you could make a pretty compelling case that he's at least Top 50.
I'm certainly not going to go to bat for a player I wasn't even considering 48 hours ago. Honorable Mention sounds good to me.