April 13, 2006
When Translating College Statistics Is a Bad Idea
It's one of the great challenges of performance analysis: translating college stats. At first glance, it's an incredibly difficult project to scope. The hardest aspect would be adjusting for the level of competition, which can vary greatly not only on a team-by-team basis, but on a game-by-game basis as well. Trying to adjust for ballpark effects when splits are rarely available only adds to the confusion. I'm not saying it's impossible: it's possible to do the incredible amount of work to come up with a series of coefficients for teams, stadiums, schedules, etc., that would balance the statistical playing field and let us better compare one player to another. However, I have one piece of advice to anyone starting to wrap their brains around the task:
Don't do it. The data will be useless.
The purpose of such an effort is what I'm calling into question here. Translating college statistics is one thing, and translating college stats in an attempt to give us a better idea of what a player will become is an exercise in futility. When we look at the college numbers of top prospects and big leaguers, they are, in general, very good. This creates the illusion that translated college statistics will give us a valuable tool in projecting professional performance. The missing piece, however, is all of the outstanding college players who never make it, so we don't look back at their college career. The majority of big league players who played in college put up big numbers there. However, the corollary is anything but true: Not all top statistical performers will make good pros. There are definite patterns to the ones who make it and the ones who don't. The patterns, though, cannot be measured in raw statistics. They can, however, be measured in scouting reports.
At the big league level, performance is everything. Once we move down to the minors, we begin to split the values of performance and projection. The lower the level, the more important projection is, and the less important pure statistical performance becomes. At these lower levels, two things must be asked: What is the player doing (performance), and how is the player doing it (scouting)?
At the college level, how a player is accomplishing good offensive numbers is far more important than the raw numbers because of the presence of the metal bat. Metal bats can play a significant role in creating 'false power,' as a physically strong player can power a ball out of the park without making solid-centered contact, while that same contact off the handle or end of a wooden bat more often than not leads to an easily played fly ball.
So let's look at some numbers. During this decade, college baseball has had four power conferences: The ACC, The Big 12, the PAC-10, and the SEC. Just looking at those conferences in order to somewhat mitigate the level of competition issue, it becomes pretty clear that the top hitters in the league do not necessarily equal top professional prospects. Here's a list of all the players in those four conferences from 2003-2005 who accomplished the following four things:
YEAR PLAYER (*=sophomore) SCHOOL AVG AB HR BB SO 2003 Jeff Van Houten* Arizona .413 231 11 24 27 2003 Jeremy Cleveland N.Carolina .410 251 19 37 34 2003 Ryan Garko Stanford .402 259 18 28 17 2004 Jed Lowrie* Stanford .399 233 17 50 40 2004 Eddy Martinez-Esteve Florida St. .385 270 19 32 41 2005 Aaron Bates* N.C. State .425 214 12 37 27 2005 Ryan Braun Miami .388 219 18 33 39 2005 Chase Headley Tennessee .387 238 14 63 23 2005 Brian Pettway Mississippi .383 266 21 35 47
It's a short list, as this is, statistically speaking, the cream of the crop. These are players whose raw numbers indicate an ability to hit for average, hit for power, draw walks, and make contact. But are they the best pro prospects? Certainly not to a man, and for some, it's not even close. Take Jeremy Cleveland, who in 2003 absolutely dominated the ACC. He won the batting title by 39 points, walked more than he struck out, and was second in home runs, trailing only Wake Forest's Jamie D'Antona. If we just did raw translations of college stats prior to the 2003 draft, we would probably be saying that Cleveland was one of the top three hitters available, maybe even the best. Scouts disagreed, seeing an unathletic body and, more importantly, a swing that was far more designed for aluminum than wood. So when the draft rolled around, nearly 60 college hitters went ahead of Cleveland, who was drafted by Texas in the eighth round and signed for $85,000. On an economic scale, he was worth somewhere between two and eight percent of the investment put into a first-round pick. And in the end, the scouts were proven right, as Cleveland hit .322/.432/.512 in the Northwest League in his pro debut, but was released following the 2005 season after batting .253/.355/.298 at Double-A and just .263/.339/.379 after a demotion to the California League. Sure, it's an extreme example, but a clear exhibit for evaluating college players where knowing how a player accomplished something was far more important than knowing what he actually accomplished.
As stated before, the illusion of power created by the metal bat is the most difficult to deal with on a pure statistical level. Looking at the home run leader boards from 2003 alone we see players who had little or no shot at a pro career--like Nebraska's Matt Hopper, who led the Big 12 with 22 home runs in 233 at-bats, Washington's Chad Boudon, who led the Pac-10 with 22 home runs in 209 at-bats, and Alabama's Beau Hearod, who led the SEC with 20 blasts in 231 at-bats. Hearod slugged 25 home runs in 2004 for Low Class A Lexington, but he was 23 years old at the time, and hit just .255 with 135 strikeouts. He retired the following spring. As for Hopper and Boudon, they're both also out of baseball, combining to hit just .193 in 187 at-bats with a grand total of three home runs in the minors. Any purely statistical translation would have seen these players among the top power hitters available, but in reality, none of the three was selected in a single-digit round that June.
Believe me when I tell you that this is not an anti-statistics column, and that many mistakes have been made on college players who did not put up good numbers, but offered plenty of projection or athleticism. The point is that to base professional projections solely on amateur statistical information is futile, because without having the necessary scouting information, you just have half of the puzzle. It's like watching a black and white movie like Psycho and then being asked to name the color of Janet Leigh's sweater. You saw the sweater, but you don't have all of the necessary information to answer the question. The same is true when you just have college stats and not scouting reports. We're back to beer and tacos--you want to evaluate amateur talent? Both have to be there, and both have to complement each other.
Next week I'll take a look at how college pitching statistics play into the argument, and present some ideas about making the project of translating college statistics more worthwhile.