April 20, 2004
Translating College Performance
You know that insurance commercial where the guy sleepily mumbles that he's going to skip class before his roommate reminds him that college is over, and he's going to be late for work? Now, imagine that, instead of facing some mild-mannered office manager, your boss is a graduate of the Larry Bowa School of Ballpark Dialectics who's never actually held an indoor job. I'm not sure that you can classify minor league baseball as The Real World, but it's at least a paying job of sorts, and it's hard to imagine a tougher college-to-job transition than going from college athlete to minor league bus jockey without, say, taking a Wellesley grad and plunking her into the Peace Corps.
In college, while nominally an adult, you have a coaching staff that knows that a behavioral meltdown by a player will negatively affect their job status. In the low minors, on the other hand, the coaching staff is charged with weeding out the players, especially those near the talent margins, who won't be able to handle the travel and celebrity scene if they advance. You go from living in a nice, structured dorm, usually with a bed check, to the standard short-season living arrangement--except for a few of the instant millionaires in the first dozen draft picks, that's eight guys, one house, one car, one XBox, and a lot of pizza. You go from four games a week, mostly on the weekends, to six games a week with extensive late-night bus travel between.
On top of those indignities, they take away your friend, the tool you've carried around since you were three--your aluminum bat--and replace it with this heavy wooden thing that stings your fingers every time the pitcher--who got to keep his ball--comes inside, and that shatters if you check your swing wrong. Given all of this, it's a miracle that anyone ever manages to eke out even a foul ball in short-season play, much less put up meaningful numbers. Nonetheless, they do. The question I want to look at, though, is whether those numbers bear any resemblance to the college numbers that got them drafted in the first place.
Out of all the ways that Bill James changed our world (and that's a fun topic all on its own, although there's a danger in eulogizing a guy who's not done yet), the most significant may have been the realization that minor league stats are just as predictive of future performance, properly interpreted, as major league stats. This point gets clouded somewhat by the fact that neither is a particularly reliable indicator, but they're better than nothing, and one of the dividing points even now between the smart organizations and the ones riding to the park on the short bus is the degree to which they consider minor league stats as valid diagnostic tools.
There's been a huge explosion of interest in college stats in the last couple of years--the post-Moneyball era, I suppose--but no one's shown the work to justify that interest. In particular, the question of what "properly interpreted" means in this context has not been settled. I'd like to walk through a series of numbers to show my work, so to speak, and give a reasonable set of steps of use when considering college numbers. I'll use OPS as a good crude tool while going through the steps and then give a full list of correlations once I get a full set of translation steps.
To be included in the study, a player had to be drafted from an NCAA Division I school in 2002 or 2003, have at least 100 PA in college that year, and have at least 100 PA in either a non-camp rookie league or a short-season A league (shortened as R and A from here on out), again the same year. This added up to 76 players for Rookie ball and 212 who went to short-season A.
Before we begin translating OPS, there's a zeroth step to consider that doesn't affect it--converting all the counting stats to rate stats. The college season can vary greatly in length; that length correlates fairly well with the quality of the team and can range all the way from 40 games to 65 or so. This doesn't affect rate stats like OPS, of course, but correlating HR/PA is much more accurate than correlating straight HR.
With that step taken, the most obvious place to start is with park factors, since that subject is well-understood. Taking the minor league park factors into account:
Next, we look at college park factors. College park factors are a bit more complicated than those for the minors because of the massively unbalanced schedule (good teams play at home a lot more than on the road), but using a park factor that considers the set of parks each team played in during a given season, we get the following:
So far, we don't appear to be progressing much (or at all, for that matter), but that'll improve as we go, and the later steps work better if park factors have been included. Next, we'll need to take into account the varying levels of competition that college hitters face. Using a multiplier for strength of schedule, we get the following improvement:
Finally, there's one more tweak we can make. As Dayn Perry pointed out recently in discussing Dodger Stadium and I discussed last year, park factor is not a constant for all stats, varying from park to park in how the different stats are affected by the park. The data's not available to produce park factors for college for stats other than runs, but a useful shorthand is that the park factor for OBP (and, to reasonable degree, OPS) is usually right around the square root of the run-based park factor. Replacing our park factor multipliers with square roots gives this final result:
With all of that in mind, here's the full table of correlations between "properly interpreted" college and short-season minor league stats:
Short-season A Rookie Stat Correlation Multiplier Correlation Multiplier AVG 0.27 0.69 0.26 0.78 R 0.27 0.54 0.28 0.69 H 0.24 0.70 0.33 0.77 2B 0.13 0.64 0.39 0.77 HR 0.45 0.31 0.48 0.45 RBI 0.38 0.50 0.51 0.64 SLG 0.38 0.60 0.40 0.72 BB 0.45 0.81 0.57 0.98 SO 0.53 1.33 0.52 1.19 OBP 0.36 0.74 0.36 0.84 OPS 0.37 0.66 0.37 0.77The multiplier columns represent the ratio of the average for the college values for that stat over the average for the minor league values. Note that they're mostly fairly low, and that's something teams have to take into account - due to the factors I talked about at the top, hitting in short season ball is hard, so your .440 OBP college stud is actually on track at .330 in Mahoning Valley. Home run rates in particular drop way off.
In analyzing these for which statistics are significant, it appears that the core components are the ones that correlate best--the Three True Outcomes are most reliable. Therefore, a strategy of drafting high-walk, high-homer guys looks good, even if the raw home run numbers won't look that great at first. RBI also correlate well--if I were the kind to explain everything, I'd probably mention that RBI correlate well with batting order position, which tends to stay the same between the two contexts. Other stats that don't correlate as well are the fluffy ones like R, H, and AVG.
Now, correlations at this level aren't great, but they are about the same as the correlations between low A and high A ball. In other words, similar to the minor league/major league comparison, looking at college numbers the right way is not a perfect predictor, but it's much better than nothing.
Boyd Nation is the sole author and Webmaster of Boyd's World, a Web site devoted to college baseball rankings, analysis, and opinions. In real life, he's an information security analyst with an energy company. He can be reached at firstname.lastname@example.org.