I've got a little puzzler for you – a brain teaser, if you will.
Here is a CSV file containing descriptive measures of a batter's batted ball distribution over his first 100 plate appearances, from two separate sources – Set A and Set B, as they are called. Then you have that batter's results for the rest of the season, in terms of BABIP, BACON (batting average on contact – included largely because I love having a reason to say BACON in a sabermetric context) and home runs on contact. Each player season has been identified by a "hash," in order to provide a unique identifier without giving any information about the player's identity. The reason is that I'm asking you all to participate in a blind taste test of two sources of information about the distribution of a player's batted balls, and how well they predict that player's future results.
Once people have had a chance to look over the data and provide their analysis, I'll go ahead and pull back the curtain and you can see whether or not people preferred Pepsi or Coke. Until then, have fun!