November 8, 2011
Getting Explicit with Sample Sizes
Believe it or not, most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.
Matt Lentzner has carved out a (very) small niche in the baseball analysis world by examining the intersection of physics and biomechanics. He has presented at the PITCHf/x conference in each of the last two years and has written articles for The Hardball Times, as well as a previous articles for Baseball Prospectus. When he’s not writing, Matt works on his physics-based baseball simulator, which is so awesome and all-encompassing that it will likely never actually be finished, though it does provide the inspiration for most of his articles and presentations. In real life, he’s an IT Director at a small financial consulting company in the Silicon Valley and also runs a physical training gym in his backyard on the weekends.
Yes, the title sounds vaguely pornographic, but what I’m getting at is pretty serious. Well, as serious as one can get when talking about baseball stats, anyway. Let me regale you with a story that is rehashed almost daily in baseball articles. It should sound familiar.
I was reading an article by Dan Lependorf, a very sharp sabermetrically inclined writer for Athletics Nation. It was about Michael Choice, a highly touted prospect with the Oakland A’s. Choice turned in a solid performance in High-A this season and has looked like a monster in the Arizona Fall League.
A+ Stockton (2011): .285/.376/.542
I haven’t been keeping up with the AFL that closely, so being the stats-oriented guy that I am, when I saw the numbers I was thinking, “Yeah, that’s pretty awesome, but how many PA’s (plate appearances) is that?”
And right at that moment it hit me.
If you want to call yourself a sabermetrician, then you know that a statistic doesn’t mean anything without a sufficient sample size. Batting .400 over 10 plate appearances doesn’t mean anything. Batting .400 over a season is something amazing. We all intuitively know that.
But here’s what I think needs to happen: every stat printed should have the sample size as PAs included. Always. This is what communicates the credibility of your number. So now Michael Choice’s lines look like this:
A+ Stockton (2011): .285/.376/.542 
We know from Russell Carleton’s (AKA Pizza Cutter) seminal work (gory details here) on sample size stability that OBP and SLG stabilize at 500 PAs. In other words, the performance that Choice logged in Stockton was worth considering, while the fireworks he’s been displaying in Phoenix are not very meaningful at all (12 percent of the required PAs). In fact, there are 11 other hitters bopping with an OPS over 1.000 at this time, and Choice is in the middle of the pack. It’s not even a special performance.
But there’s more to this story.
One of the knocks on Choice is his high strikeout numbers, the worry being that he will strike out too often to be an effective major-league hitter. He already strikes out a lot in High-A, and his strikeout rate is expected to rise as he faces better pitching in the upper levels. Here’s the strikeout performance he’s put on in Stockton and Phoenix:
A+ Stockton (2011): 24.7 K% 
AFL Phoenix (2011): 15.5 K% [62/150]
This is much more meaningful. He’s over 40 percent of the way to a “real” number. Assuming Choice plays most of the games remaining on the schedule, he should pass 100 PAs. Although he won’t reach 150, whatever that K rate ends up being will be a heck of a lot more reliable that anything his OBP and SLG have to say. Even better would be a measurement of his contact rate, which stabilizes in only 100 PA. That’s a stat that is stable in the context of AFL baseball’s short schedule.
Taking the above into account, I’d like to amend my original statement. Sample size in PAs should always be printed. The assumed stable sample size should be 500 PAs. Otherwise, the sample should appear in a format of [sample/stability number] format.
Here are the stable sample sizes per Mr. Carleton’s original work on the subject:
50 PA: Swing %
The warm summer months seem far away now, but before long we will have bid goodbye to the Hot Stove League and moved on to spring training and the regular season schedule. Hopefully, we won’t see too many “triple-S” or “SSS” (Small Sample Size) warnings when next season rolls around. They won’t be necessary, since the samples will be explicitly stated. And maybe we can discourage people from using stats incorrectly to try to find meaning where very little exists.