Believe it or not, most of our writers didn't enter the world sporting an address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.

Matt Lentzner has carved out a (very) small niche in the baseball analysis world by examining the intersection of physics and biomechanics. He has presented at the PITCHf/x conference in each of the last two years and has written articles for The Hardball Times, as well as a previous articles for Baseball Prospectus. When he’s not writing, Matt works on his physics-based baseball simulator, which is so awesome and all-encompassing that it will likely never actually be finished, though it does provide the inspiration for most of his articles and presentations. In real life, he’s an IT Director at a small financial consulting company in the Silicon Valley and also runs a physical training gym in his backyard on the weekends.

Yes, the title sounds vaguely pornographic, but what I’m getting at is pretty serious. Well, as serious as one can get when talking about baseball stats, anyway.  Let me regale you with a story that is rehashed almost daily in baseball articles. It should sound familiar.

I was reading an article by Dan Lependorf, a very sharp sabermetrically inclined writer for Athletics Nation. It was about Michael Choice, a highly touted prospect with the Oakland A’s. Choice turned in a solid performance in High-A this season and has looked like a monster in the Arizona Fall League.

A+ Stockton (2011): .285/.376/.542
AFL Phoenix (2011): .333/.424/.745

I haven’t been keeping up with the AFL that closely, so being the stats-oriented guy that I am, when I saw the numbers I was thinking, “Yeah, that’s pretty awesome, but how many PA’s (plate appearances) is that?”

And right at that moment it hit me.

If you want to call yourself a sabermetrician, then you know that a statistic doesn’t mean anything without a sufficient sample size. Batting .400 over 10 plate appearances doesn’t mean anything. Batting .400 over a season is something amazing. We all intuitively know that.

But here’s what I think needs to happen: every stat printed should have the sample size as PAs included. Always. This is what communicates the credibility of your number. So now Michael Choice’s lines look like this:

A+ Stockton (2011): .285/.376/.542 [542]
AFL Phoenix (2011): .333/.424/.745 [62]

We know from Russell Carleton’s (AKA Pizza Cutter) seminal work (gory details here) on sample size stability that OBP and SLG stabilize at 500 PAs. In other words, the performance that Choice logged in Stockton was worth considering, while the fireworks he’s been displaying in Phoenix are not very meaningful at all (12 percent of the required PAs). In fact, there are 11 other hitters bopping with an OPS over 1.000 at this time, and Choice is in the middle of the pack. It’s not even a special performance.

But there’s more to this story.

One of the knocks on Choice is his high strikeout numbers, the worry being that he will strike out too often to be an effective major-league hitter. He already strikes out a lot in High-A, and his strikeout rate is expected to rise as he faces better pitching in the upper levels. Here’s the strikeout performance he’s put on in Stockton and Phoenix:

A+ Stockton (2011): 24.7 K% [542]
AFL Phoenix (2011): 15.5 K% [62]

Much, much better in Phoenix, but with that small sample size, is it meaningful? As it turns out, strikeout rate stabilizes a lot faster than OBP and SLG—in only 150 PA. Check this out:

AFL Phoenix (2011): 15.5 K% [62/150]

This is much more meaningful. He’s over 40 percent of the way to a “real” number. Assuming Choice plays most of the games remaining on the schedule, he should pass 100 PAs. Although he won’t reach 150, whatever that K rate ends up being will be a heck of a lot more reliable that anything his OBP and SLG have to say. Even better would be a measurement of his contact rate, which stabilizes in only 100 PA. That’s a stat that is stable in the context of AFL baseball’s short schedule.

Taking the above into account, I’d like to amend my original statement. Sample size in PAs should always be printed. The assumed stable sample size should be 500 PAs. Otherwise, the sample should appear in a format of [sample/stability number] format.

Here are the stable sample sizes per Mr. Carleton’s original work on the subject:

 50 PA: Swing %
100 PA: Contact Rate
150 PA: Strikeout Rate, Line Drive Rate, Pitches/PA
200 PA: Walk Rate, Groundball Rate, GB/FB
250 PA: Flyball Rate
300 PA: Home Run Rate, HR/FB
500 PA: OBP, SLG, OPS, 1B Rate, Popup Rate
550 PA: ISO

The warm summer months seem far away now, but before long we will have bid goodbye to the Hot Stove League and moved on to spring training and the regular season schedule. Hopefully, we won’t see too many “triple-S” or “SSS” (Small Sample Size) warnings when next season rolls around. They won’t be necessary, since the samples will be explicitly stated. And maybe we can discourage people from using stats incorrectly to try to find meaning where very little exists.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
This is one of the most important articles I have ever read in BP and I bet that it will also be one of the most overlooked. No stat should even be considered until the number of plate appearances is known.
I agree with the general premise of the article.

One other note is that the pitching in the AFL is considered inferior, so I suspect strikeouts rates for all players drop. If I'm correct in this assumption, conditions in which the improve strikeout rate is occuring should also be checked against the baseline for strikeout rate. If the baseline has moved significantly, you need to consider that factor as well.

Great article
I like the idea of showing sample size, although I guess it is something was we generally have a feel for anyway, because we know the length of a season. If the stats are based on part of a season, then indicating the sample size is necessary.

Now tacking the number of Plate Appearances where the stat becomes stable is getting a bit unnecessarily cumbersome. I think we understand that the more often the thing occurs that the stat is based on, the more stable the stat is given the same number of plate appearances. There is nothing magical or final about those stable indicator numbers. They are a general guideline that I think we have a good instinct for.
Thanks for the "gory details" nod.
My pleasure. I didn't know you were still knocking around here.
He pops in periodically to remind us how much we miss him.
Line drive rate stabilizes at 150? Seems like something is wrong there.
There probably is. Those are Retrosheet data 2003-2006, and Colin Wyers has previously (and elegantly) pointed out the flaws in RS batted ball data. Also, that's LD/PA, not per BIP. That probably makes it look a little more stable.
On a related note, would it be possible/useful to have things like standard deviations and/or confidence intervals for some of our more common sabremetric statistics?
Hey, go easy on us! As someone who has to write an awful lot of triple-slash lines, the prospect of having to cite PA, SD, and CI multiple times an article is a little scary, statistically sound as it might be.
Thanks for all the comments, guys.
Reading this (and Mr. Carleton's excellent article) I started thinking about factors that might distort the rate at which particular statistics would stabilize. Though these factors might not be important for GROUPS of players, could they be important for INDIVIDUAL players? I'm thinking about things such as
--players changing teams (especially from one league to another, or one type of park environment to another) such that their statistics might be radically different before and after the change;
--players changing roles: it's frequently noted that young players don't see good results until they get a substantial amount of time in a full-time role, so would their tendencies be accurately reflected in a 50-PA to 150-PA sample size? might they show radically different results in their next 400 full-time PAs, even for stats like Strikeout Rate?;
--injuries: if you combine the statistics for a power hitter before and after a wrist injury (or, maybe more interestingly, partly after a wrist injury and then once it's had a chance to heal fully) to get a 500 PA sample for stabilizing sample size on SLG, how predictive is the result?
I'm probably missing something here, since my knowledge of statistics is laughably small, but I'd be interested in your thoughts....
And thanks for the reminder on this VERY important subject.
I'm pretty sure most people know AFL stats are going to be a tiny number of PA. The only reason to mention PAs is if the number is less than expected for the relevant league. And if so, it's just as easy to write, "in 130 PA" as it is "[130]". I.e., not sure what this article is trying to establish that couldn't have been stated in a single sentence.