keyboard_arrow_uptop

April really is a great time to be a baseball fan. Even in the worst case (say, being a Cubs fan and watching Carlos Zambrano getting lit up list a Christmas tree on Opening Day), having baseball is better than not having baseball. And April is truly a time when all baseball fans can have hope. Nobody’s been eliminated yet. Nobody’s even out of the race yet. Now, of course, some things are more likely than others—but that’s not what hope is about, is it?

So it’s a great time to be a fan. But it’s a horrible time to be a baseball analyst. (That's still a net win, really, as baseball analysts are generally also baseball fans.) Why do analysts suffer? Because there’s an expectation that since there’s baseball going on, and since one is a baseball analyst, one must, well, analyze that baseball. The thing is—there’s not really a whole lot one can say about a month’s worth of ballgames, at least in the way of useful analysis. There’s little we can know in April that we didn’t already know in March.

Small Sample Size

Consider it the sabermetrician’s catechism—a small sample size is not very indicative of future performance.

Here’s what I mean. I took a look at batters from ’04 to ’09. I broke their season down into two parts—April, and everything after. I also took a look at how that batter performed in the prior season.

What I found was that, in comparing April stats to the rest of the season, the root mean square error was worth about .060 points of True Average—a hitter with a TAv of .260 in April would have a TAv between .195 and .314 a total of 68% of the time. When comparing the previous season, I found an RMSE worth about .037 points of True Average—or a TAv between .298 and .224.

What this tells us is that one month’s worth of batting results is much less predictive than the previous season’s batting results. (And in fact a single season isn’t terribly predictive, either.)

Selective Sampling

The reason we have a small sample is because we’re really engaging in a bit of selective sampling. We rarely have but one month’s batting results for any player. While it’s true that only this year’s batting results count toward this year’s games, it’s a wholly arbitrary distinction when it comes to predicting future performance.

And even if this is a batter’s first year in the majors, we still have other information. We have a wealth of minor-league batting stats. We have a sizable body of research on how those stats translate into major-league stats. We have scouting profiles, the player’s age—we have a wealth of information.

Looking simply at April stats isn’t just a case of not having a lot of information, it’s a case of us leaving out the perfectly good information we do have. That act of exclusion makes our analysis less—not more—accurate.

Confirmation Bias

But what if we already have a theory about how good (or bad) a player is? And his stats in April confirm what we already suspected—that he’s on the decline, or that he’s ready to have resurgence? Surely, in those cases, a small sample can tell us something, can’t it?

That’s really the most dangerous case of all. It’s the logical fallacy known as "confirmation bias"—viewing the data in a way that confirms what we think.

We already know that one month of stats is less predictive than a whole season’s (and both are less predictive still than a good projection system, such as PECOTA). It’s dangerous to ignore that in the face of data that supports our point of view. It leads to us dismissing most findings (rightly) due to low predictive value, but cherry-picking the ones that support our predetermined conclusions.

Just Enjoy It

So what’s an analyst to do?Watch some baseball. Enjoy it. But don’t read too much into it.It’s really hard to sit there and constantly say, "We don’t have enough data to say anything" every time someone asks what a player’s performance in April means. But it’s the right thing to say.

Technical Notes

A note about those RMSE figures. What I did was figure a hitter’s runs per out for each of the sample periods. Root mean square error is really exactly what it says on the tin—find the error for each term in the sample, square it, find the average squared error, and take the square root.

In this case, I used a weighted average. For the weight, I took the harmonic mean of outs in each of the three sample periods. (To find the harmonic mean, take the reciprocal of each variable—in other words, divide one by each of the terms. Then find the average. Then take the reciprocal again.)

I then converted these runs per out into True Average using the following formula:

( (LeagueRunsPerOut ± RMSE) / 5 ) ^ (.4)

You need to be logged in to comment. Login or Subscribe
bsolow
4/09
Colin, isn't this just a strong argument for doing some Bayesian updating? Would the forecasts have been better incorporating both the previous years' performance and the first month's statistics?
trsinger
4/15
Not sure if that is a Counting Crows reference up there, but if it is, I like it.