April 29, 2016
Goodbye, April: You Are Not Special
Early season baseball is full of articles about “What we’ve learned so far” after a week, or two weeks, or a month of play. You can’t really blame the sportswriters and TV sports producers and podcast hosts who come up with these pieces. They have to talk about something, and there aren’t any pennant races or awards competitions to discuss in April.
As Russell Carleton has demonstrated, though, most measures of baseball performance take far longer than a week or three to stabilize. Drawing conclusions from a 10- or 20-game sample is akin to statistics problem sets involving drawing balls from an urn. A really, really big urn. With lots and lots of balls in it. When you draw a few balls from a really, really big urn with lots and lots of balls in it, you don’t get a good picture of what’s really in the urn.
But how useless are April statistics? Are they worse than those from other months? On one hand, last April Andrew McCutchen batted .194/.302/.333 and Jose Iglesias batted .377/.427/.536. Jon Lester had a 6.23 ERA while Ubaldo Jimenez’s was 1.59. Those weren't particularly durable figures. On the other hand Dallas Keuchel’s 0.73 April ERA and Josh Donaldson’s .319/.370/.549 April batting line were.
We can look at the relevance of April numbers by correlating them to players’ full-year figures, and comparing the correlation in April to that of May, June, July, August, and September. (Throughout this analysis, April includes a few days of March play in the relevant years, and September includes a few days of October games.) To do this, I selected batting title and ERA qualifiers from each of the past 10 seasons and compared their monthly results to their full-year results. I had a sample of 1,487 batter seasons with corresponding monthly data in about 87 percent of months and 850 pitcher seasons with corresponding monthly data in 86 percent of months.
Admittedly, there’s a selection bias in April data, and it applies mostly to young players. Since I’m comparing monthly data to full-year data for batting title and ERA qualifiers, I’m selecting from those players who hung around long enough to compile 502 plate appearances or 162 innings pitched. If you’re a young player who puts up a .298/.461/.596 batting line in April, as Joc Pederson did last April, you get to stick around to get your 502 plate appearances, even though 261 of your plate appearances occurred during July, August, and September, when you hit .170/.300/.284. On the other hand, if you bat .147/.284/.235 in April, as Rougned Odor did, you do get a chance to bat .352/.426/.639 in 124 plate appearances spread between May and June, but you get them in Round Rock instead of Arlington. So there’s a bias in this analysis in favor of players who perform well in April (giving them a chance to continue to play) compared to those who don’t (who may get shipped out). This shouldn’t have a big impact on the overall variability of April data, though, since the presence of early-season outperformers like Pederson who get full-time status on the strength of their April is canceled, to an extent, by early-season underperformers like Odor who don’t.
So is April more predictive than other months? Here’s a chart for batters, using OPS as the measure, comparing the correlation between batters’ full-year performance and that of each month.
We’re looking at pretty small differences here. The correlation coefficients range from 0.563 to 0.652. But given that, far from being dispositive, April turns out to be the least descriptive month of the season. Player performance in April is less correlated to full-year performance than it is in any other month.
Here’s a similar graph, looking at ERA for pitchers compiling 162 or more innings pitched:
Or, if you prefer, FIP:
The three tables concur: April is the least descriptive month of the year. We learn less in April than we learn in any other month.
“Wait,” one might say, “there are fewer games in April than in the other months, so there are fewer plate appearances and innings pitched, so sample sizes are smaller and therefore more prone to fluctuation.” And one would be right! But explaining why the correlation may be lower in April doesn’t excuse it. The premise is that as April goes, so goes the season. The fact is, April has proved to be less predictive of full-season performance than the other five months.
To see how much less predictive it can be, I selected the players for whom the difference between their April and full-year performance was the greatest. For pitchers, this is a somewhat uninteresting list, because as any fantasy owner can tell you, one really bad outing can totally croak a pitcher’s monthly numbers. And in terms of magnitude, a big miss to the downside—the April in which a pitcher gets shelled—is going to swamp a big miss to the upside, since there’s a lower limit on ERA and FIP but no upper limit. So the widest variations are uniformly for pitchers who had lousy months. With that in mind, four Aprils stood out:
· CC Sabathia, April 2008: This was easily the worst April for a pitcher having a good season over the past decade. In six April starts, he compiled a 7.88 ERA and 5.19 FIP. This wasn’t a case of one or two bad starts; in his first four starts of the year, he allowed five, four, nine, and nine runs, allowing five homers, 14 walks, and 14 strikeouts over 18 innings. By the end of the year, he got downballot Cy Young support after being traded to Milwaukee. He had a 1.95 ERA and 2.58 FIP for the rest of the year.
· Ryan Dempster, April 2011: He didn’t have a great season (4.80 ERA, 3.90 FIP) but he really stunk in April, allowing at least four runs in all six of his starts en route to a 9.58 ERA and 6.57 FIP. He had a 3.94 ERA and 3.42 FIP after that.
· Brad Radke, April 2006: No ERA qualifier over the past 10 years had a greater disparity between April FIP (7.67) and full-year FIP (4.66). He somehow got credited with two wins over five starts in April, during which he allowed 10 homers, giving up six runs in three games and four in the other two. He had a 3.44 ERA and 4.07 FIP in the remainder of his final season.
· Clay Buchholz, April 2012: Another mediocre season (4.56 ERA, 4.65 FIP) trashed by a terrible April (8.69 ERA, 6.89 FIP) in which he allowed at least five runs in all of his starts, gave up five homers in one of them, and had a 16/15 K/W ratio in 29 innings. Batters hit .410/.554/.964 against him during the month. He had a 3.81 ERA and 4.25 FIP the rest of the way.
The list of batters whose April numbers contrasted with their full-year results is more interesting. Here is the top 10, ranked by difference between April OPS and full-season OPS:
First, let us reflect on prime Albert Pujols, when a .328/.414/.628 line over five months constitutes a letdown.
Second, these examples, while obviously the cherry-picked extremes, reinforce the point that April is only one month of a long season and, as illustrated above, the least predictive. A player having a unexpectedly good month could be breaking out. Or he could be 2009 Brandon Inge. April gives us no better clue, and, in aggregate, a worse one, than any other month.