April is the cruelest month in baseball–unless you happen to be a pitcher. Through the first three weeks of the regular season, the average major league team has scored just 4.48 runs per game, which is the lowest figure through this point in the season since 1992. National Leaguers have seen their home run output drop from 1.10 per game last season to 0.84 this year. The American League is hitting a collective .253, which would represent its lowest total since the league batted just .239 in 1972, the final year before the implementation of the Designated Hitter.

Are we experiencing a shift in baseball’s perpetual tug-of-war back to pitchers? Is this an artifact of small sample size? Or is the weather to blame?

When I fielded this question in my chat last week, I said that “my gut instinct…is that the decline in offense is large enough to be material above and beyond the weather.” After running the numbers, however, I’m now not so sure.

Let’s skip right to the geeky bits. Applying a simple, unpaired t-test reveals that, with greater than 99 percent confidence, we can say that the change in run-scoring output from 2006 to 2007 is not the result of randomness alone. In other words, sample size doesn’t explain away the difference by itself.

It might also be thought that offense is routinely a little bit lower in April than it is throughout the rest of the season, since temperatures are routinely a little bit lower. In fact, the opposite is the case. I surveyed the data on run scoring output in the first three weeks of the regular season for each of the fifty seasons between 1957 and 2006, and then compared it against scoring for the rest of the season onward. On average, teams scored 4.41 runs per game in the first three weeks of the season, and 4.36 runs per game the rest of the way out.

The explanation for this is not that weather is unimportant; cold weather does harm offense, as we’ll discuss in a moment. Rather, it appears that pitchers begin the year a little less prepared than hitters. This is particularly apparent if we look at the data from 1995 and 2006, when spring training was abbreviated because of the strike and the World Baseball Classic, respectively. Hitters started the year very strong in each of these cases.

What is clear, however, is that changes in run scoring output experienced in the first three weeks of the season tend to be fairly ‘sticky’. As detailed in the table below, there were seven instances in which the run-scoring level in the first three weeks of the season represented a change of at least 50 points upward or downward from the previous season. (Note that 2007 is not among these seasons–run scoring is down by 0.38 runs per team game thus far this year, not quite meeting our threshold).

Previous Year    First Three Weeks     Rest of Year
1959    4.30     4.97 (+.67)           4.34 (+.04)
1963    4.46     3.70 (-.76)           3.97 (-.49)
1969    3.41     4.29 (+.88)           4.05 (+.64)
1971    4.34     3.81 (-.53)           3.92 (-.42)
1977    3.96     4.57 (+.59)           4.46 (+.50)
1994    4.60     5.22 (+.62)           4.87 (+.27)
2006    4.58     5.09 (+.51)           4.84 (+.26)

In each of these seven cases, the direction of the arrow did not change as we moved from the first three weeks to the rest of the regular season. In other words, if run scoring was down in the first three weeks, it continued to be down to some extent for the rest of the season, and vice versa. However, also in all seven cases, the magnitude of the change decreased as the season wore on.

We can evaluate this data a bit more precisely by means of a regression analysis, which reveals that run scoring levels during the first three weeks of the season are in fact quite a strong predictor of offense throughout the rest of the year. That variable alone explains 71 percent of the variance in run scoring. If we add the previous year’s run scoring levels to the equation, the R-Squared increases to only 75 percent. In other words, generally speaking, the first three weeks tell us a lot more about where run scoring is going to settle than the entire previous season.

In fact, the previous year’s run scoring levels are of questionable statistical significance at all if we also account for a time trend in the analysis (that is, a linear increase in scoring from year to year, as has generally been the case over the past half-century). A regression model with these three parameters–year-to-date run scoring, previous year’s run scoring, and a time trend–predicts that run output throughout the rest of 2007 will be 4.64 runs per team game, which would be materially down from last year’s figure of 4.86 RPG, but consistent with the run scoring levels in recent seasons like 2005 (4.59 RPG) and 2002 (4.62 RPG). Our naïve conclusion, then, is that the decline in run scoring is “real,” but not of earth-shattering proportions. We need to be careful, though, because this analysis remains pretty naïve.

One reason why the first three weeks can be predictive is they sometimes reflect the early impact of structural changes in the game. If we look at those seasons in which run scoring in the first three weeks was the most divergent from where it finished the previous season, we’ll find that several of the years have a ready explanation attached. In 1963 the mound was raised, and offense declined predictably. In 1969, the mound was lowered, the strike zone was contracted, and the American League was expanded, yielding a very profound increase in run scoring; 1977 was also an expansion year. It shouldn’t be a surprise that changes in run scoring are sticky when these sorts of considerations came into play.

There was no expansion this year, however, nor any alteration to the strike zone, nor any especially significant changes to baseball’s rules. Something was very unusual in the first three weeks of the season, however, and that was the nation’s weather.

Thanks to The Weather Underground, I’ve been able to compile the average high temperatures throughout the first 22 days of April in the 23 baseball cities with an open-air stadium (New York and Chicago are intentionally double-counted). These temperatures are compared against the long-term average high temperatures in each city over the same period:

City        2007       AVERAGE

Baltimore   56.6        63.2
Boston      47.8        54.7
New York    53.4        59.3

Chicago     53.5        56.4
Cleveland   52.9        53.3
Detroit     52.3        55.0
Kansas City 58.9        63.9

Anaheim     70.8        72.1
Arlington   69.3        74.9
Oakland     63.0        65.5

Atlanta     67.7        70.2
Miami       82.4        83.4
New York    53.4        59.4
Philly      55.3        59.3
Washington  58.0        65.0

Chicago     53.5        56.4
Cincinnati  57.8        63.4
Pittsburgh  52.8        59.3
St. Louis   59.0        65.4

Denver      57.4        59.8
L.A.        69.1        72.8
San Diego   63.4        68.5
San Fran.   62.0        64.0

AVERAGE     59.6        63.7

As you can see, the weather explanation is not just a lot of hot air. The decline off normal temperatures has been both deep and broad. Temperatures have been below average at all 23 facilities, and about four degrees below normal overall.

Indeed, while a four-degree difference in temperatures is not trivial, it probably understates the case, because temperatures have not just been below average, but have been wildly inconsistent from week to week. Essentially, the first two or three days of the regular season were played in decent weather, and the past 5-7 days have featured above-average temperatures throughout most of the country. The weather was absolutely brutal, however, for the 10-12 days in between, with temperatures routinely ranging between 10-20 degrees below average throughout large parts of the East Coast and the Midwest. Perhaps one-third of the schedule to date has been played in conditions that are utterly inhospitable to baseball. Since the effects of weather on run scoring are somewhat non-linear–really cold weather hurts offense more than really warm weather helps it–this has made a profound difference.

Moreover, the structure of the decline in run scoring is highly consistent with what we’d expect from inclement weather. First and most obviously, if we break 2007 down into the three weeks of the regular season to date, we find that offense has warmed with the temperatures:

        Week 1 (April 1 - April 8)       4.16 RPG
        Week 2 (April 9 - April 15)      4.19 RPG
        Week 3 (April 16 - April 22)     4.91 RPG

Run scoring last week–the first week of the season played under “normal” weather conditions–rebounded to 4.91 runs per team game, which is almost exactly consistent with where we left off in 2006 (4.86 RPG).

In addition, as Chris Constancio of the Hardball Times discovered, certain aspects of offense respond differently to cold weather than others. In particular, Chris’s numbers suggest that:

  • Walk rates are quite a bit higher (perhaps as much as 8-10% higher) in games played in cold weather than games played in temperate weather;
  • BABIPs are at least 10 points higher in games played in temperate weather as compared to cold weather;
  • Home run rates are quite strongly affected by cold weather; home runs are about 20 percent more common in games played in 65-74 degree weather than games played in 35-54 degree weather;
  • Strikeout rates behave a bit more ambiguously. Although extremely cold weather (less than 54 degrees) increases strikeout rates by 3-5%, merely cool weather does not seem to make much difference.

Thus, what we’d expect to see if weather is the culprit for the decline in offensive output is abnormally low BABIPs and home run rates, abnormally high walk rates, and perhaps a very marginal increase in strikeout rates. In fact, this is just about exactly what we have seen:

Stat        2006        2007        Change
BABIP       .304        .290         -5%
BB/9        3.17        3.54        +12%
HR/9        1.12        0.91        -19%
K/9         6.59        6.59          0%

The decline in offense is accounted for primarily by lower BABIPs and home run rates, as Chris’ model predicts. Walk rates are actually up by 12%–again, consistent with the empirical data–while strikeout rates haven’t been affected one way or another. Simply put, the ball hasn’t been carrying very well, and that’s almost certainly the result of the weather.

So what can we expect the rest of the way out? Long-term forecasts predict that this summer’s temperatures will be normal to slightly above throughout most of North America. I’d tend to look at our naïve model’s prediction of 4.64 RG as a floor, and last week’s output of 4.91 RPG as a ceiling, which gets us somewhere in the range of 4.75-4.80 runs per game for the rest of the season, a mere tick down from last year. The death of offense is greatly exaggerated, and if you’re playing in a fantasy league, it’s a great time to make a move for some cold hitters before the weather heats up.