We, as baseball fans and researchers, are fortunate to live in an era where the value of gathering as much data as possible about games has been recognized. As baseball data collection has become more thorough, and has now been done for many years, we’re able to investigate questions that would have been unthinkable (or at least unanswerable) a couple of decades ago. For example, Baseball Prospectus now has a nearly complete database of the pitches and ultimate result from each plate appearance in every game from 1988-2005, and while this is not a huge segment of baseball history, it may be long enough to tease out some recent trends, if any exist.

I was recently wondering about the how the game has changed, even over that relatively short span of time. One of the things you commonly hear about “the old days” is how pitchers threw so many more innings, presumably because they were tougher, less coddled, or generated more testoserone. From a sabermetric perspective, the arguments against that usually include how selective memory focuses on the exceptional pitchers, not the dozens or hundreds of pitchers who were forced to quit early due to a dead arm, or how pitchers nowadays have to expend a full effort on every pitch, unlike in Christy Mathewson‘s day where pitchers could coast until crucial moments of the game, or how even the greats of a generation ago were pitching significantly fewer innings than their predecessors, and how this trend has been almost constant since the inception of professional baseball. While there is truth in all of these arguments, I was interested in a different tack. What if plate appearances themselves require more pitches in the modern game than in years past? If batters have gotten more selective, or better at fouling off pitches, then 30 batters faced in 1950 may have taken less pitcher’s work than 30 batters in 2005 would have.

While we don’t have pitch count data back to 1950 to examine, I wondered if a trend existed even in the 18 year span for which we do have data. Has there been an increase in the pitch length of a PA? That’s easy enough to check.

chart 1

The chart shows that around 1998, the average pitch length rose sharply and has stayed there ever since. And with Barry Bonds being hurt for most of 2005, we can’t just write the whole change off to him.

Of course, there are some complicating factors. Offense rose dramatically between the start and end of our time period. Perhaps rising offense levels cause longer plate appearances. Also, strikeout and walk rates have been rising: from 22.8% of all PA in 1988 to as high as 26.1% in 2000. Strikeouts and walks have higher average pitch lengths than any other offensive outcome. Has the rise in the inability to make contact over time caused the overall increase in PA length?

The second concern is easier to deal with. Rather than focus on all plate apperances, we’ll look at the length of a plate appearance by its final outcome. We can look separately at strikeouts, unintentional walks, hits, non-strikeout outs (outs in play), and hit-by-pitch PAs.

chart two

Though the trend is subtle on the scale of this chart, the pitches per event has, in fact, risen for every individual event category. It isn’t just that more plate appearances are ending in strikeouts or walks. Even hits, outs in play, and HBP are averaging more pitches than they used to.

Another interesting observation is that the lines for hits and NSO (non-strikeout outs) are virtually identical. The number of pitches in a PA ending in a hit is the same as for a PA ending with a batted ball out. While initially surprising, it makes some sense. Given that the PA ends with the batter putting the ball into play (or over the fence), it just means that the relative likelihood of a hit versus an out doesn’t change dramatically as the number of pitches seen rises. Whether the ball ends up as a hit or out is determined after the pitch is thrown and put into play. Because of this, in later parts of this study, we’ll combine the two categories into a single “ball in play” category.

The chart as shown above may not be terribly convincing because we’re showing increases on the order of fractions of pitches while trying to plot values ranging from 2.5 to 6. The lines are going to look rather flat at that scale. However, if we normalize the series of values so that 1988 is 100% and subsequent years are expressed as a percentage of 1988, we can put all the events on the same scale.

chart three

This shows the phenomenon more clearly. Since 1993, every type of PA outcome has required more pitches than they did in 1988 (except for HBP which joined them two years later). For most outcomes, the number of pitches has risen 2-4% in 18 years. Strikeouts have been least affected, rising about 1.5% in pitches per SO, while balls in play have risen 4% or more. The relatively infrequent HBP has been even more volatile, rising from 0.8% lower in 1993 to 7.6% higher in 2005 (though 2005 looks like an aberration, and a 4% rise for HBP is more typical of the past decade).

Of course, offense levels jumped in 1993 and stayed high since then, so the percentage chart doesn’t answer the question of whether the increasing PA length is due to higher run scoring or an unrelated trend developing over the same time period. To answer that question we’ll need to turn away from charts, and towards one of the tools in our statistical toolbox.

Correlations have been used many times in baseball research, so we’ll just restate its purpose briefly for those new to the stats game. Correlation is a number that describes how closely connected two sets of numbers are. If a rise in one group tends to coincide with a rise in the other, the correlation is high (a number close to +1.00). If a rise in one group coincides with a decline in the other, the correlation is negative (a number close to -1.00). And if changes in one group don’t appear to be matched in either direction in the other group, then the values are considered to be uncorrelated (a number close to zero).

Correlations were computed for the number of pitches per PA and three different possible other variables:

  1. Time, measured by the year, ranging from 1988 to 2005
  2. Offense, measured by RA or runs per 9 innings
  3. Noncontact rate, measured by (SO+BB)/PA

Correlations were computed for each of the five types of PA outcomes we’ve been looking at: hits (H), unintentional walks (UBB), strikeouts (SO), non-strikeout outs (NSO), and hit-by-pitch (HBP). The table of results is presented below.

Correlation of # pitches per batting event with different measures

                H    HBP    NSO      SO     UBB
Time       0.8690 0.8869 0.9670  0.8597  0.8928
Offense    0.7281 0.5988 0.7283  0.8496  0.4876
Noncontact 0.7294 0.6949 0.7796  0.8631  0.6135

The strongest correlation for PA length by far was with time, indicating that the lengthening of plate appearances is not a direct result of the increased offense, or even of the higher strikeout and walk rates, but is a change in the way batters and pitchers are approaching their confrontation. Strikeouts were the one category where all three measures were relatively close in correlation, but is also the category that has shown the lowest percentage increase in PA. In the other four categories, the stronger relationship between time and increasing PA length is decisive.

In a future article, we’ll extend this analysis to cover some older pitch count data, and discuss how to build a year-dependent pitch count estimator based on the results.