**Intro**

Voros McCracken’s article here on the BP website has generated a tremendous amount of attention, including at least two columns by Rob Neyer on his ESPN.com gig. Voros’s research, as presented in the article, contains the remarkable result that “major-league pitchers don’t appear to have the ability to prevent hits on balls in play.”

In other words, when you remove defense-independent outcomes such as strikeouts, home runs, and walks from the batters a pitcher faces, the resulting batting average on “balls in play” (those handled by the defense) is not affected by the pitcher himself — **Pedro Martinez** is the same as **Jose Lima**.

Extraordinary claims require extraordinary proof, and Voros has done an exemplary job setting forth evidence for his conclusions. In the interest of putting his results under the scrutiny of peer review, I’ve been doing some work inspired by Voros’s article to see if it was possible to poke holes in it.

**Sample Sizes and Statistical Noise**

I started by considering the amount of statistical noise present in measuring the ball in play average over the course of a season.

Consider a pitcher with the following typical seasonal pitching line:

GS IP H HR BB SO 33 200 200 24 78 122

Let’s estimate the balls-in-play average for this pitcher:

Total Batters Faced ~= 3*IP + H + BB = 3*200 + 78 + 200 = 878 Defense-Independent Outcomes = HR + BB + SO = 24 + 78 + 122 = 224 Non-HR Hits allowed = H - HR = 200 - 24 = 176Balls In Play = TBF - DIO = 878 - 224 = 654

Ball-in-play Average = Non-HR Hits / Balls In Play = 176 / 654 = .269

Let’s assume that .269 represents the pitcher’s (or the league’s) true ability to prevent hits on balls in play. What is the expected variance around this number that we’d expect to see over the course of a typical season.

We look to a statistical concept called standard deviation to help us quantify this range. We can treat a ball-in-play as a binomial variable consisting of 2 outcomes: 1 if a hit, 0 if not. The probability of being a hit is the ball-in-play average of .269.

Though we expect to see 176 hits over the 654 balls-in-play, statistical theory allows us to calculate the variance and standard deviation associated with this expectation. In particular:

Variance = (# trials) * (probability of a hit) * (probability of an out) = 654 * .269 * (1-.269) = 654 * .269 * .731 = 128.602Standard Deviation = SQRT (Variance) = 11.34 hits

We expect to see a standard deviation of about +/- 11 hits over the course of a season, simply by the statistical noise of a sample size of 654 balls-in-play. What effect does this have on the observed ball-in-play averages?

If you look at hundreds of pitchers with a .269 ball-in-play average, about two-thirds of the seasons will be within a single standard deviation, and 95% of seasons will be within two standard deviations.

(176 - 11.34) / 654 = .252 (176 + 11.34) / 654 = .286

One standard deviation yields a range of [.252, .286]

(176 - 2*11.34) / 654 = .234 (176 + 2*11.34) / 654 = .304

Two standard deviations yields a range of [.234, .304]

Simple statistical noise coming from a single season’s worth of data will obscure true variations between pitchers over a pretty wide range of possible abilities. Failure to detect year-to-year consistency in ball-in-play average (as Voros has found) could be the result of there being no differences in ability (as he concluded), or that the range of abilities is smaller than the statistical noise, making detection difficult.

**Characteristics of Ball-in-Play Batting Average over a Career**

To see which of these possibilities is true, we can look at pitcher performance over multiple seasons to reduce the amount of statistical noise.

I looked at play by play data for all pitchers between 1979 and 1999, and computed actual batters faced, hits, home runs, strikeouts, and walks for each season. I wanted to divide a pitcher’s entire career into two halves, and see if the ball-in-play hit rate from one half was correlated at all with that in the other half.

I could have simply taken the first 50% of a pitcher’s career and compared it to the latter 50%. However, there is the possibility that if an ability exists, that it could improve or decline as a pitcher’s career evolves. Age, maturity, learning how to pitch, velocity, etc. could all change the relationship between the pitcher’s first and second halves of his career. Therefore, I chose to divide his career into even and odd halves, according to the year of that particular season. Thus a pitcher’s 1990 season was in the “even” group, and 1991 was in the “odd” group, 1992 in the “even” group, and so on. By mixing seasons throughout a player’s career into different groups, I could minimize the possibility that a pitcher’s ability could be different at different ages, confounding the analysis.

The next thing I wanted to do was control for the overall quality of the pitcher’s team defense. A pitcher who spent his entire career with a poor defensive team may show a spurious correlation between portions of his career, simply because the team was allowing more hits than the league as a whole. To control for this, I expressed a pitcher’s ball-in-play average as a percentage of his team’s total defensive ball-in-play average (similar to Bill James’s defensive efficiency). Thus, a pitcher with a ball-in-play average of .270 on a team who’s total ball-in-play average was .280 would have his ball-in-play average expressed as the ratio of .270/.280 or 0.96. This is similar to how Total Baseball expresses OPS as Adjusted Production (PRO+) where a value of 110 represents 10% above the league average, or a value of 80 is 20% below league average.

Having divided a pitcher’s career into two halves (using even/odd seasons), and controlling for his team’s defense, I totaled each pitcher’s half-career ball-in-play ratio.

e.g.

PITCHER EVEN_AVG EVEN_RATIO EVEN_BIP ODD_AVG ODD_RATIO ODD_BIP Morris,Jack .268 -0.9% 5945 .268 -0.6% 5420 Martinez,Dennis .266 -3.0% 5410 .273 -0.2% 6166 Viola,Frank .280 -1.7% 4450 .287 +1.5% 4481 Saberhagen,Bret .288 +1.6% 3670 .270 -6.3% 4296 etc.

I looked at all pitchers that had at least 3000 balls in play in both the even and odd halves of their career — roughly 5 full seasons in each half (thus at least a 10 year career). I then looked at the correlation between the ratios of their ball-in-play averages to their team’s average. A perfect correlation of +1.00 would indicate a totally predictable linear relationship between ball-in-play rates in the two halves of pitchers’ careers, whereas a value of 0.00 would indicate that there’s absolutely no statistical relationship. Values in between would indicate the degree to which knowing the value in one half would help you predict the value in the other half.

The correlation among the 70 pitchers who met the threshold was +0.53. Not a perfect correlation, but well above the level at which one could claim that there’s no relationship. Some level of innate ability seems to be appearing in our sample.

We can further test the theory that all differences are due to chance but looking at the distribution of observed ball-in-play averages for the pitchers in the sample. Comparing the distribution of measured ball-in-play averages versus what the null hypothesis that all observed differences are due to chance, we find the following results:

40% exceed 1 standard deviation away from the mean (versus ~32% predicted by chance) 24% exceed 2 standard deviations away from the mean (versus ~5% predicted by chance)

This shows us that a significant number of pitchers are further away from the mean than we’d expect from simple randomness. With the relatively high correlation, and the number of pitchers falling outside the range attributable to chance, the evidence is starting to show that, at the very least, pitchers with long careers in the majors do have distinct abilities to prevent balls in play from becoming hits.

We can see this more clearly be looking at a chart plotting rates in even-year rates vs. odd-year rates:

There is a distinct linear trend showing a relationship between the distinct halves of a pitcher’s career. Though there is clearly a lot of variance around that line, some level of ability to influence ball-in-play rate is demonstrated by the chart above.

Looking at the range of ball-in-play rates for the 70 pitchers in the sample, the span from best to worst is roughly .251 (**Charlie Hough**) to .303 (**John Burkett**). If we expand the data set to include pitchers with at least 1000 balls-in-play in both halves of their careers (a total of 338 pitchers), the range is .242 (**Mike Norris**) to .317 (**Aaron Sele**). It’s reasonable to estimate that the range in ability between the best and worst pitchers is around 50-60 points of batting average on balls in play.

Consequently, though the pitcher’s previous season may not be particularly useful in predicting future ball-in-play averages, using the pitcher’s career ball-in-play average may be more accurate, especially for pitchers who have several major league seasons under their belt (though additional research will be needed to demonstrate or refute this theory).

**Analysis & Conclusions**

How, then, do we reconcile Voros’s findings with the new results presented above? There are several alternatives, including:

- As suggested by the analysis of the standard deviation over a single season, one year’s worth of data may not be sufficient to discern a pitcher’s true level of ability with regard to ball-in-play average. It’s only when many years of data are examined that the trends become clearer.
- Another possibility is that while the majority of pitchers have no such ability to affect ball-in-play average, a few special pitchers do have such an ability. Those whose ability allows them to systematically reduce their ball-in-play average below the overall league average would have a survival advantage, and therefore are more likely to be the pitchers whose careers are long enough to be detected in a 10-year sample. Slightly more than half (56%) of the longer-career 70 pitcher sample had ball-in-play rates below the median rate of the 338 pitcher sample that includes shorter careers. This lends some support to the notion that the long-career pitchers are somewhat better at reducing balls-in-play than their more typical counterparts. However, the evidence for this theory is not overwhelming.
- Another line of reasoning might go as follows: When pitchers first arrive in the majors, they are still learning their craft, and in particular have not yet learned how to tailor their pitching to the defense behind them. Over time, playing with various teammates and parks over the years, a pitcher learns more about how to maximize the benefit they get from their defense. Pitchers who’ve mastered this skill get more effective defensive play behind them than less experienced pitchers on the same staff. This wisdom that comes with experience allows veteran pitchers to systematically do better with their defense, thus a skill related to preventing balls in play from becoming hits emerges.

Additional research and analysis will be needed to determine whether options 2 or 3 (or another theory) are supported by the evidence. At the moment, the simplest explanation that is consistent with the facts should be favored, and thus the idea that the statistical noise in one year’s worth of data obscures a relatively small true difference among pitchers may be the most likely explanation.

None of the preceding should be taken as diminishing Voros’s work at all. Indeed, it is his pioneering work and startling conclusion that led to widespread interest and further analysis on the topic. Even the conclusion is not quite as radical as saying pitchers have no influence on ball-in-play batting averages, the observation that differences among pitchers in batting average on balls-in-play is as little as 50 points (and require several years of data to discern) is a remarkable, counterintuitive result for which we gratefully acknowledge Mr. McCracken.