While looking toward the future with our comprehensive slate of current content, we'd also like to recognize our rich past by drawing upon our extensive online archive of work dating back to 1997. In an effort to highlight the best of what's gone before, we'll be bringing you a weekly blast from BP's past, introducing or re-introducing you to some of the most informative and entertaining authors who have passed through our virtual halls. If you have fond recollections of a BP piece that you'd like to nominate for re-exposure to a wider audience, send us your suggestion.
Now that BABIP has long since hit the mainstream, join us in flashing back to the day when Voros changed how we thought about pitching and defense, just over ten years after his landmark article originally ran on January 23, 2001.
"You're insane." That's generally the response I get when I present the information you're about to read. I've been accused of being the "epitome of 'pseudo-stat fan' gibberish." I've even been accused of being Aaron Sele writing under a pseudonym. I'm not entirely sure why my little way of doing things stirs the emotions of people to such a large extent, but apparently it does.
My belief? Well, simply that hits allowed are not a particularly meaningful statistic in the evaluation of pitchers.
Now before you accuse me of being Aaron Sele, please bear with me for a few paragraphs as I explain how I reached this point, and where it led from there.
One of the basic issues in evaluating pitchers is to what extent the defense behind them is responsible for the results. In fact, in Baseball Prospectus 2000, one of Keith Woolner's "Hilbert Problems" for baseball was the issue of separating defense and pitching. As he put it, "Pitching and fielding are so intertwined that they seem impossible to separate."
Around the end of the 1999 season, I started to think about that problem. My plan was to go about dividing a pitcher's stat line into what the defense can't affect and what it's possible that it can:
Walks, Strikeouts, Home Runs (essentially), Hit Batsmen, Intentional Walks
Wins, Losses, Innings, Runs, Earned Runs, Hits Allowed, Sacrifice Hits, Sacrifice Flies.
The idea was to express the things the defense can't affect in one area and check the results, then check those areas where it's possible the defense can have an effect and analyze how much of the performance is pitching and how much is defense.
The first thing I did was create something called "Defense Independent Pitching Stats." DIPS are the representation of a pitcher's stat line without any possible influence from the defense behind the pitcher. I calculated the various rates for walks, strikeouts, home runs, hit batsmen, etc. as a function of batters faced, and inserted them into the pitcher's line. Then I calculated how many batters faced were remaining, and assigned league-average rates for all of the other component stats: innings, hits, doubles, triples, etc. So for all the stats that it was possible that the defense could affect, every pitcher was now on equal footing. The results, using Dave Burba's line in 2000, looked something like this:
As you can see, the home runs, walks, and strikeouts changed little (they changed at all only due to park effects and a few other minor factors). But hits and innings pitched changed by a decent amount, at least in this case.
The next step was to look at the rest of a pitcher's stat line and somehow divine how much of it was the result of the pitcher's work. To do this, I looked at the range of values for Defense Independent ERA and compared how close they were to the range of values of actual ERA. For example, if the range of Defense Independent ERA was between 4.00 and 5.00, it would be a good indication that there's a lot about pitching not covered in the stat, because ERAs have a much larger range than that.
That didn't happen. The range was virtually the same as actual ERA, with the best pitchers having DIPS ERAs near 2.40 and the worst having DIPS ERAs up near 7.00. I found this surprising, as I expected the range to narrow quite a bit more than that.
Then, I looked at the behavior of Hits Per Balls in Play [(H-HR)/(BFP-HR-BB-SO-HB)]. That's where the trouble really started. I swear to you that I did everything within my power to come to a different conclusion than the one I did. I ran every test, checked every stat, divided this by that and multiplied one thing by another. Whatever I did, it kept leading back to the same conclusion:
There is little if any difference among major-league pitchers in their ability to prevent hits on balls hit in the field of play.
It is a controversial statement, one that counters a significant portion of 110 years of pitcher evaluation. Let's go over the facts that led me to this conclusion:
As we discussed, the range of ERAs for pitchers is almost as large without defense-dependent statistics as it is with them. This speaks to the fact that there can be massive differences in the ability of pitchers before even considering the impact of defense.
The pitchers who are the best at preventing hits on balls in play one year are often the worst at it the next. In 1998, Greg Maddux had one of the best rates in baseball, then in 1999 he had one of the worst. In 2000, he had one of the better ones again. In 1999, Pedro Martinez had one of the worst; in 2000, he had the best. This happens a lot.
There is little correlation between what a pitcher does one year in the stat and what he will do the next. In other words, what Eric Milton's hits per balls in play was in 2000 tells us next to nothing about what it will be in 2001. This is not true in the other significant stats (walks, strikeouts, home runs). Walks and strikeouts correlate very well and homers correlate somewhat well.
This is a crucial fact. One of the more critical aspects of statistical analysis is determining how well a statistic reflects an ability. It's the test given to clutch hitting, catcher game-calling, pitcher won/loss records, and so on. One of the first things asked when addressing this is "Does the stat correlate well with itself from year to year?" One reason clutch hitting is questioned is that the "clutch hitters" change from year to year, which indicates that it probably isn't the hitter as much as it's other factors. The answer to whether hits per balls in play correlates well from year to year is a fairly solid "no."
You can better predict a pitcher's hits per balls in play from the rate of the rest of the pitcher's team than from the pitcher's own rate. This is pretty self-explanatory. The effects of having the same team defense and home park appear to be significant determinants in creating what little correlation there is in the stat.
Take pitchers with similar stats in every other component category (and other peripheral factors like age, throwing hand, team hits per balls in play rates, etc.) but large differences in hits allowed (and therefore in innings pitched). When you group the pitchers into two categories–high-hits and low-hits–the following year the high-hits pitchers do not give up significantly more hits per balls in play (.292 to .291) than the low-hits pitchers, and the groups have identical ERAs.
This is a difficult point to overcome if you want to show that preventing hits per balls in play is a significant ability of pitchers. If, when all other things are equal, there is no difference, the conclusion becomes clearer.
Similarly, if you take pitchers with comparable stats in every other component category, but have as large as possible a difference in strikeouts, then separate the pitchers into high-strikeout and low-strikeout categories, the high-strikeout pitchers continue to strike out more hitters, while also giving up far fewer hits and having significantly lower ERAs.
This is the natural opposite of the fifth point. If number five is true, then logically number six ought to be true as well. It is.
The range of career rates of hits per balls in play for pitchers with a significant number of innings is about the same as the range you would expect from random chance. This is true even though we know that some pitchers may have had consistent advantages over others, as these rates are unadjusted for park or league. The vast majority of pitchers who have pitched significant innings have career rates between .280 and .290.
When you adjust for environmental advantages (the DH, park effects, and so on) the range becomes even smaller. The leaders in this stat (Pete Harnisch) have had significant environmental advantages while most of the trailers (yup, Aaron Sele) have had disadvantages. After these adjustments, the range is well within the realm you could expect from chance alone.
A stat like Component ERA (or any similarly stat that calculates ERA from the rest of a pitcher's performance), while correlating better with next-year ERA than ERA itself, does not correlate nearly as well with next-year ERA as it does if you perform the same calculation while using the average hits-allowed rate of the team for which he pitched. This advantage of "team average" rate grows to rather large proportions as the number of innings pitched in the season shrinks more and more.
Two key points here: one, there doesn't appear to be any "hidden quality" aspect to the stat. The numbers come out as they should if the above are all true: you can better predict ERA without hits allowed than you can with them. The other key point is that using a reliever's hit rate seems to be an extremely suspect way of evaluating relievers. One of my favorite examples of this is Bobby Ayala in 1998 and 1999.
There are a few lesser and somewhat anecdotal points to be made that, while not critical, are nonetheless good concepts to understand:
People have a hard time diagnosing who the pitchers are that are very good at preventing hits on balls in play. You'll often hear people use names like Randy Johnson, Jamie Moyer and Andy Pettitte in protest of the concept, but by any definition you want to use, these guys are not particularly good in the stat.
Pitchers like Pedro Martinez and Greg Maddux have, at times, expressed thoughts on the matter. Martinez has been quoted as saying that the batter determines what happens once he hits the ball. Maddux described his scoreless-inning streak last year as "mostly luck" as hard hit balls that had been falling in were being caught.
We only have 38 innings' worth of non-pitchers' pitching (like Brent Mayne). That's too small a sample on which to draw conclusions, but it is something to think about that these non-pitchers were not any worse than regular pitchers in the stat. In fact, they were a good bit better.
Pitchers are often dubbed as "unpredictable", and hits allowed is by far the most unpredictable of the component stats. In other words, it is one of the main culprits of pitcher unpredictability.
- There is no significant cross-correlation. That is, a high number of home runs allowed doesn't really mean anything in determining how many hits per balls in play the pitcher will allow. The closest is an inverse relationship with strikeouts (lots of strikeouts means fewer hits per balls in play) but that relationship is very weak and could be the result of unrelated factors. There was no significant hits-per-balls-in-play advantage found in the strikeout study above.
Many people, after reading these points, think I'm saying that all pitchers give up the same amount of hits. That's not true, and of course it's not what I'm saying. Randy Johnson gives up fewer hits than Scott Karl. That's not because batters hit the ball harder off Karl than Johnson, but because they hit the ball more often off Karl than Johnson.
Aside from walks, there are two basic outcomes for a pitcher: batter hits the ball or batter strikes out. With the latter, the result is almost always an out. With the former, all sorts of things can happen, including a base hit.
So why is this all true? All I can advance are theories, some that can be checked out and some that are more difficult to verify. I'll end this article with a list of some of the more popular ones:
Scouting. The MLB scouting network is set up to sift through an enormous pool of potential players to get to the group that might be MLB pitchers. To do this, they often employ tactics that many might call unfair in an effort to reduce the pool to a manageable number. So they don't take guys under 5'10" and every pitcher has to throw a certain speed fastball and so on. One of these factors may be weeding out a subset of pitchers for which the theory is not true.
High talent level. This theory is that there's a certain limit as to how good you can get at preventing hits on balls in play, and that in order to even come close to the major leagues you have to have reached this. This theory often comes up in clutch-hitting discussions.
Too many variables. This suggests that the ability may or may not exist, but that the number of variables involved in the outcome of balls in play are so numerous and so difficult to control for that any ability gets lost. In other words, the noise completely masks any signal.
A misunderstanding of how the batter/pitcher dynamic works. Some people will argue that despite all the numbers, the above can't be true because it means that a screaming line drive hit into the right-center-field gap is as likely to be an out as a pop-up to the shortstop.
This point deserves further discussion. One of the critical points of misunderstanding is the issue of "blame." When a ball gets crushed into the gap in right-center, some think I'm saying that the defense deserves the blame, not the pitcher. When I counter with "Neither the pitcher nor defense is to blame, it's the batter who is to blame," I lose some people. Consider this example:
When I was a kid, we used to go to the cemetery (this was our playground) and play a game called Lob-League. The makeup of this game was mostly offense and some fielding, with little to no pitching effects. The pitcher's job was to lob the ball over the heart of the plate and let the batter hit it as hard as he wants.
Now, let's suppose we're playing Lob-League and the pitcher lobs one right in the batter's wheelhouse, but the batter pops it up to the shortstop. Who deserves credit for the pop-up? The blame argument would indicate that the pitcher deserves credit for inducing a pop-up despite the fact that all he did was lob the ball over the plate. No credit or blame would belong to the batter who popped up the pitch.
A more relevant MLB example might be the Home Run Derby at the All-Star festivities. I encourage you to watch next year's contest, or, if you have it, a videotape of past contests. Watch for batted balls that would clearly be outs. The pitcher is trying to give up home runs, so does he deserve credit for a pop-up?
In MLB, a pitch could result in a pop-up or a line drive. It all depends on what the batter does with it. I think the conventional wisdom on the dynamic between pitcher and batter may be slightly inaccurate.
The critical thing to understand is that major-league pitchers don't appear to have the ability to prevent hits on balls in play. There are many possible reasons why this is the case, and I don't really have a concrete idea as to why it is.
But the one thing I do know is that it is the case.