keyboard_arrow_uptop

“You’re insane.” That’s generally the response I get when I
present the information you’re about to read. I’ve been accused of being
the “epitome of ‘pseudo-stat fan’ gibberish.” I’ve even been
accused of being Aaron Sele writing under a pseudonym. I’m not
entirely sure why my little way of doing things stirs the emotions of
people to such a large extent, but apparently it does.

My belief? Well, simply that hits allowed are not a particularly meaningful
statistic in the evaluation of pitchers.

Now before you accuse me of being Aaron Sele, please bear with me for a few
paragraphs as I explain how I reached this point, and where it led from there.

One of the basic issues in evaluating pitchers is to what extent the
defense behind them is responsible for the results. In fact, in Baseball
Prospectus 2000
, one of Keith Woolner‘s “Hilbert Problems”
for baseball was the issue of separating defense and pitching. As he put
it, “Pitching and fielding are so intertwined that they seem
impossible to separate.”

Around the end of the 1999 season, I started to think about that problem. My
plan was to go about dividing a pitcher’s stat line into what the defense
can’t affect and what it’s possible that it can:

Defense Independent:

Walks, Strikeouts, Home Runs (essentially), Hit Batsmen, Intentional Walks

Defense Dependent:

Wins, Losses, Innings, Runs, Earned Runs, Hits Allowed, Sacrifice
Hits, Sacrifice Flies.

Any stats derived from the defense-dependent ones like OPS against or ERA
or would also be defense dependent.

The idea was to express the things the defense can’t affect in one area and
check the results, then check those areas where it’s possible the defense
can have an effect and analyze how much of the performance is pitching and
how much is defense.

The first thing I did was create something called “Defense Independent
Pitching Stats.” DIPS are the representation of a pitcher’s stat line
without any possible influence from the defense behind the pitcher. I
calculated the various rates for walks, strikeouts, home runs, hit batsmen,
etc. as a function of batters faced, and inserted them into the pitcher’s
line. Then I calculated how many batters faced were remaining, and assigned
league-average rates for all of the other component stats: innings, hits,
doubles, triples, etc. So for all the stats that it was possible that the
defense could affect, every pitcher was now on equal footing. The results,
using Dave Burba‘s line in 2000, looked something like this:

Actual

BFP   IP    H  HR   ER   BB   SO   ERA
848  191  199  19   95   91  180  4.48

Defense Independent

BFP   IP    H  HR   ER   BB   SO   ERA
848  195  185  18   89   93  179  4.13

As you can see, the home runs, walks, and strikeouts changed little (they
changed at all only due to park effects and a few other minor factors). But
hits and innings pitched changed by a decent amount, at least in this case.

The next step was to look at the rest of a pitcher’s stat line and somehow
divine how much of it was the result of the pitcher’s work. To do this, I
looked at the range of values for Defense Independent ERA and compared how
close they were to the range of values of actual ERA. For example, if the
range of Defense Independent ERA was between 4.00 and 5.00, it would be a
good indication that there’s a lot about pitching not covered in the stat,
because ERAs have a much larger range than that.

That didn’t happen. The range was virtually the same as actual ERA, with
the best pitchers having DIPS ERAs near 2.40 and the worst having DIPS ERAs
up near 7.00. I found this surprising, as I expected the range to narrow
quite a bit more than that.

Then, I looked at the behavior of Hits Per Balls in Play
[(H-HR)/(BFP-HR-BB-SO-HB)]. That’s where the trouble really started. I
swear to you that I did everything within my power to come to a different
conclusion than the one I did. I ran every test, checked every stat,
divided this by that and multiplied one thing by another. Whatever I did,
it kept leading back to the same conclusion:

There is little if any difference among major-league pitchers in their
ability to prevent hits on balls hit in the field of play.

It is a controversial statement, one that counters a significant portion of
110 years of pitcher evaluation. Let’s go over the facts that led me to
this conclusion:

  1. As we discussed, the range of ERAs for pitchers is almost as large
    without defense-dependent statistics as it is with them. This speaks to the
    fact that there can be massive differences in the ability of pitchers
    before even considering the impact of defense.

  2. The pitchers who are the best at preventing hits on balls in play one
    year are often the worst at it the next. In 1998, Greg Maddux had
    one of the best rates in baseball, then in 1999 he had one of the worst. In
    2000, he had one of the better ones again. In 1999, Pedro Martinez
    had one of the worst; in 2000, he had the best. This happens a lot.

  3. There is little correlation between what a pitcher does one year in the
    stat and what he will do the next. In other words, what Eric
    Milton
    ‘s hits per balls in play was in 2000 tells us next to nothing
    about what it will be in 2001. This is not true in the other significant
    stats (walks, strikeouts, home runs). Walks and strikeouts correlate very
    well and homers correlate somewhat well.

    This is a crucial fact. One of the more critical aspects of statistical
    analysis is determining how well a statistic reflects an ability. It’s the
    test given to clutch hitting, catcher game-calling, pitcher won/loss
    records, and so on. One of the first things asked when addressing this is
    “Does the stat correlate well with itself from year to year?” One
    reason clutch hitting is questioned is that the “clutch hitters”
    change from year to year, which indicates that it probably isn’t the hitter
    as much as it’s other factors. The answer to whether hits per balls in play
    correlates well from year to year is a fairly solid “no.”

  4. You can better predict a pitcher’s hits per balls in play from the rate
    of the rest of the pitcher’s team than from the pitcher’s own rate. This is
    pretty self-explanatory. The effects of having the same team defense and
    home park appear to be significant determinants in creating what little
    correlation there is in the stat.

  5. Take pitchers with similar stats in every other component category (and
    other peripheral factors like age, throwing hand, team hits per balls in
    play rates, etc.) but large differences in hits allowed (and therefore in
    innings pitched). When you group the pitchers into two
    categories–high-hits and low-hits–the following year the high-hits
    pitchers do not give up significantly more hits per balls in play (.292 to
    .291) than the low-hits pitchers, and the groups have identical ERAs.

    This is a difficult point to overcome if you want to
    show that preventing hits per balls in play is a significant ability of
    pitchers. If, when all other things are equal, there is no difference, the
    conclusion becomes clearer.

  6. Similarly, if you take pitchers with comparable stats in every other
    component category, but have as large as possible a difference in
    strikeouts, then separate the pitchers into high-strikeout and
    low-strikeout categories, the high-strikeout pitchers continue to strike
    out more hitters, while also giving up far fewer hits and having
    significantly lower ERAs.

    This is the natural opposite of the fifth point. If number five is true,
    then logically number six ought to be true as well. It is.

  7. The range of career rates of hits per balls in play for pitchers with a
    significant number of innings is about the same as the range you would
    expect from random chance. This is true even though we know that some
    pitchers may have had consistent advantages over others, as these rates are
    unadjusted for park or league. The vast majority of pitchers who have
    pitched significant innings have career rates between .280 and .290.

  8. When you adjust for environmental advantages (the DH, park effects, and
    so on) the range becomes even smaller. The leaders in this stat (Pete
    Harnisch
    ) have had significant environmental advantages while most of
    the trailers (yup, Aaron Sele) have had disadvantages. After these
    adjustments, the range is well within the realm you could expect from
    chance alone.

  9. A stat like Component ERA (or any similarly stat that calculates ERA
    from the rest of a pitcher’s performance), while correlating better with
    next-year ERA than ERA itself, does not correlate nearly as well with
    next-year ERA as it does if you perform the same calculation while using
    the average hits-allowed rate of the team for which he pitched. This
    advantage of “team average” rate grows to rather large
    proportions as the number of innings pitched in the season shrinks more and
    more.

    Two key points here: one, there doesn’t appear to be any “hidden
    quality” aspect to the stat. The numbers come out as they should if
    the above are all true: you can better predict ERA without hits allowed
    than you can with them. The other key point is that using a reliever’s hit
    rate seems to be an extremely suspect way of evaluating relievers. One of
    my favorite examples of this is Bobby Ayala in 1998 and 1999.

There are a few lesser and somewhat anecdotal points to be made that, while
not critical, are nonetheless good concepts to understand:

  1. People have a hard time diagnosing who the pitchers are that are very
    good at preventing hits on balls in play. You’ll often hear people use
    names like Randy Johnson, Jamie Moyer and Andy
    Pettitte
    in protest of the concept, but by any definition you want to
    use, these guys are not particularly good in the stat.

  2. Pitchers like Pedro Martinez and Greg Maddux have, at times, expressed
    thoughts on the matter. Martinez has been quoted as saying that the batter
    determines what happens once he hits the ball. Maddux described his
    scoreless-inning streak last year as “mostly luck” as hard hit
    balls that had been falling in were being caught.

  3. We only have 38 innings’ worth of non-pitchers’ pitching (like Brent
    Mayne
    ). That’s too small a sample on which to draw conclusions, but it
    is something to think about that these non-pitchers were not any worse than
    regular pitchers in the stat. In fact, they were a good bit better.

  4. Pitchers are often dubbed as “unpredictable”, and hits allowed
    is by far the most unpredictable of the component stats. In other words, it
    is one of the main culprits of pitcher unpredictability.

  5. There is no significant cross-correlation. That is, a high number of
    home runs allowed doesn’t really mean anything in determining how many hits
    per balls in play the pitcher will allow. The closest is an inverse
    relationship with strikeouts (lots of strikeouts means fewer hits per balls
    in play) but that relationship is very weak and could be the result of
    unrelated factors. There was no significant hits-per-balls-in-play
    advantage found in the strikeout study above.

Many people, after reading these points, think I’m saying that all pitchers
give up the same amount of hits. That’s not true, and of course it’s not
what I’m saying. Randy Johnson gives up fewer hits than Scott Karl.
That’s not because batters hit the ball harder off Karl than Johnson, but
because they hit the ball more often off Karl than Johnson.

Aside from walks, there are two basic outcomes for a pitcher: batter hits
the ball or batter strikes out. With the latter, the result is almost
always an out. With the former, all sorts of things can happen, including a
base hit.

So why is this all true? All I can advance are theories, some that can be
checked out and some that are more difficult to verify. I’ll end this
article with a list of some of the more popular ones:

  1. Scouting. The MLB scouting network is set up to sift through an
    enormous pool of potential players to get to the group that might be MLB
    pitchers. To do this, they often employ tactics that many might call unfair
    in an effort to reduce the pool to a manageable number. So they don’t take
    guys under 5’10” and every pitcher has to throw a certain speed
    fastball and so on. One of these factors may be weeding out a subset of
    pitchers for which the theory is not true.

  2. High talent level. This theory is that there’s a certain limit as
    to how good you can get at preventing hits on balls in play, and that in
    order to even come close to the major leagues you have to have reached
    this. This theory often comes up in clutch-hitting discussions.

  3. Too many variables. This suggests that the ability may or may not
    exist, but that the number of variables involved in the outcome of balls in
    play are so numerous and so difficult to control for that any ability gets
    lost. In other words, the noise completely masks any signal.

  4. A misunderstanding of how the batter/pitcher dynamic works. Some
    people will argue that despite all the numbers, the above can’t be true
    because it means that a screaming line drive hit into the
    right-center-field gap is as likely to be an out as a pop-up to the shortstop.

    This point deserves further discussion. One of the critical points of
    misunderstanding is the issue of “blame.” When a ball gets
    crushed into the gap in right-center, some think I’m saying that the
    defense deserves the blame, not the pitcher. When I counter with
    “Neither the pitcher nor defense is to blame, it’s the batter who is
    to blame,” I lose some people. Consider this example:

    When I was a kid, we used to go to the cemetery (this was our playground)
    and play a game called Lob-League. The makeup of this game was mostly
    offense and some fielding, with little to no pitching effects. The
    pitcher’s job was to lob the ball over the heart of the plate and let the
    batter hit it as hard as he wants.

    Now, let’s suppose we’re playing Lob-League and the pitcher lobs one right
    in the batter’s wheelhouse, but the batter pops it up to the shortstop. Who
    deserves credit for the pop-up? The blame argument would indicate that the
    pitcher deserves credit for inducing a pop-up despite the fact that all he
    did was lob the ball over the plate. No credit or blame would belong to the
    batter who popped up the pitch.

    A more relevant MLB example might be the Home Run Derby at the All-Star
    festivities. I encourage you to watch next year’s contest, or, if you have
    it, a videotape of past contests. Watch for batted balls that would clearly
    be outs. The pitcher is trying to give up home runs, so does he deserve
    credit for a pop-up?

    In MLB, a pitch could result in a pop-up or a line drive. It all depends on
    what the batter does with it. I think the conventional wisdom on the
    dynamic between pitcher and batter may be slightly inaccurate.

The critical thing to understand is that major-league pitchers don’t appear
to have the ability to prevent hits on balls in play. There are many
possible reasons why this is the case, and I don’t really have a concrete
idea as to why it is.

But the one thing I do know is that it is the case.


Voros McCracken is a student
living in Chicago. He will be writing a weekly column called the
“Baseball Skeptic” in an upcoming webzine.
For more information on DIPS and McCracken’s work, check out
his Web page at

http://www.baseballstuff.com/mccracken
.