August 18, 2010
Out of Sequence
All right, so we’ve been looking at how to put a value on the performance of position players—in terms of hitting and fielding. Now let’s move on to pitching.
Evaluating offense is, at least comparatively to everything else, pretty simple—both in principle and in practice. Evaluating fielding is trickier in practice, but I don’t feel it’s any trickier in principle—everyone agrees what a fielder’s job is and what we ought to be measuring.
With pitching—ah, it’s a little more complicated. And it’s a little more complicated in no small part because fielding is more complicated.
When I DIPS, You DIPS
The first major change in how we evaluate pitchers came when someone (probably multiple someones) figured out that a pitcher was not wholly (or even mainly) responsible for how many runs his team scored while he was playing. And there was some understanding that a defense affects a pitcher’s performance, although the methods of measuring defense (errors) and removing them (unearned runs) were rather crude—OK, very crude. And so for a while we had ERA.
Then along came Voros McCracken with probably the most revolutionary finding in sabermetrics:
There is little if any difference among major-league pitchers in their ability to prevent hits on balls hit in the field of play.
A lot of really smart people have devoted a lot of time to trying to poke holes in the assertion; it comes down to how you want to define “little difference.” (And “major-league pitchers,” I suppose.) But the question becomes—how to apply that finding to evaluating major-league pitchers?
What Voros (and others) turned to was a riff off an old Bill James idea, Component ERA, which looked at the ERA indicated by a pitcher’s components (walks, hits, home runs). Almost any ERA estimator based upon DIPS principles (Voros’ original dERA, Tom Tango’s FIP, etc.) is a component ERA estimator, which ignores a pitcher’s hit rate and instead assumes a hit rate based upon his strikeout rate and the league batting average on balls in play.
So on one hand, you have a recording of the runs allowed when a pitcher is on the mound—which indiscriminately credits the pitcher with the performance, good or bad, of his fielders and (in the case of bequeathed runners) his bullpen. On the other hand, you have your DIPS-based estimators, which focus only on the aspects of pitching that show up in raw totals of three true outcomes—the walk, strikeout, and home run. You’re getting an incomplete view either way you look.
Out of Sequence
The biggest thing component estimators throw out is sequencing. We know, for instance, that the following sequence:
Walk, single, fly out, strikeout, single, strikeout
Isn’t necessarily the same as:
Strikeout, single, walk, strikeout, single, fly out
Depending on the breaks, the latter sequence may not score any runs at all. The first one could score two runs, again depending on the breaks. The order of events matters.
Now the pitcher (and to a lesser extent, the catcher) is unique in that they’re involved in every play when they’re in the game. A hitter can’t affect the outcome of the batters before or behind him in the lineup (at least, not directly). Any one fielder may not get a chance to affect any two plays in a row, much less all of them. But the pitcher can affect his sequencing. This isn’t to say that there are (or aren’t) pitchers who can pitch especially well with men on over the course of a career. But when looking at what happened, it’s certainly a part of that.
Now, when we looked at estimating a hitter’s contribution in runs, we ignored the situation the hitter was in, because it allowed us to more easily separate his contributions from his teammates. But we don’t have to do it that way.
What we can do is construct a set of context-specific linear weights. Here, for instance, is the value of an unintentional walk in 2009, based upon the runners on base and the number of outs:
So you can notice the appreciable difference between walking a batter with first base open and walking a batter with a runner already on first. (And walking a runner with the bases loaded always results in a run.)
Now if we do this with every event, we will come pretty close to just duplicating a pitcher’s RA. (We also need to consider the out value of each event—for the three true outcomes, the only one we truly need to worry about is the strikeout, which is worth one out.) But what we want to do is measure the effects of a pitcher’s sequencing, without unduly crediting him for the performance of his fielders. (It should be noted that this already settles the issue of separating a pitcher from the performance of his relievers, as regards bequeathed runners, similar to our current “Fair RA” stat.)
So instead let’s turn to our three types of plays, as defined in our new fielding metric—infield ground ball, infield air ball and outfield plays made. Let’s call everything else a “hit” (even though that’s not exactly true—some are reach on error, etc.) We’ll figure the rate of each play, the number of runs per play, and the number of outs per play:
These are not baselined to average (nor to plate appearances) but instead to runs per out. That way, we can use them the way we’re accustomed to using pitching stats (per inning or per game are just other ways of looking at the per out relationship.)
So if we want to figure the runs and outs generated by each BIP in a given situation, we can simply do something like:
And the same for the outs.
But what we recognize is that there are times when a ground-ball out is more beneficial than an air-ball out. What we can’t do (at least, not without compromising the point of the entire exercise) is look at a pitcher’s actual ground outs and air outs.
But—as you may recall—we have estimates of a pitcher’s ground-ball and air-ball tendencies, constructed as part of our fielding metric. So what we can do is use those to tell us which pitchers may get a beneficial ground ball more often than others. We also have an idea of which pitchers are able to induce more pop-ups than others, and we can account for that as well here.
So once we combine our estimates of runs allowed and our estimates of outs generated, we can easily come up with a measure that looks like RA:
Est_Runs / (Est_Outs/3) * 9 = Fair RA
And, just like with hitting, a lot of people like a rate stat baselined at 100. Now, thanks to the presentation of ERA+ at Baseball Reference, people have come to intuitively expect values greater than 100 to represent better performances and values lower than 100 to represent worse performances—which is of course totally the reverse of the RA/ERA scale.
The problem (at least with the specific method used for ERA+) is that the scale gets thrown off—the difference between an ERA+ of 150 and one of 125 is smaller than the difference between an ERA+ of 125 and 100. (For those interested in a longer discussion of these issues, Patriot has written an invaluable summary.) So we’ll use a different method, originally suggested by GuyM, to produce Fair RA+:
(2—(FairRA/lgFairRA)) * 100
This will produce something with the inverted values people have become accustomed to without the distortions introduced by ERA+.
Obviously, I now have to discuss park factors for offense and defense. I made my life a little easier in this regard—when I estimated the likelihood of one of our four play types occurring in each situation, I adjusted them based upon the tendencies of that park. If a park tends to increase the likelihood of a hit, for instance, I increased (yes, increased) my expectation that a hit occurs.
This seems counterintuitive—after all, this makes the resulting output more park dependent than it would be if I skipped this step entirely. The payoff, however, is that we can use park factors designed for RA without having to adjust for the park’s effects on batted balls. This lets us be more consistent in the park factors we use. More on that later.
And notably this discussion only covers a pitcher’s rates of performance. It doesn’t discuss the quantity of his performance. I’ve been spending a lot of time recently working on how to properly value a workhorse starter (and on the flip side, how to properly value a short reliever). And that’s the direction we’re headed in next.