March 29, 2010
Credit Where It's Due, Part 1
I usually put a warning in my pieces that gory mathematical details are about to follow. This week, it comes at the beginning of the piece. You’ve been warned. Then again, depending on what you like, you might be titillated.
Who is to blame for a strikeout? There are several candidates, after all. Was it the pitcher’s deception and skill? Was it the batter’s inability to read the strike zone? Was it the umpire’s obscenely large strike zone? Was it dumb luck? Was it… Canada?
It’s a deceptively simple question, and the answer is probably some combination of all of the above, but what if we could go deeper than that? What if we could start putting percentages on how much credit and blame can be placed on each player and/or country? I think it’s entirely possible, and the results could be rather enlightening. It’s generally assumed that a strikeout is halfway the fault of the batter and halfway the credit of the pitcher. But that’s just based on a "that makes sense" model rather than any empirical data.
In baseball’s double-entry accounting system, we find someone to place the blame on from the offense’s standpoint that the team now has one fewer out to work with (and the batter makes the most sense), and someone from the pitching team to thank (the pitcher, obviously). There’s a third actor that plays into every event in life, and that’s dumb luck. Sometimes things just happen. I propose a more nuanced look at who’s really to blame for a strikeout.
Where do strikeouts come from (and more importantly, how do we know)?
In what can now safely be called an iconic sabermetric paper, Voros McCracken posited that pitchers were, for the most part, responsible for their strikeouts, walks, and home runs given up, but not for anything that happened when the ball was in play. Since that time, the DIPS theory, as it became known, has been revised and the bold certainty of the original theory has given way to a few more qualifiers, but McCracken opened up the discussion of how Sabermetricians might separate out luck from skill in baseball. In mathematical terms, we’re trying to partition (or as I prefer, "chop up" the variance into its component parts.)
McCracken’s paper was amazingly simple in its methodology considering its broad-reaching implications. Given a respectable minimum of innings pitched, he looked at the year-to-year correlations in a pitcher’s stats and found a high correlation from year to year for home runs, strikeouts, and walks, but not BABIP. Since the pitcher seemed able to repeat his performance (more or less) from year to year on strikeouts, it was considered that he had control. BABIP, on the other hand, was considered to be out of his control, as the year-to-year correlation was rather low, suggesting that it was random chance driving the findings. Such a conclusion is actually a logical fallacy (it has to do with a misunderstanding of null hypothesis testing), but it’s a useful one. One year of BABIP doesn’t tell you much about what next year will bring, and that’s generally how people think. This framework of year-to-year repeatability was the first major framework in figuring out credit and blame.
There’s a small problem with this framework. Year-to-year correlations are very dependent on how many plate appearances/batters faced are included in each season. Consider for a moment that if a baseball season lasted a billion games or so, we’d know exactly what each player was capable of. If a hitter goes 4-for-5 one night, we don’t assume that he’s an .800 hitter. I found that the problem with BABIP is that it takes a long time before we get a reliable number on an individual pitcher, something on the order of 5,000 or so batters faced, but eventually we can tell what a pitcher’s skill level is, or at least was, over those 5,000 batters faced. What it means isn’t that there isn't any pitcher talent involved, only that there’s a very low signal-to-noise ratio. It doesn’t mean there isn't anything interesting to find, only that we have to look a little harder to find it.
Others have used variance partitioning methods that involve looking at the variation among pitcher-seasons, in which they look at the variance that can be observed between pitchers and subtract out the random variance that would be expected given the parameters of the data set. The problem is that the most commonly used formula:
Variance (observed) = Variance (random) + Variance (actual)
... is missing a term. In addition to random variance and actual variance in pitcher abilities, there’s another type of variance that can creep in. It’s measurement bias. Suppose that the measure that we’re using favors one pitcher over another? That would bias the data, but how can that be?
Let’s go back to strikeouts. There’s one other factor that could be driving a pitcher’s (or a batter’s) strikeout rate, and I don’t think it’s been adequately controlled: the quality of the opposition batters/pitchers. Suppose for a moment that a pitcher could face Mark Reynolds all the time. What do you suppose his strikeout rate would look like? It’s a built-in assumption that a pitcher has faced a bunch of hitters who, all strung together, aggregate into a league-average profile, and that everyone faces essentially the same suite of hitters. At first blush, it’s not a bad assumption, but let’s see if it stands up to scrutiny.
For a moment, let’s pretend that a pitcher has no role in his strikeouts, and that whether a plate appearance ends in a strikeout is entirely in the hands of the batter. It’s silly to think that the batter is totally in control, but it’s also silly to assume that the batter would not be involved.
In 2009, which pitchers would have had the highest and lowest "expected" strikeout rate (minimum of 250 batters faced)?
So, if the batter was the sole determinant of whether a plate appearance ended in a strikeout, then Russ Ortiz(!) would have had the third-highest strikeout percentage in baseball. I told you it was silly to pretend that pitchers had no control over their strikeout rate.
Some of the variance from year to year in a pitcher’s strikeout rate (or any sort of stat) could simply be a change in the quality of the opposition, in addition to the usual variation that comes with non-infinite sample sizes. That spread between high and low is about two percentage points, which isn’t gigantic, but not negligible either. Overall, the 2009 sample had a standard deviation .004 or 0.4 percent in the strikeout rate. (Actual observed K rate had a standard deviation of about 5.2 percent.) It may be small, but it is a bias in the data.
A solution to all the world’s problems
We have a problem. Several of them, in fact. No, not poverty, disease, and war. More important than that. We need a method that will allow us to look for effects of variables that have a lot of noise around them and that allow for the interaction of several actors. We also need a method that can allow us to have many more data points than we can get from simply looking at a list of pitcher-seasons. The easiest way to get around the issue of variables with a lot of noise around them is to pump up the size of your data set, but there are only so many pitchers who have thrown seasons with a certain minimum of batters faced. That may not be enough to get reliable results.
This problem needs a different framework. I propose that we use a technique called binary logistic regression, and use it at the plate-appearance level. Here’s how it works with strikeouts. I took all plate appearances from 1993-2009 (excluding intentional walks and plate appearances involving the pitcher batting), and coded them as either ending in a strikeout or not. This gives us a database of 2.9 million data points with which to work (although I used only plate appearances in which a batter with at least 250 PA in that season faced a pitcher with 250 BFP, which left me with a mere two million cases). Binary logistic regression looks at outcomes that are denominated in either a yes or a no and how various independent variables affect the odds of the answer being "yes" or "no."
For each plate appearance, I took the pitcher’s seasonal strikeout rate and the batter’s as well (our best observable guesses as to their true talent levels for the year). I also took the league strikeout rate. For technical reasons, I converted all of these probabilities into odds ratios, and then took the natural log. (For the curious: it helps to normalize the distribution.) I entered all three predictors into our logic regression and pushed play.
The interesting output for our purposes here is which variables pick up how much variance. Binary logic doesn’t operate give off the same R-squared as a regular linear regression, but there is something like it. For the morbidly curious: the change in the -2 log likelihood ratio if the variable were excluded, which are listed below.
Parceling that out, the batter’s K rate overall picks up about 56.0 percent of the explained variance [40289.865 / (40289.865 + 31138.462 + 490.247)], while the pitcher gets 43.3 percent, with the league mean picking up about 0.7 percent.
The word "explained" is important to note in that the overall regression had an overall Naglekerke R-squared value of about 6 percent. There’s a lot more that goes into a strikeout than knowing the approximate skill levels of the pitcher and batter, such as what actually happened during the plate appearance. But this gives us an idea of how much variance can be accounted for by each of the three main actors in this play relative to each other. A batter deserves 56 percent of the blame for his strikeout. The other 44 percent can be chalked up to the thought that he was facing Nolan Ryan in his prime and to the thought that everyone strikes out sometimes.
What then of BABIP? I hope to come back to balls in play later for a more fine-grained look, but I ran a similar regression on the outcome of balls in play (strikeouts, walks, hit by pitches, and home runs were treated as non-events), using the same method as above. The results:
Hits on balls in play have a lot to do with the batter’s prowess, which is expected. What might surprise people is that the number for the pitcher is so high, but one must remember that "pitcher" here includes both the pitcher and his defense. In a future article, I’ll look at how to break things down a little more specifically on that account.
I hope to continue this line of research. Next time, we’ll go over a few more common events (like home runs, walks, hit batsmen, and mascot interference), and then talk about a framework for looking into events that have more moving parts, such as the aforementioned balls in play. Eventually, I’d like to chop up stats like win probability in a little more fine-grained manner. Keep reading. This should be fun.