Premium and Super Premium Subscribers Get a 20% Discount at MLB.tv!
September 8, 2010
Solving the Mays Problem
So, we’ve been talking about revising the metrics we use here at Baseball Prospectus—I’ve described a fielding metric and a complementary batting metric. So now let’s go about discussing some of the ways they fit together.
One of the big things we need to do when we build all-encompassing metrics is adjust for position. That’s because of the way we construct our metrics—we have offensive metrics that compare players to all other players, but defensive metrics that compare players only to other players at that position.
That makes it difficult to compare two players who play vastly different positions. Baseball fans of course know this intuitively—if you have a first baseman and a shortstop with the same batting line, the shortstop is likely the better player.
Given the nature of the problem, the most straightforward solution would seem to be to compare players’ batting to their peers at the same position. And for the most part, that approach works well.
The Mays Problem
But you run into a problem in some extreme cases. Willie Mays is one of those cases.
See, the thing about Mays is, he could have walked into Cooperstown wearing a first baseman’s glove. He was simply an astonishing hitter. It’s just that he could play an excellent defensive center field as well. The problem with analyzing Willie Mays is that he was just so superlative that he moves the baseline to which he’s being compared.
And I call it the Willie Mays problem, but it wasn’t Willie Mays alone causing it. Mays had a lot of help. Looking at the top players in games played in center field in the 1950s:
(Bold indicates players in the Hall of Fame.)
That’s a really impressive list of baseball players. You have Mays and Mantle of course—Mantle, like Mays, is a guy whose bat was impressive for any position. Ashburn, Snider, and Doby were also incredible ballplayers, though, and prolific during the '50s.
See, if we use raw positional averages, we end up asserting that the average center fielder is as good as the average corner outfielder. In the '50s, that wasn’t true—the average center fielder was a better defensive outfielder than the men in the corners, but he was also a better hitter than them.
So in those cases, using positional averages for offense falls short. What else are we to do? We can’t simply increase our sample size to wash out the noise—as I said, even in a 10-year sample, we don’t see it wash out.
What Tom Tango has proposed, and has been adopted in much of the prevalent Wins Above Replacement metrics outside of Baseball Prospectus, is adjustments based upon comparing fielding stats of players who play multiple positions.
This is going to do OK on the Mays problem—at least the one actually involving Mays—but I feel that it introduces quite a few problems of its own along the way:
In other words, what you are doing is analyzing a much smaller (and biased) population with much cruder analytical tools instead of analyzing a very large sample, all players, with very sharp analytical tools, like modern offensive metrics.
The larger problem you come to is that the distribution of defensive talent shifts over time. Comparing position switchers is a very crude way to track that—you need a lot of years of data to do that sort of analysis and you need to manually intervene in some of the analysis yourself. So it’s very hard to see where those shifts are occurring and include those in the positional baselines.
OK, but do those problems present themselves in such a way as to cause problems with our comparisons of players? I think they do, and it’s along the boundary between second and third base and between left and right field.
For second and third base, we have over 50 years of data that says that third basemen practically always outhit second basemen. And yet looking at the position switchers in terms of fielding data, what you see is they’re basically even. What are the possibilities here?
Well, the first is that there is always a Willie Mays problem at third base—always a cluster of absurdly talented Hall of Famers at the hot corner biasing our evaluation. That seems rather unlikely, given that there are no players in common between, say, the '50s and the '90s, and yet we always see the same pattern.
The next possibility is that baseball teams are just doing a poor job of allocating talent, and that they are needlessly diluting the second-base talent pool by keeping a greater portion of the good players at third base. I don’t see any hard evidence for that contention, and it doesn’t seem to be particularly reasonable.
The third possibility is that we’re simply missing something—that the model is failing to represent the underlying reality. To be perfectly blunt, I think the responsible way to do sabermetrics is to be very careful in asserting that it is our model, not reality, that is correct when the two are in conflict. And the way actual baseball teams behave, it seems like the defensive responsibilities at second are harder, and therefore teams are more likely to put better defenders there than at third.
And again you see the same thing with the corner outfield spots—although interestingly enough in the late '70s you see a shift, where before then the left fielders were generally the better hitters and after that the right fielders generally were.
So can we solve the Mays problem while still using offensive adjustments?
At first blush, the Mays problem really looks like a simple (and common) problem in determining the average of a population—outliers.
The arithmetic mean, the most common form of average, is very sensitive to outliers. There is a lot of good existing research on how to use more robust measures of central tendency—the median, truncated means, log transformations, and so on.
Imagine my frustration when I discovered that none of them were effective.
The problem? Willie Mays (and the superlative players) aren’t the only outliers. They’re not even the biggest outliers.
What you tend to find out is that the biggest outliers at a position are typically the worst, not the best, players. That’s almost entirely a function of the data set—when you’re looking at your very sub-marginal players, what you’re not seeing are the guys just like him who are playing in the high minors.
And so if you’re trying to reduce the influence of outliers, what you do is you end up curtailing the effects of the below-average players as much as you do the exceptional players, and it washes out. (Actually, you tend to raise the positional averages more than you do to lower them).
What I’ve done is to take and split the sample in half—above and below average. Then I looked at the distance between the mean of the two halves. If there is very little skew, the halfway point between those two is going to be the same average I used to split the dataset. But that’s rarely what we see—we see one half with a larger distance from the average and fewer representative plate appearances. The weighted difference of the two halves will give us the initial average, but the unweighted difference is an estimate of how the skew is tilting the average.
So I used the unweighted difference to “shift” the average, and then applied an average amount of skew-related difference back to each position so that the positional averages will add back up to the total league average.
This process is, as one might imagine, rather unstable over a single season. But over a period of nine years it stabilizes quite nicely (I chose nine because it let me focus on the four seasons before and after the season of interest, as well as the season itself.) That isn’t to say that the picture is entirely clean—looking at corrected runs per plate appearance by position over the years:
You still do see a lot of shifts (little and big) over the years. Some of them may in fact be just noise. Others may be teams shifting talent around as conditions change. I mean, honestly—I wish it was cleaner, I do. But baseball analysis can get messy sometimes, and I think this is one of those times. Best to acknowledge the mess, rather than trying to put some throw pillows over it and act like its not there.