A while back, you may recall, I wrote an article about Brett Lawrie’s rating in one defensive metric, Baseball Info Solution’s Defensive Runs Saved. My conclusion:
So let’s play around with this basic framework and see how far it takes us. It helps to have some sample data to play around with. Again, we don’t have access to the raw data underpinning DRS. For the purposes of illustrating this point, however, any zone data should do, and the most convenient source of that is from Project Scoresheet. This is what the Project Scoresheet field diagram looks like:
I have consolidated the “S” and “D” zones with the primary zone, so for instance 5D is lumped in with 5—this way we pay attention only to the horizontal spray angle of the data. So here are third baseman play made rates on ground balls for each of those zones, from 1995 (no particular reason for that year):
Shockingly, you see most of the plays made in the zone where the third baseman stands, fewer in the hole, and then almost no plays in the remaining zones. (We can talk about how range bias affects this, but for the purposes of this illustration, it shouldn’t matter much.)
So let’s say that there’s a grounder hit through the 34 zone, and Lawrie makes a play. Given these estimates, Lawrie would get credit for one play made, less the 0.0022 probability that a third baseman would make that play, or 0.9978 “Plus/Minus points.”
What this ignores is that Lawrie is not the only infielder. What’s the probability that any infielder would have made that play?
In the particular shift the Jays are playing, you may note that Lawrie can get to only balls that have already passed by the other infielders. This is true, but the other infielders are positioned based upon knowing that Lawrie is backing them up in short right field. (And on one play, I noticed the first baseman giving up on a ball and heading back to cover first because Lawrie was going to be able to field it as well). Those balls that Lawrie is fielding from short right field are not the nearly-certain hits that you would assume from comparing Lawrie to a baseline of all third basemen, most of whom aren’t fielding balls in that position not because of a lack of skill but because they’re standing by third base.
This analysis would suggest that, instead of measuring how many additional plays made at the team level the shift is contributing, it’s measuring how many additional plays are contributed to Lawrie by the shift, crediting him for some plays that other fielders would have made in a more traditional defensive alignment.
Apparently, I’m like E.F. Hutton—when I talk, people listen. Baseball Info Solutions has modified their DRS methodology:
As recently as a week ago, we were reporting that Brett Lawrie had saved 30 runs with his defense in about a half-season thus far. Given that the best third baseman in our system has never registered even as many as 30 defensive runs saved in a full season since we started keeping this stat ten years ago, this number stood out like a sore thumb. How did it get so big? Simply put, Lawrie was making plays this year that no other third baseman has made with any consistency. Ever. But it was a flaw in our system that didn't recognize the defense the Blue Jays have employed on a regular basis for the first time in the history of baseball.
While there have been a few other isolated uses of similar alignments, the Toronto Blue Jays have created a shift defense that they use against left-handed pull hitters very frequently. Call it the Lawrie Defense. As reported earlier this year, major league teams are shifting more often in 2012 than ever before. They are employing the Ted Williams Shift (three or more infielders to the pull side of second base) more than ever, and they are employing other shifts where the second baseman or shortstop is close to the pull-side of second base—but not quite on the other side—more than ever. The Blue Jays have employed the third-most defensive shifts in baseball in 2012, but the Lawrie Defense is a Toronto specialty. The Lawrie Defense is like a normal Ted Williams Shift against a lefty swinger, but the unique aspect is that the third baseman moves all the way over to where the second baseman normally plays in the Ted Williams Shift, short right field.
Brett Lawrie has been making plays in short right field. Other teams do not use this shift with any frequency and, as a result, our system was making Lawrie look incredibly good. Too good. So we adjusted our system. Our Defensive Runs Saved System (Runs Saved or DRS for short) now removes all shift plays from the calculation for individual players.
First, let us give credit where credit is due: BIS recognized a problem, and has fessed up to it and attempted to correct it. That’s good. What’s unfortunate is the way they have done so, which will give Lawrie a more palatable fielding rating but will do nothing to address the larger methodological problem that led to Lawrie’s outsized rating and continues to affect BIS’s DRS estimates in less outrageous (but probably far more vital) ways.
Let’s go back to what was causing those inflated ratings—the fact that Lawrie was making those plays in an area where the third baseman does not usually make many plays at all, and thus was receiving outsized credit for those plays. But the reason that credit was outsized should be stated explicitly—after all, these were plays that Lawrie was making which presumably helped his team record outs and thus prevent runs and win games. The reason the credit to Lawrie was outsized was because it assumes that if he hadn’t been positioned where he had been, there was almost no chance of a play being made on that ball. From the perspective of the third baseman, that’s true; from the perspective of the team, that’s absurd.
Now when you put the third baseman out in short right field, the problems with this approach become readily apparent. But that doesn’t mean that’s the only time these kinds of problems crop up. We can expect to see this methodology causing problems even on a ground ball hit in the hole between third and short.
Going back to our Retrosheet field diagram and sample data, let’s look at balls hit in the 56 zone. At the team level, the average play-made rate on a ground ball hit in that zone is 0.51. The average play-made rate for third basemen on ground balls hit in that zone is 0.35. If a ground ball is hit in the 56 zone (or I should say, if the scorer claims such) and gets through the infield for a base hit, the third baseman will be evaluated based on .35 expected outs, and the shortstop will be responsible for .16 expected outs, the way a system like DRS is set up. But if the third baseman fields the ball, he’ll get .35 expected outs—and the shortstop will get zero expected outs, since the third baseman fielded it. That’s .16 expected outs that simply vanish into thin air.
So what you end up with, using the methodology outlined by BIS, is a lower quantity of expected outs than actual outs at the league level. Thanks to the extreme example of Lawrie, we know that BIS was not attributing these missing expected outs to the fielder who fielded the ball—otherwise Lawrie would not have had the extreme numbers he had. But since DRS (and rPM, its fielding component) zero out properly at the league level, we know those missing expected outs are being redistributed. What we don’t know is how or why.
So why do these missing expected outs matter? Because at the team level, this leads to a distorted evaluation of fielders, if the missing outs are not tracked and then reallocated at the team level. Depending on which player makes the majority of plays in areas of shared fielding responsibility, some teams may have very different distributions of missing expected outs—and if those are not being properly tracked and allocated, that leads to distorted assessments of fielding for both teams and players.
As Bill James wrote in the New Bill James Historical Baseball Abstract, “Traditional fielding analysis often fails because the fielding statistics of a good team are not very much different from the fielding statistics of a poor team. This is not true of pitching or hitting. … The effect of this is that traditional fielding analysis, starting with individual fielding statistics, will usually rate a bad team as being better defensively than a good team.” The same logic seems to be applicable to nontraditional fielding analysis that starts with individual fielding statistics as well—by not properly accounting for the missing expected outs, you can end up with distorted ratings of the best defensive teams and players.
I used to believe, when I was young and dumb… okay, younger and dumber, that once you had good batted ball data, you didn’t need to do top-down defensive analysis the way Bill James did it in Fielding Win Shares. We still don’t have good batted ball data, of course, but I’m becoming more and more persuaded that if we did, we still would have to adopt the top-down approach, and shared zones of responsibility are a great example of why.
How should we handle plays in the 56 zone, assuming our location data is good? First, we should ignore individual fielders and just count how many extra plays happen in that zone. That gives us our baseline. Then we can look at which fielders are making those plays. If the third baseman is making many more plays than expected, but the overall rate of plays in the zone is roughly what we’d expect from the average team, then we have what looks like a ball-hog. If the total rate of plays in that zone is above average, though, we can say with more confidence than the extra plays being made by the third baseman are actually contributing to his team’s fielding. But we have no way of telling the two apart unless we start with the team-level analysis.
Throwing out shift plays will get rid of the most extreme examples of the player-first methodology causing problems with defensive ratings, but it doesn’t solve the underlying problem, which is sadly still present in BIS’s DRS. Without more information from BIS, it’s difficult (if not impossible) to determine how those missing expected outs are being distributed. But unless they’re being distributed based on a team-level overview of fielding performance, they’re causing very real distortions in the estimates of fielding performance. And while those distortions affect no one player as much as they did Lawrie, they affect many more players than Toronto’s shifting has.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.Subscribe now
Also the shared zone "problem" is a lot less of a problem when you divide the infield into single degree slices, 90 of them, rather than the 8 (!) in the retrosheet diagram you display here. Also, splitting each into six velocity/timer buckets, as BIS does, helps to further segregate plays.
As for the impact of smaller zones on the shared zone problem, you'll have fewer shared zones that way, yes, but you'll still have them. The Retrosheet numbers were merely to help illustrate the point.
IMO, what we have is pretty good and worthwhile for a lot of analytic purposes. It's just not clear at what level the usefulness of the data breaks down.
We know that there legitimate questions about fly ball / line drive classifications that I don't think have been sufficiently addressed in most of the data.
We also have seen strong evidence of range bias in the past (e.g. Shane Jensen's paper) that BIS claims to have fixed - but I'm not aware of anyone validating that claim.
I do think the combination of Pitch F/X, Hit F/X and Field F/X could probably do wonders for our knowledge of the game, but unfortunately we'll probably never see the latter two (I'm holding out hope for old data in a few years).
Anyway, my main point is that if you can't identify the biases, nor how to account for them, I don't see how you can feel comfortable using the data.
It's an (unstated) disagreement I have with Tango a lot of times, when he argues that having more data is inherently good. I'm more inclined to argue that data we can't judge the quality of is little more than noise.
I agree the batted ball data isn't perfect, and we all should continue to point that out as appropriate. We can even throw in some snark if that's our writing style.
But persistent trends can be spotted and interpreted in the data. We can have a reasonable level of confidence saying that that so-and-so tends to hit more line drives. We can't be 100% confident, but so what? As long as we interpret the data correctly, we've gained something.
30 years ago, we had nothing. I can't believe how spoiled people are! ;)
It seems like it might be an issue where we would have to expand the non-existent confidence intervals around these fielding numbers moreso than throw out the results like with Lawrie. If that's true, than the bigger issue is getting people to consistently incorporate those confidence intervals than to know that the interval should be +/- 6 runs instead of 5 or whatever.
There are two kinds of biases:
- random variation
An example of random bias would be that someone mis-entered all teh fielding data for one month, and sometimes marked Beltre as the 3B, and sometimes Zimmerman, and sometimes Longoria, etc. Just no rhyme or reason to it. That data is pure junk (i.e., noise). We end up with 5/6ths of the data being valid and 1/6th of the data being junk. Even knowing that we have that situation, the overall data is still good. It simply adds a level of uncertainty.
An example of a systematic bias is if data is always recorded differently if the fielder is named Deter and if another fielder is named Bryan. In this case, more data is WORSE because the systematic bias will rule the results.
So, the question we have is how much systematic bias is there in the data. And can we somehow account for it?
For example, Coors is a systematic bias if we look at Todd Helton's hitting stats. But, if we can account for that, if we know that Todd Helton in fact hit 50% of the time at Coors, and if we know how he did at Coors, then we can handle that.
The issue with the fielding data is that we're not entirely sure how much systematic bias there is. Does that mean that because we don't know, we should just throw all the data away? Well, you can argue that. You can also argue that you can still use the data and increase your uncertainty level.