Happy Thanksgiving! Regularly Scheduled Articles Will Resume Monday, December 1
March 7, 2003
Getting Defensive: Advanced Concepts
In Part One, I took a walk through the big fielding stats: errors and fielding percentage, Range Factor, and Defensive Average/Zone Rating. Here, we'll talk about three of the more advanced fielding statistics.
In 1984 the coolest book ever was published, The Hidden Game of Baseball. Written by Pete Palmer and John Thorn, The Hidden Game gave birth to Linear Weights, a system that translates each event on the field into a value in runs. Linear Weights would go on to become the basis of Total Baseball's signature stat, TPR, which is derived from a combination of Batting Runs (offensive performance) and Fielding Runs (yep, you guessed it, glovework).
Fielding Runs uses different formulas for different positions to come up with a number for each position that's then compared to the league average for that position. For example, a first baseman's Fielding Runs might be calculated as .20(2A-E), which should raise an eyebrow immediately: If a first baseman tosses the ball to the pitcher every time because he's too slow to get to the bag, he gets credit toward a Fielding Run. But if he takes it back himself, he doesn't. And don't think that this doesn't happen in real life--as Clay Davenport noted in Baseball Prospectus 2002, Bill Buckner is an example of someone who sees his Fielding Runs driven through the roof because of this. In contrast, if a first baseman has good range and hands, and snags a lot of line drives for outs, he would get no credit for those putouts.
Then Fielding Runs divides up the credit for those runs (or lack thereof) through unspecified technical means involving playing time as measured in plate appearances and "each player's entire fielding record." Total Baseball offers the formula for calculating an infield player's contributions as:
FR = .20 (PO+2A-E+DP) by player - LgAvg * (PO, team-K, team) * (inn. by player / inn. by team)
The other problem is that Fielding Runs doesn't work that well. We can say that a run is bad and an out is good. But say that in an average game, 4.5 runs are scored. Does that make each out worth 4.5/27? Why are the different events valued so differently? Should assists be valued so highly because, as Total Baseball puts it, "more fielding skill is generally required to get one than to record a putout"?
In the same vein, Fielding Runs suffers from the same issues of being greatly affected by pitching staffs and park factors I talked about in the first installment of this series. From the first essay, we talked a little about the difference in groundball staffs, and particularly how Yankees pitching produced 200 fewer ground balls than did the Orioles, and evenly distributing those grounders could easily mean 50 extra assists for Orioles shortstops (Bordick, etc.) over someone like, say, Derek Jeter. That's nutty.
As I did in the last installment, I took a rough cut at looking at Fielding Runs as they correlated to runs allowed for last year's teams. Team Fielding Wins came out barely better than team fielding percentage (-.404 to -.398). I'm not really surprised by this. Fielding Runs are, after all, built on the same component parts as fielding percentage, even if the more complicated math puts Fielding Runs in a different context.
Recent developments have taken a markedly different approach. Baseball Prospectus has for years published Clay Davenport's Fielding Translations, which begins by defining an overall value for the team and then dividing it up between fielders and hitters. In BP 2002 there's a long, detailed explanation of all this that I'm simply not going to do justice to. Fielding Translations assigns responsibility for hits 70% to fielders and 30% to pitchers; home runs, bases on balls, hit batters, and strikeouts are all assigned to the pitcher, and then these stats are crunched to produce Equivalent Averages for the fielders and pitchers. They are calculated as:
And then, after all those individual numbers are put together, they're totaled and compared to the number you "want" to come up with, and then they're all tweaked proportionally until it comes out right.
I've always been uncomfortable with that last bit. It's a matter of taste, but any time you're trying to arrive at a conclusion, and invent a tool to reach the conclusion you want, I don't like it. I'd almost rather have the numbers not match up to the team totals and chalk it up to random variance than do that last massaging step. That said, I believe Clay's fixed that for future translations.
Anyway, Fielding Translations are much, much more complicated, but they incorporate a number of improvements--assigning balls that two fielders can take, the ground ball/fly ball distribution, and adjustments for lefty/righty pitching staffs (which results in balls to different sides of the field). It's a seven-page explanation that easily could have run much longer. The end result is similar to Fielding Runs: a number of runs the player was worth above or below an average fielder at his position over the course of a year.
The biggest flaw I see in FTs is the 70/30 split in hit responsibility. Voros McCracken's initial proposal was that pitchers have no effect on whether balls in play go for hits or not. The debate is still going on, but I think that percentage is lower than the FTs use, for starters, and adding 10% or 20% more responsibility to the fielders would make a huge difference. There's always room for incremental improvements in the positional formulas, too.
How do Fielding Translations fare in terms of correlation? Well, using the raw EqA formulas as detailed in BP 2002 and no park adjusted data, I calculated the team fielding lines and got a correlation over .900--which is OBP-to-runs-scored territory. Standard disclaimers apply here: These aren't park-adjusted team lines supplied by Clay, they're me clunking around on a spreadsheet while drinking Olympia in cans (look, if you're going to drink cheap beer, it might as well be not-that-bad cheap local beer, as long as local breweries still exist). That said, that DTs work on a team level isn't particularly instructive, because they start from the top divvying up the runs. It's the assignment on an individual level where it would break down, if it does, and because we don't have a good measure to compare it against, the only conclusion we can make for sure is that all of a team's fielding DTs will add up to a good number.
The other big hitter in this debate is Bill James' Win Shares. You may have seen the huge version in his book Win Shares, or the condensed version in the New Bill James Historical Baseball Abstract, which includes fielding shares. Right off, it starts with the same problem I had with Clay's little massage at the end of the process. Here's how this breaks down: Each team gets three Win Shares for a win in a given season, and those Win Shares are assigned to hitting and defense.
It's a similar approach to Clay Davenport's work on defense. Or, to quote James briefly, "There is no natural relationship between individual fielding statistics and team success. How do you fix that? [...] Establish the overall defensive quality of the team. Then you transfer credit (or blame) for that performance to the individuals on the team..." (James, NBJHBA, p. 354).
From there, defensive Win Shares are divided between pitching and fielding. I'm simplifying, again--there are formulas for the division and assigning of Win Shares, and limits on how many Win Shares any component can be assigned to (and, again, if you don't like the way your numbers come out, I don't think you should place arbitrary limits or fixes, you should be re-visiting how you got to the number you don't like, or thinking about whether you should like that number after all). Then the Win Shares trickle down to positions and to the fielders who played that position using Claim Points.
The Claim Points system for each position then evaluates positional defense based on their accumulation of counting stats (with some adjustments) in weighted categories, compared to what they'd expect from an average positional defense. Then these are turned into points, the points into percentages, and then re-weighted by values James chooses based on the defensive responsibility of each position. Then the Win Shares are divided among the positions, and then among fielders.
Taken as a whole, Win Shares does things I don't want to see in a measuring statistic: It pushes wins around, and it contains numerous apparently arbitrary weightings and measures, categories, and formulas.
I don't have a correlation between team fielding Win Shares and 2002's raw runs allowed. I ran out of time, and to be totally honest, I wasn't that motivated, because at the team level it's going to have a high correlation like the DTs did. When you're divvying up actual team success, you've got to go pretty badly wrong to have something show up as a weak correlation. It's the nature of correlation. For instance, let's say that I devise a new stat called "Super Factor X" which is responsible for a small percentage of team fielding. Then I divide the runs a team allows three ways: 41% pitching, 58% fielding, and 1% Super Factor X. Because it's a percent of the overall runs allowed, it's going to have a 100% correlation with actual runs allowed--teams that allow more runs will have a high Super Factor X score, even though Super Factor X may be completely bunk.
No one argues that we should shift through offensive statistics by comparing how different batters looked at the plate. But that's almost what we're reduced to doing with defensive statistics: To see if Win Shares, or fielding DTs, do a good job, we have to compare them to individual-level stats that aren't very good themselves, and our own flawed eyes.
Between the three of these advanced stats, I personally look at the Fielding Translations. I'm biased, obviously, but I think Fielding Translations are as close as we have to a complete and reasonable evaluation of defensive contributions. That doesn't mean I don't disagree with them frequently, but I feel like I'm evaluating bosses or political systems: "You're the least worst option I've ever come across."
Where do we go from here? I'm convinced that somewhere there's a good enough and simple enough stat like OBP the average fan will be able to use on fielders. It's why I spend a lot of time looking at how many outs are generated off fly balls by team outfields, and things like that, when I'm not tied up trying to answer my always full email queue. The solution's got to come from somewhere.
At the same time the debates on fielding/pitching responsibility and how to cope with issues like righty/lefty ground/flyball pitcher distribution result in the constant refinement of measures like Fielding Translations. Maybe the solution is indeed in finer and finer work to combine the level of responsibility of rate measures like ZR with difficulty adjustments and team context.
For now, I believe that team scouting can match and even beat statistical analysis. Teams investing in quality video setups could conceivably compile a play-by-play database that would include precise ball location, fielder position at ball contact, type of pitch, batter handedness, time from contact to fielder, ground covered from initial positioning to ball...oh, I'm drooling just thinking about this. Even with minimal sorting, teams can see how a shortstop played going to his left over the season, for instance, or how he played line drives, and from there work with the player, or change his positioning. No stat today offers anything as good as a quality video department can.
And at a lower level, these are the things a good scout sees--the quick first step, the ability to cover ground, to take the best path to balls, to make plays on balls others don't even get to. In some respects, evaluation by scout eyeballing is farther ahead than mainstream fielding. Scouts in the minor leagues will tell you that they're much more interested in how many balls a player reaches than how many errors they make. But every Sunday in the expanded sports section, we get errors and fielding percentages.
Strides have been made, but the long road remains ahead.