World Series time! Enjoy Premium-level access to most features through the end of the Series!
November 23, 2009
Checking the Numbers
The increased relevance of defensive metrics in recent years has led to a bevy of cost-cutting activity across the league, as teams are beginning to exhibit a greater understanding of the value of saving runs with the glove. Solid defense is tantamount to success, but it does not translate to deeper bank accounts like the mighty whopping stick. Even amongst some statistically savvy fans, offense garners more value than defense for reasons like the perceived irrefutability of batting data in comparison to fielding subjectivity, how the goals of fielding metrics seem to be more abstract than the what-you-see-is-what-you-get numbers on offense, and how advanced defensive statistics are still fairly new additions to the baseball vernacular. Subjectivity creates doubt, which leads to distrust and skepticism, making it difficult for some to wrap their heads around how a relatively mid-pack player Mike Cameron (4.8 WARP1) could actually have been more valuable than a masher like Jason Bay-4.4 WARP1.
The basic reason for this ranking reversal is that most of Cameron's value is tied up in his value on defense. The doubt surrounding how valuable that is, however, leads to "yeah, but…" statements issued with the intention of knocking his performance down a few pegs. Admittedly, statistics like UZR, FRAA, and Plus/Minus should not be treated as gospels, but their different, rigorous methodologies paint fairly accurate portraits of player value, and are worthy of acceptance as evidence. Changing everyone's mind on the subject will not be achieved overnight, however, so to expedite the process, it's worth exploring the reasons for doubt as well as why the data can clash with personal opinions of fielding prowess on occasion, because in the end subjective evaluations seem to comprise the leading source skepticism thrown in the way of advanced fielding metrics.
The aforementioned statistics are not natural analogs for commonly used numbers like batting average or home runs. The outcomes are not binary, but they are much more straightforward and easy to comprehend. The defensive equivalent in this regard is fielding percentage: the play was made cleanly, or an error resulted, with no gray area in between. [Ed. note: In the official scorer's opinion.] We know that fielding percentage is practically worthless as an evaluative metric because errors are not always bad-perhaps a shortstop records an error on a ball that half the league wouldn't even reach-and due to several other factors looming large in the nuts-and-bolts foundation of what constitutes a solid fielder, that gray area matters more than anything else. Other factors interfere as well, specifically involving the exclusion of baselines.
For instance, a .280/.380/.540 slash line looks fantastic, and it will be treated as such regardless of whether or not the league average was .260/.360/.515; evidence of this phenomenon can be found in Marc Normandin's fantasy articles involving first basemen, where the replacement level for the position is so high that hitters deemed superb in a vacuum do not rate all that highly once they're evaluated within the appropriate context. Certain statistics normalize data for these situations, but not everyone measures performance from a relative standpoint. Even if they did, it is much easier to conjure up an image of an average or replacement-level hitter than to qualitatively figure out what an average fielder looks like, making difficult the task of pinning down what we even want to measure and how to structure the results.
The Victorino/Beltran Conundrum
Simply put, perception of effort is directly proportional to opinions of fielding prowess. Shane Victorino is widely regarded as a very good fielder, taking home two consecutive Gold Gloves-insert your own generic anti-Gold Glove comment here-while boasting the reputation of a high-octane speedster who covers plenty of ground. Unfortunately, these characteristics overstate what he brings to the table, especially in relation to Carlos Beltran, who covers more ground (at least before last season's injuries) but in a seemingly nonchalant fashion. If each player starts in the same position and ranges to field an equidistant fly ball, Victorino's hustle will create the illusion of a better play. If Beltran appears to have had little trouble reaching the ball, the play looks routine, and our minds tend to veer off into the wrong direction, assuming that Victorino's ball was tougher to reach. By virtue of that, he made a better play.
In actuality, Beltran can travel greater distances with less effort, but his catch appeared mundane and no different from the dozens of fly-ball outs a fan sees each and every game, becoming easily forgettable in the process. Spatial recognition issues based on the television angles presented and the areas to which attention is drawn preclude us from grasping exactly how much distance was actually traveled. When this occurs, the backup generator in our minds reverts to assuming equal talent amongst those being compared, using effort as the tiebreaker in judging the quality of a play and player.
Things should work in the opposite fashion, with a realization that Victorino needed all of that extra effort to stand a fighting chance of recording an out, whereas Beltran put little doubt in the minds of anyone as to the play's eventual result. Had Victorino made a diving catch, his actual range would be immaterial; a diving play is perceived to be great no matter what. In relation to his own abilities, the catch could be considered great, but the fact remains that he had to work harder to glove the ball. Beltran naturally reached the spot at which the ball would descend and should be lauded for having displayed such great range.
With that in mind, when Victorino does not rate as highly in a specific advanced fielding metric, it annoys fans who swear he ran his heart out, covered ground, and made fantastic plays, losing sight of the fact that he might have made certain plays look tougher than they were. This has the compound effect of making the balls he cannot reach seem unreachable for everyone else, an inaccurate extrapolation based on a faulty initial assumption.
Victorino and Beltran were merely examples here, so try not to get too caught up in the specific names and focus on the concept itself. Overall, as our perception of effort increases, so too does the perceived fielding value of the player.
Lost in Isolation
Say that a sharp one-hopper is hit down the first-base line and Mark Teixeira reacts quickly, dives to his left and snares the ball, making a great play on a hard-hit ball. From an absolute standpoint, little doubt exists that he made a play worthy of applause. Neglected at the time of the play is the realization that several other first basemen would have delivered the same result. It becomes a tough task to dispute the merits of T-Rex's play, because fielding more than anything else tends to be evaluated from an absolute and not a relative standpoint, even though the metrics bear relative results. Whether or not Derrek Lee would have yielded a similar result becomes irrelevant because Teixeira actually did make the play, preventing an extra-base hit in the process.
This particular reason for metric distrust lends itself to choice, really, in that there is nothing wrong with evaluating fielders in the absolute, crediting a play regardless of whether or not others at the position would have repeated the act. Absolute and relative measures boil down to whether the interest lies in gauging fielding talent as a whole, or rating members of a group against one another, not a straightforward exercise since fielders can have an ample talent supply while posting below average fielding marks. If Elvis Andrus, Jimmy Rollins, Yunel Escobar, Jack Wilson, and Adam Everett were the inputs to a UZR or Plus/Minus type of metric, the relative basis of comparison would peg one or two of these terrific fielders as below average, simply due to the relativity.
Reconciling the fact that a below-average fielding mark does not always equate to a below-average fielder proves difficult given that the opposite can be applied to various walks of life; it's akin to getting a 90 on a test-a very good score-but one that pales in comparison to the 96 class average. Teixeira's below-average marks this season do not necessarily mean there's a lack of ability, but likely point to some confluence of an improved relative baseline, the inherent small sample of one year, and his own performance.
The Holliday Corollary
One year of fielding data is not nearly enough off of which to base definitive conclusions given the small samples of balls in play in the various fielding bins. It should go without saying that one month or one series represents a minuscule sample, right? Then why has it become commonplace to extrapolate one or two series' worth of fielding performance out over an entire season or career? Matt Holliday made a couple of boneheaded plays during the 2007 postseason, one in particular in which a playable fly ball soared over his head. He also made the heavily criticized fielding boner in the 2009 NLDS against the Dodgers, although lighting earned a share of blame.
For those who do not follow Holliday on a regular basis, it is almost impossible to fathom that fielding systems rate his efforts positively. I mean, we saw five games over a three-year span, and he stunk in three of them, so he must be bad, right? That's very much not the case, but while this feels so obvious to some, it can be incredibly tough to avoid and can eventually evolve into a form of diagnosis bias, which rarely produces positive results and can set things back without any realization. As we will get into in a bit with a peer of his, in more ways than one Holliday's burly physique and those blunders can make it difficult for some observers to subjectively consider him a solid glove man.
My Buddy Says…
I have never seen an episode of The Wire, but seemingly everyone on the planet that has spent time watching claims it is one of the greatest television shows of all time. When the time comes to sit down and binge my way through the DVDs, it is more than likely that I will attribute positive connotations to certain aspects of the program, aspects that might not stand out in other situations, because of the expectations and the idea that so many people in agreement could not be wrong. Either that, or my subconscious will steer me in the direction of a personal vendetta in which the opposite occurs and I purposely look for reasons to hate the show. If a fielder develops a sterling reputation, human nature chimes in with a glass-half-full approach; plays he cannot reach are considered unreachable, and those he gets to, even of the most routine variety, are lauded.
This form of diagnosis bias was definitely perpetrated in the 1980s, the golden age of San Pedro de Macoris in the Dominican Republic, which suddenly developed the reputation of being the wellspring of top-tier defensive shortstops. Suddenly, the league was overrun with the likes of Tony Fernandez, Manny Lee, Manny Alexander, Rafael Ramirez, and Jose Offerman. Regardless of their individual fielding merits, teams became interested in players of this ilk because SPdM had been diagnosed as a great provider of defensive relief. Seriously, what are the odds that five big-league starting shortstops in the '80s and early '90s all came from the same Dominican city? Essentially, my contention here is that some of our opinions on fielders are not our opinions at all, but rather a combination of what others have publicly written or stated, and that this door swings both ways, with the mind either subconsciously buying into the idea or vehemently opposing it.
An oldie but a goodie, Nichols' Law is the shorthand for the Nichols Law of Catcher Defense, a theory proposed by Sherri Nichols on rec.sports.baseball in the early '90s. The theory goes that the defensive reputation of a catcher is inversely proportional to his offensive abilities. In other words: Paul Bako has to be a great defender because that is the only feasible reason a hitter as putrid as he could enjoy an 11-year career. The law can certainly be applied to other positions; James Loney comes to mind, since he looks like a slick fielder and has taken steps backwards in terms of offensive development. If he were more of an offensive force, discussions of his fielding would not come up as frequently. Inversely, Albert Pujols is one of the best-fielding first basemen in the game today, and yet he is seldom mentioned for this facet of his game, aside from compliments in passing. Of course, that might be partly caused by the next course in our fielding skepticism meal.
Body by Bay
Jason Bay looks like a fit and athletic guy, and he doesn't make a ton of overt mistakes in the field, leaving legions of fans puzzled after referring to any one of several advanced defensive metrics. They're left swearing that the results must be confusing Jason for director Michael Bay, though the fielding ineptitude of the latter would be excused given his threats to add in CGI explosions and Shia LaBeouf whenever a ball comes his way. Regardless, it has become fairly obvious that even when accounting for the Green Monster, Jason Bay cannot field, a hard assumption to grasp given that his below-average ability deals more with his poor range, not dropped balls. Of course, there's also the problem that he looks like he should be able to field; if he bore a resemblance to Rich Garces and played his terrible left field, nobody would blink at the suggestion that he's a bad fielder, but expectations can come from body type and perceived athletic ability. Big-yet-fit players like Holliday and Pujols generally will not get much credit, that while we automatically assume that players like Joey Gathright and Chris B. Young bring the leather.
The Halo Effect
The Halo Effect is a combination of the perceived effort and the diagnosis bias, in that players who develop reputations tend to keep those reputations well past their expiration dates. Torii Hunter will never be considered a poor fielder even when he becomes a poor fielder, because he constantly shows up on SportsCenter, plus he can still move around out there and make impressive plays, and besides, he's a great ambassador for the game, and robbed Barry Bonds of a dinger in that All-Star Game several years back. Even though the numbers suggest Hunter has been steadily declining defensively for a few seasons, polls revolving around the top defensive center fielders amongst the broader fan base will undoubtedly showcase Hunter towards the top of the list. This phenomenon can also manifest itself in forms of potential, wherein sloppy fielders with oodles of raw tools are consistently given more chances to harness their talent.
As you can see, there are very real and definitive reasons for why fielding metrics garner skepticism and much will require bypassing before they grow in popularity. These reasons also don't solely apply to less statistically-oriented fans, as I'll readily admit it felt odd to type the Cameron/Bay comparison at the top of this piece, even in the face of my staunch support of UZR, Plus/Minus, FRAA and various other metrics. It isn't so much that I do not trust the outputs, but rather that the tiniest bit of doubt-especially given the error bars and how there really isn't much difference between a +6 and a +10 fielder-tickles my interest bone much more than a comparison of their respective OBP/SLG rates. With the increased granularity of readily available information, defensive metrics will continue to be fine-tuned, becoming more accurate in the process, and while it will definitely take time before they garner mainstream acceptance, understanding the causes of a problem is always the first step towards its fixing.