July 21, 2010
Checking the Numbers
To Subtract or Divide
In this day and age, baseball players are defined by their statistical attributes much more than they were a few decades ago. That isn’t to say that stats rule all by any means, but rather that teams are starting to be built with more of an eye toward numbers than in the past or at least with an eye toward numbers that provide more information. We have witnessed the defensive revolution. This past offseason, not only did the Red Sox make a conscious effort to bring aboard the darlings of fielding metrics—Mike Cameron, Marco Scutaro, and Adrian Beltre—but teams shied away from the likes of Jermaine Dye, who averaged 33 home runs and a .279/.347/.528 line over the last four seasons, because his overall contributions were not in line with his asking price. And last offseason, the glut of hard-hitting but poor-fielding corner outfielders suffered financially; it’s hard to imagine players with skill sets similar to those of Adam Dunn and Bobby Abreu being offered so little even just a few years ago.
Simply put, with decisions hinging upon some of these numbers, it is imperative that users of the information not only utilize the appropriate toolkit but that they develop a solid understanding of why certain metrics are used. You don’t want to bring a knife to a gunfight, but on a more granular level, it also isn’t smart to bring a butter knife to a cleaver battle if such things exist. My favorite television show growing up was "The X-Files," so it should come as no surprise that my goal as an analyst has always been to spread the truth in whatever way possible. My goal today is to use a topic I recently wrote about as well as a couple of the ensuing comments to revisit what numbers we might use in a specific situation as well as why that specific number is used.
Over the last two weeks I have written about Cliff Lee’s strikeout-to-walk ratio, breaking down the metric itself, comparing his current rate to the single-season highs over the last few decades, and comparing his current rate to the rates of others through a similar point in the season. The articles found that nobody has ever had a K/BB ratio as high as his currently stands this deep into the season and that it would take a relative implosion—given where he is, a ratio of around 4.50 would be considered implosive—for Lee to not break the single-season K/BB record with a 150-inning minimum set by Bret Saberhagen at 11.00 in the strike-shortened 1994 campaign.
In the more recent piece, I took the 10 highest rates at a similar point in time and found their respective rates from that point forward. David Wells, at 14.50 on June 27, 2003, had the highest non-Lee rate, with Curt Schilling’s 13.64 on June 14, 2002, coming in second place. The dichotomous nature of how these two pitchers arrived at their rates tended to prove the shortcomings of the K/BB ratio in general, as Wells’ 14.50 consisted of 58 strikeouts and four walks, while Schilling whiffed 150 and issued just 11 free passes. Wells avoided walks but so did Schilling, and the latter punched out three times as many hitters. In other words, Wells posted the higher rate, but the inputs to Schilling’s rate added more value.
Value is the key, as the goal of most evaluators and analysts is to advise or discuss players in terms of the value they can add to a team. Wells issued a minuscule number of walks, but Schilling prevented many, many more hitters from doing harm to his team by preventing them from even making contact. Value can be tough to measure as well because some of the more telling numbers are not stable, meaning that they fluctuate from year to year. A pitcher with a 3.65 ERA in 2009 isn’t a lock to come anywhere near that mark in 2010, and so predictivity—a word I’ve wanted to use for a while but over which I was afraid to bypass the red squigglies—plays a major role in assessing value.
A player who has a solid season derived from numbers likely to repeat is more valuable than, say, Kyle Kendrick’s 2007 season, when he posted a sub-4.00 ERA but ended up with very poor peripherals. All of which brings us back to the Wells vs. Schilling conversation that surfaced in the comments of my articles. Several readers pointed out the very fact that Schilling produced a lower rate but added more value. It was also noted that the correlation of K/BB from one half of the season to another across the 10 pitchers tabled was irrelevant, while the correlation of the K-BB differential was quite high.
The differential simply subtracts walks from strikeouts, so even though Wells has the higher rate, his +54 paled in comparison to Schilling’s +139. The correlations were important because if the goal is to measure a specific aspect of performance—limiting walks and whiffing batters in this case—but one metric or rate proves more telling than another, we want to use that more informative rate. The K/BB ratio is more commonly used because of the familiarity associated with it, but if K-BB, or (K-BB)/PA is more predictive, then it is a better indicator of value because it offers more assistance in the decision-making department.
Is it a better indicator of value? If so, we would expect it to correlate with a common value-laden metric more than the K/BB ratio. To that end, I pooled every pitcher with 150 or more innings in a season from 2000-09 and ran a correlation measuring the relationship between (K-BB)/PA and ERA, while repeating the test for K/BB and K/UBB in order to see which bond proved stronger. Of the 960 pitchers from 2000-09 with 150 or more frames in a season, the r for K/UBB to ERA is -0.48, for K/BB to ERA is -0.49, and for (K-BB)/PA and ERA is -0.58. For those wondering whether or not these numbers are significant, the answer is yes because, in the context of baseball, correlation coefficients above 0.45 tend to matter more than they would in many other fields.
The inverse relationship suggests that as one goes up the other goes down—higher differential leads to a lower ERA and vice-versa—and the strikeout and walk differential clearly wins out in this regard. Backtracking to Wells and Schilling, the comments indicated that the split-half correlation comparisons between the differential and the ratio favored the former more than the latter, which is important because the differential has a stronger relationship with a common metric used to assess value.
With that in mind, comparisons of Lee’s strikeout and walk prowess at this juncture should involve finding whether or not anyone else, at a similar point in a given year, had a differential as vast if not more. Those hypothetical pitchers would then be used to potentially determine expected values for Lee over the remainder of the season. Currently, Lee has thrown 121
And how they did from that point forward?
OK, OK, so there isn’t much variety here, which just goes to show how rare it is for pitchers to exhibit this type of dominance in the comparison of strikeouts and walks. Additionally, the top 10 differentials and rates in the first table make Lee’s +90 and .189 look like puny, little girly-men. Though only three pitchers show up here, the aggregate rates do not fall that much over the second half, which is to be expected if we assume that there is a high correlation between first- and second-half differential. The numbers up to are better, for sure, but the decline from that point forward is nowhere near as significant as, say, David Wells falling from a 14.50 K/BB ratio to a 2.69 in his second half.
Anyway, the point here isn’t necessarily to showcase what might happen to Lee moving forward but rather to discuss, with the help of examples, why we use certain numbers and how we can choose the appropriate weapon from our arsenal. The goal is to determine value more often than not, and in order to do that, it is more helpful to find numbers that are likely to persist over the course of a season and into the next, and share a strong bond with numbers generally assigned to value. When discussing strikeout rates and walk rates, it is apparently more informative to use some form of the strikeout and walk differential, as it encompasses what the strikeout-to-walk ratio attempts to provide, while also accounting for the shortcomings.
Lee might still set the record for the single-season K/BB ratio with 150 or more innings, but over the last 40 years (1970-2009), the highest differential belongs to Randy Johnson’s 2001 season—a +303 differential. Color me skeptical Lee reaches that threshold. This does not take anything away from his fabulous season, but it lends itself to the idea that by obsessing over his ratio, we are actually asking the wrong questions. Rates and raw tallies have their places to be used, but hopefully this sheds light on when to use different types of rates and how to decide.