October 9, 2007
On Awards and Statistical Tools
I was completely intent on finishing up with our discussion of umpires today-based on the execrable officiating in last night's game, the men in blue deserve every bit of scrutiny we can focus upon them-but then I remembered something that's a bit more urgent. The Internet Baseball Awards seem to sneak up on me every year, and this season was no exception. Voting runs through this Friday, and if you haven't voted yet, you should absolutely take advantage of the opportunity to let your voice be heard in what some regard as the best year-end awards in baseball.
So what we're doing this week is looking at some tools that can help you when you're filling out your IBA ballots, or listening to the inevitable arguments that will break out next month when the baseball writers hand out their "official" awards. IBA voting functions in much the same way as the voting for the awards handed out by the Baseball Writer's Association of America, with respondents submitting ranked lists of the top candidates for each league's most valuable player, top pitcher, rookie, and manager.
One common pitfall in awards voting is excessive reliance on a single metric-particularly one that doesn't account for context. Perhaps the most disproportionately influential metric to the Cy Young Award-voting public is wins. I blame the overarching influence on the statistic's seductive name: after all, baseball is a game, and the object of the game is to win. What then, could give you a better impression of a pitcher's performance than a statistic called "wins"?
It sounds like unimpeachable logic to tell someone that a guy with a lot of wins is a winner, and that, therefore, the guy with the most wins is the best at winning. But if you're of a more analytical bent, you might want to look past the name to see what the wins statistic describes. The definition of the stat tells us that a starter who gets a win is someone who pitched five or more innings in a game, left that game with his team leading on the scoreboard, and that his team went on to win the game. When you look at those criteria-particularly in the context of the modern game, where the bullpen has a greater influence on a game's outcome-shouldn't the statistic be called something a little less impressive-sounding, like leads? I mean, that's the main thing the "wins" stat tells us, that counts on the pitcher's performance-the person who won left the game with his team leading. It doesn't tell us whether the lead was small or large, or whether the fellow actually pitched well or not.
Now, we have a variety of tools here that can be used to judge pitching performances. Value Over Replacement Player (VORP), which we discussed in this space a few months ago, can be used to assess the performances of starters and relievers, but there's also RA+, a statistic that places the runs allowed by each pitcher on a fixed scale, and adjusts for park effects. We've also previously discussed starter- and reliever-specific tools like SNLVAR and WXRL and their variants, which can be used for such comparisons.
One tool, which we haven't discussed previously, that you might use to evaluate pitchers is the Pitcher's Quality of Batters Faced Report. This report gives the cumulative season stats of all the hitters a given pitcher faced. For example, it's interesting to note that among AL ERA title qualifiers (162 or more innings pitched) Cleveland ace C.C. Sabathia had the lowest quality of opposition batters, by OPS-738, while the 2007 AL average was 761.
Another tool-not available as a report on the stats page-is a support-neutral riff on the quality start. By looking at SNLVA, we can determine whether a pitcher gave his team a better-than-even chance at winning any particular game he started. This has advantages over the regular quality start, which uses a static pitching line (six or more innings, with three or fewer runs or earned runs allowed) that doesn't adjust for home ballpark or for strength of lineup faced. Here are the top ten or so in each league in what we'll call SNLVA QS:
National League Pitcher GS SNLVA QS SNLVA QS % Jake Peavy 34 26 76.5 John Smoltz 32 25 78.1 Tim Hudson 34 24 70.6 Adam Wainwright 32 23 71.9 Bronson Arroyo 34 22 64.7 Tom Glavine 34 22 64.7 Greg Maddux 34 22 64.7 Brad Penny 33 21 63.6 Carlos Zambrano 34 21 61.8 Joe Blanton 34 21 61.8 Ted Lilly 34 21 61.8 Brandon Webb 34 21 61.8
American League Pitcher GS SNLVA QS SNLVA QS % C.C. Sabathia 34 25 73.5 Gil Meche 34 24 70.6 Josh Beckett 30 22 73.3 Fausto Carmona 32 22 68.8 John Lackey 33 22 66.7 Andy Pettitte 34 22 64.7 Dan Haren 34 22 64.7 Erik Bedard 28 21 75.0 Kelvim Escobar 30 21 70.0 Chien-Ming Wang 30 21 70.0 Miguel Batista 32 21 65.6 Justin Verlander 32 21 65.6 Johan Santana 33 21 63.6
The same kind of crutch that the wins statistic is for Cy Young voters, RBI tends to be for MVP voters. Now, I'm no hater of the run batted in, but the RBI certainly isn't a balanced measure of a player's offensive excellence. One way to put those RBI totals in context is to check out the RBI Opportunities Report, which will show how many ducks the batter had on the pond when he came to bat, and how efficient he was at bringing those baserunners around to score.
Beyond RBI, other challenges when evaluating MVP candidates have to do with context. Since the Rockies made the playoffs for the first time in a decade, voters might want to consider the effects of altitude-something you can do by looking at the normalized ("translated") pitching and hitting statistics on the Rockies' Davenport Translations page. While we're looking at the Davenport Translations pages, I'd be remiss not to point out EqA-a rate stat that takes all manner of offensive contributions, including baserunning, into account, and also adjusts for ballpark effects-and WARP, yet another statistic we discussed earlier this season.
WARP is one tool you can use to help you consider the quality of a player's performance on defense as well as offense, which helps in, say, a comparison of a third baseman who was historically good on offense and historically bad on defense to a shortstop who was quietly very good at both (I'll let you guess who I'm talking about, here). Now, the defensive metrics we use in WARP are based on pure statistics, not play-by-play data-which makes them better for historical comparisons than for present-day matchups. Luckily, there are a number of play-by-play based metrics out there you can try out for comparison.
The final piece of your IBA toolbox is our reports for rookies. The most easily accessible rookie reports on this site are the VORP for Rookie Pitchers/Position Players Reports. These reports function exactly like the normal VORP reports, just they are populated with players who meet the rookie playing time requirements. Sadly, we haven't yet incorporated service requirement data into the rookie eligibility, so the report may contain a few players who are not eligible for the Rookie of the Year award-if you see a player with good stats who is not on the IBA ballot, that's the most likely reason. Although VORP is the only pre-made rookie report, premium subscribers can compare rookies in a number of statistics by selecting the "Rookie" statistic in the "player/pitcher season" or the "player/pitcher team year" customizable categories. If you sort "DESC" on "Rookie" as your first sort option in any such customizable report, you can compare rookies in stats such as WXRL or SNLVAR-or really, anything else you like.
Internet Baseball Awards, History Page: Contains a list of awards winners and links to voting details, going back to the good old Usenet days of 1991.
Chaim Bloom, MVP Prediction System, Part 1, Part 2, and Part 3: In this study, Bloom develops and examines a system to predict the results of National League MVP voting by the BBWAA, based on statistical accomplishments and standings.
Bill James, "E = M Cy Squared" in the Neyer/James Guide to Pitchers (New York: Fireside, 2004): In this study, James devises a system designed to predict the baseball writer's pick for the annual Cy Young awards, based on statistical factors. ESPN.com tracks the results of the Cy Young Predictor, which can be found here.