September 14, 2010
Prospectus Hit and Run
Last year, our friends at ESPN tasked me with building an MVP predictor in the spirit of a system such as Bill James' Hall of Fame Monitor, one that awards points for various accomplishments in an attempt to identify who will win as opposed to who should win. Limiting my scope to the post-strike timeframe to take advantage of the fact that none of the ensuing winners were pitchers, that all of them save for the 2003 version of Alex Rodriguez came from teams that finished above .500, and that 22 of the 28 hailed from teams that qualified for the expanded postseason, I built a carefully-gerrymandered system—Jaffe's Ugly MVP Predictor (JUMP)—that could identify ("predict") 14 winners, and put 27 out of 28 winners in its top three in points for that year.
Getting any closer on the direct hits proved impossible (at least for me), because over the 1995-2008 span, the voters were only so predictable; at some point, subjective considerations inevitably entered the discussion. Nonetheless, certain tendencies made it easier to identify the top candidates, enabling me to rack up secondary hits (identification as a top-three candidate) in all but one election. Mind you, that means top-three candidate in this system, not a top-three vote-getter. Instead of focusing on the round-numbered benchmarks typical of a Jamesian system (30 homers, 100 RBI, etc.), I focused on league rankings among batting title qualifiers in key offensive categories, discarding anything that lacked the power to predict a player into that top three.
To a bit of my surprise and disappointment, on-base percentage fell by the wayside in the process—one simply can't do a better job of predicting voter behavior by taking it into account. The categories ultimately found to have some bearing on prediction were as follows: batting average, slugging percentage, home runs, total bases, runs, RBI, intentional walks, and stolen bases, though the latter two weren't as valuable as the rest. Points were awarded in each category via a 10-7-5-3-2-1-1-1-1-1 system, recognizing a top 10 finish and increasingly rewarding top five, top three, and league leadership. Additionally, player position and team success carried significant bonuses which had a major impact on winnowing the field of candidates, with middle infielders getting a boost, and designated hitters a penalty. Team success generated points for three levels of accomplishment, a .500 record, a wild card, or a division title. Playing for the Rockies carried an additional penalty, since despite the gaudy hitting stats annually accumulated in Colorado, only Larry Walker (1997) has won the award. (For more nuts and bolts, please see the original piece as well as this comment on my process.)
The system lived up to its record in 2009, scoring a direct hit with Albert Pujols' win in the NL (Ryan Howard and Chase Utley rounded out the top three), and a secondary hit with Joe Mauer's win in the AL. The fact that only one catcher (Ivan Rodriguez, AL 1999) had won in the post-strike era, and had done so in a very strange and close MVP race made it very hard to cue the system to distinguish what separated Mauer from, say, Mike Piazza at the time when the latter was putting up video game numbers for a winning Dodgers team in 1995 and 1996. Instead, the system identified Mark Teixeira as the likely AL winner, tabbing a player who led the league in homers, total bases, and RBI while playing for a team that won 103 games and the division title.
With that in mind, I turned JUMP's attention to the 2010 MVP races. With one exception, the top threes the system generated weren't tremendously surprising given the current debates being waged on barstools and websites across the country. What's more interesting at this stage is to consider the scenarios which could change those triumvirates and their relative rankings over the final three weeks of the season.
TSP is team success points, which make up a good chunk of the point totals here; each team's win total is divided by nine, with the result multiplied by one for a team at .500 or above, by two for a wild-card team, and by three for a division winner. A 10-point penalty is incurred for playing for the Rockies, for reasons already mentioned.
When I first ran the numbers on Saturday, Votto led the pack, with the Reds' hefty six-game lead in the NL Central standings helping him to overcome Pujols' slight superiority in the individual stat categories. Nonetheless, Pujols' two homers and five RBI over the weekend were enough to push Phat Albert into the lead. He'll need to hold those positions to have the upper hand on Votto here, and it wouldn't hurt his cause to reclaim the total bases lead.
CarGo isn't out of the race yet, however. He's in the red as far as TSP goes, but should the Rockies win the NL West—through Sunday they had a 23.2 percent chance of doing so according to our Playoff Odds Report—he would leapfrog the two slugging first basemen. Merely claiming the wild card (11.1 percent chance) wouldn't be enough in terms of the point total unless he also passed Pujols and/or Votto in the runs and/or RBI categories. It's not hard to see how the narrative of him helping the Rox reach the postseason after being nearly left for dead could carry the day, even with his Coors-infused stat line (.385/.433/.773 at home, .288/.310/.450 on the road).
Barring a slump by CarGo to knock 10 or 15 points off his batting average and slugging percentage, it doesn't appear likely that anyone else can crack the top three. I've presented the fourth- and fifth-ranked candidates to show how far behind they are in the key categories; both would have to vault from the lower reaches of the top 10 in homers to pick up even a single point, and while that might just mean one or two multi-homer games, it's still a crowded pack.
Note that the fourth- and fifth-ranked players, Adrian Gonzalez and Ryan Howard, are already benefiting from the fact that their teams were in first place at the time of this writing. The Giants, a half-game out of the wild-card spot and percentage points behind the Padres in the West, were considered as having missed the playoffs, though the Playoff Odds actually placed them ahead of the Friars, with a 42.7 percent chance at the division and a 12.9 percent chance at the wild card, compared to 34.0 percent and 12.5 percent. If I instead credit the Giants with the division title (shudder) as the other two NL West contenders miss the cut, Aubrey Huff vaults from 26th place to sixth, not enough to become a factor here.
A more logical way to dole things out might be to use the Odds breakdowns to award Team Success Points; instead of being credited with 1.0 division title and 0.0 wild cards, a team might be at 0.51 division titles and 0.30 wild cards (like the Phillies). As it turns out, this doesn't change the top three in the NL, but it does bump Martin Prado and Brandon Phillips into the fourth and fifth spots, and litters the top 10 with several Reds with no chance at the award (Jonny Gomes? Drew Stubbs???) because crediting them for 0.96 division titles vaults them past players on teams with much smaller fractions. No, I think we'll stick to whole numbers.
Turning to the American League rankings through Sunday:
The situation here should be considered extremely fluid. Hamilton has been sidelined by bruised ribs since September 4, and he's made little progress thus far in his recovery; he could easily drop out of the top 10 in RBI and out of the top five in homers if he doesn't return soon, and could fall in the other counting categories as well. Cabrera could lose his meager allotment of Team Success Points if the Tigers fall under .500; that wouldn't be enough to knock him out of the top three without other players stepping forward, though if Jose Bautista keeps launching enough bombs to boost his slugging percentage and total base total, it's possible.
The first real surprise is the presence of Teixera in the top three even in a down year; leading the league in runs scored for a division leader—a showing that owes to the fact that the supposedly unclutch Alex Rodriguez leads the league in OBI%—turns out to have its advantages, particularly when coupled with two other top-10 rankings. The second surprise is that Teixeira's presence in the top three comes at the expense of teammate Cano, who's put together an MVP-caliber season worthy of the narrative if nothing else by becoming the big run producer for the Yanks in a year where Rodriguez, Derek Jeter, and Jorge Posada have all shown their age. Cano is in the top 10 in six different categories, and has the middle infielder bonus going for him; alas, he's in the top five in just one of those categories, and a lesser one (intentional walks) at that.
The other situation that could have an impact on the top three here is if the Yankees wind up with the wild card and the Rays win the AL East flag. Without changing anything in the individual statistical rankings, that switch alone would drop Teixeira to 32.3 points and Cano to 25.8. It would also push the Rays' Carl Crawford into the top three with 39.4 points; Crawford ranks second in runs, third in steals and ninth in batting average thanks to a hot September showing (.444/.500/.778). Evan Longoria (33.2, thanks to five top-10 showings, albeit none above sixth) would pass Teixeira as well but would still be behind Mauer (34.8).
The take-home message here is that even as the playoff picture begins to take shape, the MVP races aren't over yet, even if the narratives have taken shape in the minds of voters. The historical "precedent" covered by JUMP suggests that the players here anointed as the most likely MVPs have 50 percent shots at the award (15 direct hits in 30 tries), with the second- and third-ranked players each having 23.3 percent chances (seven out of 30 each), and the remaining 3.3 percent (one out of 30) reserved for a dark horse. However, the rankings themselves can change quickly, just as those in our Playoff Odds Report.
And again, let me reiterate that this system—an algorithm I designed via my "Three-I" system of trial and error: intuition, iteration, and idiocy—is making no value judgments that the BBWAA voters haven't already made. To the extent that the voters have shown any consistency in applying standards for the award from year to year, they're baked into JUMP. Team performance, position, and certain individual stats matter, but other more important ones don't, including those compiled by pitchers. I'm not attempting to justify those choices, just to identify them in the hopes that some voter might scratch his head and wonder (for example), "Why do we undervalue OBP when it comes to the MVP vote?" After all, the first step in fixing a problem is admitting you have one.