Premium and Super Premium Subscribers Get a 20% Discount at MLB.tv!
August 31, 2009
Prospectus Hit and Run
Building an MVP Predictor
As the recent scrum between supporters of the candidacies of Joe Mauer and Mark Teixeira reminds us, nearly every Most Valuable Player Award is capable of producing controversy. Not only do the voters from the Baseball Writers Association of America rarely elect the player who, via some objective formula, is worth the most wins to his team, they appear to shift their standards from year to year, instead constructing narratives to fit whatever loosely-gathered facts are at hand. Particularly in recent years, defensive value is often minimized or entirely ignored in favor of heavy hitters with big Triple Crown stats, almost invariably from successful teams.
The question is whether the voters' behavior can be predicted. To that end, I was tasked with building an MVP predictor in the spirit of a system such as Bill James' Hall of Fame Monitor, one that awards points for various levels of achievement in an attempt to identify who will win, as opposed to who should win. My initial bursts of enthusiasm for the assignment were soon followed by endless hours of cowering in the fetal position before a massive spreadsheet, but in the end I emerged with a system-Jaffe's Ugly MVP Predictor (JUMP)-which correctly identified 14 of the 28 winners during the Wild Card Era (1995 onward), and put 27 of those winners among the league's top three in its point totals.
I limited the scope of the system to that post-strike timeframe for three main reasons: none of the 28 winners were pitchers, only one played for a team that finished below .500 (Alex Rodriguez in 2003), and 22 of them played on teams that qualified for the expanded postseason-extremely strong tendencies that could help separate seemingly equal candidates. Instead of focusing on round-numbered benchmarks like James did (a .300 batting average, 100 RBI), I chose to dispense with actual stat totals and rates and focus on league rankings among batting title qualifiers (3.1 plate appearances per game) in 12 key offensive categories: batting average, on-base percentage, slugging percentage, OPS, hits, homers, total bases, runs, RBI, walks, intentional walks, and steals. Through much study, trial, and error-indeed, every single step of the process involved this-I eventually settled upon a 10-7-5-3-2-1-1-1-1-1 point system in each category, which produces a slight scoring bonus for leading the league or finishing in the top three, and some acknowledgement of a top-ten finish.
Surprisingly enough, it's not a strong showing in RBI or even home runs which is most common among the award winners of this era:
Category Lead Top 5 Top 10 Total Bases 7 19 25 Slugging Pct. 10 18 23 Runs 9 16 23 OPS 8 17 22 Batting Avg. 3 11 21 Home Runs 7 19 21 Runs Batted In 6 18 20 Intentional BB 7 13 20 On-Base Pct. 6 11 18 Hits 2 10 16 Bases on Balls 6 8 12 Stolen Bases 1 3 5
As you'll see below, the lack of a correlation between the getting-on-base stats and the eventual hardware had consequences that needed to be taken into account.
Because team performance has such an overwhelming effect on the voters' perceptions of players' candidacies, I recorded each team's record and route to the playoffs, fixing upon a system that awarded a maximum of three "Team Success Points": one for finishing at or above .500, another for winning the Wild Card, and two for winning the division. Those points were then multiplied by the team's win total and divided by nine; a player on a 99-win division winner thus received 33 points, one on a 90-win Wild Card team received 20 points, and one on an 81-win team received nine points. These team points, which can outweigh the points of any individual categories, do much to winnow the field.
At that point, various iterations of the system-some of which included weighting the stat categories according to the frequency with which past winners had placed in the top 10-correctly identified anywhere from nine to 12 winners out of 28, not a terribly impressive result. From there, JUMP became an exercise in careful gerrymandering, not only to increase the direct hits but to push as many winners as possible into the league's top three, a concession to the fact that at some point subjective elements take over for a number of voters. The point totals in a few categories-OBP, OPS, hits and walks-were dropped entirely from the scoring once it was determined that excluding them made no difference; simplicity was given the priority. Intentional walks were reduced to a 0.5 weight, stolen bases to 0.55. I introduced a positional adjustment, adding 3.33 points for middle infielders and penalizing 13 points for designated hitters, and an anti-Rockies adjustment, penalizing 10 points for high-altitude residence. All of these values were arrived at only after tedious trial and error.
Here's how the actual award winners fared in JUMP, along with the players it flagged as the likely winners in years where they differed from the voting:
Year AL Winner Rank System Winner 1995 Mo Vaughn 3 Albert Belle 1996 Juan Gonzalez 2 Albert Belle 1997 Ken Griffey 1 1998 Juan Gonzalez 1 1999 Ivan Rodriguez 10 Manny Ramirez 2000 Jason Giambi 1 2001 Ichiro Suzuki 2 Bret Boone 2002 Miguel Tejada 2 Alfonso Soriano 2003 Alex Rodriguez 1 2004 Vladimir Guerrero 1 2005 Alex Rodriguez 1 2006 Justin Morneau 3 Derek Jeter 2007 Alex Rodriguez 1 2008 Dustin Pedroia 1 Year NL Winner Rank System Winner 1995 Barry Larkin 3 Dante Bichette 1996 Ken Caminiti 1 1997 Larry Walker 2 Jeff Bagwell 1998 Sammy Sosa 1 1999 Chipper Jones 1 2000 Jeff Kent 3 Barry Bonds 2001 Barry Bonds 3 Sammy Sosa 2002 Barry Bonds 1 2003 Barry Bonds 1 2004 Barry Bonds 3 Albert Pujols 2005 Albert Pujols 1 2006 Ryan Howard 2 Albert Pujols 2007 Jimmy Rollins 3 Matt Holliday 2008 Albert Pujols 2 Ryan Howard
Memo to Mauer fans: the only catcher to win the award during this era is the one whose result sticks out like a sore thumb. In my gerrymandering efforts, no amount of positional bonus given to Pudge could offset the consequence on boosting Mike Piazza into the top three a few times or creating other problems. The system does correctly nail a few of the curveballs thrown by the voters, including Sosa over McGwire despite the latter's record-setting home run total in 1998 (one reason OBP was dropped), A-Rod with the 71-win Rangers in 2003, and Pedroia last year, and it gets pretty close on players like Larkin, Kent, Suzuki and Rollins, who won the award despite not finishing in the top five in homers, RBI, or slugging percentage-three of the four most common categories populated by MVP winners.
As for the various discrepancies, it doesn't take too much to recall some of the subjective elements which may have played a part in the voting, particularly in the AL. Take the writers' loathing of Belle, who in 1995 led the league in six point-accumulating categories and outdistanced Vaughn 84-47 here (Edgar Martinez was second at 52 points). Recall their fascination with the novelty of Ichiro and that 116-win, post Griffey/Rodriguez/Randy Johnson team. Note their tendency to avoid voting for Yankees when at all possible, particularly during the Torre dynasty years; Rodriguez's win in 2005 was the first for someone in pinstripes since Don Mattingly in 1985. Remember the way the wholesome Midwestern clutch goodness of Morneau's 130 RBI carried the day over a fine all-around year from Jeter (.343/.417/.483 with 118 runs and 97 RBI), not to mention the season turned in by teammate and batting-title winner Mauer.
Jeter's 2006 plight only serves to reopen the wounds of 1999, a monster year in which he hit .349/.438/.552 with 24 homers, 134 runs and 102 RBI, all career highs; while he ranks second in JUMP, he could do no better than sixth in the actual vote. Pedro Martinez, who won the Cy Young and the pitchers' Triple Crown by going 23-4 with a 2.07 ERA and 313 strikeouts, received the most first-place votes that year but wound up a tight second, the highest by a pitcher during this span. JUMP leader Ramirez wound up tied with teammate Roberto Alomar (who ranked third here) for a close third in the actual voting. Pudge overcame the strong candidacy of teammate Rafael Palmeiro, who ranked fourth here and finished fifth in the voting, leapfrogging a very strong, very unusual field.
There are actually more discrepancies here on the NL side, though the "wrong" winners are often players who wound up winning in other years-namely Bonds, Sosa, Pujols and Howard-softening the blows of those "injustices." Rollins won it as the sparkplug leadoff man who led the league in runs and finished second in total bases, and while the Phillies only made the playoffs on the regular season's final day, who knows how many already-sent votes might have turned Holliday's way given his Game 163 heroics.
As to what the system says about this year's MVP races, the names in the NL race are no surprise. Pujols, who leads the league in five of these categories and is in the top five in two others, is the overwhelming leader with 84 points, followed by Howard with 49 and Chase Utley with 40, repeating last year's third-place ranking. The system has flip-flopped on Pujols and Howard twice, getting the wrong answer both times, but neither time was the gap separating the two so wide as this year.
As for the AL, Teixeira leads in RBI and is second in homers, and ranks first with 55 points; he's followed by Miguel Cabrera with 47 points. Surprisingly, Chone Figgins is third with 44 points thanks to the league lead in runs and a third-place showing in steals; teammate Kendry Morales, who's second in the league in slugging, is just a fraction of a point behind him. As for Mauer, he's currently 28th, a consequence of playing for a team that through Saturday was a game under .500; Sunday's Twins win vaults him to 15th. While he leads the league in all three triple-slash categories, OBP doesn't score points here, and he currently cracks the top 10 only in intentional walks. He is 11th in homers and total bases, and 13th in RBI, so if the guy could just snap out of his August funk (.402/.462/.654 with seven homers and 22 RBI), he may yet JUMP into the fray.
A few years back, Rob Neyer and Bill James introduced a Cy Young Predictor formula in the Neyer/James Guide to Pitchers, a formula made possible by the relatively smaller number of statistical inputs which go into consideration for that award, and one that produced a much higher level of accuracy (around 80 percent) than JUMP does. In the end, a sharper mind than mine might well have produced an MVP prediction system with more direct hits, perhaps even a simpler, more elegant system altogether. Nonetheless, JUMP underscores both the wider variety of inputs that can come into play in a single MVP vote and the fact that nearly any given year produces at least a few candidates with strong enough statistical resumés and team backgrounds for a voter to attach to a narrative which rationalizes their vote. As with any season's actual voting results, I strongly suspect we haven't heard the last word on this topic.
A version of this story originally appeared on ESPN Insider .