December 26, 2001
The MVP Prediction System
Second in a SeriesI identified a set of seven criteria by which "candidates" for National League MVP can be determined, and a quick tiebreaker to tell which of several candidates will be selected. You'll recall that from 1969-1993, the system identified the eventual winner as a candidate in all but two cases: Kirk Gibson in 1988, and Willie Stargell's half-award in 1979.
In today's article, I'll look at those two cases and the other system misses from during the period 1946-1993, beginning with the times when a non-candidate won, and then examining the years in which the tiebreaker's prediction was wrong. Next week, I'll examine the recent MVPs, and describe trends that may--or may not--change the criteria by which MVPs are decided in the future.
In the past, I had believed that the system would not extend to the period before divisional play in 1969, for three reasons. First, pitchers appear eligible before 1969, but not after; second, voters might have treated pennant winners differently than division winners; third, I had believed (following comments by Bill James) that there was a period in which up-the-middle players were favored by the voters more than seemed to be the case after 1969.
I no longer believe that two of those conditions hold. I now believe the system works--that is, describes how the NL voters chose MVPs--for the entire post-war period, with one major exception. Sometimes, from 1945-1968, they gave the award to a pitcher.
Let's get to it.
There are 23 National League MVPs from 1946 through 1968. Four of those (1950, 1956, 1963, and 1968) went to pitchers. Of the other 19 votes, a system-identified candidate won 14 times.
This is, to be sure, a much lower success rate than during the following period, in which the system goes 23 for 25. However, the success rate is sufficient to yield the conclusion that voters were using basically the same set of standards.
What were the system's mistakes?
The first error was in 1947. The system assigns Braves third baseman Bob Elliott just two points, as he hit .317 with 113 RBI. The pennant-winning Dodgers had no better possibilities; six of the eight position players earned two points (remember, the catcher, center fielder, second baseman, and shortstop for winning teams start with two points). The real contenders, though, were on the fourth place Giants and last place Pirates. Ralph Kiner's triple-crown numbers were .313/51*/127, while Johnny Mize put up a .302/51*/138* line, leading the league in RBIs (league leaders are marked with an asterisk). If both Mize and Kiner count as home-run leaders, then Mize should be the only candidate at four points, with Kiner at three; if neither counts, then Mize still should receive the award as the only three-point candidate. Mize finished third, Kiner sixth, in the actual voting; Elliott beat a pitcher from a non-contending team, with two of the two-point Dodgers finishing fourth and fifth behind Mize.
The next system miss is 1955. The winner should be Duke Snider (.309/42/136*), a five-point candidate; instead, four-pointer Roy Campanella (.318/32/107) is chosen. Snider, by the way, also was "robbed" by a pitcher the following year. The system isn't wrong by much in 1955, however; Campy edged him 226 to 221 in the voting.
That's the only miss in the 1950s. In 1961, Frank Robinson (.323/37/124), one of two three-pointers on the pennant-winning Reds (CF Vada Pinson hit .343), won over the system's candidate, Orlando Cepeda (.311/46*/142*). Cepeda does finish second, as F. Robby wins in a landslide.
The next year is a mess. The league, of course, ends in a tie, with the Giants prevailing in a best-of-three playoff. Willie Mays is a five-pointer (.304/49*/141, CF). If voters considered the Dodgers a co-winner, then Tommy Davis (.346*/27/153) would be five points as well. The voters instead go for Maury Wills and his successful assault on the stolen-base record, despite his one-point season (.299/6/48) (or three-point season, if the voters consider the Dodgers a winner). He still isn't the system candidate, but at least he's getting close. The vote is Wills 209, Mays 202, Davis 175. (Indeed, counting the teams as tied and therefore counting all the players as winners accounts for Campanella's award in 1951.)
The last miss in the pre-division era was in 1966. Hank Aaron (.279/44*/127*) was the only three-point player in the league, but one of the nine two-pointers, Roberto Clemente, won the award. In fact, Aaron finished eighth (fifth among position players). Of the full system misses over the period 1946-1993, meaning that the winner was not even identified as a candidate, Aaron is the only should-be winner to fail to finish even as the runner-up among position players.
On to the two-division era. After a good run of over a decade, the system comes up short in 1979, sort of, as Keith Hernandez ties Dave Winfield as a three-point candidate, but ties Willie Stargell of the "We Are Family" Pirates in the voting. Stargell's point for winning the division was his only one. In 1988, the system identifies one three-point candidate, Darryl Strawberry (.269/39*/101 for the division-winning Mets). The voters ignored him in favor of one-pointer Kirk Gibson, who did not lead the league in anything. Strawberry finished second, 272-236. Remember, MVP ballots must be turned before the playoffs begin, so this outcome was determined well before Gibson's famous shot off of Dennis Eckersley.
Do these misses have anything in common? Not that I can see. Some have claimed that Clemente may have been awarded a "career achievement" award in 1966, the same with Stargell in 1979. That does not seem consistent with Mize losing in 1947 (after losing awards he qualified for in 1942, 1940, and 1939), or with Snider's similar failures in 1955 and 1956.
The "robbed" players (Mize, Snider, Cepeda, Mays, Aaron, and Strawberry) did not play for small-market teams while losing to New York or Los Angeles players. They did not have darker-colored skin than the winners. They did not, as a group, either burst on the scene suddenly, or have a theoretical extra burden of defending previous awards. Nor did their teams have anything in common. Robinson's 1961 Reds were a surprise winner (as were Kirk Gibson's Dodgers), but that doesn't apply to any of the others. And despite what many have claimed, splitting votes between teammates doesn't seem to affect the voting.
For each of these possible explanations, I've tested to see whether accounting for a "wrong" award could be done without causing other problems. For example, it's possible that the writers are more generous to players on miracle teams such as Robinson's Reds or (possibly) Gibson's Dodgers. However, the most famous such team of all, the 1969 Mets, did not have a player capture the MVP; only a year after Bob Gibson took home the last pitcher MVP, Tom Seaver could not win the award. Nor could Monte Irvin defeat Roy Campanella in 1951. No, the voters simply bucked their usual instincts in those years.
I am comfortable with the idea that one of the "wrong" awards--Maury Wills's honor in 1962--is explained by an extraordinary circumstance. Both Lou Brock and Rickey Henderson finished second in their record-breaking years, with far less system support (Brock was a one-pointer) than Wills had. If any number of freakish things could derail the system, it wouldn't be useful, but record-breakers are rare, and it wouldn't be at all surprising if they are rewarded. I'm willing to say the system works if the exceptions are as odd as the 1962 case.
The 1947 award could easily be explained by moving the system ahead in time. From the beginning of the award through 1944, the system only selects five of nine non-pitchers; in other words, it does not capture what is going on with the voters. That's the best I can do with that year.
That leaves 1955, 1961, and 1966, 1979 (sort of), and 1988. All are simply cases where the system is wrong. That's a relatively tolerable error rate. The system isn't perfect. It captures most of what goes on in NL MVP voting, but every once in a while something else happens, a successful PR campaign, or what have you. Maybe Duke Snider had one or two quiet enemies in the press box for some reason.
Actually, all of these erroneous predictions do have one thing in common. In each case except for the 1979 tie and the 1962 mess, the system identified a single winner; that happens about half the time in the 1946-1993 era, with the other half resulting in ties. In all but one case--1962--in which the system produced two or more candidates, one of those candidates took home an MVP (although one had to share it).
A couple of notes are still required. The first is on the subject of pitchers. Simply put, up through 1968, pitchers sometimes won. Why? I have no idea. I see no logical explanation for the MVP results of 1963, 1965, 1966, and 1968. In 1963, Sandy Koufax beat a four-point candidate (Hank Aaron, again). In 1965 and 1966, he couldn't beat lone three-point candidates, even though the voters were certainly willing to dump one of them. Then in 1968, Bob Gibson was given the award, despite two reasonable three-point candidates (Curt Flood and Willie McCovey) available for the voters to choose.
The only thing I can say about pitchers is that no five-point candidate has lost to a pitcher since World War II, and that they are apparently no longer considered eligible by the voters. After Gibson's award in 1968, Tom Seaver finished a close second the next year. Since then, only two pitchers have cracked the top three, with Mike Marshall finishing a distant third in 1974 and Greg Maddux finishing a close third in 1995.
I suppose I should say something about the American League. During the period 1946-1968 in which the system misses five National League MVPs, it misses eight AL MVPs, with no Maury Wills excuses to make it look better, either. After 1968, it doesn't get any better; the system misses 1970 and 1977, and then after a pretty good streak it misses 1987, 1989, 1990, 1991, and 1993. That makes 15 misses out of the forty-two chances (1946-1993, leaving out years pitchers get it). That's too many to put any confidence in the system, since after all quite a few awards are not particularly contested and are not in need of explanation. The National League has only seven misses (counting Stargell) over that period, and I'm not losing any sleep over Maury Wills.
Is it plausible that the standards are different for the two leagues? I have no real problem with that proposition; others will object. There's no way to prove anything one way or another, statistically; the question is whether the evidence that the system works is strong enough to overcome methodological doubts. I'm convinced that there is a difference between the leagues, such that the system works well enough to use in the National League, but not in the junior circuit.
Now, on to the tiebreakers. Recall that when there is more than one candidate, the predictor expects the voters to make a head-to-head comparison, adding up BA+HR+RBI and adding 15 points for up-the-middle players and for players on division or pennant winners. The first of these contests took place in the first year of the era under consideration here, and Stan Musial easily outpointed Enos Slaughter and Dixie Walker. But the next three times--1952, 1959, and 1973--the tiebreaker picked the wrong candidate. In 1973, Pete Rose (.338/5/64 +15) should have lost to Willie Stargell (.299/44/119 +15) or Tony Perez (.314/27/101 +15). But contrary to the theory that having multiple candidates on the same team hurts both players, Rose beat Stargell by a narrow margin.
The previous tie was about as odd as 1962. In 1959, Ernie Banks (.304/45/143* +15) tied Hank Aaron (.355*/39/123) and Eddie Mathews (.306/46*/114) with three system points. However, had voters cast their ballots on the last day of the season, Aaron and Mathews might have been ranked higher, because the Braves were tied with the Dodgers and needed a playoff to settle the pennant. As seen above, the voters appear to consider both teams winners in these cases. So is Banks a first-round error? No--because Mathews clinched the home-run title with his 46th longball during the playoff. If the voters had ignored the playoff games, Banks would have an extra point for a tie in the league lead in home runs. Banks should lose the tie-breaker, however, regardless of whether Milwaukee was considered a pennant winner or not. The only way that the system predicts a Banks win is if the voters considered Banks the co-leader in HRs but ignored the Braves tie for pennant; I'd rather call this one a tiebreaker failure.
One more failure: in 1952, Hank Sauer (.270/37/121 = 428) edged Duke Snider (.303/21/92 + CF + Dodgers win = 446). I can't explain this one, other than to note that the voters had it in for Snider, or else he was just plain unlucky based on some factor the system doesn't pick up.
So after a correct selection in 1946, the tiebreaker was only invoked three more times through 1973, and it failed each time. Then it kicked in. With two division winners, ties were more frequent, and the tiebreaker correctly sorted out the winner 16 times in a row from 1974 through 1998. Finally, the tiebreaker failed again just this season, with Barry Bonds (.328/73/137 = 538) clobbering Sammy Sosa (.328/64/160 = 552) in the voting. I'm willing to accept this one as an understandable fluke, thanks to Bonds setting the home-run record.
Given the evidence, the story looks pretty clear. Before the split into divisions, voters did not have a regular procedure for settling ties. Once ties became frequent, they quickly settled on a method, and have stuck to it. Or at least, they have stuck to it up to this point. Does the Bonds award indicate that something has changed? I'll look at recent awards and the future of the NL MVP in the final part of this series.
Jonathan Bernstein has been walking around with a goofy grin on his face ever since Barry Bonds agreed to stay put with the Giants for at least one more year. He thanks the gang at rec.sport.baseball for their helpful suggestions about MVPs.