April 28, 2004
Lies, Damned Lies
Making RBIs Useful
There isn't a whole hell of a lot to do in Lansing, Michigan. There aren't any mountains, and there isn't any seacoast. The nearest amusement park is 400 miles away. There's a minor league ball team there now, but there wasn't when I grew up. There's a college there--a big, state university--with lots of college parties, and lots of college girls, and a lot of kids from Lansing start behaving like college students long before they really should. But even those with precocious synapses manage to sneak in a few years of relative innocence before learning what sororities and beer bongs are, and my synapses were late to the party. There's a big city not too far away, but to paraphrase W.C. Fields, the prevailing sense that one has when one is in Detroit is that, all things considered, one would rather be in Lansing.
So what you do a lot is drive. You drive past the cow farms and the meadows and rolling hills or whatever the hell they're called in the TripTik and the dilapidated country town with the antique store that your mother likes so much. You drive with your dad in an American-made sedan and you listen to Ernie Harwell and the Tigers. You drive at 62 m.p.h. past a shuttered-up farmhouse with peeling gray paint and a half-working windmill, and Steve Balboni stands there like a house by the side of the road and watches Frank Tanana's fastball go by, or that's what Ernie tells you. You drive and you listen and you daydream and you talk about baseball.
I can distinctly remember, on one of those lazy summertime drives with my dad, talking to him about the relative merits of batting average and RBIs. Much better, we were agreed, to have a run producer like Kirk Gibson in your lineup than a batting average specialist like Wade Boggs. Baseball, after all, is won and lost with runs; an RBI, by definition, produces a run, while a base hit doesn't. Boggs had to rely on Jim Rice or Dwight Evans to drive him in, but Gibby got things done all by himself. We felt this to be a sophisticated and almost inscrutable line of argument. A few years later, we'd use the same reasoning to argue with anyone who would listen that Cecil Fielder had been robbed by Rickey Henderson in the MVP voting.
Nowadays, of course, my father and I have developed a more nuanced view of baseball and its statistics. We could tell you, for example, that some of the impetus behind our enthusiasm for the RBI was the result of the peculiar park effects of Tiger Stadium, which tended to increase run scoring and power categories, while depressing batting average. We could also tell you that our skepticism about batting average was correct: it is a relatively weak predictor of run scoring.
Surely, the RBI has its problems, and we could tell you that too. The usual modernist argument against the RBI, a case that Dayn Perry makes eloquently, is that it's extremely dependent upon a player's context. All else being equal, a hitter who has Derek Jeter and Gary Sheffield hitting immediately in front of him is going to have more RBIs than a hitter who is batting behind Jack Wilson and Rob Mackowiak, simply because the former pairing will be on base so much more frequently.
But what if we could remove these sorts of considerations from the equation? Would the RBI then become useful? I asked Keith Woolner to provide me with the following information for each major league hitter in 2003:
Let's take a look at the leaders in the first category. Which hitters were most effective at driving in runners from third base?
Table 1: Percentage of Runners Scored from 3B (minimum 50 opportunities)
Player OPP R RATE Sheffield_Gary 84 48 57.1% Stynes_Chris 50 28 56.0% Stewart_Shannon 64 34 53.1% Cintron_Alex 51 27 52.9% Sweeney_Mike 68 36 52.9% Helton_Todd 72 38 52.8% Anderson_Marlon 63 33 52.4% Randa_Joe 51 26 51.0% Walker_Todd 63 32 50.8% Lee_Carlos 77 39 50.7% LEAGUE AVERAGE 38.6%The list is dominated by what we might call good contact hitters. Power is not so important when it comes to scoring a runner on third; except in very rare instances, any base hit will score the runner. Striking out is bad, since it eliminates the possibility that the runner will score while the batter makes an out, and most of these hitters don't strike out very much. This is also one spot where having a high walk rate isn't particularly helpful; with nobody on base and nobody out, a walk is worth exactly as much as a base hit; with two outs and a runner on third, the base hit is worth substantially more. The list, of course, also includes its share of sample size anomalies; I think you'd be hard-pressed to find a manager who would rather have Chris Stynes or Marlon Anderson batting with a runner on third than, say, Vladimir Guerrero.
Moving clockwise across the diamond...
Table 2: Percentage of Runners Scored from 2B (minimum 100 opportunities)
Player OPP R RATE Stynes_Chris 100 26 26.0% Casey_Sean 117 30 25.6% Delgado_Carlos 174 44 25.3% Anderson_Garrett 147 37 25.2% Abreu_Bobby 161 40 24.8% Everett_Carl 121 29 24.0% Sheffield_Gary 143 34 23.8% Crawford_Carl 122 29 23.8% Wilson_Preston 176 41 23.3% Matsui_Hideki 153 35 22.9% LEAGUE AVERAGE 16.6%One thing that should jump out immediately is the sharp decline in the league average scoring rate: in any given plate appearance, a runner on third base is going to score almost two-and-a-half times as often as a runner on second. It seems apparent that some of the conventional wisdom about the importance of having runners in 'scoring position' is misguided. A base hit scores a runner from second base only about 63 percent of the time, and an out will virtually never score a runner from second unless it's combined with an error.
The character of hitters on this list is not entirely different from the one that we examined before. It includes a lot of doubles hitters, folks like Hideki Matsui and Bobby Abreu. Our friend Chris Stynes shows up again; if only Jayson Stark had known!
Table 3: Percentage of Runners Scored from 1B (minimum 150 opportunities)
Player OPP R RATE Pujols_Albert 215 28 13.0% Thomas_Frank 202 24 11.9% Edmonds_Jim 172 20 11.6% Chavez_Eric 221 24 10.9% Thome_Jim 240 26 10.8% Guerrero_Vlad 161 17 10.6% Rodriguez_Alex 231 24 10.4% Giles_Brian 213 22 10.3% Guillen_Jose 166 17 10.2% Jenkins_Geoff 176 18 10.2% LEAGUE AVERAGE 5.6%This list, in contrast, is a testament to the importance of isolated power. A team will have a runner on first base far more often than it will have a runner on second or a runner on third--but it requires an extra-base hit to score that runner (a double, incidentally, will score a runner from first base a little bit more than 40 percent of the time). Of course, a team can try and play station-to-station ball, advancing a runner one base at a time, but it will often find that it runs out of outs before it runs out of bases.
In the interest of completeness, we'll also provide the list of the leaders in home runs per plate appearance (it is true that a batter can sometimes score himself without hitting a home run, such as when he hits a triple and the defense makes an error, but those instances are unusual enough that it should be safe to ignore them).
Table 4: Leaders in HR/PA (minimum 400 PA)
PLAYER HR PA RATE Lopez_Javy 43 495 8.7% Bonds_Barry 45 550 8.2% Edmonds_Jim 39 531 7.3% Sosa_Sammy 40 589 6.8% Thome_Jim 47 698 6.7% Rodriguez_Alex 47 715 6.6% Thomas_Frank 42 662 6.3% Pujols_Albert 43 685 6.3% Sexson_Richie 45 718 6.3% Sanders_Reggie 31 498 6.2% LEAGUE AVERAGE 2.8%I hope that you can see what I'm trying to do here, which is to break the RBI down into its component parts. The percentage of the time that, say, a given hitter knocks in a runner from second base is, in fact, largely a function of his ability. Given the sorts of sample sizes that we're dealing with, it is also heavily influenced by luck. But save for some negligible differences due to baserunning ability, it does not have very much to do with the abilities of his teammates. It is context-independent. Put another way, the foremost problem with the RBI is that different hitters will be faced with different baserunning states with different frequencies. David Ortiz conducted about 27 percent of his plate appearances last season with a runner on second; Bobby Higginson, just 17 percent.
By removing these sorts of variances, and replacing them with league average figures, we can go a long way toward making the RBI context-neutral. These were the relevant, league average rates in 2003:
With these averages in place, we're now ready to introduce the CIRBI--the Context-Independent RBI. (Any resemblance to the former Twins slugger is purely intentional):
CIRBI = (R3H * Lg3PA + R2H * Lg2PA + R1H * Lg1PA) * PA + HRR3H, R2H and R1H represent the percentage of runners that a given hitter knocks in from third base, second base, and first base respectively, as we've calculated in the tables above. All of the components of the CIRBI are context-neutral, or at least pretty close to it. You'd probably want to build in an adjustment for park effects, but we can skip that step for now.
Here were the major league leaders in CIRBI in 2003, presented along with their actual RBI totals:
Table 5: 2003 CIRBI Leaderboard
Player CIRBI RBI Delgado_Carlos 138 145 Pujols_Albert 131 124 Sheffield_Gary 131 132 Rodriguez_Alex 128 118 Helton_Todd 124 117 Thome_Jim 124 131 Sexson_Richie 122 124 Wilson_Preston 120 141 Wells_Vernon 120 117 Lee_Carlos 117 113 Anderson_Garret 117 116One of the nice properties of the CIRBI is that it operates on exactly the same scale as the regular ol' RBI. A CIRBI total above 100 represents a good season, and a total above 120, an outstanding one. In most cases, indeed, the differences between CIRBI and RBI totals are relatively slight.
Nevertheless, it does seem apparent that the CIRBI is somewhat better correlated with more advanced metrics of productivity. Players like Alex Rodriguez and Albert Pujols rank higher in CIRBIs than they do in RBIs, while Preston Wilson ranks lower, even before accounting for park effects. At the same time, the CIRBI, like the RBI, is able to recognize the strengths of a hitter like Garret Anderson for what they are. While Garret gets a lot of flack around here for his poor plate discipline, he makes contact well, stays healthy, produces a lot of extra-base hits, and is a pretty good guy to have at the plate when you've got runners on base.
It should be noted that the CIRBI behaves like a counting stat, rather than a rate stat; all else equal, it will increase linearly with playing time. It should further be noted that the CIRBI, like the RBI, is subject to the vagaries of "clutch" hitting. If a hitter hits uncharacteristically well with runners on base, that will reflect positively on his CIRBI total, even though it is unlikely that it represents any sort of repeatable ability. I wouldn't expect any Prospectus authors to start using CIRBI in lieu of more robust metrics like slugging percentage and isolated power.
Nevertheless, the CIRBI does have its charms. It could provide for potentially powerful ammunition in an MVP debate--we could have used it to point out, for example, that A-Rod was an exceptionally productive hitter with runners on base, even though he might not have had as many opportunities as his counterparts in Boston and New York. It is scaled in such a way that makes it accessible to the more traditional sorts of folks in the baseball community; I don't think that I could persuade Jim Hendry about the merits of MLVr, but maybe I could sell him on the merits of the CIRBI. The CIRBI does a pretty good job of answering, in the most literal sense of the term, the question of which hitters were most effective at driving runners home. It does a better job of that than the original, context-dependent version of RBI.
That said, I don't know that you're going to be seeing CIRBI next to VORP and SNWL on the BP stats page any time soon. The real problem with RBI--and the real problem with CIRBI--is not so much that it is context-dependent, but that it conflates the question of driving in runs with the more important matter of producing runs. Barry Bonds doesn't do so well in terms of CIRBI, but he's the best hitter on the planet because he's going to get on base something like 60 percent of the time this year, creating unprecedented opportunities for the hitters hitting behind him (that the Giants can't do better than A.J. Pierzynski and Marquis Grissom is another matter entirely). Wade Boggs and Rickey Henderson were better hitters than Kirk Gibson and Cecil Fielder were better hitters for the very same reason. Driving in runs is only half the battle, and because of the importance of avoiding outs, it's the less important half.
As Chris Kahrl argues in the case against OPS, a little bit of simplicity can be a dangerous thing. I think my dad, who has never been much of an Ockham's Razor guy, would agree. Probably not Ernie Harwell, though.