June 3, 2004
Doctoring The Numbers
Scraping the Bottom of the BarrelIt takes a lot these days to awaken me from my slumber and coerce me into penning a column for BP. Between taking care of a baby daughter at home and starting my own medical practice, the truly important things in life--like baseball analysis--have gotten short shrift of late.
But finally, I have found a topic that arouses my passion. A question so intriguing as to get my heart racing, my blood pumping, my brain thinking. Finally, a puzzle worth being solved, a code worth being cracked.
That question, of course, is: "Does Alex Sanchez have the emptiest batting average in major-league history?"
Consider the evidence. Bolstered by an obscene number of bunt hits, Sanchez was hitting .359 going into Wednesday night's game, which ranked him third in the American League. (By the way, who had the exacta on a Melvin Mora-Ken Harvey-Alex Sanchez top three at this point in the season?) But Sanchez's impressive ability to hit singles is neutered by his inability to do anything else: hit for power (eight extra-base hits), reach base by other means (four walks, no HBPs), or make effective use of his speed (11 steals, 10 caught stealings).
For the season, Sanchez is hitting .359/.371/.431. His batting average may rank third in the league, but his 802 OPS ranks just 43rd--in a tie with Jose Cruz, who's hitting .237.
Put succinctly, Sanchez's batting average is about as empty as Le Stade Olympique. But is it the emptiest ever?
There are a number of ways to frame that question. Ever since baseball analysts became aware that batting average didn't tell the whole story--which is to say, ever since baseball analysis existed--there have been attempts to measure all the different aspects of offensive performance not covered by batting average. The first commonly-used statistic that was designed to sweep the floor of all the things that batting average left behind was Secondary Average, which was originally designed (or at least propagated) by Bill James, two decades ago.
James' original formula for secondary average was (BB + (TB - H) + SB)/AB. The statistic covered the three primary elements of offense outside of average--power, walks, and offensive speed--and had the added feature that a league-average figure for secondary average would be almost the same as a league-average figure for batting average. (Using this formula, the AL's secondary average in 1985, for instance, was .260; the league batting average was .261.)
Secondary average has had more of an impact on baseball analysis than you might think. James, in his essay explaining secondary average to the world, made an off-hand comment that he thought a player's overall offensive contribution was approximately two-thirds batting average, one-third secondary average. One of his readers decided to use this rough template--2/3 BA + 1/3 SA--as the foundation for a new statistic that he hoped would quantify a player's performance in one number, which was the genesis for how Clay Davenport came up with the idea for Equivalent Average, or EqA.
There is a problem with James' original formula, which is that by counting stolen bases without counting caught stealings, the formula measures only the positive impact of offensive speed, and not its downside. While it's a useful way of quantifying a player's diversity of skill, it's not a good way to quantify the value of a player's talents outside of batting average. So we'll make an adjustment to the formula, and also add HBPs into the equation:
Secondary Average = (BB + HBP + (TB-H) + SB - (2*CS))/AB
The best secondary averages of all time?
Year Player SEC ------------------------- 2002 Barry Bonds .955 2001 Barry Bonds .941 2003 Barry Bonds .831 2000 Mark McGwire .797 1998 Mark McGwire .786Greatness can make for some pretty boring lists. A look at those numbers can help explain how skills like power and walks can be so important when secondary average is, in terms of raw numbers, only half as important as batting average. It's because secondary average has considerably more variance than batting average; no one's ever had a .955 batting average, to the best of my knowledge. It's been 63 years since anyone hit .400; generally about two dozen players a year will have a secondary average that high.
The flip-side of that is that while no one has ever made it through a full season hitting .150, plenty of players have had a secondary average that low. The lowest secondary averages of all time (min: 450 PA):
Year Player SEC ------------------------- 1968 Hal Lanier .053 1915 Gus Getz .055 1982 Doug Flynn .056 1999 Mike Caruso .060 1976 Duane Kuiper .063As a 21-year-old rookie shortstop in 1998, Mike Caruso hit .306, and had some people predicting future greatness. That secondary average is a sign that Caruso was never destined to meet those expectations.
So, what is Sanchez's secondary average? Try .044. There have been a little over 12,000 player-seasons of 450+ PA in the modern era. Alex Sanchez is on pace to have a lower secondary average than any of them.
Even if he raises his secondary average towards the stratospheric heights of .100, he'll still have a strong claim to the title of Emptiest Batting Average ever. That's because the players at the bottom of the secondary average rankings weren't hitting an empty .330; they weren't hitting, period. When Hal Lanier had a .053 secondary average, his batting average was just .206. That's not an overrated hitter; that's a bad hitter.
One way to measure just how overrated Sanchez's batting average is, would be to compare his batting average with his secondary average. First, though, let's look at the opposite list: the players whose secondary average dwarfs their batting average by the highest amount:
Year Player SEC AVG Diff ---------------------------------------- 2001 Barry Bonds .941 .328 .613 2002 Barry Bonds .955 .370 .586 2003 Barry Bonds .831 .341 .490 1998 Mark McGwire .786 .299 .487 1920 Babe Ruth .775 .376 .400OK, so this is the same list as the first one, more or less. I just couldn't resist pointing out that Barry Bonds was underrated by his batting average more than any player in major-league history--over the course of three straight seasons in which he hit .328, .370, and .341. That's inhuman.
So where's the list of the players whose batting average towers over their secondary average by the greatest amount?
Year Player AVG SEC Diff ---------------------------------------- 2004 Alex Sanchez .359 .044 .315 1898 Willie Keeler .385 .135 .250 1915 Stuffy McInnis .314 .066 .248 1914 Stuffy McInnis .314 .071 .243 1971 Glenn Beckert .342 .108 .234 1920 Stuffy McInnis .297 .066 .231It's not even close.
Wee Willie might have Him 'Em Where They Ain't, but he wasn't hitting 'em all that far. He had 216 hits that year: one homer, two triples, seven doubles, and 206 singles. Stuffy McInnis may be the King of the Empty Batting Average; over an 11-year stretch from 1914 to 1924, the long-time first baseman for the Philadelphia A's and Red Sox hit over .290 10 times, without ever posting an OBP over .350 or a slugging average over .400.
There's one major problem with dead-ball era stats: we lack caught stealing data for many of those years, meaning that the secondary averages for many of the players in that era are artificially inflated. Keeler, for instance, stole 28 bases in 1898, and gets credit for that in the secondary average formula; we don't know how many times he was nailed trying to steal, so he doesn't get penalized for his larceny. In other words, some of the players from the dead-ball era may be even more overrated than they appear using this formula.
So let's look at the question a different way, using a different formula. A player like Sanchez, whose batting average represents most of his value, should have an OBP and slugging average only slightly higher than his batting average. We can use the ratio of OPS to batting average as a measure of this; much like a jock taking the SAT gets 400 points just for writing his name, a player starts with an OPS-to-average ratio of 2.000, because batting average is a component of both OBP and slugging average. (For you smart-alecks who want to point out that owing to sacrifice flies, OBP can be lower than batting average, I have only two words for you: shut up.)
For all my fellow comrades in the Rob Deer Fan Club, here's the list of the highest ratios of OPS to batting average in major-league history:
Year Player OPS AVG Ratio ----------------------------------------- 2001 Barry Bonds 1.379 .328 4.206 1995 Mark McGwire 1.125 .274 4.100 1998 Mark McGwire 1.222 .299 4.093 1999 Mark McGwire 1.120 .278 4.026 1991 Rob Deer .700 .179 3.918And the flip-side of the list:
Year Player OPS AVG Ratio ----------------------------------------- 1898 Willie Keeler .830 .385 2.156 1984 Kirby Puckett .655 .296 2.212 1966 Don Kessinger .608 .274 2.218 1915 Stuffy McInnis .699 .314 2.228 1968 Horace Clarke .512 .230 2.230By this measure, Sanchez loses his top spot on the list, finishing just out of the Top Five at 2.239. Which makes this the perfect time to unveil his secret weapon in the Quest for Batting Average Purity: the bunt single.
Fully 20 out of his 65 hits--over 30%--have come without a full swing of the bat. Leading the universe in bunt hits might get Sanchez a lot of press, but it actually diminishes his value, because 99% of bunt hits result in runners advancing only one base. (This explains why, in situations with a man on second base, and no one at third, Sanchez has six hits--and only four RBIs.) Giving Sanchez extra credit because he has so many bunt hits is like giving Barry Bonds extra credit because so many of his walks are intentional.
How overrated is Sanchez as a leadoff hitter? Let's put it this way: he's on pace to finish with 206 hits, and only 79 runs scored. Only three players in major-league history have tallied that many hits, and scored so few runs. And none of them batted leadoff for a team that ranked fifth in the league in runs scored, in an era of high offense.
So to answer the riddle that spawned this unholy column: does Sanchez have the emptiest batting average of all time? By some measures, yes. By some measures, not quite.
But give him time. It's still early.