February 19, 2004
Baseball Prospectus Basics
Measuring OffenseBefore delving into those harrowing inhabitants of the Baseball Prospectus statistics page like VORP, RARP, EqA or any other acronym that sounds like a debutante sneezing or something uttered on Castle Wolfenstein circa 1986, it's worth asking: What's wrong with those comfy traditional offensive measures like RBI, batting average and runs scored?
This Baseball Prospectus Basics column is going to address that question and, ideally, demonstrate why the traditional cabal of offensive baseball statistics tell only a piece of the story. Later, someone smarter (but shockingly less handsome) than I will take you on a tour of the more advanced and instructive metrics like the aforementioned VORP, RARP and EqA. For now, though, we'll keep our focus on why we need those things in the first place.
Many of the stats you encounter in mainstream baseball circles are what we call "counting stats." That is, they count things: 23 homers, 107 RBI, six triples, etc. This may sound painfully obvious, but the more a hitter plays in a given season, the higher his counting stats are likely to be. Some counting stats, like RBI and runs scored, are highly team and batting-order dependent. A cleanup hitter logging 600 plate appearances in a potent lineup must work very hard not to rack up at least 100 RBI. Whereas a leadoff hitter on an otherwise weak offensive team won't crack the 100-RBI mark no matter how effective he is. If a superior player is surrounded by weak hitters, it's entirely possible that he'll cash in on a much greater percentage of his RBI opportunities and still have a lower RBI total than a lesser player in a stronger lineup.
The thing to understand about counting stats is that, absent supporting information, they're really only useful at the margins. That's to say, it's hard to rack up 140 RBI and somehow stink. Conversely, it's difficult to log a season's worth of plate appearances, total 40 RBI and somehow be any good.
The flip side of this is that it's entirely possible, especially in eras conducive to run scoring, to break the vaunted 100-RBI barrier and still be an ineffective player. It's debatable what the worst 100-RBI season is, but Ruben Sierra in 1993 may be hard to beat. More later on why he was a lousy player that season.
So, highly context-dependent counting stats like RBI and runs scored can be inflated or deflated by a panoply of factors that have nothing to do with that hitter's true abilities. One of the prevailing missions of sabermetrics is to evaluate the player in a vacuum: What's he doing independently of his teammates and environment? Using only RBI or runs scored to judge a player or to frame an argument at the tavern is a fool's errand.
Home runs, since they have almost nothing to do with a hitter's teammates, are more reliable than RBI, but they're still not an ideal metric. It's fully possible for a player with fewer home runs than another to be a far superior player. How's that? Again, it's context. Home runs (and singles, doubles, triples, etc.) aren't lineup- and teammate-dependent like RBI and runs scored, but, like any other unadjusted statistic, they are dependent upon the ballpark and, when comparing players across history, the era (more on park and league effects later in this series).
Another factor to consider when comparing hitters is the notion of positional scarcity. This is the idea that it's easier to find good hitters at the less demanding defensive positions than it is at those positions that require a great deal of skill with the glove. The less demanding positions are the corner slots: left field, right field, third base and first base. The more exacting positions are those up the middle: catcher, shortstop, second base and center field. Up-the-middle defenders handle more balls and cover more ground than corner players, or, in the case of the catcher, they have defensive duties distinct from those who man other positions.
So if a first baseman and a shortstop have identical offensive statistics and equal defensive abilities relative to their positions, who's the better player? The shortstop, because the offensive-productivity bar for shortstops is notably lower than it is for first baseman, since it's far easier to find a good-hitting first baseman than it is a good-hitting shortstop. Generally, from highest level of positional scarcity to least, the positions go shortstop, catcher, second baseman, center fielder, third baseman, right fielder, left fielder and first baseman. Those can vary from year to year, but most seasons up-the-middle defenders who can hit will always be rarer beasts than corner players who can hit. This is why Alex Rodriguez is such a special player: He hits like an All-Star first baseman, yet he plays the most challenging defensive position on the diamond, and does it well to boot. Again, many stats you'll find on this site are already adjusted to reflect the demands of the position.
And what of batting average? Well, it's a percentage stat and not a counting stat, so it has a somewhat different set of concerns and caveats. First, it's subject to sample-size errors. To provide an extreme example, a hitter who goes one for three on Opening Day and one who plays the entire season going 200 for 600 will both be hitting .333 when you check the box scores; however, it's the latter hitter whose .333 average is more legit. Why? Because it's been borne out over time, whereas the former hitter may be a banjo-hitting fringe player who had a lucky day. (As an aside, counting stats are also prone to a different kind of sample-size error. It's the dread "on pace to" statistical distraction. When some unlikely player is, say, leading the league in RBI after the first two weeks of the season, we'll hear how he's "on pace" to put up 380 RBI on the season or some such nonsense.) Basically, if a hitter is doing something that's completely out of step with the rest of his career, you should be skeptical and demand a larger sample before you buy into those reports that his stroke has been tweaked or how he's seeing the ball better since he started drinking liver smoothies. Sample size is a major principle to grasp, and you'll never look foolish by being roundly unmoved by what a player does in the first few weeks of the season.
That's not all that's wrong with batting average. As much as the .300 hitter is a lionized, what does that really tell us about a player? It tells us he got a hit of some kind in 30% of his at-bats. We have no idea what kinds of hits he got, and we have no idea how he fared in terms of reaching base by other means. We don't even know how many times he came to the plate.
When dealing with percentage statistics, having at least a rough idea of the number of plate appearances is essential. And as far as batting average goes, you can tell much more about a player if his average (AVG) is presented along with his on-base percentage (OBP) and slugging percentage (SLG).
OBP is how often a player reached base via hit, walk or hit by pitch; among traditional offensive statistics, it's the most important. The higher a player's OBP, the less often he's costing his team an out at the plate. Viewed another way, 1-OBP = out %. In other words, OBP subtracted from the number 1 will yield the percentage of how often a hitter comes up to bat and uses up one of his team's 27 outs for that game. A player can play all season, rack up impressive counting stats and still be using up far too many outs.
SLG measures a player's power, albeit not perfectly. It places more value on extra-base hits than it does on singles, and what you're looking at when you see a hitter's SLG is the total bases he averages per at-bat. For example, a player with a .500 SLG averages one-half total base per at-bat.
You'll often see AVG, OBP and SLG presented in the following format: .300/.400/.500, where .300 is the player's AVG, .400 is the player's OBP and .500 is the player's SLG. Another statistic you can glean from this "holy trinity" is Isolated SLG, which is the player's SLG minus his AVG. This expresses how much "raw" power he's producing by focusing solely on his extra-base hits. So of the trinity, AVG, which by far the most popular and heavily relied upon, really provides you with the least important information. It's good info in the presence of OBP and SLG, but by itself it's almost as useless as RBI.
What's a good OBP and SLG? Well, as we've already mentioned, offensive statistical standards depend greatly upon a player's era, home ballpark and defensive position. Generally speaking, if a player today puts up a .360 OBP and .500 SLG, he's doing his job. If he's a shortstop in Dodger Stadium with these numbers (and with an ample number of plate appearances, of course), he's an MVP candidate; if he's a first baseman in Denver with these numbers, he's nothing special. Again, context is where the rubber hits the road. (We discuss OPS, the stat that adds OBP + SLG, later in this series.)
Remember our pal Ruben Sierra and his 101 RBI from 1993? Let's go back and look at him, knowing what we know now. Yeah, there's his 101 RBI. But that season his trinity numbers were .233/.288/.390. Those are ugly, and they get even uglier when you recall that he split his time between DH and the outfield corners. That means he had little defensive value, and, hence, his offensive standard was higher than that of most players. A .288 OBP and .390 SLG are patently unacceptable for a corner defender, no matter how many RBI he racks up.
So, in summary:
And that's that. Like I said, there's a whole other world of statistics out there besides the ones that have been foisted upon you since you bought your first set of Topps. Now that you know what's wrong with traditional offensive statistics, you're ready to arm yourselves with the tools of state-of-the-art baseball analysis.