So, how good is good enough, exactly? A recent blog post on ESPN looked at how OPS fares at explaining team runs. It’s a rather oftrepeated argument, to be sure—we can simply let it stand in for any number of articles in this vein:
Sabermetricians feel OPS does not go deep enough. They prefer to use a more indepth breakdown, such as replacementplayer analysis. Clearly, there is no right or wrong philosophy. Simplicity is always good, but so is validity. Something easy to understand works, but at what cost to the statistical significance?
The issue of simplicity vs. statistical significance is not a problem specific to baseball. The answer most have come up with is known as Occam's razor, which states "entities must not be multiplied beyond necessity."
In this case, if there is something uncomplicated that does just as good of a job explaining offense as a more complex one does, we should use the simpler stat.
He shows that OPS does a pretty good job of explaining team runs scored, with a correlation of 947. So, does that mean that everyone should forget about the work done on estimating runs since Palmer and Thorn and go back to OPS?
Certainly the sort of testing they’ve done is fairly common. But does that make it valid? And does that validity extend itself to individual hitters, which is, of course, what we’re really concerned with here? (After all, we already know team runs scored—we don’t need to look at OPS to evaluate teams as a proxy for run scoring.)
Checking the spread
The issue at hand here is that the spread of talent between teams is much, much smaller than the spread of talent between players. (When we speak of teams, what we typically mean are seasonal totals. You can aggregate at a more granular unit of run scoring, like the game or inning level.) Take, for instance, the standard deviations of various stats (per plate appearance) from ’93 to ’09:

2B 
3B 

Players 
0.038 
0.041 
0.016 
0.005 
0.018 
Teams 
0.010 
0.009 
0.004 
0.001 
0.005 
Simply put, there is a much greater differentiation between players than there is between teams. So saying that OPS is able to greatly predict the variation in team run scoring at the seasonal level actually doesn’t tell us all that much at how good it is at predicting the variation in a player’s contribution to run scoring—because the variation in player runscoring abilities is much, much greater than the spread at the teamseason level.
Is OPS really simple?
Put that nugget aside for a little bit—we’ll return to it in a moment. Let’s swing around for a moment to the contention that OPS is simpler than other, modern runestimation methods.
The formula for OPS is, of course, OBP plus SLG. Written out in long form, that becomes:
This involves adding two fractions with different denominators—the bane of math teachers everywhere. We can combine the two, although it looks rather messy:
OPS is simple so long as you hold to OBP+SLG—once you try to dig deeper into it, the way it’s constructed actually makes it rather difficult to figure out how it’s working under the hood. But, with a little bit of work, we can plug values into OPS and derive the value of an extra walk, single, etc., for a typical batting line in 2009. I’ve "normalized" those values so that a single is equal to one. Presented alongside are similar values based upon the observed change in run expectancy—in other words, the average number of team runs added by that event.

RE 

0.51 
0.65 

1.00 
1.00 

2B 
1.85 
1.62 
3B 
2.70 
2.18 
3.55 
3.02 
The OPS values are close to our run expectancy values, to be sure. But there’s a rather severe bias toward extra base hits and away from the walk.
Looking at Individuals
The reason OPS "works" at the team level in spite of this flaw is the fact that there is very little differentiation between teams when it comes to things like walk and home run rates. So in the narrow band of difference between teams, OPS "works" because the low value for a walk is offset by the high value for extrabase hits.
But no modern baseball team is ever going to walk as infrequently as Jeff Francoeur, for instance—OPS is likely to overrate him because it overvalues his power and doesn’t capture the effects of his low walk rate. On the other side, you have batters like Rafael Furcal, with good walk rates but low power numbers, that OPS will likely underrate. (This is one reason Furcal can put up a better EqA than Francoeur in ’09, despite Francoeur having the higher OPS.)
Let’s take it a step further, and look at how a player’s walk rate changes the relative values provided by OPS. Remember, walks are included in the denominator of OBP but not SLG—a change in walk rates changes the relationship of OBP to SLG.
We’ll look at how a player with otherwise average stats, but a walk rate of half or oneandahalf of the league average, changes the relationship between inputs in OPS. This is a player whose walk rate is relatively common in MLB—just a little over one standard deviation away from average—but who is almost nonexistent at the team level, well over four SDs away from average for teams.

0.5 
Average 
1.5 
0.52 
0.51 
0.49 

1.00 
1.00 
1.00 

2B 
1.82 
1.85 
1.88 
3B 
2.65 
2.70 
2.76 
3.47 
3.55 
3.64 
So as the walk rate goes up, two things happen—the value of the walk drops (slightly), and the value of extrabase hits rises. The reverse happens as the walk rate drops. This isn’t by design—if you were sitting down trying to figure out the relative value of events, you wouldn’t do it this way. It’s an accident, caused by Henry Chadwick’s bizarre antiwalk feelings back in the 1800s.
And this is the sort of thing that doesn’t show up when you simply test against team runs, because teams rarely exist at extremes that are actually quite common among individual players. Consider it a case of what’s known as the ecological fallacy, where assuming that things true of a group are also true of individual players. For a properlydesigned run estimator, this isn’t a problem.
But when a run estimator is based upon assumptions that are true at the team level, but not at the player level—it may be unfair to say that OPS is designed this way, but it certainly works that way; run estimators derived from a regression on team runs scored are even more guilty of this—then you run into these sorts of problems.
Does this mean that OPS is a bad run estimator? It depends on what you compare it to. It’s an improvement over, say, batting average. Of course, at this point, that’s setting a rather low bar. If you care about comparing specific ballplayers with differing skill sets, using OPS can lead you to the wrong conclusion.
Some technical notes
One standard deviation, for those of you wondering, means that 68 percent of the sample being measured falls within that far of the average. (This is assuming a normal distribution—most baseball stats are at least a good approximation of this.) In this case, I’ve weighted the players and teamseasons by plate appearances. So if one wants to be pedantic, one should say, for instance, "68 percent of plate appearances went to players with a walk rate within 0.038 of the league average."
As for run expectancy, we can measure how many runs, on average, are scored between each plate appearance and the end of the inning. Typically, we then group plate appearances by the number of outs at the start of the plate appearance and the position of the runners on base. (This is usually referred to as the baseout state.) The average number of runs that score given a base out state is called the run expectancy.
When we talk about the change in run expectancy for an event, we look at the change in the baseout state and subtract the starting run expectancy from the final run expectancy (as well as runs that score on the play).
—
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
Look at wins  there are a lot more pitchers, with 0 wins than 1, more with 1 than 2, more with 2 than 3. At some point, you start getting some fluctuation  in any given year, you may have more pitchers with 13 wins than 12, for instance.
You're right that there are more players below average than above it. But it turns out that a player's ability is correlated with his playing time  better players get more playing time.
So while there are more belowaverage players, the amount of PAs given to belowaverage players is about equal to the playing time given to aboveaverage players. So when you weight for playing time, you get an approximation of a normal distribution.
(And obviously the more data you have, the more "normal" it looks  the law of large numbers at work.)
But if you ignore playing time considerations, yeah, the distribution of talent in MLB actually resembles a bimodal distribution  you have one peak around the league average, and one (larger) peak around replacement level.
BB: 1
1B: 1
2B: 2
3B: 3
HR: 4
In that event you're overvaluing extrabase events and the walk, and rather substantially undervaluing the single. Realistically, you've taken almost all the problems of OPS and made them worse, with little to offset them.
One question. I might have misunderstood your paragraph explaining this, but are both the OPS weights and RE values for events normalized so that singles are worth 1 (i.e. are they both relative to the value of a single)?
"Iâ€™ve "normalized" those values so that a single is equal to one."
Great work Colin. I often encounter people who criticize my use of "madeup" stats while defending their love of OPS. I love how you explain how it's not really even that simple when you look at it.
True Average, Runs Created, etc., are not the result of two widely understood and reported inputs. I can't look at the scoreboard and immediately compute it. In fact, given raw numbers, I can't compute it without going and finding the formula, and probably using a spreadsheet.
That's what we're talking about, as far as complexity goes.
OPS, though  it's just two values added together.
that said, there is value to it  clearly, a .900 OPS is a lot better than a .700 OPS  so it's good for crude comparisons. It's not at all clear whether a .710 OPS is better than a .705 OPS, though, without looking at the underlying stats.
Do you have an answer to the quesiton that I posed there? Namely, what is the correlation between TAv and run scored on the team level?