OPS, I Did it Again: Playing with Old Tools

March 26, 2010

So, how good is good enough, exactly? A recent blog post on ESPN looked at how OPS fares at explaining team runs. It’s a rather oft-repeated argument, to be sure—we can simply let it stand in for any number of articles in this vein:

Sabermetricians feel OPS does not go deep enough. They prefer to use a more in-depth breakdown, such as replacement-player analysis. Clearly, there is no right or wrong philosophy. Simplicity is always good, but so is validity. Something easy to understand works, but at what cost to the statistical significance?

The issue of simplicity vs. statistical significance is not a problem specific to baseball. The answer most have come up with is known as Occam's razor, which states "entities must not be multiplied beyond necessity."

In this case, if there is something uncomplicated that does just as good of a job explaining offense as a more complex one does, we should use the simpler stat.

He shows that OPS does a pretty good job of explaining team runs scored, with a correlation of 947. So, does that mean that everyone should forget about the work done on estimating runs since Palmer and Thorn and go back to OPS?

Certainly the sort of testing they’ve done is fairly common. But does that make it valid? And does that validity extend itself to individual hitters, which is, of course, what we’re really concerned with here? (After all, we already know team runs scored—we don’t need to look at OPS to evaluate teams as a proxy for run scoring.)

Checking the spread

The issue at hand here is that the spread of talent between teams is much, much smaller than the spread of talent between players. (When we speak of teams, what we typically mean are seasonal totals. You can aggregate at a more granular unit of run scoring, like the game or inning level.) Take, for instance, the standard deviations of various stats (per plate appearance) from ’93 to ’09:

	BB	H	2B	3B	HR
Players	0.038	0.041	0.016	0.005	0.018
Teams	0.010	0.009	0.004	0.001	0.005

Simply put, there is a much greater differentiation between players than there is between teams. So saying that OPS is able to greatly predict the variation in team run scoring at the seasonal level actually doesn’t tell us all that much at how good it is at predicting the variation in a player’s contribution to run scoring—because the variation in player run-scoring abilities is much, much greater than the spread at the team-season level.

Is OPS really simple?

Put that nugget aside for a little bit—we’ll return to it in a moment. Let’s swing around for a moment to the contention that OPS is simpler than other, modern run-estimation methods.

The formula for OPS is, of course, OBP plus SLG. Written out in long form, that becomes:

This involves adding two fractions with different denominators—the bane of math teachers everywhere. We can combine the two, although it looks rather messy:

OPS is simple so long as you hold to OBP+SLG—once you try to dig deeper into it, the way it’s constructed actually makes it rather difficult to figure out how it’s working under the hood. But, with a little bit of work, we can plug values into OPS and derive the value of an extra walk, single, etc., for a typical batting line in 2009. I’ve "normalized" those values so that a single is equal to one. Presented alongside are similar values based upon the observed change in run expectancy—in other words, the average number of team runs added by that event.

	OPS	RE
BB	0.51	0.65
1B	1.00	1.00
2B	1.85	1.62
3B	2.70	2.18
HR	3.55	3.02

The OPS values are close to our run expectancy values, to be sure. But there’s a rather severe bias toward extra base hits and away from the walk.

Looking at Individuals

The reason OPS "works" at the team level in spite of this flaw is the fact that there is very little differentiation between teams when it comes to things like walk and home run rates. So in the narrow band of difference between teams, OPS "works" because the low value for a walk is offset by the high value for extra-base hits.

But no modern baseball team is ever going to walk as infrequently as Jeff Francoeur, for instance—OPS is likely to overrate him because it overvalues his power and doesn’t capture the effects of his low walk rate. On the other side, you have batters like Rafael Furcal, with good walk rates but low power numbers, that OPS will likely underrate. (This is one reason Furcal can put up a better EqA than Francoeur in ’09, despite Francoeur having the higher OPS.)

Let’s take it a step further, and look at how a player’s walk rate changes the relative values provided by OPS. Remember, walks are included in the denominator of OBP but not SLG—a change in walk rates changes the relationship of OBP to SLG.

We’ll look at how a player with otherwise average stats, but a walk rate of half or one-and-a-half of the league average, changes the relationship between inputs in OPS. This is a player whose walk rate is relatively common in MLB—just a little over one standard deviation away from average—but who is almost nonexistent at the team level, well over four SDs away from average for teams.

	0.5	Average	1.5
BB	0.52	0.51	0.49
1B	1.00	1.00	1.00
2B	1.82	1.85	1.88
3B	2.65	2.70	2.76
HR	3.47	3.55	3.64

So as the walk rate goes up, two things happen—the value of the walk drops (slightly), and the value of extra-base hits rises. The reverse happens as the walk rate drops. This isn’t by design—if you were sitting down trying to figure out the relative value of events, you wouldn’t do it this way. It’s an accident, caused by Henry Chadwick’s bizarre anti-walk feelings back in the 1800s.

And this is the sort of thing that doesn’t show up when you simply test against team runs, because teams rarely exist at extremes that are actually quite common among individual players. Consider it a case of what’s known as the ecological fallacy, where assuming that things true of a group are also true of individual players. For a properly-designed run estimator, this isn’t a problem.

But when a run estimator is based upon assumptions that are true at the team level, but not at the player level—it may be unfair to say that OPS is designed this way, but it certainly works that way; run estimators derived from a regression on team runs scored are even more guilty of this—then you run into these sorts of problems.

Does this mean that OPS is a bad run estimator? It depends on what you compare it to. It’s an improvement over, say, batting average. Of course, at this point, that’s setting a rather low bar. If you care about comparing specific ballplayers with differing skill sets, using OPS can lead you to the wrong conclusion.

Some technical notes

One standard deviation, for those of you wondering, means that 68 percent of the sample being measured falls within that far of the average. (This is assuming a normal distribution—most baseball stats are at least a good approximation of this.) In this case, I’ve weighted the players and team-seasons by plate appearances. So if one wants to be pedantic, one should say, for instance, "68 percent of plate appearances went to players with a walk rate within 0.038 of the league average."

As for run expectancy, we can measure how many runs, on average, are scored between each plate appearance and the end of the inning. Typically, we then group plate appearances by the number of outs at the start of the plate appearance and the position of the runners on base. (This is usually referred to as the base-out state.) The average number of runs that score given a base out state is called the run expectancy.

When we talk about the change in run expectancy for an event, we look at the change in the base-out state and subtract the starting run expectancy from the final run expectancy (as well as runs that score on the play).

—

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Colin Wyers

Latest Articles

You need to be logged in to comment. Login or Subscribe

AutomatedTeller

3/26

Can you defend the statement that "This is assuming a normal distributionâ€”most baseball stats are at least a good approximation of this." I believe that, in fact, most baseball stats are an approximation of one tail of a normal distribution curve, not a normal curve at all, for nearly all stats, there are more players below the average than above it.

Look at wins - there are a lot more pitchers, with 0 wins than 1, more with 1 than 2, more with 2 than 3. At some point, you start getting some fluctuation - in any given year, you may have more pitchers with 13 wins than 12, for instance.

Reply to AutomatedTeller

cwyers

3/26

Sure.

You're right that there are more players below average than above it. But it turns out that a player's ability is correlated with his playing time - better players get more playing time.

So while there are more below-average players, the amount of PAs given to below-average players is about equal to the playing time given to above-average players. So when you weight for playing time, you get an approximation of a normal distribution.

(And obviously the more data you have, the more "normal" it looks - the law of large numbers at work.)

But if you ignore playing time considerations, yeah, the distribution of talent in MLB actually resembles a bimodal distribution - you have one peak around the league average, and one (larger) peak around replacement level.

Reply to cwyers

rawagman

3/26

How would a stat wherein slugging used PAs as a denominator help us here? I've been thinking about this a lot lately - Why not use a stat that looks at a given player's propensity to reach whatever base, by whatever means, in a given plate appearance? Would this give us a more accurate representation of his value than OPS? If not, why?

Reply to rawagman

cwyers

3/26

You mean like EqA?

Reply to cwyers

rawagman

3/26

Probably - but not scaled.

Reply to rawagman

cwyers

3/26

Can you clarify what you mean here?

Reply to cwyers

rawagman

3/26

In a nutshell, TB+BB+HBP/PA.

Reply to rawagman

cwyers

3/26

Well in that case, the weighting looks like:

BB: 1
1B: 1
2B: 2
3B: 3
HR: 4

In that event you're overvaluing extra-base events and the walk, and rather substantially undervaluing the single. Realistically, you've taken almost all the problems of OPS and made them worse, with little to offset them.

Reply to cwyers

TGisriel

3/26

Don't you meant TAv?

Reply to TGisriel

cwyers

3/26

Yes. I'm going to have to write that on a blackboard 100 times this afternoon.

Reply to cwyers

redsoxtalk

3/26

Kudos, this is a great read for those unfamiliar with the limitations of OPS. Thanks, Colin!

Reply to redsoxtalk

SFiercex4

3/26

Classic Colin piece, thanks for the information as always.

One question. I might have misunderstood your paragraph explaining this, but are both the OPS weights and RE values for events normalized so that singles are worth 1 (i.e. are they both relative to the value of a single)?

Reply to SFiercex4

mickeyg13

3/26

Yes, he did normalize things that way:
"Iâ€™ve "normalized" those values so that a single is equal to one."

Great work Colin. I often encounter people who criticize my use of "made-up" stats while defending their love of OPS. I love how you explain how it's not really even that simple when you look at it.

Reply to mickeyg13

colintj

3/27

sweet

Reply to colintj

Tarakas

3/27

I think to some intent you are (probably intentionally) missing the point about the complexity of OPS. People know what On Base Percentage and Slugging Average mean, and both SLUG and OBP numbers are widely available without additional computation. If on the scoreboard I see a player has a .350 OBP and a .455 slugging, I can easily compute his OPS in my head. Given raw numbers, I can compute it without having to go find a formula or a spreadsheet.

True Average, Runs Created, etc., are not the result of two widely understood and reported inputs. I can't look at the scoreboard and immediately compute it. In fact, given raw numbers, I can't compute it without going and finding the formula, and probably using a spreadsheet.

That's what we're talking about, as far as complexity goes.

Reply to Tarakas

AutomatedTeller

3/27

The big problem I have with OPS is that it has no natural meaning. What does 1 point of OPS mean? OBP is simple, conceptually - the % of time a player gets on base. SLG isn't all that simple, but it's definable.

OPS, though - it's just two values added together.

that said, there is value to it - clearly, a .900 OPS is a lot better than a .700 OPS - so it's good for crude comparisons. It's not at all clear whether a .710 OPS is better than a .705 OPS, though, without looking at the underlying stats.

Reply to AutomatedTeller

yadelman

3/28

I made a very similar comment to the article in ESPN (see the post from yadelman). Guys like Nick Johnson are really undervalued by OPS. Of course your response is much more rigorous and thorough.
Do you have an answer to the quesiton that I posed there? Namely, what is the correlation between TAv and run scored on the team level?

Reply to yadelman

OPS, I Did it Again: Playing with Old Tools

Thank you for reading

Latest Articles

The Stash List ’24: Week Four $

Box Score Banter: No Exit B

MLU: Triantos Tries on Some Power $

Speed, Spin, and Snap $

Pat Murphy, Wade Miley, and the Ship of Theseus $

Colin Wyers

Latest Articles

The Stash List ’24: Week Four $

Box Score Banter: No Exit B

MLU: Triantos Tries on Some Power $