Prospectus Perspective: Missing the WAR

September 14, 2010

The end of the season brings with it a lot of miserable things. It brings about the playoffs (and trust me, for a Cubs fan, that’s about as miserable as it gets), and soon thereafter the end of baseball altogether for the season (well, this may be more miserable). It also brings with it awards voting season.

Zounds! Awards voting season. Rarely is so much passion devoted to so little meaning—if by rarely you mean “every year, like clockwork.” It’s important to remember that, except nobody ever does. It’s as if every year September comes around and everyone is a tabula rasa—every argument starts over as if we’ve never been through this before.

Which would be fine, if these were interesting or stimulating debates, but most of them aren’t. Perhaps the most interesting thing you can learn from these discussions is the staggering number of people who can simultaneously believe “too much attention is paid to statistics in baseball” and “so-and-so is obviously the winner because of his impressive [HR | RBI | Win] totals.”

The awards themselves are of little help—most of them consist of only the vaguest of qualifications, so every voter is given wide latitude to define what the awards mean. And most of them are happy to explore every inch of that latitude.

And the focus of all this tempest in a tea pot? Finding the best player, or the best pitcher, or the best rookie… so on and so forth. But frankly, the gap between the best and the next best isn’t much of a gap at all. And it turns out that the smaller the gap, the more (not less) contentious things become. Large differences are easy to measure, small differences are hard.

This is something I’ve wanted to discuss ever since Murray Chass wrote about his ex-employer printing Wins Above Replacement in connection to the MVP and Cy Young voting. Very little has changed since the last time Chass made such a broadside, except now Nate Silver has a job at the New York Times and Murray Chass runs a blog.

So in a nod to that look back at Chass’ unchanging views on baseball stats, let’s look at the top 10 position players in the NL in Value Over Replacement Player:

#	Name	Team	POS	PA	VORP
1	Albert Pujols	SLN	1b	621	68.5
2	Joey Votto	CIN	1b	585	66.2
3	Carlos Gonzalez	COL	lf	562	64.3
4	Hanley Ramirez	FLO	ss	601	55.2
5	Adrian Gonzalez	SDN	1b	611	52.6
6	Troy Tulowitzki	COL	ss	448	52.4
7	Matt Holliday	SLN	lf	589	50.9
8	Ryan Zimmerman	WAS	3b	564	47.5
9	Aubrey Huff	SFN	1b	594	45.8
10	Prince Fielder	MIL	1b	630	44.9

I think if I presented you with that list of players as the “top 10 in the National League,” you’d be inclined to say—yeah, that’s reasonable. (Of course, I can come up with similarly reasonable listings by pulling up the top 10 in WARP, or the top 10 in WAR, or the top 10 in WAR–-depending on if you’re visiting FanGraphs or Baseball Reference.) But in terms of awards voting, the listing of the top 10 is less important than where those players rank in the top 10 To illustrate, those same players’ rankings in those measures:

	VORP	WARP	fWAR	rWAR
Albert Pujols	1	1	3	2
Joey Votto	2	3	2	5
Carlos Gonzalez	3	10	9	8
Hanley Ramirez	4	21	18	18
Adrian Gonzalez	5	2	7	1
Troy Tulowitzki	6	4	6	3
Matt Holliday	7	6	4	11
Ryan Zimmerman	8	13	1	6
Aubrey Huff	9	5	8	4
Prince Fielder	10	15	22	15

I expressly didn’t compute an average of each player’s ranking in each measure, because that I think would distract rather than enlighten. The point is that the ranking isn’t very helpful in the first place—they’re such fragile things. Add three runs to Carlos Gonzalez’s VORP and he leaps from third to first.

That’s because the precision with which we’re measuring is somewhat less than the differences between these players. And beyond our precision in measurement, what these metrics are is a collection of assumptions about how baseball players create value. And often two very similar metrics will have different assumptions—about positions, about parks, about how to measure defense.

So in the sabermetric approach, there is still room for dissent. We aren’t heading for Chass’ imagined future, where a computer does all the reckoning and humans are mere observers. The computer is a tabulation machine, nothing more or less—it does the rote calculation as instructed to by a thinking human being.

I think what the sabermetric approach does best is it encourages you to think through the process of evaluating players, independently of the results. That strikes me as distinct from the way a lot of awards voters do it, which is to look at players and then work their way to figuring a definition of value. Much to do has been made of Gonzalez’s home-road splits, for instance. And we know that a player’s home park can affect scoring, and that Coors Field is certainly one of those parks. But once we adjust for that (and again, there are different assumptions you can make in doing so that can affect the outcome), does the magnitude of the home-road split matter? Once you figure out these rather subtle questions, the ordinal ranking of value falls into place rather easily.

But the downside is that, in collating all these assumptions into a single number, we invite people skipping all the details and going straight to the conclusion, sometimes not caring about the operating assumptions (or understanding that they exist at all—I’m still rather astonished by people who will quote “WAR” without bothering to note what site the numbers came from). The assumptions matter—in the aggregate sometimes not a lot, but for an individual player they can make all the difference in the world. And by uncritically quoting any of these metrics without examining the assumptions you’re letting someone else do your thinking for you.

I think the worst thing that could happen is for people to start treating any “above replacement” measure the way people 50 years ago treated pitcher wins or runs batted in. That would indicate that sabermetrics won some battles but lost, so to speak, the war. Because it’s not about numbers, it’s about a way of thinking about baseball (and the world)—one that admits that there are always new things to learn and new discoveries to make.

So I implore you—get your nose out of a game and watch a spreadsheet once in a while. I don’t mean to look at the results—I mean examine the process. Don’t think about what the numbers say, think about why the numbers are what they are and what assumptions you have to make to get there. Conclusions are boring and sometimes more final than they ought to be. It’s in figuring out how we come to those conclusions that we end up learning something meaningful about baseball.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Colin Wyers

Latest Articles

You need to be logged in to comment. Login or Subscribe

rawagman

9/14

Thanks for this - once again, brilliant.

Reply to rawagman

TangoTiger1

9/14

I agree wholeheartedly with Colin. When I was developing the framework for WAR, it was all about breaking it down by components, so that we can see how it works, and, if one so chooses, replace the calculations of one or more components with other sets of calculations. WAR is a framework that is easy to follow and accept.

As an example, look at the way Fangraphs lays it out for Ryan Zimmerman.

We see that he's +31 runs above average in offense, +16 runs above position average in fielding, +19 runs for playing time, +2 runs for his position, for a total of +67 runs (rounding issues notwithstanding). The conversion to wins makes it +6.9 wins above replacement according to Fangraphs' implementation of the WAR framework (fWAR).

Now, suppose you don't like the fact that fWAR uses UZR. You are a Total Zone maven. Well, guess what, you simply move one number in, and move one number out. It doesn't invalidate the rest of the metric.

Suppose you think replacement level is set too high, or too low. Well, change that too. Suppose you think Linear Weights makes no sense, and prefer BaseRuns. Well, go ahead, knock yourself out. Suppose you think that 3B is easier to play than 2B. Change that too.

The important point is that you have a FRAMEWORK. Create that, adopt that, follow it. That's WAR. Now, once you have a framework, you need an implementation. You can be lazy and let Fangraphs (fWAR) and Baseball Reference (rWAR) figure that out for you. Or, gulp, you can do as Colin says here and think for yourself.

What you can't do is just throw your arms up and say the solution is too difficult AND THEN proceed to give us your opinion as to who is the most outstanding player! If it's too hard to find the solution, then your opinion becomes irrelevant. It's a bullsh!t opinion, because it's a summary opinion without evidence.

So, this is what sabermetrics is about, the journey, the thought process, the critical thinking. Do it, because we can never have enough people doing this.

Reply to TangoTiger1

Zebs335

9/14

I would LOVE if you made one of your spreadsheets available to look at. Absolutely freaking love it.

Reply to Zebs335

pferrington

9/14

Can we take this out from behind the wall? I would love ot link to this on a basketball fan site where there was some heated discussion about this sort of stat. It isn't basketball related but this is a good reminder of what stats are and aren't and I think it would a good choice for you all to make more widely available for linking.

Reply to pferrington

lucasjthompson

9/14

"But the downside is that, in collating all these assumptions into a single number, we invite people skipping all the details and going straight to the conclusion, sometimes not caring about the operating assumptions..."

That's fine for some people. But look, I don't know how TVs work, and I don't especially want to know. But I do want to buy the best TV. It's useful if some magazine of TV nerds that I feel I can trust distills TV goodness into a single number. Even if I act on their advice, don't be too shocked if I can't tell you exactly how they do it.

Reply to lucasjthompson

joelefkowitz

9/14

But don't complain when the things that matter most to you are price, sleekness, and longevity, and the single number of "TV goodness" directed you instead to buy an expensive bulky TV that will burn out in a year because it has the best picture quality, latest features, and energy savings.

Reply to joelefkowitz

ScottBehson

9/14

Luke- right on.

I consider myself an informed non-expert in sabremetrics, and do not have the time or inclination to really become an expert, with all the work that would entail.

However, I do LOVE reading, evaluating the work of experts and knowing enough to have a not-completely-dumbass opinion on these matters. That's why I subscribe to BP rather than entering BP Idol.

Reply to ScottBehson

beeker99

9/16

I see your point, but its really an apples to oranges comparison you are making.

You don't know how a TV works, but all you need to know that it does work is to turn it on.

WAR or WARP are completely different. There is no "on/off" button that will tell you, one way or the other, whether they "work". They might be wrong, and we don't necessarily know that yet.

Not only that, you can't authoritatively measure their various characteristics, like you can with a TV. If a TV has 3 HDMI ports, there's no disputing that. There's no disputing the resolution. There's no disputing the refresh rate, the color clarity, the viewing angle. You measure certain characteristics on well-known, established scales. There are no such scales for the components of WAR/WARP. Is TZ or UZR or FRAA/FRAP better? How about the baserunning metrics? The offensive metrics? Which park factors are right? Etc. Its a far cry from being able to say, with certainty, "The resolution of this TV's screen is 1080p and it has 5 HDMI ports."

That's why, with WAR/WARP you HAVE TO know at least a little something about how they purport to work, or at least the underlying assumptions and context of their components.

Reply to beeker99

studes

9/14

Exactly right, Colin. Well said.

Reply to studes

craigburley

9/14

What many of us have been saying for years. This was terrific. I particularly liked "So in the sabermetric approach, there is still room for dissent."

I also particularly liked "But the downside is that, in collating all these assumptions into a single number, we invite people skipping all the details and going straight to the conclusion... by uncritically quoting any of these metrics without examining the assumptions youâ€™re letting someone else do your thinking for you."

Let a thousand flowers bloom; let a thousand points of thought contend. But, you know, *really*.

Reply to craigburley

beeker99

9/16

The second quote is perhaps my favorite part, too. Its something (in not quite the same context) Steven Goldman has been hammering home for years, and I think its something that can't be said enough.

Bravo, Colin. Bravo.

Reply to beeker99

Tarakas

9/14

One of the better pieces I have read this year in BP.

The longer I live with baseball statistics, the more I tend to pay less attention to their exact values. I have little faith in long articles and complex formulas that conclude some player is .3 runs better than another player is. As you note, such small differences may well be beyond our ability to measure.

To me, statistics have reshaped my opinions of what constitutes value in baseball players (as a young Cardinals I was saddened to realize that Willie McGee's empty .292 average was not really very valuable). But as far as fine details go, they are not much use.

Reply to Tarakas

hjw099

9/14

Well done article. I have noticed as well that OPS is quickly becoming the new RBI total for TV and WAR the new RBI total for the internet. Sure it takes some mathematical legwork to understand these measures but one shouldn't quote the various sabremetric numerical conclusions any more than they should those of complex game theory results without knowing why and how they exist.

Reply to hjw099

TangoTiger1

9/15

Another thing that I love about WAR is that you can put all players, pitchers, non-pitchers, starters, relievers, etc, and list them on one scale, like say on the 1994 Expos. You might get a surprise here or there, but overall, it conforms to expectations. And if it does that, it's alot easier to trust for teams you are not too familiar with.

Reply to TangoTiger1

jtrichey

9/15

Am I mistaken in thinking that WARP was a Baseball Prospectus stat first?

Reply to jtrichey

dcarroll

9/15

Excellent article, Colin.

Reply to dcarroll

Prospectus Perspective: Missing the WAR

Thank you for reading

Latest Articles

The Call-Up: Andy Pages $

Searching for Hidden Homers $

Five & Dive Episode 364: It’s actually Jared Triolo

First-Pitch Swinging is Good, but for Who? $

TA: Marlins Get Less Meyer-ed, More Mired; Rafaela Extension; One Million Injuries $

Colin Wyers

Latest Articles

The Call-Up: Andy Pages $

Searching for Hidden Homers $

Five & Dive Episode 364: It’s actually Jared Triolo