April 29, 2013
Starling Marte, Bryce Harper, and the Limits of WARP
Earlier today, CBS baseball writer Jon Heyman asked a rather innocuous question on Twitter.
It's a hardly a clown question, bro.
Heyman is citing the Baseball-Reference version of WAR, and as of this writing, the two men are rated as equal in their performance so far in 2013. Our own WARP has Harper at 1.9 WARP, with Marte at 0.8. That's not a knock on Marte. Harper has just been that good so far this year. The two systems use different inputs, and in this case, they spit out different results.
The answer to Heyman's question is that while Harper is the clearly superior offensive player (.361/.444/.756 for Harper; .331/.393/.444 for Marte as of Monday afternoon), Marte rates rather high on the defensive metric used by Baseball Reference (he is tied with Norichika Aoki for the best defensive player in MLB), while Harper rates as merely average. On FRAA (Fielding Runs Above Average), which we use at BP, Marte doesn't do as well, and as such, doesn't rate as highly as Harper overall.
One advantage of the WARP framework is that it incorporates defense into the overall understanding of a player's value. Defense matters. But how good are we at measuring defense?
There are two parts to that answer. One is whether our defensive metrics have construct validity. Are they measuring what we think they are? According to Baseball Reference (not to pick on them, the same question could and should be asked of all metrics), Marte has saved 8 runs above average in the field. Has he really? That's construct validity. The truth is that fielding data are imprecise, often low resolution, subject to a lot of biases, and in some cases, not publicly available for inspection. It's not to say that all fielding metrics are garbage, but they aren't gospel either.
The other issue is whether our defensive metrics have good reliability. If we assume that we really are accurately measuring fielding, the next question is whether one month of a season is enough to tell whether it accurately predicts what's going to happen over the rest of the season. I'd argue that the answer is no, particularly for outfielders. Starling Marte very likely has gotten lucky over a small sample size. That's what happens with metrics that have low reliability. He may very well be a good outfielder, but we just can't trust that his early season performance accurately reflects his overall talent level at this point.
Heyman points out a real issue with WARP, and with any measure. It is subject to the quality of the data that back it up, the assumptions that underlie it, and the inherent randomness in the game of baseball. WARP is a fantastically useful tool, but it has flaws. Being imperfect isn't the same thing as being pointless. It just means that, despite the fact that it's one of the best ways of measuring player value that we have available, there's a lot more work to do.