BP Comment Quick Links


September 5, 2013 Reworking WARPThe Uncertainty of Offense, Part TwoLast week, we talked a bit about measuring the uncertainty in our estimates of offense. I hinted at having a few additional ideas on quantifying the uncertainty involved. Let’s examine two different routes we could take, both of which would offer less uncertainty than what we quantified last week. When we did our estimates of uncertainty last week, we compared the linear weights value of an event to the actual change in run expectancy, given the baseout states before and after the event. What we can do instead is prepare linear weights values by baseout state and find the standard error of those instead. Looking at official events:
The high value of the out is somewhat misleading—that includes things like the reaching on error, which we separate out in our current linear weights implementation. But here, the source of error comes from the potential differences in baserunner advancement. It makes a certain kind of sense—an Adam Dunn double and a Juan Pierre double present different opportunities for a runner at first to advance, for instance. (A Juan Pierre double is probably closer to an Adam Dunn single, and an Adam Dunn double is probably a good chance at a triple for Juan Pierre.) So now you’re reduced your estimated error without changing your run estimates! Congratulations. The downside is that you’re now measuring your error against something that I suspect most people have a hard time understanding. You’re getting pretty far into the weeds of hypothetical runs, rather than measuring against a good proxy for actual runs, like what we did last week. Another thing we can do is look at the change in run expectancy for each event. This isn’t a particularly new idea (Gary R. Skoog came up with it in 1987, calling it the Value Added approach[i]), although it hasn’t been especially popular because of the playbyplay data needed to compute. Let’s pull up the same run expectancy chart we used last week:
To get the Value Added for a plate appearance, you take the runs scored on the play, add the ending run expectancy, and subtract the starting run expectancy. So a bases loaded home run with no outs would have an ending run expectancy of .489, plus runs scored of 4, minus 2.262: a Value Added of 2.227. A home run with the bases empty and two outs has the same run expectancy at the beginning and the end, so you end up with a Value Added of only 1. This approach, needless to say, does a much better job of reconciling with actual runs scored than the linear weights approach. It also comes closer to measuring performance “in the clutch,” although it ignores inning and run differential (we’ll talk about that at a later date). So why not use it instead of linear weights? The data needed to power it is readily available now, at least for the modern era, as is the computing power required to accomplish it. The issue—if you’ll remember back to our goals laid out in the first week—is that we want to avoid overcrediting a player for the accomplishments of his teammates. A player is not directly responsible for the baseout states he comes to bat in; that’s the product of the hitters ahead of him in the lineup. But if you look, there’s a substantial relationship between the average absolute Value Added of the situations a player comes to bat in, and the difference between his linear weights runs and his Value Added runs (per PA):
In other words, a player’s Value Added is driven in part by the quality of opportunities he has, not simply what he does with them. The ability for a player to impact plays is greater in some situations than others, and players who get to bat in those situations more than the typical player will have more of a chance to accrue Value Added. But we can adjust for this. Using a variation on Leverage Index called baseout Leverage Index, we can adjust the Value Added run values for the mix of baseout situations a player comes up in. We take the average absolute change in run expectancy in each baseout state and compare that to the average absolute change in run expectancy in all situations, so that 1 is an average situation and higher values mean more possible change in run expectancy. Then we divide the Value Added values by the leverage to produce an adjusted set of values that reflects a player’s value in the clutch without penalizing or rewarding a player based on the quality of his opportunities. There is still a substantial relationship between linear weights runs and adjusted Value Added—most players won’t see a drastic change. But some will. Take Robinson Cano, for instance. In 2009, he had a pretty good offensive season by contextneutral stats, batting .320/.352/.520 in 674 plate appearances. That’s good for 25.5 runs above average. But Cano had some pretty pronounced splits that season, hitting .376/.407/.609 with the bases empty but .255/.288/.415 with men on. Cano’s “clutch” performance was so bad, he ended up being worth 1.6 adjusted Value Added runs, worse than the average hitter despite hitting well above average for the season on the whole. It comes down to what we want to measure—do we want the contextneutral runs, which say Cano was a superb hitter in 2009? Or do we want the clutchbased Value Added runs, which say he was below average? Or should we present both, and let readers decide which they prefer? Here’s your chance to weigh in. We’re not taking a poll—it’s not an election, per se. But we’ll listen to arguments on both sides, and I promise you that this isn’t a trick question.
Colin Wyers is an author of Baseball Prospectus. Follow @cwyers
22 comments have been left for this article. (Click to hide comments) BP Comment Quick Links BurrRutledge (18981) Thanks for the opportunity to peak behind the curtain, Colin. I have more questions than opinions at this point. Sep 05, 2013 05:49 AM Peter Benedict (3131) Regarding linear weights and adjusted Value Added: Which has been a better predictor? I realize that's not the point of WARP, but wouldn't the answer to that question point to which measures a stable attribute more effectively? Sep 05, 2013 06:50 AM Linear weights, almost assuredly. As you note, though, that's not the point of WARP. Sep 05, 2013 06:53 AM Mooser (26842) How many years of Value Added runs do you need (or PAs)in order for Value Added runs to become just as good an indicator as Linear Weights. If its only like 3 years (rather than say 10) I would like to see a 1 year WARP with Linear Weights and a 3 year WARP for Value Added Runs. Anything more than 3 year WARP is likely useless in terms of determining True Talent as then you get into aging factors etc. Sep 05, 2013 08:00 AM cmaczkow (31488) I think this is a fascinating question, because it really boils down to "What do you want the stat to actually indicate?" Sep 05, 2013 08:39 AM Well, as I am fond of saying (and if you play the Colin Wyers Drinking Game, this is a Babylon 5 quote, so take a shot), Understanding is a threeedged sword  your side, their side, and the truth. Sep 05, 2013 09:05 AM TangoTiger (57181) I love your coin analogy. Sep 05, 2013 10:16 AM ncsaint (67849) Why choose between them? It seems to me that the best reason to keep leverage out of WARP is that we already have RE24/boLI. Sep 05, 2013 08:44 AM Mooser (26842) Because WARP includes defence and baserunning. That being said, if your going to include noncontext neutral batting runs (RE24/boLI) than doesnt the defence and baserunning components need to include context as well. I suppose with play by play data we have the ability to do that, but I think it needs to be consistent. Sep 05, 2013 08:51 AM newsense (5112) BRAA vs. RE24/boLI is a good (if irresolvable) discussion but the context here is the error bars Colin is attaching to BRAA/WARP. Sep 05, 2013 12:20 PM newsense (5112) Once again, as I commented on the previous article, Colin is calling "standard errors" what are really "standard deviations". Sep 05, 2013 12:21 PM therealn0d (51857) I'm not certain you are correct here. Standard error and standard deviation aren't very different, unless you mean some other use of the term standard error. Sep 06, 2013 11:09 AM newsense (5112) Standard deviation is a description of distribution. If the SD of a double is .456, we can estimate that 68% of doubles produced a change in run expectancy between .723 +/ .456. Sep 06, 2013 12:37 PM Matt (35980) I would appreciate you present any number of different statistics if they essentially answer different questions. Sep 05, 2013 13:25 PM eliyahu (11036) Late to the conversation here, but I have a separate question: If the ValueAdd approach does not stabilize over time to align with the context neutral approach, am I to understand that clutch hitting is an actual skill that is predictive? Over a career, I would expect Cano to balance his poor performance in high leverage situations with outperformance. If, over an entire career, this does not balance out, do we really want to ignore it? Sep 07, 2013 21:46 PM Brady Childs (71053) Could the extra value added be because of the pitcher's strategy with men on base? I could imagine that a home run would be more likely with the bases loaded than with the bases empty because the pitcher is being forced to throw strikes. I don't know if this is true or not, and I don't think BR has MLB stats. Sep 17, 2013 12:39 PM Not a subscriber? Sign up today!

As usual, the answer is... it depends.
When I am playing fantasy, I really do not care how good the player is in theory. If he is assigned high leverage situations, I don't want that sorted out. After all, I seek out players who are placed in highleverage situations.
If I am asking who the best ballplayers are, who makes the most of the opportunities, then I want the deleveraged values. Comparing second basemen who bat second on one team and sixth on another really requires deleveraging.