Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

The following is an edited transcript of an in-house discussion among the Baseball Prospectus team about WPA

Colin WyersSort of inspired by Will’s comments on Twitter (although I’ve actually been doing a lot of thinking about these things recently), I thought I’d share my favorite example of WPA producing questionable results.

The story so far: Mariners down a run, bottom of the ninth, two outs. Mike Sweeney hits a double to center. The WPA (at least this implementation—you can probably get slightly different answers depending on how you set up the win expectancy tables) is .092. Then Ichiro homers to win the game; WPA records .867.

I am willing, I suppose, to agree that the change in win probability at this point of the game is what WPA says it is. Where I sort of part ways is in giving the whole .867 to Ichiro. That says that Ichiro was nine times as important to winning the game as Sweeney. I, uh, disagree. But the way that WPA is being parceled out here, there’s nobody else to give the WPA to besides Ichiro—all WPA accrues to the batter on offense and the pitcher on defense.

But in a very real sense, it doesn’t matter what Ichiro does if Sweeney doesn’t double. And I mean very real—if Sweeney makes an out in that spot, the game ends before Sweeney can bat. Sweeney receives credit based upon an average performance from Ichiro—but Ichiro didn’t perform average.

 That’s the problem I have with metrics that aren’t context-neutral—the context only runs one way. Sweeney provides "context" for what Ichiro does, but Ichiro’s actions don’t provide "context" for Sweeney. His value stays the same regardless of what Ichiro does. 

Will Carroll: The issue I have with this specific example is that it's not… reversible. If Sweeney doesn't get on, Ichiro's value would go down to zero. If Ichiro walks, his value is unlikely to go down nearly as much. I just don't get how that works out.

CW: Well, on one hand, we have a "fact"—the odds of the Mariners winning did in fact improve 86.7 percent on that play. That part of WPA is unassailable.

Then there's a decision to accredit all WPA for a play to the batter on offense and the pitcher on defense. Honestly, I'm about as uncomfortable with that decision as you are.

WC: Let's go a bit further—let's pretend Sweeney has speed and is distracting the pitcher. The team wants to keep him out of scoring position and grooves a fastball to Ichiro on an attempted steal. Ichiro yards it, win. He still gets all the credit? No, no… that's reductionism, not accuracy.

The more I look at this, the more I realize WPA is graph-crack. It makes a nice, easy chart that appears right at first glance, but it implies a level of accuracy and granularity that it simply can't have. It's good and likely "good enough" for many applications, like a "game heartbeat," but at heart, it is the reverse of the people that want to argue intangibles. By agreeing that we can't factor in everything, it throws up its hands, does something pretty, then goes back for another hit off the stat-bong.

Ben Murphy: Mmmmm… stat bong.

Rob McQuown: Sorry, but I would debate this "unassailable" point. WPA is based on "everything being equal," in the same way that every game begins at 50/50. But, obviously, everything is not equal. What is unassailable is that historically, the situation in which Ichiro came to bat had resulted in a win .867 less than the post-event state. The dependent variable here, which is completely ignored in this calculus, is the fact that Ichiro's presence alters the percentages (as does the speed of the runner—as Will alluded to). A comparable situation would be using run expectancies, and suggesting that when Carlos Ruiz (batting eighth) gets on base to lead off an inning for the Phillies, the run expectancy is the same as when Shane Victorino gets on base to lead off an inning in front of the massive Phillies order… based on identical base-out states. We don't draw this conclusion, obviously, nor should we conclude that win expectancy is context-independent.

Put more simply, the WE after Sweeney's double might have been higher than 13.3 percent. In this case, it's hard to suggest that it actually was, since Mo is so awesome, but pretend it is Brad Lidge pitching or something. (The truth of the matter is that the original 2-out WE was probably lower than 4.1 percent given Sweeney was batting against Mo). Of course, we are almost compelled to use the 13.3 percent if we want enough sample-size-per-state to have any meaning, but I do think your original point of having variable "context" for each player is an area worth exploring.

Ken Funck: Take this contextual weakness of WPA, sprinkle in the yet-more-confounding variable of pitch sequencing, and you have that which most gives me the willies: the summing of Pitch Type Linear Weights per hundred pitches to calculate, say, the "most effective changeup in baseball." If Neftali Feliz strikes out the side on nine pitches, blowing eight straight fastballs past hitters before making the last guy flail helplessly at a changeup, by pitch-type linear weights his most effective pitch is the changeup.

Matt Swartz: Hear, hear! The pitch-type linear weights things are so incredibly ignorant of context and used so horribly inappropriately that I almost wish they didn’t exist at all. It’s a shame because they could be used so effectively at a theoretical game level to figure out which pitches are being thrown too rarely or too much, but they say nothing about "the most effective changeup in baseball"or anything like that.

This article by Sky Andrecheck is a great one. Hitter performance on pitches has very little persistence year to year. The reason is that you throw pitches that are effective more often and pitches that are less effective less often until you reach equilibrium. Any time your fastball is more effective than your changeup in a given count, you’re probably throwing it too much. For an example no one is talking about lately, take Ryan Howard—his pitch type linear weights says that in 2009, the slider was the pitch he hit best. No, it isn’t. Pitchers just threw it to him so often that he looked for it so often that he hit it enough. But it lowered his opportunities to see fastballs, so he capitalized on them less and did worse on them than he would if he saw them often enough to expect them. It’s a mixed-strategy Nash equilibrium, basically. But people misuse these stats.

WPA is a story stat. Like RBI, it’s fun to know if you like to have a number tell a tale. It’s not a moral. It’s an anecdote. It tells the story of the game. It’s limited in that sense, and should be taken as such.

Russell Carleton: This doesn't solve the leverage problem, but you can plug in a good set of context-neutered linear weights (Colin?) and parse out the credit that way, no?

MS: Does WPA weight certain statistics too much or too little? Is it biased toward certain players, or just imperfect in attributing credit in a randomly distributed way?

CW: I don't know the answer to that yet. Right now I'm building my own set of win expectancy tables so I can look into some of these questions.

Here’s more fun with WPA—the 1996 Yankees bullpen. Mariano Rivera, the setup man that season, has a 5.4 WPA. John Wetteland, the Yankees closer, has a 4.2 WPA.

At first blush, it makes no sense. Rivera pitched close to twice as many innings as Wetteland, had a lower ERA, higher K, drastically lower HR rate, a practical tie in walk rate—Rivera was much better than Wetteland, to an extent that WPA isn't capturing.

The answer is leverage. Wetteland had a 2.37 aLI, compared to 1.54 for Rivera. Now, if we swapped out Rivera with an average setup man, Wetteland's aLI, and thus his WPA, would drop. Wetteland, and not Rivera, gets all the credit for the extra leverage above average he is being provided by Rivera.

Clay Davenport: There's nothing wrong with the win expectancy values—they are what they are. The problem is that the end of a game is a boundary condition, and you are dealing with a model that doesn't really handle boundaries. The phrase that comes to mind (my mind, anyway) is "collapse of the wave-function"—a problem in quantum mechanics that occurs when you take an actual measurement, replacing a probabilistic distribution with a discrete point. Similarly, here we have a system that has been continuously considering all possibilities suddenly reducing to one real answer, and an inordinate amount of credit flows to the person who's there at the end.

CW: Yeah, I’m not suggesting I’m about to set the world on fire with new win-ex tables. I could probably use someone else’s for this; I just have a bad case of wanting to do things myself.

The point about a boundary condition is a good one. But if it’s "breaking" at the end of the game, how far back in the game is it at least bending?

CD: That's a very interesting question. Thinking about it after going to bed, it seemed that the win probability is essentially a function of (run differential)/(time remaining), and time remaining becomes zero at the end. Think of how the value of 1/N changes as you change N from 100 to 0 in steps of -1, almost imperceptible differences through most of the range, accelerating rapidly as you approach zero.

WC: Couldn't we take someone like Ichiro—who has a lot of PAs—and see what situations he's in most, then compare him to other players in similar situations to see if the WE is different? I'm curious to see what the range of possible player-adjusted WE might be. 

 

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
oneofthem
5/02
the problem with wpa can be illustrated by calculating the "wpa" of a series of coin tosses. let's say you make 4 tosses, and for each side, it takes 3 to win.

if you take wpa like a measure of player ability naively, you'd believe in the unbelievable clutchness of the 4th coin.
DrDave
5/02
Fascinating topic, guys. I've been worrying about this since I first read Lindsey's work from 1960, back in the 80s.

1. Clay nailed the fundamental problem -- the collapse of uncertainty when the game is over.

2. How you address the problem requires you to declare in advance whether or not game-level clutch hitting (as opposed to inning-level) is something you want to give credit for. (If you don't want to give credit for clutch hitting at all, stick to VORP/WAR/etc.)

3. I think Rob is on the wrong track, regarding player differences. You don't want a method that discounts Chase Utley's performance because he's Chase Utley and we expect more from him -- that leads down the rabbit hole.

If you want to discount his performance based on Victorino being on and Howard coming up next, that's better -- it starts to correct the problem that hitters on bad offensive teams have less total run leverage to work with. But the plain truth about value-added methods is that the only accurate correction for that is to switch back to context-neutral methods entirely. As above, it becomes a question of which contextual inequalities you wish to give personal credit for, and which you don't. That's philosophy, not math.

In the specific Sweeney/Ichiro example, Sweeney is penalized for batting in a nearly hopeless position. How do the numbers work out if you give him credit not for the delta win probability, for the factor by which he increased the win prob? What was the win prob when Sweeney came to the plate? I'm guessing that he increased it by, if not a factor of 9, at least something a lot closer to that...
BurrRutledge
5/02
Dr. Dave, I am in complete agreement. As the end of the game approaches, the delta in WP is less important than the ratio. Perhaps the same is true at any time of the game, as I think about it.

This is why Clay is to the solution with regard to outs remaining.

With two outs in the ninth, a player who gets on base has improved his team's changes of winning infinitely when compared to the alternative (getting out and ending the game).

This is also why there is widespread appeal for the concept of clutch hitting.


sethwick23
5/02
Maybe I'm being naive here, but couldn't you do something along the lines of WPA/aLI? Shouldn't leverage increase rapidly at the end of the game in pretty much the same way that WPA does?

For Colin's Rivera vs. Wetteland example, WPA/aLI is 3.52 for Rivera and 1.77 for Wetteland, which seems (at least more) intuitively correct.

I'd also assume that Ichiro's plate appearance had a higher leverage score than Sweeney's, so it'd bring them closer together in that case as well.
Kinanik
5/02
I'm not sure you can say, a priori, that replacing Mariano with an average set-up man decreases Wetteland's aLI. Let's say in one world, he has a set up man who never allows a run. He'll come into a game with a lead of either 0, 1, 2, or 3. Replace that first set up man with one who always gives up a single run. Wetteland now enters in the exact same situations - 0, 1, 2, or 3 runs up, but those were previously 1, 2, 3, and 4 run leads. So the bad set up man decreases the aLI at 0 runs (as those Wetteland no longer enters) and increases the aLI of the other situations. It seems like an empirical question regarding a team's distribution of runs scored and allowed whether not having Mariano hurts Wetteland's WPA. Maybe I'm missing something.
AutomatedTeller
5/02
How big a problem do you guys see this as being? Just from a naive point of view, 87% seems in the ballpark. I mean, Mariners needed 2 runs. Sweeney did half the job of scoring one (getting on base), Ichiro did the other half of scoring it and all the job of scoring the other one. Are you guys worried about the 12% difference? or are you guys thinking it should be more like 50/50?
ils4O1
5/02
12% Difference? Sweeney got 0.092 and Ichiro was credited with 0.867.
thegeneral13
5/03
12% if the difference b/t the 87% Ichiro was credited with by WPA and the 75% Automated Teller is saying kind of makes sense, based on Ichiro doing about 3/4 of the work needed to turn the win expetancy from near zero to 100%.
ils4O1
5/02
I understand WPA is based of years of data and thousands of games, but why were the Mariners' chances only 13.3% down 1 run with a man on 2nd and 2 outs? A single ties it and the chances of that happening are higher than 13.3%, plus the inning can continue. A tie game at that point is essentially 50/50, so does that mean the chance of tying is 26.6%?
dougcox
5/03
If you want Sweeney and Ichrio to get equal credit, then Sweeney should have homered.

With the Mariners down by 1 with 2 outs and no one on, it looks like his double was 10% of a win. That sounds right to me. If he'd homered and tied the game, his WPA would have been 50% or so, and Ichrio would have been about 45%.

I don't see the issue here.
Mountainhawk
5/03
Agreed. This makes total sense to me. Sweeney got on base and kept the game alive. If he doesn't, he gets a -.041 WPA ... instead, he got .092, or a difference of .133. Ichiro HR to win the game gives him .867, if he gets out, it's -.133, a difference of 1.000 (Obviously, Ichiro could do other things, but these are the extremes).

So is Ichiro's hit worth 7x as much? Well, he's responsible for 6 bases to 2, he's also responsible for Sweeney being able to advance to 3rd and home. I don't think 7x is all that far outside of the range of reasonableness.
airmondii
5/03
1st point: WPA is not meant to tell us how the true probability of winning a game changes over the course of a game. Otherwise, as Rob McQuown said, the game wouldn't start at 50/50.

Like all metrics, WPA is flawed. However, pulling it from a base-out state table is valid if your aim is to value the contribution of a player's at bat in a context-neutral way.

2nd point: distributing 9.2% to Sweeney and 86.7% to Ichiro is totally fair. WPA is influenced by leverage, but only on a superficial level. It makes intuitive sense to value a play (HR, K, etc) by its WPA. And in the long run, Sweeney will be presented with the same frequency of high-leverage situations as Ichiro. Ditto for low-leverage, and everything in between.

I think we'll just end up outsmarting ourselves if we try to change Sweeney's 9.2% WPA after-the-fact, based on what Ichiro does. Besides, if you want to argue Sweeney's contribution is undervalued if Ichiro goes yard, then you'd have to believe Sweeney's contribution is overvalued if Ichiro makes the 3rd out. Better to just keep it independent altogether.
studes
5/03
Seems to me that WPA is what it is. That's it's strength and weakness. As a story stat, I think WPA is superb. When it comes to giving "credit" to players, you're really quantifying their role in the game story. As Matt says, you're not passing some sort of moral judgment on them, and I don't know many people who are using WPA to pass moral judgment.

Regarding Colin's specific example, Sweeney did get credit for extending the inning to Ichiro, but you're right that he didn't get credit for what Ichiro subsequently did. Should he?

Well, if you give him that credit, that might be "fairer," but it also disrupts the game story. It becomes a "post hoc" story instead of an in-game story. To me, that's not worth it.

Isn't leverage index the way to solve Clay's boundary issue?
instersting
5/03
The thing that's always bugged me about WPA is that if the Sweeney/Ichiro sequence happens in the first inning (instead of the 9th), the WPA assignment to the players is completely changed. The hits somehow become less valuable, the ratio between the 2 WPA's gets much closer to 1, and yet the outcome of the game remains the same.

I believe that WPA is interesting from the standpoint of tracking the state of the game, but the way that it assigns credit/blame to the players seems forced.
escapeNihlism
5/04
could we approach WPA like we approach the PECOTA-adjusted playoff odds? every night the season is simmed a million times and weighted back to what should have been expected at the beginning of the year, according to the system..


so if we were to sim an individual game a million times, or a thousand, or whatever, we could say that the Reds have .55 chance of beating the Mets tonight, and start the graph there. and then weight each individual outcome over the course of the game back to what should be expected.

this does penalize good players, as if the Yankees start a game with a .7 win probability there is more (-) that their players can achieve than positive, but I'm not certain that WPA can be adapted into a meaningful individual player stat.