Happy Thanksgiving! Regularly Scheduled Articles Will Resume Monday, December 1
April 6, 2006
Wins and the Quantum
"I'm reminded a bit of the principle of superposition--each player in the game produces a contribution that has an effect on the probability of winning, somewhat analogous to a wave function. Add up these "wave functions" for each team, and you get a result that expresses how likely the team is to win with these particular sets of contributions, yet at this point it's still unknown whether the team actually wins (much like the fate of Schrödinger's cat inside the box). However, the wave function only collapses to the actual result when the game is played (or the box containing the cat is opened)."
When Woolner wrote that over four years ago, this column wasn’t even a twinkle in BP’s collective eye. I do love the analogy, though, and before moving on to this week’s topic--which connects to Woolner’s quote--I wanted to take a minute to explain the title of this column and how, other than the play on words, it relates to baseball. (If quantum physics doesn't interest you, click here to go directly to the baseball part of the article).
Erwin Schrödinger was an Austrian physicist who, in 1926, formulated the fundamental equation of quantum mechanics. His equation described a world where properties of a particle (such as the location of an electron) at a specified time can be pinpointed only probabilistically. In other words, the particle may have a greater chance of being in one place than in another, but its location is described by a wave where the peak represents the position with the greatest probability. The quantum world appeared to be a fuzzy one governed by probability unlike the clock-work deterministic world of our everyday experience.
By 1935, many physicists, including Niels Bohr (although famously not Albert Einstein), had interpreted this waveform equation to mean that particles do not in fact possess specified properties (such as location) before measurements are taken; they are in a spread out and fuzzy state of superposition (literally beyond position) until a measurement is taken and causes the waveform to “collapse” to a particular value.
As a response to this view, Schrödinger devised a thought experiment that came to be known as “Schrödinger’s Cat”; he believed it showed that Bohr’s interpretation of quantum theory was, at the very least, incomplete.
In short, the experiment involves a cat in a box with a trigger that releases poison. That trigger is tied to a device that measures a property of a particle. According to the waveform equation there is a 50% chance that the particle will be in state A and a 50% chance of it being in state B. If the device measures it in state A the poison is released and the cat dies.
The core question for Schrödinger was simply this: if the particle’s property is not determined until it is measured, under Bohr’s interpretation isn’t the cat--through its connection with the particle via the device--also left in an undetermined state and therefore neither dead nor alive until the box is opened? Bohr’s interpretation didn’t really answer the question since it didn’t define any rules about the nature of measurement and observation.
To make a long story short, which is told in wonderfully accessible prose by Brian Greene in The Fabric of the Cosmos, the questions raised in this thought experiment baffled physicists for years, but now have been mostly resolved by applying a concept known as decoherence. That concept holds that long before the box is opened the influence of the environment (from photons to air molecules and other particles) has nudged the waveform function into taking on a specific value, meaning that the cat is in fact really dead or alive and not caught in some state of limbo.
What I like about this episode in the history of science is that Schrödinger devised a clever experiment used to test a common perception in his own field of quantum mechanics. That experiment made people think deeply about what they knew or thought they knew about the nature of reality itself. And while I’m not pretending that baseball has anything profound to say about such matters (it is, after all, just entertainment), I do hope that through this column, at least now and then, we can devise clever experiments that put to the test both conventional and sabermetric wisdom and help us think more deeply about our shared distraction.
Before moving on I should also mention that reader John MacKenzie noted that he’s been using the moniker Schrödinger’s Bat (with a different spelling) for his fantasy league team for several years. We were of course unaware of that usage, so please don’t give John a hard time thinking that he lifted it from us.
And now on to your regularly scheduled programming…
The concept of Win Expectancy (or Win Probability Added) is now an old one in performance analysis circles. Simply put, Win Expectancy is the probability of wining a game given the inning, score, and base/out situation. Using the Expected Win Matrix here at BP you can see, for example, that in 2005 when the visiting team was behind by a run in the top of the 6th inning with runners on first and third and nobody out, their probability of winning was exactly 50%.
Changes in that probability throughout a game can be tracked and then applied to a host of questions both strategic (when to sacrifice, when to steal, when to issue an intentional walk, when to bring in a reliever) and reflective (who contributed most or least to increasing their team’s chances of winning in 2005--their aggregate contribution to the waveform function for each game in which they played, to use Woolner’s analogy).
Those readers who’ve treated themselves to The Numbers Game by Alan Schwarz or Curve Ball: Baseball, Statistics, and the Role of Chance in the Game by Jim Albert and Jay Bennett know all about the Mills Brothers and their computation of “Player Win Averages” (PWA) for the 1969 season published in their 1970 book Player Win Averages: A Computer Guide to Winning Baseball Players. There they devised a system where changes in win expectancy were assigned to players and multiplied by a point system to compute Win and Loss points. The ratio of those became the PWA, their goal being to formulate a statistic like batting average used to discover clutch performers.
But simply because it’s a topic with some legs doesn’t mean there aren’t new applications and refinements that can be made. Woolner himself contributed to this endeavor through the publication of the Win Expectancy Framework (WX), first discussed in the 2005 Baseball Prospectus and again in the 2006 version as well as in Baseball Between the Numbers, where it is applied to topics ranging from relief pitching to stolen bases.
For those unfamiliar, the framework allows for the computation of the probability of winning a game given the current inning, score, base/out state, run environment (both home and visiting teams), and run differential. It does so by calculating all the permutations of possible outcomes from that point forward to determine the probability of each team winning.
The key difference between the framework and matrices such as the one referenced previously, is that the probabilities produced are theoretical and the situations from which they derive needn’t have occurred in real life. This has a twofold advantage:
For example, in the scenario described above, the visiting team had a 50% chance of winning as revealed in the table. However, in the intuitively less favorable situation where the visitors had a runner only on first, their probability of winning in 2005 was 52.4%. The inherent nature of WX eliminates these problems. From that perspective, WX is more similar to the approach used by the Mills brothers in computing PWA where they used computer simulation to derive the probabilities.
Leveling the Playing Field?
In any case, in his 2006 Baseball Prospectus article “Adventures in Win Expectancy” Woolner applied WX to hitter seasons using play by play data extending from 1960 through 2005. In other words he calculated and then summed the change in win expectancy across all plate appearances for each hitter using the WX framework to produce a kind of “number of wins above average” contributed by each player. The results were then shown in two tables that reveal the 15 highest and lowest seasonal Batting WX and the 20 highest and lowest career Batting WX for the time period. The two tables below show the top and bottom five for each.
Seasonal Batting WX Year Name PA WX 2004 Barry Bonds 617 12.07 2001 Barry Bonds 664 11.71 2002 Barry Bonds 612 10.45 1969 Willie McCovey 623 10.02 1998 Mark McGwire 681 9.65 --------------------------------- 2003 Royce Clayton 543 -4.28 1970 Larry Bowa 577 -4.29 1968 Hal Lanier 518 -4.45 1997 Gary DiSarcina 583 -4.76 2002 Neifi Perez 585 -6.69
Career Batting WX Name WX Barry Bonds 115.71 Willie McCovey 74.11 Hank Aaron 71.15 Willie Mays 63.41 Frank Robinson 63.04 ----------------------- Tim Foli -24.61 Doug Flynn -25.69 Royce Clayton -27.29 Alfredo Griffin -28.90 Larry Bowa -31.50
Obviously the gap between
Fortunately, we can augment these lists to stretch back through time by applying a formula and a table of slopes and intercepts Woolner provided to estimate the win value of an offensive event given any offensive environment.
First, by applying the formula to the league run environment over time for the National League we can produce the following two graphs:
There is an interesting aspect of the first graph, as noted by Woolner. In eras of higher run scoring--such as when the average NL team scored 7.36 runs per game in 1894, 5.68 runs per game in 1930, and around 5.00 runs per game in 1999-2000--each offensive event contributes less to a win than in lower run scoring environments such as 1908 (with 3.32 runs per game), and 1968 (at 3.42).
In other words, contrary to the notion that home runs during the dead-ball era weren’t as important as small ball tactics, they were in fact even more important, since each extra base hit--especially one that plates a run--has a larger relative impact on winning the game. Looking closer you’ll notice that as the number of bases gained by the event increases the relative value also increases as the run environment decreases. So in 1908 a home run is worth 3.14 times more than a single while in 1930 it’s worth exactly three times as much.
It is in that context that the following quote from the supreme hitter of the dead-ball era is relevant:
"If I had set out to be a homerun hitter, I am confident in a good season I would have made between twenty and thirty homers...I would naturally have sacrificed place hitting, which, to my way of thinking, is the supreme pinnacle of batting art."
If Cobb could indeed have hit 25 home runs a season in the days before 1920 as he also is purported to have contended in the oft-recited anecdote where he hit three homeruns to prove the point, then he would have been well served to do so.
In the second graph the value of various kinds of outs are shown and what is revealing is that the win values of strikeouts and other kinds of outs don’t change very much over time. The graph also shows how much more costly getting caught stealing is than other kinds of outs and that caught stealing fluctuates more with the run environment. In higher scoring eras getting thrown out doesn’t cost as much as in lower run scoring eras since when runs are scarce and runners are hard to come by, losing a baserunner has a larger relative impact on winning or losing. The long and short of it, as illustrated by Woolner in the original article and discussed by James Click in Baseball Between the Numbers and shown in the following graph, is that you have to be successful at a higher rate in low run scoring environments than you do when runs are more plentiful.
Attentive readers will note that the break even percentages shown here vary somewhat and are lower than those shown in the original article. The reason is that these are based on the overall win expectancies calculated using Woolner’s formula and not on specific situations in various run environments.
So by joining lower win values for offensive events in higher run scoring environments and very similar win values for most outs in lower run scoring environments you get something rather counterintuitive. But both of those statements have their roots in the fact that the basic structure of the game hasn’t changed much. Despite styles of play that come in and out of vogue, you still get just three outs per inning and 27 outs per regulation game and a home run has always been the most efficient way to score runs.
So let’s apply the formula to individual batter seasons and adjust for both the league run environment as well as the ballpark using three-year park factors. After all, just as an extra base hit increases the win probability in the low run scoring environment of 1968 more so than in 2001, it does so to a greater degree at Dodger Stadium in 2001 than it does in Coors Field.
After making the calculation for 83,733 player seasons (starting in 1876 in the NL and 1901 in the AL) we find the following top and bottom 15 seasons. Note that there are two tables of the bottom performers, since the bottom performers were dominated by pre-1900 players.
Name PA WX1 2001 SFN Barry Bonds 664 11.59 2002 SFN Barry Bonds 612 10.85 1923 NYA Babe Ruth 699 9.83 1921 NYA Babe Ruth 693 9.61 1920 NYA Babe Ruth 615 9.44 1927 NYA Babe Ruth 691 9.17 1926 NYA Babe Ruth 652 9.10 1927 NYA Lou Gehrig 717 9.09 1941 BOS Ted Williams 606 8.94 1946 BOS Ted Williams 672 8.94 2004 SFN Barry Bonds 617 8.79 1957 NYA Mickey Mantle 623 8.74 1917 DET Ty Cobb 669 8.66 1924 SLN Rogers Hornsby 640 8.55 1942 BOS Ted Williams 671 8.49
Name PA WX1 1894 CHN Jiggs Parrott 536 -6.02 1933 SLA Jim Levey 567 -5.84 1886 KCN Jim Lillie 427 -5.52 1893 SLN Joe Quinn 584 -5.36 1894 NY1 John Ward 575 -5.24 1894 CL4 Chippy McGarr 554 -5.07 1885 NY1 Joe Gerhardt 423 -5.05 1895 PHI Jack Boyle 625 -5.03 2002 KCA Neifi Perez 585 -5.03 1890 CL4 Bob Gilks 582 -5.02 1884 BFN Jim Lillie 476 -4.81 1891 CIN Germany Smith 551 -4.81 1890 BRO Germany Smith 526 -4.67 1879 CN1 Will White 300 -4.64 1892 BSN Joe Quinn 574 -4.64 Post 1900 Only Name PA WX1 1933 SLA Jim Levey 567 -5.84 2002 KCA Neifi Perez 585 -5.03 1933 SLA Art Scharein 522 -4.62 1953 SLA Billy Hunter 604 -4.54 1934 SLA Ski Melillo 589 -4.53 1909 BRO Bill Bergen 372 -4.48 1999 COL Neifi Perez 732 -4.47 1932 SLA Ski Melillo 659 -4.45 1931 SLA Jim Levey 540 -4.42 1936 PHA Skeeter Newsome 508 -4.33 1937 CHA Jackie Hayes 631 -4.30 1977 OAK Rob Picciolo 446 -4.09 1902 CLE John Gochnauer 506 -4.08 1970 CIN Tommy Helms 605 -4.07 2000 COL Neifi Perez 699 -4.07
What stands out of course is that the WX values for Bonds from the tables shown previously don’t match the WX1 values in the first table here. The reason is that the formula applied to calculate these values is more of an approximation and doesn’t put into complete context each individual plate appearance. As a result you would expect to see more variability when play-by-play data is used since a player may find himself more or less frequently used in highly leveraged situations through both chance and managerial decision.
In other words, the price we pay for being able to reach back before play-by-play data was available is a loss in precision. However, given that the presence of a clutch hitting ability--if it exists at all is likely quite small--some might argue that WX1 has the advantage of removing the effect of randomness and in that way actually provides a more “pure” technique for comparison.
Bonds’ 2001 and 2002 seasons still come out on top, but Ruth makes his mark with five consecutive entries on the list which is rounded out by appearances by
Cubs fans will no doubt be disheartened to see
We can then sum the WX1 values for entire careers and provide the following top and bottom 20 career performers with the bottom performers list being duplicated once again for post 1900.
Name PA WX1 Babe Ruth 10616 117.37 Barry Bonds 11636 108.73 Ty Cobb 13072 105.14 Ted Williams 9791 98.37 Hank Aaron 13940 90.41 Stan Musial 12712 87.85 Willie Mays 12493 85.29 Mickey Mantle 9909 82.43 Lou Gehrig 9660 79.31 Rogers Hornsby 9475 78.44 Tris Speaker 11988 77.15 Frank Robinson 11743 73.09 Mel Ott 11337 72.06 Honus Wagner 11739 65.08 Eddie Collins 12037 65.03 Rickey Henderson 13346 58.73 Jimmie Foxx 9670 57.57 Jeff Bagwell 9431 56.65 Joe Morgan 11329 55.48 Frank Thomas 8602 53.22
Name PA WX1 Tommy Corcoran 8275 -41.25 Joe Quinn 6341 -38.43 Germany Smith 4652 -34.46 Alfredo Griffin 7330 -34.24 John Ward 7470 -34.14 Bobby Lowe 7741 -33.95 Bill Bergen 3228 -33.22 Kid Gleason 8198 -32.70 Malachi Kittridg 4446 -32.18 Ozzie Guillen 7133 -31.80 Bones Ely 5000 -30.73 Davy Force 3081 -30.04 Fred Pfeffer 6563 -29.65 Ed Brinkman 6640 -29.40 Don Kessinger 8529 -29.13 Ski Melillo 5536 -29.03 Herman Long 7845 -28.97 Everett Scott 6373 -28.77 Larry Bowa 9103 -28.64 Tim Foli 6573 -28.49
Post 1900 Only Name PA WX1 Alfredo Griffin 7330 -34.24 Bill Bergen 3228 -33.22 Ozzie Guillen 7133 -31.80 Ed Brinkman 6640 -29.40 Don Kessinger 8529 -29.13 Ski Melillo 5536 -29.03 Everett Scott 6373 -28.77 Larry Bowa 9103 -28.64 Tim Foli 6573 -28.49 George McBride 6235 -27.02 Tommy Thevenow 4484 -26.57 Neifi Perez 5123 -25.53 Aurelio Rodrigue 7078 -25.16 Hal Lanier 3940 -24.77 Leo Durocher 5827 -24.60 Mark Belanger 6602 -24.50 Luke Sewell 6041 -23.92 Roy McMillan 7653 -23.74 Wally Gerber 5816 -23.18 Rabbit Warstler 4611 -22.97
You’ll notice that the total WX1 here for Bonds is just four wins or so less than the table shown earlier while Ruth overtakes him at 117.37. Of course, Ruth’s contribution to winning here does not include his pitching performance which would further distance him from Bonds. Nor do these values include fielding which would help Bonds close the gap a bit.
Mays and Aaron also add 19 and 22 wins, respectively, by including their entire careers; Ty Cobb comes out very well, and both Ted Williams and
Perhaps the most interesting thing about the top performers list is that
Clearing the Bases
To wrap up, there are also a couple issues I wanted to address from last week’s column regarding platoon splits.
I mentioned last week that The Book notes that right-handers need about 2,000 plate appearances against lefties before their measured platoon split can be considered reliable. I received several comments on this to the effect that since 2,000 plate appearances is the equivalent of 10 to 12 years of playing time, that seems like an awfully long time to wait before you can say anything about a player’s split.
I agree. The point is not that you can’t know anything about the player’s split ability in fewer plate appearances. The point is that if you had only two pieces of information--a hitter's platoon split and the average split for right handed batters--and you had to choose which was more accurate, you would chose the average split.
That doesn't mean that you couldn't get a better estimate by regressing the player's split to the mean using a weighted value, which the authors also discuss. So you certainly don't need to ignore the measured platoon split of players like
Second, because in this case the statistical threshold is so high, teams can and do combine both scouting information and statistical data to make predictions about future performance. So Epstein’s comments about Pena’s ability to perhaps contribute immediately because of his platoon split hopefully also reflects their scouting of his swing mechanics and pitch recognition among other attributes.
And finally, I’d like to thank all the regular BP readers who have so kindly welcomed me into the fold. Your support is appreciated and your feedback encouraged anytime.