March 23, 2006
Rethinking Normalized League Values
There's something that's been bothering me all winter, ever since Keith Woolner sent some of us a copy of the article he was working on for BP 2006, and I finally figured out what it was. But first, let me digress about twenty years…
The book that turned me into a hardcore sabermetrician wasn't one of Bill James' Abstracts, although I bought all that I could (including a fairly rare 1980 edition; I have a friend who managed to get a 1977.) No, for me it was The Hidden Game of Baseball, with Pete Palmer's stats and John Thorn's writing.
Palmer's system for rating players was called Linear Weights. It was a simple linear equation, something like
.46 H + 0.80 DB + 1.02TP +1.40HR + .33 BB + .3 SB - .6CS - .25 (AB-H)
In a general sense, you could say it was about adding up the values of a bunch of good things (hits, walks) and subtracting the bad things (outs). The way Palmer did it, however, the value of the out was not "-0.25." Palmer wanted the equation to come out to zero for the league as a whole (the system was supposed to show how many runs above or below average a player was, not how many total runs he generated), and he accomplished that by making the value of the out be whatever was necessary to offset the value of all the good things. In 2005, for instance, he would have assigned the NL an out value of -0.273 and the AL a value of -0.277. The value given to "outs" was in fact a variable, which was generally in the neighborhood of -0.25.
It was very easy to turn linear weights into a routine that estimated total runs; you simply had to add back in the average number of runs per out. Effectively, this meant that the value of an out changed from something like -0.25 to something like -0.10. The logic of Palmer's initial system would direct you to change the value of the out to whatever was needed to get the right answer for the total runs scored by the league; looking back on 2005 by way of example, the values would be .096 in the NL and .088 in the AL. If you do that for every team in history, you get an estimator for team runs that has an average error (root-mean-square error, to be exact) of 27.23 runs. Call this--where all the interannual variability is carried by the out term--Case A.
Just because Pete Palmer did it that way, though, doesn't mean it's the way you or I have to do it. You could just as easily leave the value of the out at a fixed value, like -0.09 (pretty close to the historical average), and apply the multiplier needed to get total runs to the entire equation, like this:
1.032 * [.46 H + 0.80 DB + 1.02 TP +1.40 HR + .33 BB + .3 SB - .6 CS - .09 (AB-H)]
1.032 is the actual value for the 2005 NL; the AL would use 0.992. There are a couple of good reasons for doing it this way. The first is that the formula works a lot better; the average error drops from 27.23 to 25.12, a two-run and then some improvement. The second reason is that the multiplier is a lot more stable. When all of the variation was shoved into the out term, over all of the 254 major leagues in history (I count the NA), you had an average multiplier of .073 with a standard deviation of 0.030; that's 41% of the average. The number actually turned over in the 1870s, with outs counting as a positive, thanks to all of the errors committed in those days. But that was all with Case A. Now, with what we'll call case B, where we've modified the entire equation, you get the variability down to 14% (a mean of 0.92, SD 0.13). That's a desirable thing, to minimize the change of those league-specific terms; the less those interannual multipliers change, the easier it is to just pick up the formula, apply it to whatever player you're worried about, and not have to worry about how the league affects the calculation.
For years, that's where I stood, and that's how I used linear weights on those occasions when I wanted to use them, generally for comparisons among different run estimator systems. However, as I mentioned earlier, Keith Woolner's article in this year's Baseball Prospectus made me reconsider this. Keith was looking at win expectancies--using the statistics to get a direct measure of wins added to the team, without going through the middleman step of figuring out how many runs were involved. As you would expect, when you played in a low run environment, the value of everything went up, and in a high run environment everything went down--except for the outs. Outs stayed pretty much constant in all of the run environments, while everything else moved around--a pattern that was an exact opposite of Pete Palmer's original model.
You can, then, change the linear weights model to reflect Keith's findings, creating a version where the value of the out stays fixed, and the multiplier to get the right total of league runs is only applied to everything else (and I really should take the CS away from everything else, but that's outside the point I'm trying to make right now). That equation, case C, would look like this:
1.021 * [.46 H + 0.80 DB + 1.02 TP +1.40 HR + .33 BB + .3 SB - .6 CS] - .09 (AB-H),
with the out term now outside the big brackets. 1.021 is the necessary multiplier for the NL; 0.995 is the AL value. You'll notice that there is only a 2.6% difference between the two, whereas it was 3.9% (for case B, when the NL/AL terms were 1.032 and 0.992), and before that, 9.1% (case A, when they were .096 and .088). That is a difference that holds up over all-time, as the ratio between the new standard deviation (0.09) and the new mean (0.94) drops to 9.5%. We also get another (small) gain in accuracy, cutting the RMSE down from 25.12 to 24.91. The actual correlation between linear weights and runs supports the improvement as well; correlation is .913 for case A, improves to .919 when you applied the difference to everything in case B, and increases again, to .929, when the difference is applied to everything but outs, as in case C. I'm probably overstating the case from a purely mathematical perspective, but I see this as a strong confirmation of Keith's work. When outs are treated as a constant, the model improves.
Which leads to the scary part, the part that's been nagging at me for months. If the value of an out is roughly constant, then it means, quite simply, that every statistical system that uses normalized league values is wrong. I mean EqA, I mean NOPS, I mean PRO+, I mean the updated linear weights in the Baseball Encyclopedia, I mean adjusted RC/27. Every single one of them, as part of the normalization process, is moving the value of the outs along with the values of everything else. When they adjust Honus Wagner's 1908 season, they adjust the value of everything, including the outs, upwards--which means that the outs end up being counted too strongly and that his performance is being underrated. The opposite is true of players from hitter's leagues, like Hack Wilson in 1930, or Hugh Duffy in 1894, or even Brandon Wood from the 2005 California League--these systems will all systematically overrate these performances.
I tried to find a way to modify EqA so that it would be able to avoid, or at least overcome, this problem. In essence, I needed to get more (marginal) wins out of seasons from low offensive environments, and fewer from the high environments. By the nature of the math that creates WARP, that is equivalent to saying that I need to get more marginal runs from the low run environments. Equivalent runs are built from a formula that looks like this:
EqR = (PA) * (league RPPA) * (2 * REqA/LgeREqA - 1)
RPPA is runs per plate appearance, and REqA is the "raw EqA" formula. The 2x-1 portion has always just been an approximation of the data. You could run various regression tests, and depending on exactly what years you choose to evaluate the system and exactly what variables you use in the raw equivalent average equation, you can get values anywhere between 1.9 and 2.1--"2" was just a typical value that was also very easy to remember. The trailing -1 goes up and down with multiplier, to keep the whole thing at 1 for a league average; so 2x-1, or 2.2x-1.2, or 1.8x-0.8; that's the form of the equation.
I have checked that value many times over the years, but I don't know that I ever thought about checking to see if there was a correlation between that multiplier and league offense…to see if one could improve the overall accuracy of EqA by letting that specific number vary in a defined way. Suppose (cutting through a lot of boring preliminary work) the lead number isn't really 2, but this:
2 * (.125/LgRPPA)^ X
For the last 15 years, I have been treating X as if it were zero. 0.125 is just a nice round number (1/8 is pretty round, and I like round numbers; they're easier to remember) that happens to be pretty close to the historical average of runs per plate appearance (0.119); as luck would have it, it also minimizes the errors in the following experiment. Since a direct measure of the needed multiplier between relative runs and relative EqA was very difficult (and is occasionally undefined), I used an iterative method on the value X to minimize my RMS errors. Translation: I dumped it into Excel and used Solver. Using just a simple "2x-1" equation, I have an RMSE of 24.30 runs over all time (linear weights, you'll recall, was at 24.91; both are using just the stats listed in the LW equations above). Solver says that with an exponent of 0.332, which my simple mind is going to call 1/3, the RMSE is reduced to 24.08 runs. That is about the same improvement you get by including HBP in the equation (adding HBP in all the right places cuts the RMSE from 24.30 to 24.06), so it is hardly insignificant. Playing around with other variables, like walk rate and batting average, that I wouldn't expect to have any effect (just in case Solver is doing something really funky) shows that they don't have any effect; simple randomness seems to allow gains of .03-.05, but nothing like .22.
This means that, for someone like Wagner in 1908, playing in a run environment of about .090 (league RPPA=.092, PF=.983), his runs should be calculated based on an equation of 2.231x-1.231, not 2x-1. A league average player will still get the same number of runs, but the range is expanded. Hugh Duffy, playing in a 1.075 park and a 0.180 league, has a run environment of .194, yielding an equation of 1.73x-.73. Let me put that in more familiar terms, carrying those values through the entire EqA process:
Existing EqA-1, EqR-1 With modifier Gain Honus Wagner, 1908 .354, 148 .361, 157 +.007, +9 Hugh Duffy, 1894 .352, 121 .343, 113 -.009, -8 Hack Wilson, 1930 .352, 146 .349, 143 -.003, -3 Carl Yastrzemski, 1967 .349, 146 .353, 149 +.004, +3Wagner and Yaz come from notably low offensive years, while Wilson and Duffy represent two of the highest. The first column shows the figures as they are on today's player cards, where I have always used the 2x-1 form; the second column shows what would happen if I added the modifier with the 1/3 exponent. Even in these most extreme cases, this amounts to less than 10 runs a year, which is a fair reason why it has never really been noticed before now. I am probably going to make these modifications to the player cards when the new season gets underway, although I still have to think about some of the consequences. The case looks compelling, especially since it seems to address several problems I've noticed before, like why the best EqAs still come from high-offense environments after you adjust them, and why high-offense minor leagues like the Pacific Coast and California leagues get lower difficulty ratings in the DTs than their supposed equals, the International and Florida Leagues. I still want to check a few more things before making that decision final.