If you have ever tried to explain the concept of Pythagorean Record to a baseball novice, you probably have had to answer the following criticism: “That counts the extra runs at the end of a blowout as much as other runs, even though it does not matter whether you win 10-0 or 15-0.” The answer that we give to that criticism is that teams that can take advantage of blowouts have better offenses and those type of teams will be more likely to win close games in the future. That is the reason that we have thousand-run estimators that try to approximate how many runs a team will score on average, and why we evaluate players with statistics like VORP-measured in runs over replacement player. Runs are the building blocks of wins, and you win by scoring more runs than your opponent. We cringe when we hear offenses evaluated by batting average because we know that the goal of offenses is to score runs, not get hits.
The Inning
However, with all of these run estimators that sabermetricians have developed, we often forget the context in which runs are scored-by innings. Teams get to score as many runs as they can before their opponents record three outs; then they get to try again eight more times. That environment-how much you can score before three outs-is the environment to keep in mind when we talk about winning games. Nearly a decade ago, Keith Woolner wrote about the link between runs per inning and runs per game, and how well you can predict the frequency of zero-run innings, one-run innings, two-run innings, etc. by looking at how many runs teams score per game.
It is certainly true that the rate of scoring a certain number of runs in an inning and the average number of runs per game are related. In fact, teams that have more variance in their run-scoring per inning also have more variance in their run-scoring per game. This is tricky to show because teams that score more runs also have more variance in the number of runs they score per game-that makes sense, because they have a lot of eight-run games, 10-run games, and 15-run games, so they are bound to have a higher variance because they needed enough big innings to put up those run totals. Simply checking the variance of runs per game against how frequently those teams have big innings would obviously yield a positive correlation. Instead, I needed some way to neutralize the variance of runs per game. I initially tried dividing by the number of runs per game, but that statistic still had a positive correlation with runs per game. I tweaked with things until I found a way to measure variance of runs per game that did not have any correlation (-0.0025) with runs per game, which I call “Adjusted Variance” or “AdjVar,” is this:
(Variance of Runs/Game) AdjVar = ----------------------- ((Runs/Game) ^1.30)
Looking at 1998-2008 data for each team (330 team seasons total), I found that this number was slightly positively correlated with the frequency of scoring zero runs in an inning (correlation = 0.097, two-sided p-stat = 0.075), highly correlated with the odds of scoring four or more runs in an inning (correlation = 0.258, two-sided p-stat = 0.000), and highly correlated with the odds of scoring five or more runs in an inning (correlation = 0.292, two-sided p-stat = 0.000). That much should not come as a surprise; we predicted that teams that have more variance in their runs-per-inning scoring would have more variance in their runs-per-game scoring.
Run-Scoring Variance and Pythagorean Record
The next step is to check if teams with more variance in their runs per game tend to underperform their Pythagorean records. In fact, this is true-the difference between actual wins and Pythagorean expected wins is negatively correlated with the AdjVar statistic above (correlation = -0.303, two-sided p-stat = 0.000). Teams that are more volatile in their rate of scoring runs are going to lose more often than other teams that score similar number of runs, but are not as volatile.
Now we know that teams that have high variance in their run-scoring by inning have more variance in their run-scoring per game. We also know that teams that have more variance in their run-scoring by game are not as likely to win as teams that put up the same number of runs but without as much of a spread. The next step is to figure out if there is any way to predict which offenses will have less variance in their runs per inning.
Which Offenses Spread Their Runs Around Better
Three years ago, Sal Baxamusa looked at 2006 team-scoring data and used the Weibull Distribution to predict how often they would score a certain number of runs. The Weibull Distribution does a pretty good job at predicting the number of times teams will put up certain run totals, but tends to underestimate how often teams are shut out. This is likely due to the fact that the talent level of pitchers is different, so analyzing how a team scores in general will not take this into account. You face Johan Santana sometimes, and you face Livan Hernandez at others, and Santana might shut you out more often than a model of hitting alone would predict. Baxamusa demonstrated that slugging teams were shut out less often, and also were more likely to score at least three runs in a game than their season run total and the Weibull Distribution would predict. This was useful information, but given the difficulties with the Weibull Distribution and the small sample size of just thirty data points, he was unable to check this in much detail.
By looking at runs per inning, we can look at a much larger sample-there were 477,884 half-innings from 1998-2008. Using this, we can check which type of offenses are more likely to spread their runs around and win more games as a result. The correlations between the odds of scoring at least a given number of runs in an inning and a number of common offensive rate statistics reveal even more evidence of Baxamusa’s suspicion-that the teams that score with power are more likely to win than other teams who score similar numbers of runs.
For reference, note that the average team from 1998-2008 only scored in 29 percent of the innings that they played, but they scored two or more 14 percent of the time, they scored three or more six percent of the time, they scored four or more three percent of the time, and they scored five or more one percent of the time.
Below I list the correlation between the frequencies of scoring at least a certain number of runs in an inning and on-base percentage and slugging percentage. Note that each of these have a 0.887 correlation with runs per game. You will notice an interesting trend:
At least X Runs/inning OBP SLG 1 .822 .872 2 .741 .723 3 .603 .573 4 .746 .716 5 .667 .611
The trend that you probably noticed is that high-slugging teams are more likely to pick up at least a run in an inning, but high-OBP teams are more likely to have big innings. The reason that this is so important is that we have shown that being able to spread your runs around different innings is more valuable than scoring a lot of runs in one inning, in terms of wins and losses, since high variance in run scoring tends to be correlated with underperforming your team’s Pythagorean Record. This means that all of our standard measures of run-scoring are overweighting the contribution of OBP towards winning and underestimating the contribution of SLG towards winning.
The connection can be highlighted even further by using regression analysis to predict the probability that a team scores at least X runs in an inning. I regressed the probability of scoring at least one, two, three, four, and five runs in an inning on on-base percentage and slugging percentage and found the following formulas:
Prob(Scoring at least 1 run) = -0.154 + 0.659*OBP + 0.526*SLG Prob(Scoring at least 2 runs) = -0.224 + 0.686*OBP + 0.307*SLG Prob(Scoring at least 3 runs) = -0.164 + 0.462*OBP + 0.171*SLG Prob(Scoring at least 4 runs) = -0.090 + 0.235*OBP + 0.094*SLG Prob(Scoring at least 5 runs) = -0.050 + 0.138*OBP + 0.039*SLG
The important thing to realize when looking at these formulas is that the coefficient on SLG gets smaller relative to the coefficient on OBP as you increase the number of runs per inning. Teams that string together a lot of baserunners are more likely to score by putting up big innings than teams that swing for the fences, who will spread their runs around better.
The link remains strong when you look at similar statistics for scoring at least a certain number of runs in a game:
Prob(Scoring at least 1 run) = 0.673 + 0.336*OBP + 0.387*SLG Prob(Scoring at least 2 runs) = 0.238 + 0.815*OBP + 0.814*SLG Prob(Scoring at least 3 runs) = -0.205 + 1.46 *OBP + 1.06 *SLG Prob(Scoring at least 4 runs) = -0.584 + 2.00 *OBP + 1.21 *SLG Prob(Scoring at least 5 runs) = -0.889 + 2.65 *OBP + 1.11 *SLG Prob(Scoring at least 6 runs) = -0.973 + 2.51 *OBP + 1.14 *SLG Prob(Scoring at least 7 runs) = -0.909 + 2.26 *OBP + 0.974*SLG
Conclusion
It is clear that power helps you score frequently, and on-base skill helps you pile on when you do score. In fact, a team’s home runs per at-bat has a 0.15 correlation with the difference between the number of wins a team gets beyond what their Pythagorean record predicts. Teams that hit more home runs do better than their Pythagorean Record suggests.
What this means is that power hitters are even more valuable than their VORP suggests. Power hitters not only change the scoreboard, but they change the scoreboard when it matters. The next time somebody tells you that a team is falling short because they rely too much on the long ball, you can reply that they may not rely on it enough.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.
Subscribe now
Great job on this, Matt.
Are you writing for BP regularly now? I've really enjoyed your articles the last few weeks.
I was also curious about the adjusted variance metric that you created. You mention that it is uncorrelated with runs scored, but does is there any pattern to the residuals? If so it could introduce some bias. It seems like it would be better to regress variance in runs scored against runs scored and use the residuals as the predictors of the difference between wins and pythag predicted wins. Again, just a thought.
Like the article overall. Thought-provoking = good.
I played around a little with dividing the sample into two cohorts as you suggest, but the problem them is that I have biased variables-- teams that were outscored already have positive run differentials, so they are statistically more likely to fall short of their pythagorean record because we have eliminated teams above .500 with negative run differentials from our subsample but not the opposite.
I think I understand what you mean about the adjusted variance metric, but the results were strong enough that alternative specifications didn't really change anything. I like the approach, though. I think in general the distribution of runs scored with the long right tail probably explains the issue that you are looking at anyway, maybe?
I suppose runs scored would follow a lognormal rather than normal distribution, since they are a product of multiple independent events. If you charted log(runs scored) I bet you'd get a normal distribution (long right tail would be gone), and the variance of that distribution would be more indicative of true volatility in run scoring for a team (wouldn't be skewed by blowouts). That would probably be the best method to come up with an adjusted variance metric. Then I'd regress that against the difference between wins and pythag expected wins. The analysis becomes a little more esoteric (what is a logrun?), but the conclusion would be stronger, I think.
If you take out the skew with that methodology I'd be interested to see what happens. My intuitive guess is that you'd see variance have little predictive value for the overall population, but significant predictive value for the individual cohorts, i.e. variance is good for the bad teams, and bad for the good teams. But I could be wrong. In fact I hope I'm wrong because that would be more interesting.
I'm not sure I understand the second paragraph of your response. I'm saying divide the population into "+ run diff" and "- run diff" and redo the regression of "wins minus expected wins" vs. "adjusted variance" for each of those cohorts. If you divide them based on run differential (as opposed to actual winning percentage) you shouldn't have any bias. In other words, it shouldn't be any easier or harder for a team in the "+" cohort to outperform or underperform its pythag than for a team in the "-" cohort.
Thoughts? Thanks again for the reply.
I'm having trouble interpreting a "logrun", as well, and so I'm not sure what the point would be of transforming the variance. Instead, I think it's best to highlight the fact that some teams score a bunch of extra runs as blowouts, and ignoring that will be an issue if this tendency is more likely for certain offenses.
I'm pretty sure that the bias would come in from the fact that there are four subsamples of the population-- those over .500 with + run diff, those under .500 with - run diff, those over .500 with - run diff, and those under .500 with + run diff. By subsetting the groups into + run diff and - run diff, you create a + run diff group with teams over .500 and under .500, and so you are including teams with a + run diff under .500 but not teams with a - run diff and over .500. The residuals will be biased, because the some negative residuals will have been eliminated.
Teams that exceed their Pythagorean projection are doing well in close games--I believe that is a given. Most close games are low-scoring games. In low scoring games, as Bill James realized 20-30 years ago writing about the World Series, long-sequence offenses (with high OBA) do poorly and slugging teams do better. So yes, slugging teams would be expected to outperform long-sequence teams in close games.
Comments?
David Kaiser
One thing I'd be curious about is how the run environment affects the big inning-->W% correlation. I.e. in a very low-run environment, does that still hold? Same with a very high-run environment. And does OBP still lead to bigger innings in these different envts?
The next step would be to take RPG and variance of RPG for different style offenses and defenses and see which ones maximize winning games. In general, consistent offenses and inconsistent defenses make for better games.
I'd also like to see things broken down a bit more. Instead of OBP and SLG, how about AVG, ISO, and BB%?
correlation BB% AVG ISO OBP SLG
%inng w/1+R .47 .75 .73 .82 .87
%inng w/2+R .45 .65 .59 .74 .73
%inng w/3+R .37 .53 .46 .60 .57
%inng w/4+R .42 .68 .57 .75 .72
%inng w/5+R .40 .60 .47 .67 .61
It seems that AVG tends to follow a similar pattern as OBP, and ISO tends to follow a similar pattern as SLG. BB% tends to look like SLG because of their correlation, I think (0.36).
Another tangent to explore would be to see if "good bullpens" tend to be those that tend to allow less home runs/extra base hits, or those that tend to allow less baserunners.
Also, I wonder how well park factors correlate with this kind of concept, since some parks have low effects on OBP and higher effects on SLG, and vice versa.
To the extent that bullpens can control hits on BIP, it would be interesting to test their effects, but the effects would be pretty muted.
That's why I was wondering how much SLG affected W-L record predictions and if an increased understanding of this principle would refine W-L predictions. Also, whether bullpens that reduced SLG leaded to an increased chance of winning. Also, home teams that play in parks with high SLG park factors might have more variance in their W-L records.
Am I off the wall?
The park factor thing should not matter because all games are played by two teams that are in the same stadium at the same time. The park would spread out or condense their run scoring equivalently.
As an example, assume you are managing the visiting team in extra innings. If the home team scores a run, you lose immediately. If you had two relievers (ignore handedness and platoon) to choose from, with similar VORP/WXLR/etc., one of which has a slightly better rate at reducing OBP (but gives up more home runs) and the other has a slightly better rate at surpressing SLG (but gives up more walks), would it be better to use the pitcher with the better SLG-surpressing skills or the OBP-surpressing skills.. my instinct is it would be better to use the one who surpresses SLG since a pitcher could give up multiple walks/hits but still get the three outs.
As far as park factor goes, it might not matter much within the context of that game.. but if your team is in a park that allows more SLG, your team's W-L record might vary more from expected W-L based on Pythagorean Method since the increased chance of a "big inning" might cause more fluctuation. Perhaps, along those lines, those who play in so-called pitcher's parks have teams with records closer to their expected win-loss record. This might also have the added advantage of being better able to evaluate individual player performance.
The Angels rank 10th in the AL in HR rate, and only 7th in ISO, but 3rd in SLG.
The article was interesting. Questions I had: Any multi collinearity with runs per inning and runs per game stemming perhaps from the correlation between high variance in runs scored(i.e. lots of high run innings) and additional high run innings coming in the same game. As you pointed out, to miss the context of the runs within the game is somewhat problematic. Easy ways I can think to get started would be testing for the correlation between scoring x runs in an inning and scoring more in more than .29 percent of the remaining innings. After all, pitch counts, poor relievers ect. Not sure what the implications of this might be.
Other things, aren't slugging and obp really highly correlated(A really quick attempt to prove that led me to realize that I have no interns or data sets. shoot)? Can you still do that last trick and compare the probability of scoring x+ runs per inning over many x's and say things about the coefficients relative to one another? What are the p values on those regressions like? Could you also run those regressions on the probability of scoring x runs per game, not x plus, that way ranges of runs scored could be aggregated, and the correlation for scoring one run in a game and slugging would give something negative perhaps.
Anyway, happy to respond to your questions...
--I'm not sure what you mean about multicollinearity with R/Inn and R/G. I didn't use the two variables in the same regression at any point, so I'm not sure the problem.
--I don't have the data to compare the probability of scoring a lot of runs in an inning and of scoring a lot of runs in subsequent innings. I'm sure it's positively correlated, though, just because pitcher's ERA's show some persistence and park effects exist too. I'm not sure if that would complicate the results at all, though.
--OBP and SLG are very highly correlated, actually about 0.75 on a team level. But that's okay, because they aren't collinear. It's okay to run a regression on both.
--Running the regression for individual runs per inning would yield the following coefficients:
obp coef/slg coef
0 R: -404/-1259
1 R: -14/291
2 R: 310/188
3 R: 323/107
4 R: 137/77
That actually seems to strengthen my case, since the OBP has a negative (statistically insignificant) coefficient on the probability of scoring one run. The OBP coefficient for 0 R is also insignificant, and the SLG coefficient on 3 R is only weakly significant (p=.073). Everything else is strongly statistically significant (p<.004).
--The p-values on OBP and SLG coefficients for each of the x+ inning regressions are all 0.000.
I used the wrong word, I didn't mean Multi collinearity exactly. I meant to ask specifically about the relationship between the variance of runs scored in an inning and the variance on runs scored in a game, or predictions about total runs scored in the game. This would be important to understand in the debate on obp/slg. If a team is set up to be as likely as possible to score a run in an inning, can that help them score more later in the game significantly? Does scoring 2 runs in an inning have a greater effect on scoring at other points in the game than 1? i don't think pitcher era's or team offensive levels are grounds for tossing it out nessecarily. certainly for the team offense you could normalize with some success against the year's average results . Obviously it could turn up meaningless results, but if they were significant in some way, there are a number of different narratives that could be spun .
-Thats cool regarding individual runs per inning. thanks for doing that. That coefficient on slugging for one run is huge, and is a complex mathmatical way of describing why you walk albert pujols in close games(not last night though).
-As for the OBP/slg correlation, what happens if you drop one of the variables from the regression? Isn't that a good test for multicollinearity even though the variables aren't collinear. Apologies to Professor Tayon if I fail to grasp these concepts fully.
There certainly is a positive correlation with being able to score a lot and being able to score at all. That's why I normalized the variance of runs/game to find a scoring-level-neutral way of talking about more variance in runs/innings and runs/game. The key was only to link those two, so that I could justify looking at things on a per inning basis.
For example, someone with a .350 OBP would get on base 35% of the time and thus advance a runner. However, they might have a 60-65% chance (or higher) to advance the runner with a bunt (costing an out). Let's flesh out his stats and say he is a .250 hitter with a .400 SLG. Does that make him a good enough hitter where, playing the probabilities, he has a better chance of advancing (or scoring) the runner if he swings, thanif he bunts?