June 15, 2005
Is Small Ball Also Smart Ball?
Hitting Approach and Run-Scoring Consistency
The Chicago White Sox' apparent off-season change in philosophy has generated considerable buzz. The traditional baseball community has largely lauded the decision to play "small ball," while the performance analysis community has largely questioned, if not ridiculed, it. In particular, the decision to trade Carlos Lee for Scott Podsednik has been considerably scrutinized and attacked.
The performance analysis community's opposition to small ball is long documented: sacrifice hits are, almost always, wasted outs while stolen bases are only useful when the player can steal with a high enough success rate. One of the most basic tenets of the performance analysis community is that playing small ball should decrease the number of runs scored by giving up precious outs.
The defense of small ball by the traditional baseball community seems a little more nebulous. A lot of the commentators and writers who speak and write glowingly of small ball seem to be yearning for what they think of as the golden days, before expansion, before bandbox ballparks, and before steroids. Or they seem to believe small ball emphasizes the fundamentals, or that it's more beautiful. It is pointless to argue with nostalgia and aesthetics (though that does not always stop me), but some have defended small ball as not only more enjoyable baseball, but as winning baseball, with the Sox' fast start taken as evidence to prove the theory.
The crux of this defense is that small ball will lead to more consistent run scoring than the take-and-rake approach of walks and homers. Waiting for a home run, it is argued, only leads to runs when the home runs come. On the other hand, runs can always be manufactured if you have people good at manufacturing runs. Quoted in a St. Paul Pioneer Press story about how the White Sox are trying to emulate the Twins approach that defeated them year after year, Willie Harris, a good small-baller himself forced out of a starting job by the off-season moves to bring in better small-ballers, nicely summed up the argument. "To me," he said, "a small-ball team is more consistent. You're going to have small ball every day as opposed to the home run, which won't be there every time."
This argument certainly has intuitive appeal: get the leadoff guy on first, have him steal second, bunt him to third, make contact, bring him home. Of course, the White Sox might struggle getting the guy on first, especially if their on-base woes continue (although they are no longer on a record-setting pace for not walking, they currently rank in the bottom third of the league in OBP and total walks), and their stolen base efficiency is far from outstanding (currently less than 70%), but if you put together a good small ball team, they should be able to reliably push across a few runs a game. On the other hand, even a good home run hitter is only going to hit one out once every four games or so. The best did not even average one home run every other game.
The second part of the argument-that consistent run scoring should lead to more victories than inconsistent run scoring-also makes intuitive sense: if two teams average five runs a game, but one scores exactly five a game and one alternates scoring 10 and none, the first team, unless they reside in Coors Field or have the Yankees' pitching staff, should be in every game but the second team will definitely lose at least half of theirs.
However intuitively appealing the arguments are, though, they have not been tested: should they prove true, it would represent at least some validation for the White Sox decisions; should they prove false, it would suggest the White Sox offseason moves might not continue to bear fruit. In this article, I test the first part of the argument, namely that small ball leads to increased consistency in run scoring.
To accomplish this, I need measures of consistent run scoring, the small ball approach, and the take-and-rake approach. To measure run-scoring consistency (or, rather, inconsistency), I simply use the variance in runs scored for a team over the course of a season. Variance is a commonly used measure to describe the spread of data and measures how much each data point deviates from the average of the data. In this context, it measures how much the number of runs scored in each game for a team differs from the seasonal average number of runs scored per game. The higher the variance, the less consistent the team is in scoring runs.
To measure small ball, I added together the total number of stolen base attempts and sacrifice hits to create the Small-Ball Index (SBI). Admittedly, there is more to small ball than stolen bases and bunts, like hitting and running or moving the batter over, but SBI is easy to measure and should capture the essence of the approach. In 2003, for instance, the Marlins lapped the Majors with an SBI of 306 with Anaheim coming in second with an SBI of 240. Oakland and Toronto brought up the rear with SBIs of 84 and 73 respectively. This shouldn't be surprising, given the GMs in Oakland and Toronto and the emphasis on speed and putting the ball in play in Florida and Anaheim.
The essence of the take-and-rake approach is drawing walks and hitting home runs. I thus add isolated power (slugging percentage minus batting average) and "isolated patience" (on base percentage minus batting average) to create the Take-and-Rake Index (TRI). Boston and the Yankees led the Majors in TRI in 2003, with scores of .273 and .268 respectively, while the Dodgers and Tampa Bay came in last. Again, none of this should be surprising, although coming in last in SBI seems to be by design while coming in last in TRI might be more through overall offensive incompetence.
Looking at the relationship between the two indices and variance in runs scored provides some evidence in support of the small ball defenders. The data in this study covers all teams from 1998-2003 and comes from the Retrosheet performance logs for those seasons. Figure 1 shows the relationship between TRI and variance, with TRI on the vertical axis and variance on the horizontal. As can be seen, the relationship is strongly positive: the higher TRI is, the more variance in runs scored. With the exception of a handful of outliers (the team with the variance above 17 is the 2000 Rockies) the points all tightly follow a steeply positive line. In fact, the correlation is .6, fairly strong substantively and highly significant statistically (p<.001, suggesting that there is almost no possibility this relationship is due to chance.)
The relationship between SBI and variance is weaker, though, as can be seen in Figure 2. There is virtually no pattern here: the correlation is, in fact, only -.07. While this is negative, as expected by small-ballers, it is not at all statistically significant. Figures 1 and 2 suggest that the traditionalists are partly right: waiting for a home run does seem to lead to much more inconsistent run scoring while playing small ball has little influence on increasing consistency.
The above analysis ignores a crucial factor, though. Variance is likely to increase as the number of runs increase, because the number of runs scored is a bounded variable: no matter how hard the 2003 Dodgers may have tried, you cannot score fewer than zero runs in a game. A team averaging zero runs a game, therefore, must have a variance of zero, while a team scoring an average of one run a game will likely have a positive variance because they will be shut-out in some of the games, while in others they will score more than one, thus enabling them to have an average of one. But while the variance will be positive, it cannot be very large, as they can never score one run lower than their average. On the other hand, a team averaging 100 runs a game could still be shut out and will need to score 200 runs the next game to maintain their average, which is a huge variation. More realistically, a team averaging three-and-a-half runs a game (the lowest limit in the sample) can only score three runs lower than their average in any given game while a team averaging 6 runs a game (the upper limit) can score as many as six runs below their average. This alone can increase the variance for the latter team substantially. Figure 3 demonstrates that just such a relationship exists between the mean and variance of runs scored per game.
This relationship is even stronger than that between TRI and variance: the correlation is nearly .8 and, again with the exception of the 2000 Rockies, the line is nearly perfectly linear.
If the performance analysts are right that walks and home runs are the road to runs scored, then teams with a high TRI should score more and should, for that reason, have higher variance in runs scored. Looking only at TRI and SBI, therefore, may be giving credit to the manner in which the runs are scored when the effect is really driven by the number of runs scored. Therefore, I turn to a multiple regression framework to assess the effect.
A multiple regression framework enables you to see the effects of one variable while controlling for the effects of other variables. In other words, it enables you to hold constant the number of runs scored while examining the influence of hitting approach on variance in runs scored. With this framework, I can answer the following question: if two teams scored an identical number of runs, would the team playing small-ball have lower variance than the team taking and raking?
The results strongly suggest that the answer to this question is no. The following equation shows the results of the regression:
Variance = -3.07 - .001*SBI - 3.26*TRI + 2.98*Runs Scored + å (.005) (.682) (.603) (.000)The number in parentheses below the equation are the p-values for the coefficients on the batting approach indices and the number of runs scored. Loosely speaking, the coefficients show the direction and size of the relationship between these variables and variance of runs scored while the p-values show the probability that the true relationship is zero and the coefficients observed are the result of random variation. In general, one wants a p-value that is at least less than .1 before concluding that the relationship between the variables is real.
As can be seen in the above equation, the argument in favor of small ball has no support once the number of runs scored is taken into consideration. Each additional run scored per game increases the variation in runs scored by 3, which is highly statistically significant. Neither the small-ball nor the take-and-rake index are even close to significant, with p-values above .6. Hitting approach has no effect on the consistency of runs scored. In fact, once the number of runs scored is controlled for, it seems nothing else effects the variance in runs scored: I repeated the above regression including batting average, on-base percentage, slugging percentage, and any number of other measures of offensive performance and not one of them was significant.
A small-baller might argue that they are willing to trade off fewer runs if those fewer runs are scored more consistently. This might be a smart trade-off and it would be interesting to discover just how many runs one should be willing to trade for a given increase scoring consistency. However, one of the benefits of the multiple regression approach is that it enables us to conclude that hitting approach is not involved in this trade-off. A team can gain run-scoring consistency just as easily by substituting a bad slugger for a good slugger as by trading away their slow slugger for a speedster. Trading Carlos Lee for Ruben Sierra would have about as much effect on scoring consistency as trading Carlos Lee for Scott Podsednik will. (PECOTA projects Sierra and Podsednik to have near identical EQMLVRs of -.037 and -.035 respectively, so they should have similar effects on run-scoring if both were used full-time.) I doubt the White Sox would have made the former trade, but, looking only at offense, this analysis suggests they should have been just as hesitant to make the latter trade.
Sean Ehrlich is a contributor to Baseball Prospectus. He can be reached here.