May 13, 2008
I feel the need to re-introduce myself, or at least my column, since this space has been pretty quiet in 2008-I've spent a lot of time so far running around explaining legal concepts and covering live events for Baseball Prospectus-and I'm hoping this will be the beginning of a more regular schedule. Prospectus Toolbox is a column dedicated to new readers, or veteran readers who might not be familiar with all the acronyms and numbers that we tend throw around rather casually on this website. The focus is meant to be high on simple language and explanations and low on any form of math you'd have learned after middle school-because, really, you shouldn't need to be Stephen Hawking to comprehend the work that we do here, or to have it increase your enjoyment of baseball.
Since the best way to figure out what people want to learn more about is to listen to their questions, you'll find that this column-by design-leans more on reader mail than anyone else's. You're encouraged to use the links at the bottom of the page to send me questions or comments, particularly if you feel that there is something that isn't covered in our glossary. Sadly, I don't answer every email I receive, but I do read all of them, and a fair number wind up getting published.
Anyway, on to this week's question, from reader J.B.:
Does the 'Left on Base' Statistic have any correlation to a team's offensive success or failure?
Some statistics carry more of an emotional charge than others, and baserunners left on base (LOB or BLOB, as it's referred to in our glossary) incites emotions on the level of the home run or the strikeout. It's hard to find a more frustrating (or, if you're rooting for the team on defense, exhilarating) moment in a game than when a batter comes to the plate with the bases loaded, and fails to bring any of those runners home. Keeping track of how many runners an individual batter strands on base is a popular enough hobby that many scorecards dedicate a space just for keeping a running tally. At the same time, games in which either or both teams strand a lot of runners tend to be long, frustrating bores.
Before we take on J.B.'s question, a quick explanation of what he's asking for. Correlation is a very basic statistical tool that examines the relationship between two sets of numbers. You can have positive correlation between the two sets-if the values in one set go up, the values in the other will also rise, if they go down, the values in the other set will also fall (for example, your age and the date). Negative correlation means that as one value rises, the other falls (for example, the number of miles you've driven between trips to the gas station and the amount of fuel left in your tank). The correlation coefficient runs on a spectrum between 1 and -1, with strong relationships at either side of the spectrum, weakening as you approach the middle. A correlation coefficient of 0 means that the two sets of numbers are basically random with respect to each other.
Now, it's important to remember that correlation is a what, not a why. Just because two sets of numbers are correlated, it doesn't mean that one necessarily causes the other. Correlation sometimes suggests the nature of a relationship, but it never proves it.
The reason that J.B.'s question is interesting is because LOB has such negative connotations, that you'd probably expect a strong negative relationship to run scoring. After all, if you're leaving a lot of runners on base, by definition your team is losing out on scoring opportunities. To get an idea of whether this expectation holds water, let's look at the Team LOB leaderboard for last year, along with a couple of other statistics we usually relate to a team's offensive success or failure: Runs and Times on Base (TOB).
Team R TOB LOB 1 PHI 892 2289 1295 2 BOS 867 2314 1291 3 OAK 741 2143 1258 4 COL 860 2271 1250 5 NYA 968 2371 1249 6 CLE 811 2174 1216 7 ATL 810 2145 1205 8 LAN 735 2096 1200 9 NYN 804 2146 1196 10 FLO 790 2107 1192 11 CHN 752 2070 1190 12 HOU 723 2058 1181 13 CIN 783 2098 1170 14 SLN 725 2073 1168 15 TBA 782 2098 1166 16 WAS 673 1992 1163 17 SDN 741 2014 1153 18 BAL 756 2076 1152 19 DET 887 2182 1148 20 SFN 683 1978 1141 21 SEA 794 2080 1127 22 MIN 718 2020 1120 23 PIT 724 1997 1119 24 MIL 801 2032 1117 25 TOR 753 2014 1112 26 ANA 822 2125 1100 27 TEX 816 2019 1092 28 ARI 712 1939 1090 29 KCA 706 1964 1089 30 CHA 693 1925 1074
Looking at this list, you see some evidence against the idea that LOB hurt your offense. Looking at the top five teams at leaving men stranded on the bases, you find four of the five top scoring offenses of 2007… and you find the Oakland A's, the tied-for-nineteenth-best-offense in the game. Just as you're tempted to draw the opposite conclusion, there are odd little bits of data-for example, the worst offense in the game (the Nationals) producing more LOB than the third-best offense (the Tigers). Is this all just random?
To answer the question, we'll want to look at a larger sample of data. So I asked the amazing Bil Burke to give me the LOB totals for each team since 1971, and from that sample I omitted strike-shortened seasons such as 1981, 1994, and 1995, as well as the current season. That leaves us with some 918 data points from which to calculate the correlation between team Runs scored and team LOB. The coefficient of correlation was 0.52, which is a pretty strong positive correlation between these two statistics. Two other correlations between the statistics presented above suggest a reason that we see this relationship between LOB and Runs scored. The coefficient of correlation between LOB and TOB was 0.72, and between TOB and runs it was an extremely high 0.91. Each of these relationships makes a bit of sense: the more runners you put on base, the more likely some are to get stranded there, and the more runners you put on base, the more likely you are to score runs. Put it all together and the data suggests that a high number of runners left on base, while frustrating and annoying, may not be a sure sign that your team's offense is broken or even in bad shape. It might actually be a good thing.