May 20, 2008
I was ready to move on, but last week's column generated a lot of comment, so we're sticking with baserunners left on base for today's column. Now, last week we looked at the correlation between leaving runners on base-the Team LOB statistic-and run scoring, by looking at team totals of those stats going back to 1971. I kept the conversation limited to raw totals of runners left on base, times on base, and runs scored for two reasons: first, because that was the question that had been asked, and second, because raw totals are the way the left on base stat is most often used and discussed. Once the strike-shortened seasons of 1981, 1994, and 1995 were omitted (although I failed to take the first strike year, 1972, out of the sample) the teams were on more or less equal footing in terms of opportunities to put men on base, score runs, or strand them.
As several readers noted, using raw counting stats left a number of questions unanswered. For example, reader S.B. pointed out:
The better stat to look at is probably Percentage of Baserunners who are Stranded (or a very-pleasing acronym POBWAS). This is analogous to the RBI/RBI opportunity stat for an individual batter being a better indicator than total RBI.
Statheads often favor rate statistics to counting stats, because rate statistics provide us with valuable context-usually, by contrasting a counting stat against opportunities. Reader M.P. took things a bit further than S.B. did, crunching some numbers while using the stats I provided in last week's column:
It seems to me that the key stay for LOB would actually be the percentage of runners left on base, a sort of offensive strand rate, OSR perhaps? Just using the data from 2007 that you presented, the coefficient of correlation between runs scored and OSR would be -.62, which shows a pretty strong negative correlation between OSR and runs scored. That would explain the Tigers outscoring the Nationals despite the Nationals stranding fewer runners. The Nationals stranded over 58 percent of their runners while Detroit stranded just under 53 percent. It would be interesting to see if home runs are a factor. I would think Detroit hit a good number more homers than Washington, making their offense much more efficient.
The rate stat described by both S.B. and M.P., the ratio of runners left on base to times reached base safely or LOB/TOB, would seem to be a pretty good (albeit inverse) measure of a team's offensive efficiency, and a negative correlation between that stat and run-scoring would undercut the conclusion of last week's study, that a high number of LOB aren't necessarily bad for your offense. M.P.'s point about Detroit's home runs makes some intuitive sense, and is bolstered by the fact that the Yankees, another power-hitting ballclub, featured a similarly low LOB/TOB ratio. However, the most efficient offense last year, by this measure, that of the Los Anaheim Angels of Angeles, had the exact same number of homers (123) as the Washington Nationals, who had the second-highest LOB/TOB. We'll shelve further investigation of that theory, for the moment.
M.P.'s other point, about the how much double plays might throw off the LOB/TOB ratio, was interesting, but when I followed up on it, it seemed incomplete. If double plays should be added to LOB, why not triple plays? That's easy enough, but then there were other questions ("Why not caught stealing? Why not pickoffs?") which were a bit of a headache. So, with the assistance of William Burke, I took a look at another statistic, Team OBI percentage (OBI%), that includes double plays and takes the batter scoring on a home run out of the equation.
As it happens, I wasn't the only person who thought that maybe LOB/TOB wasn't the end of the discussion. Reader L.H. chimed in with his own preferred metric:
You missed one factor of interest: the ratio of Runs to LOB. The more Runs a team scored, the higher the ratio. What was very interesting was that the ratio for the majors last year was .666. Those that fell below, did worse. To put it another way, the lower-scoring teams got 33-35 percent of their TOB around to score, the higher-scoring teams 40 percent. The number 15 team, Tampa, got 36.4 percent across. Everyone above them had a better rate. Only three of the remaining 15 teams below them had a better rate, and none by more than one percent.
Another reader, M.C., suggested the remaining combination among the stats I listed last week:
A couple of years ago (when the Red Sox were leading the league in runs scored and OBA annually), I heard a lot of Sox fans complaining about all the runners left on base, so I decided to see what percentage of base runners scored (R/TOB). As it turned out, despite leading the league in LOB, the Sox plated the second-highest percentage of base runners, which I felt indicated that they were pretty efficient in driving runners home.
So, we have a quandary: one concept (find a statistic that describes a team's "efficiency" on offense) and four different proposed metrics, with the usual alphabet soup of acronyms in tow. Confronted with that challenge, I decided to calculate each metric's correlations to run scoring, again going back to 1971. Since mixing counting stats and rate stats is a bit like mixing acid and water, I decided to correlate the various metrics to another rate stat, runs per game (R/G). Another 2007 leaderboard follows:
Year Team R/G LOB/TOB R/LOB OBI% R/TOB 2007 NYA 5.98 .5268 .7750 15.96 .4083 2007 PHI 5.51 .5657 .6888 14.35 .3897 2007 DET 5.48 .5261 .7726 16.56 .4065 2007 BOS 5.35 .5579 .6718 14.78 .3747 2007 COL 5.28 .5504 .6880 14.79 .3787 2007 ANA ->5.07 .5176 .7473 16.11 .3868 2007 TEX 5.04 .5409 .7473 15.27 .4042 2007 CLE 5.01 .5593 .6669 14.46 .3730 2007 ATL 5.00 .5618 .6722 14.57 .3776 2007 NYN 4.96 .5573 .6722 14.64 .3746 2007 MIL 4.94 .5497 .7171 14.74 .3942 2007 SEA 4.90 .5418 .7045 14.81 .3817 2007 FLO 4.88 .5657 .6628 13.85 .3749 2007 CIN 4.83 .5576 .6692 13.60 .3732 2007 TBA 4.83 .5558 .6707 14.35 .3727 2007 BAL 4.67 .5549 .6563 14.46 .3642 2007 TOR 4.65 .5521 .6772 14.62 .3739 2007 CHN 4.64 .5749 .6319 14.17 .3633 2007 OAK 4.57 .5870 .5890 12.79 .3458 2007 SDN 4.55 .5725 .6427 14.01 .3679 2007 LAN 4.54 .5725 .6125 14.15 .3507 2007 SLN 4.48 .5634 .6207 13.67 .3497 2007 PIT 4.47 .5603 .6470 14.39 .3625 2007 HOU 4.46 .5739 .6122 13.34 .3513 2007 MIN 4.43 .5545 .6411 14.36 .3554 2007 ARI 4.40 .5621 .6532 14.11 .3672 2007 KCA 4.36 .5545 .6483 14.80 .3595 2007 CHA 4.28 .5579 .6453 13.34 .3600 2007 SFN 4.22 .5768 .5986 13.43 .3453 2007 WAS 4.15 .5838 .5787 13.64 .3379
Another benefit of using the Runs per Game was that I could run the correlations against all years going back to 1971, including the strike years (since the strike cut down on totals, not ratios).
The results? M.P. was correct about the negative correlation between LOB/TOB and run-scoring-the correlation I found (-0.52) was almost exactly as strong as the positive correlation between runs scored and LOB last week. OBI Percentage had a strong positive correlation to run scoring (0.82), but not as strong as R/LOB (0.87). That isn't surprising-having runs in the numerator of your metric increases the chances of positive correlation to run-scoring-and neither is the rock-solid correlation between R/G and R/TOB (0.93).
As it turns out, in the end, it's best to cut out the middle man (LOB) and attack the question directly: how often do the guys who reach base come around to score?