BP Comment Quick Links

May 5, 2005 Crooked NumbersDo Not Pass Go
As a kid, when the paper arrived I would immediately dive for the sports section, a move closely followed by flipping past all the articles and heading right for the "Scoreboard" page where I could scan the box scores over a bowl of Chocolate Frosted Sugar Bombs. This frantic ritual had its pros (a quick summary of yesterday's action) and cons (I missed lots of headlinesfor example, I didn't realize for two days that Buster Douglas had actually beaten Mike Tyson because that somehow didn't make the scoreboard page), but it did make me intimately familiar with the old newspaper box score format. Box scores are disappearing. While we still find them in their traditional format in newspapers and across the Web, the ability to read several articles about each game, daily updated stat reports, and playbyplay logs largely nullify the need to manually keep track of how many home runs your favorite player has or to discern the events of each game from a very limited set of numbers. The days of trying to figure out how a player scored a run without an AB or why another player has one fewer plate appearance than everyone else despite being in the middle of the order arefor the most partgone. As the box score loses its place, certain stats become a little harder to find. In that section just below the playerbyplayer lines, a quick summary of various game events like errors, doubles, triples, home runs, and double plays was always included. While those stats are meticulously and popularly maintained, there hasn't been as much discussion lately of our old friend LOB (left on base). Unless you're an A's fan, a group that seems to be on the verge of having to add a counter to the outfield bleachers to keep track of all the green and gold left stranded on the bases. Through Tuesday's games, the A's had scored 95 runs this season, good for second to last in the majors, tied with Cleveland and ahead of Pittsburgh. During that same time, they've left 209 men on base. That sounds like quite a lot, but let's see how it compares to the rest of the league. Pittsburgh, also down in the basement with the A's in run scoring, is at 178 LOB and 79 runs scored. Cleveland has 173 and 95. So the A's aren't terribly outside the norm when looking at their basement compatriots. Since offense can be thought of in two component partsgetting men on base and driving them inLOB can be thought of as potential runs, fulfilling half of that equation, but not the other. Thus, looking at runs scored as a percentage of LOB + R (called "runner scoring percentage" for now) can give an idea of whether or not a team is lacking in one department or the other. While this ignores events like double plays and caught stealings, it should still give us a rough idea of offenses that are good at one component, or the other. In 2005, the Pirates have plated 30.74% of their potential runs. The A's have scored 31.25%, the Indians 35.45%. Just as before, the A's look to be in the middle of the pack when comparing them to the other two meager offenses this year. But comparing the Pirates and A's to every team since 1990, they come in dead last. In the past 16 years, no two teams have been as bad at driving in runners they put on base as the A's and Pirates. For comparison's sake, here are the top and bottom ten teams since 1990 in runner scoring percentage: YEAR TEAM LOB R AVG OBP SLG R%         1994 CLE 762 679 .290 .348 .484 47.12 2000 CHA 1127 978 .286 .352 .470 46.46 1996 COL 1108 961 .287 .350 .472 46.45 2004 CHA 1031 865 .268 .330 .457 45.62 1995 CLE 1018 840 .291 .358 .479 45.21 1996 BAL 1154 949 .274 .348 .472 45.13 1997 COL 1124 923 .288 .353 .478 45.09 2005 DET 161 132 .273 .331 .433 45.05 1999 CLE 1234 1009 .289 .370 .467 44.98 2000 COL 1198 968 .294 .358 .455 44.69         1990 PHI 1242 646 .255 .324 .363 34.22 2003 LAN 1108 574 .243 .299 .368 34.13 1992 CHN 1148 593 .254 .303 .364 34.06 1990 SLN 1164 599 .256 .316 .358 33.98 1990 HOU 1132 573 .242 .309 .345 33.61 1992 BOS 1215 599 .246 .318 .347 33.02 1993 FLO 1189 582 .248 .311 .346 32.86 1992 LAN 1138 548 .248 .308 .339 32.50 2005 OAK 209 95 .237 .311 .338 31.25 2005 PIT 178 79 .230 .299 .359 30.74 When Will Carroll asked me about this on BP radio two weeks ago, I quickly responded that the reason for the A's struggles is their reliance on players with high onbase percentages (OBP) and not necessarily high slugging percentages (SLG). Especially in the A's situation, in which they've sought out players with deceptively high OBPs built mostly on walks rather than batting average, a team built on walks rather than slugging would seem to strand more runners than one built on batting average (AVG). The reason for this is quite simple: It's hard to take the extra base on a walk. But the Pirates don't necessarily fall into that category. Their team line of .230/.299/.359 is objectively terrible, but their SLG and isolated power (ISO) are both higher than the A's (.237/.311/.338) by margins of .021 and .028, and the Pirates have a higher ISO than any team in the bottom 10 in runner scoring percentage. If anything, Pittsburgh's better power numbers should mean that they would score a higher percentage of their runners on base than the A's, but that's not the case. Before jumping to any conclusions based on limited amounts of data, let's expand things to the full 15+ years worth of data we've got on hand. Of the three major rate stats, SLG has the highest correlation to runner scoring percentage, meaning that we can expect a team's slugging percentage to account for most of the changes in runner scoring percentage. In this case the correlation is positive, meaning the higher the slugging percentage, the higher the runner scoring percentage. Doing the same analysis for AVG and OBP reveals that all three stats have solid positive correlations; so as offense increases overall, the percentage of runners on base who score increases as well. Again, this is just a logical extension of the fact that there are only three bases where runners can be stranded, so as teams put more runners on those bases, more of them have to score. Essentially, teams can only strand up to three runners per inning, but they can hypothetically score an infinite number of runs. So if all three major rate stats have positive correlations to runner scoring percentage, we cannot say that teams with high OBP and low SLG will have a lower runner scoring percentagenot exactly, anyway. Because both OBP and SLG encompass AVG to some degree, the positive correlation of AVG may be overshadowing what we're really looking for. Instead, we can run a multivariable regression using all three major rate stats against runner scoring percentage. Doing so yields the following equation: Runner Scoring Percentage = 0.40*AVG + 0.21*OBP + 0.69*SLG + 0.07 (+/ 0.01) Note that when AVG and SLG are included in the regression, OBP actually has a negative effect on runner scoring percentage. This is exactly what we suspected: if AVG and SLG are held steady, increasing OBP (in this case only in the form of walks because AVG is constant) results in more baserunners, but not nearly as many runs as we'd expect if those baserunners reached on hits instead of walks. This doesn't mean that walks are a bad thing; it just means that teams with a disproportionate percentage of their baserunners coming on walks will have a higher percentage of their baserunners left on base than teams whose baserunners come from hits. Believe it or not, there's actually some hope here for A's and Pirates fans (and even Cleveland fans). Instead of using AVG, OBP, and SLG in the regression, we can try to remove the AVG component of OBP and SLG. For SLG, this is simply ISO. For OBP, it can be a little trickier because the denominators for the two stats are different, but to keep things simple, we'll just use OBPAVG and call it ISO_BB for now. It's not technically correct, but it still gives us a good idea of teams whose OBP is built more on walks than hits. Running things again, we now get this formula with a virtually identical correlation: Runner scoring percentage = 0.88*AVG + 0.21*ISO_BB + 0.69*ISO + 0.07 (+/ 0.01) Once again, the walks component of offense results in more baserunners but not the corresponding number of runs based on runner scoring percentage. Note that the coefficient for AVG has gone way up while the other two have remained very similar. As mentioned above, the Pirates (.230), A's (.237), and Indians (.226) have struggled mightily in the batting average department. All three teams are likely to see significant improvements in those numbers as the season moves along. As their batting average increases, all three teams will start to score a higher percentage of the runners they put on base. (The other major point made frequently by the mainstream media is a team's performance with runners in scoring position. On the whole, teams tend to bat very similarly with runners in scoring position than without and there doesn't appear to be any characteristics of teams that's indicative of a group that bats better or worse than expected with runners in scoring position. Part of the A's and Pirates' struggles is their ineptitude with runners on second or third, but those numbers aren't far out of line with their overall offensive performance and lend little to no additional information about runner scoring percentages.) As with any regression formula, forecasts for outliers are going to involve some extreme regression to the mean. In this case, the Pirates, instead of scoring 30.74% of their runners, would be expected to score 35.07%. The A's increase from 31.25% to 33.64% (Note that because of their higherOBP, lowerSLG numbers compared to the Pirates, the A's don't increase nearly as much. Scoring 33.64% of their runners would still rank them fifth worst since 1990). Applying those numbers to their actual run totals, the Pirates would be forecast to score 90 runs instead of their actual 79; the A's jump to 102 instead of 95. The Indians, however, are already scoring 35.45% of their runners, very close to their predicted average of 36.37%, a net of only two more runs. In Cleveland's case, it isn't that the offense can't get men home, it's that it can't get them on base in the first place. The A's and Pirates are better offenses than they've shown so far this year, and expecting them to maintain both their poor overall offensive pace and their poor ability to score runners on base is like expecting Brian Roberts to hit 48 home runs. Both teams should see a rebound, both because their team batting averages will increase and because they've been underperforming their runner scoring percentage so far this year. For now, they've both dug themselves quite a hole, and it may be a while before they climb out of it. 0 comments have been left for this article.
 