Baserunning is the neglected stepsister of offense. It tends not to be well correlated with metrics that evaluate hitting performance, and indeed is often more likely to be associated with defense. The common intuition that teams that play good defense tend to run the bases well to boot is corroborated by the data, which makes sense, given that speed plays an important role in both pursuits. The top ten baserunners in baseball by BRR are all considered above average defenders, and each plays—or is capable of playing—an up-the-middle position. At the same time, only Carl Crawford among them is considered a great hitter, and a couple of them (Emilio Bonifacio, I’m looking at you) are downright dreadful. In other words, baserunning is one of those dusty backwaters of analysis that continually confounds expectations.

**A Baserunning Koan**

What is great about baserunning is that it can be so deeply strange. For example, here’s a puzzle. In our baserunning metric, teams’ collective efforts on the bases are a zero-sum game. That is to say, over the last fifty-plus seasons, teams break even in terms of runs generated in each of those categories. When you add up all the runs produced against average (via run expectancy methodology) on grounders by all teams since 1954, you get a whopping 0.01 runs. Over 1408 team-seasons, in other words, it was basically all a wash. That makes sense, since the comparison is against *average* performance—the sum of performances compared to average should net out to zero. Do the same for runs gained on balls hit in the air, and the number comes out to an effective rounding error of 39.59 (or about 0.03 runs per team-season). For out advancement runs? The sum is just 0.21. So far, so good.

But what about stolen base runs? This is where things get interesting. In Dan Fox’s original article introducing EqSBR, the stolen base component of BRR, there was no adjustment made against average. The method was all run expectancy, no adjustment. The result is that the average team since 1954 has averaged -9.2 stolen base runs. The very best teams—the 2007 Philadelphia Phillies being the leader—have accrued fewer than 15 runs per season. The worst teams, by contrast, have cracked the -30 run barrier. The 1978 Athletics, led by the brutal-in-almost-every-way Mike Edwards, were worse than the best teams were good. The following histograms show the way in which EqSBR is thus different in kind from its BRR-component brethren:

So why is this? What sense does it make? One plausible reading is that teams are, and have almost always been, clumsy on the basepaths. It has long been a sabermetric criticism of managers that they run too often and in suboptimal situations. After all, Torii Hunter still gets caught stealing at third base with no outs down by two runs. This is the idea behind not normalizing stolen base runs, because it allows for some external criticism of player and team decisions. But is it necessarily the right way to approach the analysis? Another way to ask the question would be to wonder whether we might not be better off re-centering the stolen base runs so that average were equal to zero. According to this view, stolen base runs—expressed positively or negatively—would always be a comparison to how the league did as a whole. We’d lose the objective guidepost, but what if the guidepost were never objective to begin with?

**Over the Columns and Through the Rows . . .**

First, let’s understand the run expectancy methodology. The first step is to create an empirical run expectancy table. This is a concept undoubtedly familiar to many readers, so I’ll just explain it briefly here. A run expectancy table tells us how many runs were scored, on average, after a certain situation came about. For example, if there were a runner on first and no outs, the run expectancy might be something like 0.9 runs. We can calculate this data by going through play-by-play databases and figuring out what happened after every instance in which a runner was on first with no outs, and then averaging the results. This takes a lot of saying “please” to Colin Wyers, and also patience (because even today’s computers take time to generate results to broad queries like this), but the results are worth it.

For the question we want to answer, we need to restrict our focus to those situations in which there was a stolen base attempt. We would have to look at all attempts to steal second base, broken down both by number of outs and possible baserunner states (note that we wouldn’t have, say, runners on second and third, since then no one could steal second). In those attempts that were successful, we would calculate the number of runs that scored in the rest of the inning. In those that failed, we’d do the same. Each of those numbers would be compared against the run expectancy before the stolen base attempt was made. Sounds pretty good, right?

**“Aha!” Says the Man With the Database-Fu**

Unfortunately, two problems appear with this methodology, one of which is easier to fix than the other. The first is that the run expectancy from before the stolen base attempt was made is going to include lots of information about what happened in subsequent stolen base attempts. In other words, the fact that a runner might steal a base when on first with no outs affects the expected number or runs scored when a runner is on first with no outs. But what we really want to know is the run expectancy when compared against the situation where the runner didn’t attempt to steal at all.

“Aha!” says the man with the database-fu, “we can solve that!” We’ll just run a new query and look only at those situations in which there was no steal attempt. That sounds great, except that there’s something fishy about those situations where no steal was attempted. Like, *why* wasn’t a steal attempted? Maybe the situation didn’t call for it, or the pitcher was quick to the plate, or the catcher had a good arm. But it’s also possible that the runner on first was a crummy baserunner. If that’s true, then the run expectancy will be artificially low in that circumstance, because good baserunners are more likely to score than bad ones. (I hope this at least is an uncontroversial statement, but I’m growing increasingly unsure.)

The second problem has to do with the source of the data itself—play-by-play records. Caught stealing events include two types of outcomes: cases where the runner ran and was thrown out by the catcher and cases where a hit-and-run was called and went awry. In the latter case, there is a caught stealing recorded (which would thus show up in our run expectancy calculations) but in an important sense there was no stolen base attempt. The effect of this phenomenon is definitely in the negative direction, meaning that teams have artificially low baserunning totals when expressed in total runs and not normalized. And although the effect is likely spread across teams more or less equally, it affects the league as a whole each year. Even once recognized, this is a very difficult problem to quantify because the data to do so simply isn’t available.

**Question of the Day**

Given these limitations, which is the superior approach to expressing stolen base runs? Do the potential systematic errors counsel in favor of centering everything on zero? Wasn’t that Torii Hunter caught stealing just a disaster?

*Thanks to Colin Wyers for research assistance.*