“It has truly impacted my life. People are often remembered for one thing in their career, whether it’s good or bad. Fortunately for me, that stolen base is embedded in people’s minds.”Dave Roberts on the most famous stolen base in Red Sox history

We’ve covered quite a bit of ground over last the month or two in this series on baserunning. For those just joining us, here’s a quick recap. Baserunning is an aspect of the game that draws a lot of attention. After all, who doesn’t like the drama of the stolen base, or the excitement of the relay home as the runner narrowly beats the throw and avoids the tag with a nifty slide? Unfortunately, many of the things that players actually do on the bases go unaccounted for. No offense to Henry Chadwick, mind you, but this is the historical result of the way we keep records. Aside from the inferences we can draw from runs scored and today’s topic (stolen bases and caught stealing) even the most ardent fans don’t necessarily have a gut feel for the contributions most of their favorite players make on the basepaths.

With that in mind, our goal in this series is to quantify as many of the aspects of baserunning as we can in order to more accurately put this aspect of the game into perspective, using data from 2000-2005. We’ve already added to the work previously done to quantify advancement on hits by developing a framework and metrics for quantifying runner advancement on ground outs and runner advancement on outs in the air. Along the way, we’ve looked at how ballparks influence air advancement and how both can be looked at from the team perspective. Last week, I mentioned that it’s important to keep in mind that what these metrics measure is not really the actual number of runs a team gained or lost by an advancement event, but theoretically how the aggregated decisions made throughout the course of the season put the individual–and hence his team–in more or less advantageous situations measured in terms of runs. This is why the two metrics we created use the word “Equivalent” and not “Actual” in their names.

This week, we’ll delve into an area well-known by most folks in the performance analysis community by taking a look at base stealing.

More Methodology

For those who’ve read the previous articles in this series, the method based on the Run Expectancy matrix (actually, the average of the matrices for the 2000 through 2005 seasons) we’ll use to quantify this area of the game should come as no surprise. Simply put, we create a derived matrix that allows us to assign credit to each of the 24,253 total opportunities that we’ll use for this analysis.

These opportunities include not only stolen bases and caught stealing (a total of 23,210 attempts), but also plays in which the runner was picked off (1,043 pickoffs). However, we don’t give any credit to trailing runners in double-steal attempts, nor do we give credit for pickoff attempts that resulted in an error on the defense that allowed the runner to advance, since it is impossible to know from the play-by-play data whether the runner would have been out had the pitcher’s throw not been errant or had the fielder not dropped it. What that leaves us to work with is 90% of the 16,505 stolen bases during the six-year period, as well as the more than 1,000 pick offs.

The matrix looks as follows:

                                  Safe           Out
  Steal     Outs  Bases      Run Value     Run Value
    2nd        0    1x3         0.1857       -0.8660
    2nd        0    1xx         0.2396       -0.6339
    2nd        1    1x3         0.2162       -0.8424
    2nd        1    1xx         0.1519       -0.4405
    2nd        2    1x3         0.0894       -0.5099
    2nd        2    1xx         0.0930       -0.2422
    3rd        0    12x         0.3174       -0.9745
    3rd        0    x2x         0.2808       -0.8735
    3rd        1    12x         0.2780       -0.6937
    3rd        1    x2x         0.2740       -0.5924
    3rd        2    12x         0.0548       -0.4551
    3rd        2    x2x         0.0363       -0.3352
   Home        0    1x3         0.0767       -1.2919
   Home        0    x23         0.1305       -1.3258
   Home        0    xx3         0.0885       -1.1544
   Home        1    1x3         0.3385       -0.9717
   Home        1    xx3         0.3088       -0.8664
   Home        1    x23         0.2741       -1.0950
   Home        2    123         0.6663       -0.7888
   Home        2    1x3         0.7323       -0.5099
   Home        2    x23         0.7359       -0.5993
   Home        2    xx3         0.7404       -0.3715

There are a couple of points regarding this matrix you should keep in mind. First, both the safe and out run values are calculated on the assumption that other runners will not advance. In other words, the run values assigned to the situation where the runner attempts to steal second with nobody out and runners on first and third assume that the runner on third will remain at third regardless of the outcome of the stolen base attempt. In that scenario, the run values are calculated as

Safe Run Value = (RunExp for x23/0) - (RunExp for 1x3/0)
  0.1857       =       2.0300       -     1.8443

Out Run Value = (RunExp for xx3/1) - (RunExp for 1x3/0)
  -0.8660     =       0.9783       -     1.8443

The ending base/out states of x23/0 and xx3/1 do not cover all of the possibilities. Throwing errors by catchers, dropped balls by fielders, passed balls, and wild pitches all change the ending state on particular plays.

For example, in the data set used for this article, there were 11 plays out of 261 opportunities in this particular scenario where an error or wild pitch allowed the runner on third to score. That extra advancement is not credited to the offense, since it was the product of a defensive lapse that the offense essentially had no control over. As mentioned previously, we’re attempting to quantify the theoretical number, not the actual number of runs contributed. In the same way the runner is not debited when the batter strikes out during a stolen base attempt or–in the case of the Phillies’ Placido Polanco on September 3, 2002–when both the batter (Marlon Anderson) strikes out and the runner on third (Travis Lee) is subsequently thrown out at home as part of a triple play.

If what we were attempting to measure was the strategic value of stolen base attempts, we would factor in all the actual outcomes, and even try to assess the influence of errors for particular runners.

Secondly, what’s obvious from the table is that the cost of getting caught stealing far outweighs the benefit in all cases, except when you have runners on first and third, second and third, or just third with two outs and attempting to steal home. Unfortunately, that’s a pretty low percentage play, and was successful just 42 times in 121 attempts from 2000 through 2005.

This latter point is what usually leads statheads like me to decry the running game as overvalued, and therefore at best an inefficient way to score runs, and at worst a real detriment to doing so depending on your success rate. The case was very well made by Dayn Perry in his book Winners. The average run value for all stolen base attempts (not pickoffs) over the entire data set was actually -0.041662 runs. In other words, the average stolen base attempt in the major leagues actually cost the team runs, since the success rate of exactly 67% was a shade under the rate needed to break even. Breaking it down further in the following two tables, we can see that attempted steals of home are especially costly, while steals of second with two outs are the least costly.

2000-2005 Avg Run Value By Stolen
Base Attempted
SB Att    Steal      Run Value
 20058   Second       -0.03534
  2838    Third       -0.03587
   314     Home       -0.49763

2000-2005 Avg Run Value By Number of Outs
and Stolen Base Attempted
  SB Att    Steal   Outs      Run Value
    5154    Second      0       -0.0582
    7213    Second      1       -0.0549
    7691    Second      2       -0.0017
     424     Third      0       -0.1215
    1624     Third      1       -0.0230
     790     Third      2       -0.0164
      10      Home      0       -1.0317
     179      Home      1       -0.7646
     125      Home      2       -0.0726

Keep in mind, this doesn’t even consider the disruption that stolen base attempts have on the hitter at the plate, as discussed in The Book.

What this analysis misses is the differentiation between the stolen base as a general purpose weapon versus a tactical gambit. Although it’s been a tough week for Red Sox fans–as documented by Joe Sheehan and Jim Baker–one need only think back to Game 4 of the 2004 American League Championship Series. It’s the bottom of the ninth, with the Sox trailing 4-3 with Mariano Rivera on the mound. After a Kevin Millar walk, Dave Roberts pinch-ran, and everyone watching knew what the plan was. Roberts stole second on the first pitch, and the rest is history.

In that case the strategic situation–including the pitcher, the score, and the need to plate a single run instead of maximize the potential number of runs scored in the remainder of the inning–suggested that a stolen base attempt was called for. Two ways to attempt to capture this other dimension include using a scoring probability rather than a Run Expectancy matrix, and to use Keith Woolner’s Win Expectancy Matrix, as Keith himself did in an essay for Baseball Prospectus 2006. The point to take away here is that the metric we’re creating for this analysis, like the others, measures this aspect of baserunning in terms of its general benefit to run scoring. (Remember, think DePodesta’s mantra: “be the house.”)

Although not shown in the matrix, there are a few additional rows used in the calculations to deal with the times when runners are picked off in various base/out combinations. For these rows, we make the same assumptions as we do for caught stealings, namely that other runners would not have advanced. With the matrix established, we can assign the values to each of the opportunities. All that’s left is to sum them up to derive an Equivalent Stolen Base Runs (EqSBR) metric that we can add to our toolbox.

If you’ve hung on this long you’re probably ready for some numbers, so let’s start by taking a look at the top and bottom ten in EqSBR for 2005.

Top and Bottom 10 for 2005
Name              SB Att      PO      CS   EqSBR
Alfonso Soriano       32       0       2    4.92
Johnny Damon          19       0       1    2.84
Kenny Lofton          24       0       3    2.64
Jason Bay             22       0       1    2.39
Rafael Furcal         56       1      10    2.36
Jimmy Rollins         47       3       6    2.24
Torii Hunter          30       0       7    2.18
Marcus Giles          18       0       3    2.04
Willie Bloomquist     15       1       1    1.74
Jose Reyes            75       2      15    1.73

Brad Wilkerson        17       2      10   -6.23
Oscar Robles           8       0       8   -4.26
Brady Clark           23       0      13   -4.20
Jeromy Burnitz         9       3       4   -4.02
Juan Rivera           10       0       9   -3.76
Jeremy Reed           22       1      11   -3.72
Randy Winn            29       1      11   -3.69
Nick Johnson          11       1       8   -3.43
Jerry Hairston        17       0       9   -3.32
Luis Matos            26       2       9   -3.06

As you can see, the range here is on the order of roughly +5 to -5 runs, or the equivalent of about a win’s difference between the best and worst. Also keep in mind that although someone like Jason Bay was 21 for 22 while Johnny Damon was 18 for 19, and neither was picked off, Damon scores higher because when Bay did get caught, it came with a runner on third (thereby costing him a bit more), and three of his stolen bases were of third with two outs, which are worth relatively little from a Run Expectancy standpoint because a runner on second with two outs is worth almost as much as a runner on third with two outs–Joe Morgan‘s oft-heard commentary regarding all the many additional ways there are to score from third than from second aside.

Next let’s take a look at the seasonal leaders and trailers during the 2000-2005 period.

Seasonal Leaders and Trailers for 2000-2005
Year    Name              SB Att      PO      CS   EqSBR
2000    Eric Young            59       1       7    5.00
2001    Derek Jeter           29       0       3    3.21
2002    Derek Jeter           35       0       3    4.20
2003    Carl Crawford         64       2      10    4.76
2004    Dave Roberts          41       1       3    4.21

2000    Vladimir Guerrero     19       3      10   -4.87
2001    Vladimir Guerrero     50       2      16   -5.40
2002    Cristian Guzman       23       0      13   -4.44
2003    Luis Castillo         39       1      19   -5.77
2004    Juan Pierre           69       4      24   -6.61

One of the interesting aspects to these lists is that although Vladimir Guerrero had a close to break-even success rate of 68% in 2001 (34 of 50) he was credited with a whopping -5.4 runs. A breakdown of the events in which he was negatively credited helps explain why.

Next Bas  Outs   Bases        Play Code    Run Value
Home          1    1x3        POCSH(12)      -0.9717
Home          1    xx3     K+PO3(25)/DP      -0.8664
2nd           1    1x3          CS2(24)      -0.8424
3rd           1    12x      CS3(25).1-2      -0.6937
3rd           1    12x      CS3(25).1-2      -0.6937
2nd           0    1xx        POCS2(16)      -0.6339
2nd           0    1xx          CS2(26)      -0.6339
2nd           0    1xx          CS2(24)      -0.6339
2nd           1    12x      CS2(24).2-3      -0.6007
2nd           2    1x3          PO1(13)      -0.5099
Home          2    1x3       POCSH(132)      -0.5099
Third         2    12x          CS3(25)      -0.4551
2nd           1    1xx          CS2(24)      -0.4405
2nd           1    1xx       POCS2(136)      -0.4405
2nd           1    1xx      POCS2(1363)      -0.4405
2nd           2    1xx       POCS2(136)      -0.2422
2nd           2    1xx          CS2(26)      -0.2422
2nd           2    1xx       POCS2(136)      -0.2422

As you can see, he was caught stealing home twice, and picked off of third another time, in addition to being thrown out at third twice with a runner on first. All told, those 18 negative events put him over ten runs in a hole that his 34 successful stolen bases couldn’t make up for.

This also shows that although the Marlins featured a combination known for their speed at the top of the order, that speed in the aggregate often cost the team runs. Juan Pierre was the leader in getting picked off, 15 times, from 2000 through 2005. In total, Pierre came in with -8.85 EqSBR, while Luis Castillo came in at -8.21.

Finally, keep in mind that in the play-by-play data, the only evidence of a failed hit-and-run is most often the caught stealing that accompanies it. As a result, players who are on the front end of these scenarios more frequently than others will tend to do a little poorer than they otherwise deserve.

Let’s round out this column with the leaders and trailers in EqSBR for the entire six year period.

Leaders and Trailers for 2000-2005
Name              SB Att      PO      CS   EqSBR
Carlos Beltran       195       9      21   11.75
Derek Jeter          149       0      24   10.59
Johnny Damon         212       5      42    9.03
Carl Crawford        204       9      38    8.53
Tom Goodwin          140       7      25    5.15
Doug Glanville       104       1      17    5.00
Scott Podsednik      218       9      46    4.52
Darin Erstad         134       2      26    4.10
Craig Biggio          76       1      15    3.40
Roberto Alomar       109       3      18    3.30

Brad Wilkerson        77       4      35  -15.80
Vladimir Guerrero    170       9      55  -15.05
Jeromy Burnitz        60       5      29  -14.20
Jason Kendall        125       1      52  -13.42
Alex Sanchez         180       7      58  -11.31
Neifi Perez           59       4      28  -11.03
Fernando Vina         80       5      31  -10.37
Ray Durham           138       9      44  -10.24
Jose Hernandez        35       1      21  -10.18
Cristian Guzman      146       3      49   -9.96

Now that we’ve covered advancing on outs and the running game, we can make a final push by including advancing on hits to come up with our “total baserunning metric.” Stay tuned.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Dan Fox


You need to be logged in to comment. Login or Subscribe