“It has truly impacted my life. People are often remembered for one thing in their career, whether it’s good or bad. Fortunately for me, that stolen base is embedded in people’s minds.”—Dave Roberts on the most famous stolen base in Red Sox history
We’ve covered quite a bit of ground over last the month or two in this series on baserunning. For those just joining us, here’s a quick recap. Baserunning is an aspect of the game that draws a lot of attention. After all, who doesn’t like the drama of the stolen base, or the excitement of the relay home as the runner narrowly beats the throw and avoids the tag with a nifty slide? Unfortunately, many of the things that players actually do on the bases go unaccounted for. No offense to Henry Chadwick, mind you, but this is the historical result of the way we keep records. Aside from the inferences we can draw from runs scored and today’s topic (stolen bases and caught stealing) even the most ardent fans don’t necessarily have a gut feel for the contributions most of their favorite players make on the basepaths.
With that in mind, our goal in this series is to quantify as many of the aspects of baserunning as we can in order to more accurately put this aspect of the game into perspective, using data from 2000-2005. We’ve already added to the work previously done to quantify advancement on hits by developing a framework and metrics for quantifying runner advancement on ground outs and runner advancement on outs in the air. Along the way, we’ve looked at how ballparks influence air advancement and how both can be looked at from the team perspective. Last week, I mentioned that it’s important to keep in mind that what these metrics measure is not really the actual number of runs a team gained or lost by an advancement event, but theoretically how the aggregated decisions made throughout the course of the season put the individual–and hence his team–in more or less advantageous situations measured in terms of runs. This is why the two metrics we created use the word “Equivalent” and not “Actual” in their names.
This week, we’ll delve into an area well-known by most folks in the performance analysis community by taking a look at base stealing.
For those who’ve read the previous articles in this series, the method based on the Run Expectancy matrix (actually, the average of the matrices for the 2000 through 2005 seasons) we’ll use to quantify this area of the game should come as no surprise. Simply put, we create a derived matrix that allows us to assign credit to each of the 24,253 total opportunities that we’ll use for this analysis.
These opportunities include not only stolen bases and caught stealing (a total of 23,210 attempts), but also plays in which the runner was picked off (1,043 pickoffs). However, we don’t give any credit to trailing runners in double-steal attempts, nor do we give credit for pickoff attempts that resulted in an error on the defense that allowed the runner to advance, since it is impossible to know from the play-by-play data whether the runner would have been out had the pitcher’s throw not been errant or had the fielder not dropped it. What that leaves us to work with is 90% of the 16,505 stolen bases during the six-year period, as well as the more than 1,000 pick offs.
The matrix looks as follows:
Safe Out Steal Outs Bases Run Value Run Value 2nd 0 1x3 0.1857 -0.8660 2nd 0 1xx 0.2396 -0.6339 2nd 1 1x3 0.2162 -0.8424 2nd 1 1xx 0.1519 -0.4405 2nd 2 1x3 0.0894 -0.5099 2nd 2 1xx 0.0930 -0.2422 ---------------------------------------------------- 3rd 0 12x 0.3174 -0.9745 3rd 0 x2x 0.2808 -0.8735 3rd 1 12x 0.2780 -0.6937 3rd 1 x2x 0.2740 -0.5924 3rd 2 12x 0.0548 -0.4551 3rd 2 x2x 0.0363 -0.3352 ---------------------------------------------------- Home 0 1x3 0.0767 -1.2919 Home 0 x23 0.1305 -1.3258 Home 0 xx3 0.0885 -1.1544 Home 1 1x3 0.3385 -0.9717 Home 1 xx3 0.3088 -0.8664 Home 1 x23 0.2741 -1.0950 Home 2 123 0.6663 -0.7888 Home 2 1x3 0.7323 -0.5099 Home 2 x23 0.7359 -0.5993 Home 2 xx3 0.7404 -0.3715
There are a couple of points regarding this matrix you should keep in mind. First, both the safe and out run values are calculated on the assumption that other runners will not advance. In other words, the run values assigned to the situation where the runner attempts to steal second with nobody out and runners on first and third assume that the runner on third will remain at third regardless of the outcome of the stolen base attempt. In that scenario, the run values are calculated as
Safe Run Value = (RunExp for x23/0) - (RunExp for 1x3/0) 0.1857 = 2.0300 - 1.8443
Out Run Value = (RunExp for xx3/1) - (RunExp for 1x3/0) -0.8660 = 0.9783 - 1.8443
The ending base/out states of x23/0 and xx3/1 do not cover all of the possibilities. Throwing errors by catchers, dropped balls by fielders, passed balls, and wild pitches all change the ending state on particular plays.
For example, in the data set used for this article, there were 11 plays out of 261 opportunities in this particular scenario where an error or wild pitch allowed the runner on third to score. That extra advancement is not credited to the offense, since it was the product of a defensive lapse that the offense essentially had no control over. As mentioned previously, we’re attempting to quantify the theoretical number, not the actual number of runs contributed. In the same way the runner is not debited when the batter strikes out during a stolen base attempt or–in the case of the Phillies’ Placido Polanco on September 3, 2002–when both the batter (Marlon Anderson) strikes out and the runner on third (Travis Lee) is subsequently thrown out at home as part of a triple play.
If what we were attempting to measure was the strategic value of stolen base attempts, we would factor in all the actual outcomes, and even try to assess the influence of errors for particular runners.
Secondly, what’s obvious from the table is that the cost of getting caught stealing far outweighs the benefit in all cases, except when you have runners on first and third, second and third, or just third with two outs and attempting to steal home. Unfortunately, that’s a pretty low percentage play, and was successful just 42 times in 121 attempts from 2000 through 2005.
This latter point is what usually leads statheads like me to decry the running game as overvalued, and therefore at best an inefficient way to score runs, and at worst a real detriment to doing so depending on your success rate. The case was very well made by Dayn Perry in his book Winners. The average run value for all stolen base attempts (not pickoffs) over the entire data set was actually -0.041662 runs. In other words, the average stolen base attempt in the major leagues actually cost the team runs, since the success rate of exactly 67% was a shade under the rate needed to break even. Breaking it down further in the following two tables, we can see that attempted steals of home are especially costly, while steals of second with two outs are the least costly.
2000-2005 Avg Run Value By Stolen Base Attempted Avg SB Att Steal Run Value 20058 Second -0.03534 2838 Third -0.03587 314 Home -0.49763
2000-2005 Avg Run Value By Number of Outs and Stolen Base Attempted Avg SB Att Steal Outs Run Value 5154 Second 0 -0.0582 7213 Second 1 -0.0549 7691 Second 2 -0.0017 424 Third 0 -0.1215 1624 Third 1 -0.0230 790 Third 2 -0.0164 10 Home 0 -1.0317 179 Home 1 -0.7646 125 Home 2 -0.0726
Keep in mind, this doesn’t even consider the disruption that stolen base attempts have on the hitter at the plate, as discussed in The Book.
What this analysis misses is the differentiation between the stolen base as a general purpose weapon versus a tactical gambit. Although it’s been a tough week for Red Sox fans–as documented by Joe Sheehan and Jim Baker–one need only think back to Game 4 of the 2004 American League Championship Series. It’s the bottom of the ninth, with the Sox trailing 4-3 with Mariano Rivera on the mound. After a Kevin Millar walk, Dave Roberts pinch-ran, and everyone watching knew what the plan was. Roberts stole second on the first pitch, and the rest is history.
In that case the strategic situation–including the pitcher, the score, and the need to plate a single run instead of maximize the potential number of runs scored in the remainder of the inning–suggested that a stolen base attempt was called for. Two ways to attempt to capture this other dimension include using a scoring probability rather than a Run Expectancy matrix, and to use Keith Woolner’s Win Expectancy Matrix, as Keith himself did in an essay for Baseball Prospectus 2006. The point to take away here is that the metric we’re creating for this analysis, like the others, measures this aspect of baserunning in terms of its general benefit to run scoring. (Remember, think DePodesta’s mantra: “be the house.”)
Although not shown in the matrix, there are a few additional rows used in the calculations to deal with the times when runners are picked off in various base/out combinations. For these rows, we make the same assumptions as we do for caught stealings, namely that other runners would not have advanced. With the matrix established, we can assign the values to each of the opportunities. All that’s left is to sum them up to derive an Equivalent Stolen Base Runs (EqSBR) metric that we can add to our toolbox.
If you’ve hung on this long you’re probably ready for some numbers, so let’s start by taking a look at the top and bottom ten in EqSBR for 2005.
Top and Bottom 10 for 2005 Name SB Att PO CS EqSBR Alfonso Soriano 32 0 2 4.92 Johnny Damon 19 0 1 2.84 Kenny Lofton 24 0 3 2.64 Jason Bay 22 0 1 2.39 Rafael Furcal 56 1 10 2.36 Jimmy Rollins 47 3 6 2.24 Torii Hunter 30 0 7 2.18 Marcus Giles 18 0 3 2.04 Willie Bloomquist 15 1 1 1.74 Jose Reyes 75 2 15 1.73 Brad Wilkerson 17 2 10 -6.23 Oscar Robles 8 0 8 -4.26 Brady Clark 23 0 13 -4.20 Jeromy Burnitz 9 3 4 -4.02 Juan Rivera 10 0 9 -3.76 Jeremy Reed 22 1 11 -3.72 Randy Winn 29 1 11 -3.69 Nick Johnson 11 1 8 -3.43 Jerry Hairston 17 0 9 -3.32 Luis Matos 26 2 9 -3.06
As you can see, the range here is on the order of roughly +5 to -5 runs, or the equivalent of about a win’s difference between the best and worst. Also keep in mind that although someone like Jason Bay was 21 for 22 while Johnny Damon was 18 for 19, and neither was picked off, Damon scores higher because when Bay did get caught, it came with a runner on third (thereby costing him a bit more), and three of his stolen bases were of third with two outs, which are worth relatively little from a Run Expectancy standpoint because a runner on second with two outs is worth almost as much as a runner on third with two outs–Joe Morgan‘s oft-heard commentary regarding all the many additional ways there are to score from third than from second aside.
Next let’s take a look at the seasonal leaders and trailers during the 2000-2005 period.
Seasonal Leaders and Trailers for 2000-2005 Year Name SB Att PO CS EqSBR 2000 Eric Young 59 1 7 5.00 2001 Derek Jeter 29 0 3 3.21 2002 Derek Jeter 35 0 3 4.20 2003 Carl Crawford 64 2 10 4.76 2004 Dave Roberts 41 1 3 4.21 2000 Vladimir Guerrero 19 3 10 -4.87 2001 Vladimir Guerrero 50 2 16 -5.40 2002 Cristian Guzman 23 0 13 -4.44 2003 Luis Castillo 39 1 19 -5.77 2004 Juan Pierre 69 4 24 -6.61
One of the interesting aspects to these lists is that although Vladimir Guerrero had a close to break-even success rate of 68% in 2001 (34 of 50) he was credited with a whopping -5.4 runs. A breakdown of the events in which he was negatively credited helps explain why.
Next Bas Outs Bases Play Code Run Value Home 1 1x3 POCSH(12) -0.9717 Home 1 xx3 K+PO3(25)/DP -0.8664 2nd 1 1x3 CS2(24) -0.8424 3rd 1 12x CS3(25).1-2 -0.6937 3rd 1 12x CS3(25).1-2 -0.6937 2nd 0 1xx POCS2(16) -0.6339 2nd 0 1xx CS2(26) -0.6339 2nd 0 1xx CS2(24) -0.6339 2nd 1 12x CS2(24).2-3 -0.6007 2nd 2 1x3 PO1(13) -0.5099 Home 2 1x3 POCSH(132) -0.5099 Third 2 12x CS3(25) -0.4551 2nd 1 1xx CS2(24) -0.4405 2nd 1 1xx POCS2(136) -0.4405 2nd 1 1xx POCS2(1363) -0.4405 2nd 2 1xx POCS2(136) -0.2422 2nd 2 1xx CS2(26) -0.2422 2nd 2 1xx POCS2(136) -0.2422
As you can see, he was caught stealing home twice, and picked off of third another time, in addition to being thrown out at third twice with a runner on first. All told, those 18 negative events put him over ten runs in a hole that his 34 successful stolen bases couldn’t make up for.
This also shows that although the Marlins featured a combination known for their speed at the top of the order, that speed in the aggregate often cost the team runs. Juan Pierre was the leader in getting picked off, 15 times, from 2000 through 2005. In total, Pierre came in with -8.85 EqSBR, while Luis Castillo came in at -8.21.
Finally, keep in mind that in the play-by-play data, the only evidence of a failed hit-and-run is most often the caught stealing that accompanies it. As a result, players who are on the front end of these scenarios more frequently than others will tend to do a little poorer than they otherwise deserve.
Let’s round out this column with the leaders and trailers in EqSBR for the entire six year period.
Leaders and Trailers for 2000-2005 Name SB Att PO CS EqSBR Carlos Beltran 195 9 21 11.75 Derek Jeter 149 0 24 10.59 Johnny Damon 212 5 42 9.03 Carl Crawford 204 9 38 8.53 Tom Goodwin 140 7 25 5.15 Doug Glanville 104 1 17 5.00 Scott Podsednik 218 9 46 4.52 Darin Erstad 134 2 26 4.10 Craig Biggio 76 1 15 3.40 Roberto Alomar 109 3 18 3.30 Brad Wilkerson 77 4 35 -15.80 Vladimir Guerrero 170 9 55 -15.05 Jeromy Burnitz 60 5 29 -14.20 Jason Kendall 125 1 52 -13.42 Alex Sanchez 180 7 58 -11.31 Neifi Perez 59 4 28 -11.03 Fernando Vina 80 5 31 -10.37 Ray Durham 138 9 44 -10.24 Jose Hernandez 35 1 21 -10.18 Cristian Guzman 146 3 49 -9.96
Now that we’ve covered advancing on outs and the running game, we can make a final push by including advancing on hits to come up with our “total baserunning metric.” Stay tuned.