A lot of sabermetric research has gone into understanding when to steal, and when not to. Most of this research focuses on understanding the number of outs in the inning and the current position of the runners and calculating the increased run expectancy of a successful attempt versus the decreased run expectancy of an unsuccessful attempt. I direct the new reader to Joe Sheehan‘s article in the Baseball Prospectus Basics series as a great primer to this concept. Almost all of the research ends up with statements like, ‘If the success rate for a stolen base is at least X percent in situation Y, then it’s a good tactic to steal in this situation.’ However, very little research has been devoted to understanding what the success rate X is likely to be, or it has been assumed that simply looking at either the runner’s success rate is a good enough measure for that.

Understanding the Context

In The Book, the authors state that everything about baseball is about context. Context can include every single aspect of any given situation: the pitcher, the catcher, the baserunner, the batter, the home park, the inning, the score differential, and whether Glenn Close is standing up in the bleachers wearing white as the sun streams through her hat to create an angelic vision for inspiration. However, our mind can go numb thinking of all the possible variables, so as a first step, it’s important to hypothesize about the likely key factors, and then go from there. To start, let’s assume that the stolen base is largely dependent on three players: the pitcher, the catcher, and the baserunner.

The Approach

I pulled each situation from 2004-2008 that met the following criteria:

  • A stolen base (defensive interferences are excluded) of second was attempted and no pickoff occurred (i.e., a pitch was made to the catcher);
  • There were no other runners on base except the runner on first.

This filter resulted in 12,095 stolen-base attempts. My next step was to put those attempts into two sets: those that occurred from 2004-07 (9,700 attempts) in one set called a “training set,” and those from 2008 (2,395 attempts) in a second set, called a “test” set. My goal was to see how accurately we could predict stolen-base success rates in 2008 from the previous four years of data, by developing a model that says when runner A is on first and pitcher B and catcher C are the battery in game situation Y, the likely success rate of an attempted steal will be X percent.

To predict the success rate, we could have done one of three simple things:

  • The success rate against the catcher is the best predictor of future stolen-base success;
  • The success rate against the pitcher is the best predictor of future stolen-base success;
  • The success rate of the baserunner is the best predictor of future stolen-base success

For a more sophisticated method, I’m going to use log5 methodology to combine the pitcher, catcher, and batter’s previous statistics into another estimation. However, before we do any of these models, we need to handle the issue of small sample sizes.

Dealing with Small Sample Sizes: Regression to the Mean

To handle small sample sizes, we employ a technique called regression to the mean which is based on ‘true score theory.’ Basically, we are saying that a player’s true ability is some mixture of his observed performance and the average performance of some smartly selected population to which that player belongs. As we have more data about the individual player, the more heavily we weight the player’s observed performance. In a situation where the catcher has only had one stolen-base attempt against him, we will likely use a value that is pretty close to the general population’s caught-stealing rate, while for a catcher like Jorge Posada-who had 321 attempts in the training set-we will use a value that is pretty close to his historical caught-stealing rate.

One of the keys in properly doing regression to the mean is the base population to which we regress. In my regression to the mean, I compared players to the set of players who were involved in a similar amount of attempts instead of the overall population. To give an idea of what the success rates were, the tables below present the breakouts based on the number of attempts that the player was involved in from 2004 to 2007. For catchers, there was a relatively narrow spread of success rates, while for runners the spread is rather large.

For pitchers and catchers, there’s a similar pattern. The players who had many attempts against them had a slightly worse caught-stealing rate than the overall population, likely caused by opponents realizing a weakness and exploiting it as much as possible. Players in the middle of attempts had the best caught-stealing rate. For catchers this is likely a function of lots of playing time, but fewer attempts due to their throwing prowess.

# of SB
Attempts      # of       Total    Success
2004-07      Catchers   Attempts    Rate
 150+          16        3,671     77.6%
50-149         47        4,574     73.9%
 1-24          84        1,455     75.3%
Total         147        9,700     74.6%

When we look at pitchers, we see a similar pattern as with catchers, but the success rates have a bit more spread.

# of SB
Attempts     # of        Total     Success
2004-07    Pitchers    Attempts      Rate
  50+          32        2,122      79.7%
20-49         114        3,278      72.8%
1-19          700        4,300      75.4%

On the other side of the ball (no pun intended), a much more stark pattern is abundantly clear with the basestealers: players who only had a few attempts (some of which were undoubtedly botched hit-and-runs), the success rates were in the low sixties, while runners who had many attempts (50+) had success rates in the lower eighties.

# of SB
Attempts    # of       Total     Success
2004-07    Runners   Attempts      Rate
 50+         37        3,482      80.7%
25-49        77        2,657      76.5%
15-24        69        1,316      73.5%
 5-14       193        1,645      69.1%
 1-4        304          600      62.7%

Regressing to the mean and using our log5, for each stolen-base attempt we can predict a “base” success rate. To evaluate the efficacy of this model, we would like to see a few things. First, in the aggregate, we want to see if the predicted stolen-base success rate matches up with what was observed. Second, whether there is a good amount of separation for the expected stolem-base success rate, i.e., the attempts don’t all clump into a central bucket.

In the table below, I grouped stolen-base attempts into five percent buckets based on the predicted success rates. For example, the 80 percent bucket includes all stolen-base attempts where we predict the success rate to be between 77.5 and 82.5 percent. I capped the buckets (but not the success rates of the individual attempts) to 90 percent on the high side and 60 percent on the low. So, if the model predicts a success rate of only 42.1 percent, then that attempt is still put into the 60 percent bucket.

Success        # of      Predicted    Actual
Rate Bucket   Attempts    Success    Success
  90%          1,830        91.7%      93.8%
  85%          1,543        84.5%      88.4%
  80%          1,450        80.0%      84.1%
  75%          1,319        75.2%      74.3%
  70%          1,049        70.1%      71.9%
  65%            800        65.1%      62.1%
  60%          1,709        52.5%      46.1%
Total          9,700        75.2%      74.6%

It seems that the actual rates have a little more spread than what our regressed and log5 model predicts. For example, at the likely successful attempts, our model seems to predict a slightly lower success rate, while at the lower success levels, the model slightly overestimates the expected success rate. This could be the fact that the regression to the mean is too strong, or that there are other factors going on based on the situation that this base model does not predict. With that said, one nice thing is that there is a relatively even distribution of attempts in many buckets, and not a clumping into the center 75 percent bucket. We’re going to look at a number of situations to determine if we can improve this model based on a better understanding of the situation.

Determining situational effects

We’re going to look at four possible aspects, and give a hypothesis on why this may affect the situation:

  • Batter’s hitting prowess: As a better hitter is up at the plate, the pitcher and catcher are more focused on getting the hitter out, allowing the runner a better jump. Our expectation would be that the better the hitter the higher the success rate.
  • Game Score differential: If the game score is lopsided, then there may be more emphasis by the defense on getting the batter out than to focus on the slight improvement in run expectancy.
  • Count: When the pitcher is ahead in the count, his focus will switch to the batter to get the batter, thus in pitcher counts we would expect the success rate to improve.
  • Outs: With two outs, the inning will end by simply getting the batter out, and therefore less emphasis is placed on the runner, leading to better success rates.

Batter Quality

In the BP Idol competition, I suggested that the batter (or at least the batter’s handedness) is largely irrelevant to stolen-base success rates. However, I did not look at the mere menacing presence of the hitter in the box. To model the batter’s hitting prowess, I looked at the batter’s OPS in the previous 365 calendar days as a proxy for the defense’s estimate of the perceived danger for the defense. For those players who had less than 200 PAs in the past 365 days, their OPS was regressed to an average replacement player. I then grouped the hitters into three buckets: Bad (720 OPS or below), Good (830 OPS and above), Average (everyone else). I selected these breakpoints because they put roughly 50 percent of the attempts in the average bucket, 25 percent in the bad, and 25 percent in the good bucket.

Hitter        # of        Predicted   Actual
Bucket       Attempts     Success     Success
Bad            2,740        73.1%      71.1%
Average        4,412        75.1%      75.4%
Good           2,548        77.4%      80.3%

There are two things to note here. The standard logic of not trying to send a runner with a good hitter up plays out as the situations. When a good hitter is up, the team seems to only send the runner when the likely success rate is going to be high anyway (77.4 percent, compared to 75.1 percent). Also, more importantly, even with this increased likelihood of success, the actual success rate is even higher, suggesting that the battery is likely focusing more on the hitter than the runner. The decreased success rate compared to predicted with a bad hitter up is technically not statistically significant, however, it is really close.

Game Score Differential

I put the game score differential into one of six buckets to help identify any high-level trends:

Score Bucket          Definition
Blowout               Either team up by 5 or more runs
Offense Comfortable   Team on offense is leading by 3 or 4 runs
Offense Slim          Team on offense is leading by 1 or 2 runs
Defense Comfortable   Team on offense is trailing by 3 or 4 runs
Defense Slim          Team on offense is trailing by 1 or 2 runs
Tied                  Game is Tied

The comparison of the predicted success rates and the actual success rates are as follows:

Score                  # of     Predicted   Actual
Bucket               Attempts    Success    Success
Blowout                 445        76.7%       84.3%
Offense Comfortable   1,307        74.2%       75.7%
Offense Slim          2,511        74.2%       72.6%
Defense Comfortable     512        78.0%       83.8%
Defense Slim          1,749        75.5%       76.7%
Tied                  3,176        75.6%       74.5%

The only two situations where there is any statistical significance are with the blowout or when the defense has a comfortable lead. In both situations, the defense is likely focusing on just getting outs, and therefore has more of its attention tuned to the batter.

The Count

In a great blog post by Joe Posnanski, an old baseball scout told him that all the secret and intrigues of baseball can be anticipated by following the count. Is this true for the stolen base as well?

I grouped counts into one of three buckets. Essentially all early counts and the full count are neutral, all two-strike counts (except full) are pitcher counts, while all two- and three- ball counts are hitter counts (except those with two strikes). As you will see, there’s a slight decrease in success rate compared to predicted on hitter’s counts (likely because the defense is expecting the stolen base), while in pitcher counts there is a significant bump in success rates, likely because the battery is focusing on getting the hitter out.

Count                    # of       Predicted   Actual
Bucket    Definition    Attempts    Success     Success
Neutral   0-0,1-0,1-1    6,718       75.6%      75.2%
Hitter    2-0,2-1,       1,335       74.8%      73.0%
Pitcher   0-2,1-2,2-2    1,647       73.9%      78.4%

The Number of Outs

Lastly, I was considering how the number of outs affected the success rate. The table below shows the differences in predicted stolen-base success and actual, based on the number of outs in the inning:

         # of    Predicted    Actual
Outs   Attempts   Success    Success
0        2,538      76.0%      73.7%
1        3,220      75.0%      74.4%
2        2,942      74.8%      77.4%

With zero outs, the defense gets more focused with the runner and the success rate is lower, but with two outs the defense is more focused on getting the last out with the hitter, and the success rate is higher than predicted.


  1. The catcher, pitcher, and runner each play a significant part in the success rate of a stolen base.

  2. The quality of the hitter at the plate also affects the likely success rate, as the better the hit rate, the more likely a successful steal will occur.

  3. In blowouts and when the defense has a comfortable lead, success rates are much higher than predicted.

  4. Success rates are higher when there are two outs as likely the defense is focusing on getting an out on the hitter (two outs in the inning, or two strikes on the batter).

When we have all the data laid out, it seems that we have been able to determine what is driving stolen base success rates. However, the true test is predicting future success rates. In the next part, using only the data in 2004-2007, we will see how our model does at predicting what the success rates were in 2008, and similarly what we would have predicted about 2009 given 2004-08 data.

Tim Kniker is a contributor to Baseball Prospectus.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Great job!!!! quite often, in depth articles of this nature tend to fall a bit flat and I lose focus on them. Your article managed to avoid that. I have only one quibble with it. I dispute your issue with count. I think you should have had 4 buckets instead of 3 for a very good reason: the 3-2 count is not normally a count where hitters steal bases, but it is a time when many runners are going on the pitch, even if they aren't trying to steal.

Indeed, with a 3-2 count and 2 outs, its impossible to steal the base, as a strike ends the innings, a ball forces a walk, and anything in play either ends the inning or puts the batter on base. on a 3-2 count and less than 2 outs, whether a batter runs or not has very little to do with him stealing, but in avoiding double plays. The only way to either steal or be caught stealing on a 3-2 count is for the batter to strike out, which iws already a negative outcome.

So I would separate it into neutral counts: 0-0, 1-0, 1-1 and 2-2
hitters counts: 2-0, 2-1, 3-0, 3-1
pitchers counts: 0-2, 1-2, 0-1
In some ways it doesn't really matter since form 2004 - 2007 there was only one instance of a recorded stolen base attempt on a 3-2 count, and none in 2008, therefore, yes, I probably should not have included it, but it doesn't change the results significantly.
Nice to see your column in here, Tim. I like it so far.
Where's the "thumbs up" button?
Your results in relation to success rates rising on pitchers' count is probably just as much a reflection of pitch selection than of "focus". Runners will prefer to go on counts where offspeed pitches are more likely to be thrown. It is not just that the defence is more focussed on the hitter - it is that the kind of pithers thrown to strike a hitter out, or get him to chase at 0-1 (a curve breaking out of the zone for example)are much better to steal on than, say, a 3-1 fastball that the hitter swings through.

Good point on that. It would be interesting to line this up with a pitchfx data set so that we could see that the pitch type/speed/location is the driver.
Good stuff and great to see you writing a series Tim :)

One thought... I don't think many pitchers are aware of what a batter's OPS is when determining whether someone is a good or bad hitter. Maybe a better way to analyze the "menace" is to break buckets based on number of home runs hit, or straight batting average, or something...
My opinion was that while pitchers/managers don't necessarily know exactly what OPS is, my feeling would be that if you had pitchers divide players into good/averge/poor hitter buckets, that likely they would fall pretty closely along the lines of OPS.