BP Comment Quick Links
September 16, 2009 Forecasting Stolen Base Success RatesPart One
A lot of sabermetric research has gone into understanding when to steal, and when not to. Most of this research focuses on understanding the number of outs in the inning and the current position of the runners and calculating the increased run expectancy of a successful attempt versus the decreased run expectancy of an unsuccessful attempt. I direct the new reader to Joe Sheehan's article in the Baseball Prospectus Basics series as a great primer to this concept. Almost all of the research ends up with statements like, 'If the success rate for a stolen base is at least X percent in situation Y, then it's a good tactic to steal in this situation.' However, very little research has been devoted to understanding what the success rate X is likely to be, or it has been assumed that simply looking at either the runner's success rate is a good enough measure for that. Understanding the Context In The Book, the authors state that everything about baseball is about context. Context can include every single aspect of any given situation: the pitcher, the catcher, the baserunner, the batter, the home park, the inning, the score differential, and whether Glenn Close is standing up in the bleachers wearing white as the sun streams through her hat to create an angelic vision for inspiration. However, our mind can go numb thinking of all the possible variables, so as a first step, it's important to hypothesize about the likely key factors, and then go from there. To start, let's assume that the stolen base is largely dependent on three players: the pitcher, the catcher, and the baserunner. The Approach I pulled each situation from 20042008 that met the following criteria:
This filter resulted in 12,095 stolenbase attempts. My next step was to put those attempts into two sets: those that occurred from 200407 (9,700 attempts) in one set called a "training set," and those from 2008 (2,395 attempts) in a second set, called a "test" set. My goal was to see how accurately we could predict stolenbase success rates in 2008 from the previous four years of data, by developing a model that says when runner A is on first and pitcher B and catcher C are the battery in game situation Y, the likely success rate of an attempted steal will be X percent. To predict the success rate, we could have done one of three simple things:
For a more sophisticated method, I'm going to use log5 methodology to combine the pitcher, catcher, and batter's previous statistics into another estimation. However, before we do any of these models, we need to handle the issue of small sample sizes. Dealing with Small Sample Sizes: Regression to the Mean To handle small sample sizes, we employ a technique called regression to the mean which is based on 'true score theory.' Basically, we are saying that a player's true ability is some mixture of his observed performance and the average performance of some smartly selected population to which that player belongs. As we have more data about the individual player, the more heavily we weight the player's observed performance. In a situation where the catcher has only had one stolenbase attempt against him, we will likely use a value that is pretty close to the general population's caughtstealing rate, while for a catcher like Jorge Posadawho had 321 attempts in the training setwe will use a value that is pretty close to his historical caughtstealing rate. One of the keys in properly doing regression to the mean is the base population to which we regress. In my regression to the mean, I compared players to the set of players who were involved in a similar amount of attempts instead of the overall population. To give an idea of what the success rates were, the tables below present the breakouts based on the number of attempts that the player was involved in from 2004 to 2007. For catchers, there was a relatively narrow spread of success rates, while for runners the spread is rather large. For pitchers and catchers, there's a similar pattern. The players who had many attempts against them had a slightly worse caughtstealing rate than the overall population, likely caused by opponents realizing a weakness and exploiting it as much as possible. Players in the middle of attempts had the best caughtstealing rate. For catchers this is likely a function of lots of playing time, but fewer attempts due to their throwing prowess. # of SB Attempts # of Total Success 200407 Catchers Attempts Rate 150+ 16 3,671 77.6% 50149 47 4,574 73.9% 124 84 1,455 75.3% Total 147 9,700 74.6%When we look at pitchers, we see a similar pattern as with catchers, but the success rates have a bit more spread. # of SB Attempts # of Total Success 200407 Pitchers Attempts Rate 50+ 32 2,122 79.7% 2049 114 3,278 72.8% 119 700 4,300 75.4% On the other side of the ball (no pun intended), a much more stark pattern is abundantly clear with the basestealers: players who only had a few attempts (some of which were undoubtedly botched hitandruns), the success rates were in the low sixties, while runners who had many attempts (50+) had success rates in the lower eighties. # of SB Attempts # of Total Success 200407 Runners Attempts Rate 50+ 37 3,482 80.7% 2549 77 2,657 76.5% 1524 69 1,316 73.5% 514 193 1,645 69.1% 14 304 600 62.7% Regressing to the mean and using our log5, for each stolenbase attempt we can predict a "base" success rate. To evaluate the efficacy of this model, we would like to see a few things. First, in the aggregate, we want to see if the predicted stolenbase success rate matches up with what was observed. Second, whether there is a good amount of separation for the expected stolembase success rate, i.e., the attempts don't all clump into a central bucket. In the table below, I grouped stolenbase attempts into five percent buckets based on the predicted success rates. For example, the 80 percent bucket includes all stolenbase attempts where we predict the success rate to be between 77.5 and 82.5 percent. I capped the buckets (but not the success rates of the individual attempts) to 90 percent on the high side and 60 percent on the low. So, if the model predicts a success rate of only 42.1 percent, then that attempt is still put into the 60 percent bucket.
Success # of Predicted Actual
Rate Bucket Attempts Success Success
90% 1,830 91.7% 93.8%
85% 1,543 84.5% 88.4%
80% 1,450 80.0% 84.1%
75% 1,319 75.2% 74.3%
70% 1,049 70.1% 71.9%
65% 800 65.1% 62.1%
60% 1,709 52.5% 46.1%
Total 9,700 75.2% 74.6%
It seems that the actual rates have a little more spread than what our regressed and log5 model predicts. For example, at the likely successful attempts, our model seems to predict a slightly lower success rate, while at the lower success levels, the model slightly overestimates the expected success rate. This could be the fact that the regression to the mean is too strong, or that there are other factors going on based on the situation that this base model does not predict. With that said, one nice thing is that there is a relatively even distribution of attempts in many buckets, and not a clumping into the center 75 percent bucket. We're going to look at a number of situations to determine if we can improve this model based on a better understanding of the situation. Determining situational effects We're going to look at four possible aspects, and give a hypothesis on why this may affect the situation:
Batter Quality In the BP Idol competition, I suggested that the batter (or at least the batter's handedness) is largely irrelevant to stolenbase success rates. However, I did not look at the mere menacing presence of the hitter in the box. To model the batter's hitting prowess, I looked at the batter's OPS in the previous 365 calendar days as a proxy for the defense's estimate of the perceived danger for the defense. For those players who had less than 200 PAs in the past 365 days, their OPS was regressed to an average replacement player. I then grouped the hitters into three buckets: Bad (720 OPS or below), Good (830 OPS and above), Average (everyone else). I selected these breakpoints because they put roughly 50 percent of the attempts in the average bucket, 25 percent in the bad, and 25 percent in the good bucket. Hitter # of Predicted Actual Bucket Attempts Success Success Bad 2,740 73.1% 71.1% Average 4,412 75.1% 75.4% Good 2,548 77.4% 80.3% There are two things to note here. The standard logic of not trying to send a runner with a good hitter up plays out as the situations. When a good hitter is up, the team seems to only send the runner when the likely success rate is going to be high anyway (77.4 percent, compared to 75.1 percent). Also, more importantly, even with this increased likelihood of success, the actual success rate is even higher, suggesting that the battery is likely focusing more on the hitter than the runner. The decreased success rate compared to predicted with a bad hitter up is technically not statistically significant, however, it is really close. Game Score DifferentialI put the game score differential into one of six buckets to help identify any highlevel trends: Score Bucket Definition Blowout Either team up by 5 or more runs Offense Comfortable Team on offense is leading by 3 or 4 runs Offense Slim Team on offense is leading by 1 or 2 runs Defense Comfortable Team on offense is trailing by 3 or 4 runs Defense Slim Team on offense is trailing by 1 or 2 runs Tied Game is Tied The comparison of the predicted success rates and the actual success rates are as follows: Score # of Predicted Actual Bucket Attempts Success Success Blowout 445 76.7% 84.3% Offense Comfortable 1,307 74.2% 75.7% Offense Slim 2,511 74.2% 72.6% Defense Comfortable 512 78.0% 83.8% Defense Slim 1,749 75.5% 76.7% Tied 3,176 75.6% 74.5% The only two situations where there is any statistical significance are with the blowout or when the defense has a comfortable lead. In both situations, the defense is likely focusing on just getting outs, and therefore has more of its attention tuned to the batter. The Count In a great blog post by Joe Posnanski, an old baseball scout told him that all the secret and intrigues of baseball can be anticipated by following the count. Is this true for the stolen base as well? I grouped counts into one of three buckets. Essentially all early counts and the full count are neutral, all twostrike counts (except full) are pitcher counts, while all two and three ball counts are hitter counts (except those with two strikes). As you will see, there's a slight decrease in success rate compared to predicted on hitter's counts (likely because the defense is expecting the stolen base), while in pitcher counts there is a significant bump in success rates, likely because the battery is focusing on getting the hitter out.
Count # of Predicted Actual
Bucket Definition Attempts Success Success
Neutral 00,10,11 6,718 75.6% 75.2%
01,32
Hitter 20,21, 1,335 74.8% 73.0%
30,31
Pitcher 02,12,22 1,647 73.9% 78.4%
The Number of Outs Lastly, I was considering how the number of outs affected the success rate. The table below shows the differences in predicted stolenbase success and actual, based on the number of outs in the inning: # of Predicted Actual Outs Attempts Success Success 0 2,538 76.0% 73.7% 1 3,220 75.0% 74.4% 2 2,942 74.8% 77.4% With zero outs, the defense gets more focused with the runner and the success rate is lower, but with two outs the defense is more focused on getting the last out with the hitter, and the success rate is higher than predicted. Takeaways
When we have all the data laid out, it seems that we have been able to determine what is driving stolen base success rates. However, the true test is predicting future success rates. In the next part, using only the data in 20042007, we will see how our model does at predicting what the success rates were in 2008, and similarly what we would have predicted about 2009 given 200408 data. Tim Kniker is a contributor to Baseball Prospectus.
Tim Kniker is an author of Baseball Prospectus. 8 comments have been left for this article.

Great job!!!! quite often, in depth articles of this nature tend to fall a bit flat and I lose focus on them. Your article managed to avoid that. I have only one quibble with it. I dispute your issue with count. I think you should have had 4 buckets instead of 3 for a very good reason: the 32 count is not normally a count where hitters steal bases, but it is a time when many runners are going on the pitch, even if they aren't trying to steal.
Indeed, with a 32 count and 2 outs, its impossible to steal the base, as a strike ends the innings, a ball forces a walk, and anything in play either ends the inning or puts the batter on base. on a 32 count and less than 2 outs, whether a batter runs or not has very little to do with him stealing, but in avoiding double plays. The only way to either steal or be caught stealing on a 32 count is for the batter to strike out, which iws already a negative outcome.
So I would separate it into neutral counts: 00, 10, 11 and 22
hitters counts: 20, 21, 30, 31
pitchers counts: 02, 12, 01
In some ways it doesn't really matter since form 2004  2007 there was only one instance of a recorded stolen base attempt on a 32 count, and none in 2008, therefore, yes, I probably should not have included it, but it doesn't change the results significantly.