Premium and Super Premium Subscribers Get a 20% Discount at MLB.tv!
September 16, 2009
Forecasting Stolen Base Success Rates
A lot of sabermetric research has gone into understanding when to steal, and when not to. Most of this research focuses on understanding the number of outs in the inning and the current position of the runners and calculating the increased run expectancy of a successful attempt versus the decreased run expectancy of an unsuccessful attempt. I direct the new reader to Joe Sheehan's article in the Baseball Prospectus Basics series as a great primer to this concept. Almost all of the research ends up with statements like, 'If the success rate for a stolen base is at least X percent in situation Y, then it's a good tactic to steal in this situation.' However, very little research has been devoted to understanding what the success rate X is likely to be, or it has been assumed that simply looking at either the runner's success rate is a good enough measure for that.
Understanding the Context
In The Book, the authors state that everything about baseball is about context. Context can include every single aspect of any given situation: the pitcher, the catcher, the baserunner, the batter, the home park, the inning, the score differential, and whether Glenn Close is standing up in the bleachers wearing white as the sun streams through her hat to create an angelic vision for inspiration. However, our mind can go numb thinking of all the possible variables, so as a first step, it's important to hypothesize about the likely key factors, and then go from there. To start, let's assume that the stolen base is largely dependent on three players: the pitcher, the catcher, and the baserunner.
I pulled each situation from 2004-2008 that met the following criteria:
This filter resulted in 12,095 stolen-base attempts. My next step was to put those attempts into two sets: those that occurred from 2004-07 (9,700 attempts) in one set called a "training set," and those from 2008 (2,395 attempts) in a second set, called a "test" set. My goal was to see how accurately we could predict stolen-base success rates in 2008 from the previous four years of data, by developing a model that says when runner A is on first and pitcher B and catcher C are the battery in game situation Y, the likely success rate of an attempted steal will be X percent.
To predict the success rate, we could have done one of three simple things:
For a more sophisticated method, I'm going to use log5 methodology to combine the pitcher, catcher, and batter's previous statistics into another estimation. However, before we do any of these models, we need to handle the issue of small sample sizes.
Dealing with Small Sample Sizes: Regression to the Mean
To handle small sample sizes, we employ a technique called regression to the mean which is based on 'true score theory.' Basically, we are saying that a player's true ability is some mixture of his observed performance and the average performance of some smartly selected population to which that player belongs. As we have more data about the individual player, the more heavily we weight the player's observed performance. In a situation where the catcher has only had one stolen-base attempt against him, we will likely use a value that is pretty close to the general population's caught-stealing rate, while for a catcher like Jorge Posada-who had 321 attempts in the training set-we will use a value that is pretty close to his historical caught-stealing rate.
One of the keys in properly doing regression to the mean is the base population to which we regress. In my regression to the mean, I compared players to the set of players who were involved in a similar amount of attempts instead of the overall population. To give an idea of what the success rates were, the tables below present the breakouts based on the number of attempts that the player was involved in from 2004 to 2007. For catchers, there was a relatively narrow spread of success rates, while for runners the spread is rather large.
For pitchers and catchers, there's a similar pattern. The players who had many attempts against them had a slightly worse caught-stealing rate than the overall population, likely caused by opponents realizing a weakness and exploiting it as much as possible. Players in the middle of attempts had the best caught-stealing rate. For catchers this is likely a function of lots of playing time, but fewer attempts due to their throwing prowess.
# of SB Attempts # of Total Success 2004-07 Catchers Attempts Rate 150+ 16 3,671 77.6% 50-149 47 4,574 73.9% 1-24 84 1,455 75.3% Total 147 9,700 74.6%When we look at pitchers, we see a similar pattern as with catchers, but the success rates have a bit more spread.
# of SB Attempts # of Total Success 2004-07 Pitchers Attempts Rate 50+ 32 2,122 79.7% 20-49 114 3,278 72.8% 1-19 700 4,300 75.4%
On the other side of the ball (no pun intended), a much more stark pattern is abundantly clear with the basestealers: players who only had a few attempts (some of which were undoubtedly botched hit-and-runs), the success rates were in the low sixties, while runners who had many attempts (50+) had success rates in the lower eighties.
# of SB Attempts # of Total Success 2004-07 Runners Attempts Rate 50+ 37 3,482 80.7% 25-49 77 2,657 76.5% 15-24 69 1,316 73.5% 5-14 193 1,645 69.1% 1-4 304 600 62.7%
Regressing to the mean and using our log5, for each stolen-base attempt we can predict a "base" success rate. To evaluate the efficacy of this model, we would like to see a few things. First, in the aggregate, we want to see if the predicted stolen-base success rate matches up with what was observed. Second, whether there is a good amount of separation for the expected stolem-base success rate, i.e., the attempts don't all clump into a central bucket.
In the table below, I grouped stolen-base attempts into five percent buckets based on the predicted success rates. For example, the 80 percent bucket includes all stolen-base attempts where we predict the success rate to be between 77.5 and 82.5 percent. I capped the buckets (but not the success rates of the individual attempts) to 90 percent on the high side and 60 percent on the low. So, if the model predicts a success rate of only 42.1 percent, then that attempt is still put into the 60 percent bucket.
Success # of Predicted Actual Rate Bucket Attempts Success Success 90% 1,830 91.7% 93.8% 85% 1,543 84.5% 88.4% 80% 1,450 80.0% 84.1% 75% 1,319 75.2% 74.3% 70% 1,049 70.1% 71.9% 65% 800 65.1% 62.1% 60% 1,709 52.5% 46.1% Total 9,700 75.2% 74.6%
It seems that the actual rates have a little more spread than what our regressed and log5 model predicts. For example, at the likely successful attempts, our model seems to predict a slightly lower success rate, while at the lower success levels, the model slightly overestimates the expected success rate. This could be the fact that the regression to the mean is too strong, or that there are other factors going on based on the situation that this base model does not predict. With that said, one nice thing is that there is a relatively even distribution of attempts in many buckets, and not a clumping into the center 75 percent bucket. We're going to look at a number of situations to determine if we can improve this model based on a better understanding of the situation.
Determining situational effects
We're going to look at four possible aspects, and give a hypothesis on why this may affect the situation:
In the BP Idol competition, I suggested that the batter (or at least the batter's handedness) is largely irrelevant to stolen-base success rates. However, I did not look at the mere menacing presence of the hitter in the box. To model the batter's hitting prowess, I looked at the batter's OPS in the previous 365 calendar days as a proxy for the defense's estimate of the perceived danger for the defense. For those players who had less than 200 PAs in the past 365 days, their OPS was regressed to an average replacement player. I then grouped the hitters into three buckets: Bad (720 OPS or below), Good (830 OPS and above), Average (everyone else). I selected these breakpoints because they put roughly 50 percent of the attempts in the average bucket, 25 percent in the bad, and 25 percent in the good bucket.
Hitter # of Predicted Actual Bucket Attempts Success Success Bad 2,740 73.1% 71.1% Average 4,412 75.1% 75.4% Good 2,548 77.4% 80.3%
There are two things to note here. The standard logic of not trying to send a runner with a good hitter up plays out as the situations. When a good hitter is up, the team seems to only send the runner when the likely success rate is going to be high anyway (77.4 percent, compared to 75.1 percent). Also, more importantly, even with this increased likelihood of success, the actual success rate is even higher, suggesting that the battery is likely focusing more on the hitter than the runner. The decreased success rate compared to predicted with a bad hitter up is technically not statistically significant, however, it is really close.Game Score Differential
I put the game score differential into one of six buckets to help identify any high-level trends:
Score Bucket Definition Blowout Either team up by 5 or more runs Offense Comfortable Team on offense is leading by 3 or 4 runs Offense Slim Team on offense is leading by 1 or 2 runs Defense Comfortable Team on offense is trailing by 3 or 4 runs Defense Slim Team on offense is trailing by 1 or 2 runs Tied Game is Tied
The comparison of the predicted success rates and the actual success rates are as follows:
Score # of Predicted Actual Bucket Attempts Success Success Blowout 445 76.7% 84.3% Offense Comfortable 1,307 74.2% 75.7% Offense Slim 2,511 74.2% 72.6% Defense Comfortable 512 78.0% 83.8% Defense Slim 1,749 75.5% 76.7% Tied 3,176 75.6% 74.5%
The only two situations where there is any statistical significance are with the blowout or when the defense has a comfortable lead. In both situations, the defense is likely focusing on just getting outs, and therefore has more of its attention tuned to the batter.
In a great blog post by Joe Posnanski, an old baseball scout told him that all the secret and intrigues of baseball can be anticipated by following the count. Is this true for the stolen base as well?
I grouped counts into one of three buckets. Essentially all early counts and the full count are neutral, all two-strike counts (except full) are pitcher counts, while all two- and three- ball counts are hitter counts (except those with two strikes). As you will see, there's a slight decrease in success rate compared to predicted on hitter's counts (likely because the defense is expecting the stolen base), while in pitcher counts there is a significant bump in success rates, likely because the battery is focusing on getting the hitter out.
Count # of Predicted Actual Bucket Definition Attempts Success Success Neutral 0-0,1-0,1-1 6,718 75.6% 75.2% 0-1,3-2 Hitter 2-0,2-1, 1,335 74.8% 73.0% 3-0,3-1 Pitcher 0-2,1-2,2-2 1,647 73.9% 78.4%
The Number of Outs
Lastly, I was considering how the number of outs affected the success rate. The table below shows the differences in predicted stolen-base success and actual, based on the number of outs in the inning:
# of Predicted Actual Outs Attempts Success Success 0 2,538 76.0% 73.7% 1 3,220 75.0% 74.4% 2 2,942 74.8% 77.4%
With zero outs, the defense gets more focused with the runner and the success rate is lower, but with two outs the defense is more focused on getting the last out with the hitter, and the success rate is higher than predicted.
When we have all the data laid out, it seems that we have been able to determine what is driving stolen base success rates. However, the true test is predicting future success rates. In the next part, using only the data in 2004-2007, we will see how our model does at predicting what the success rates were in 2008, and similarly what we would have predicted about 2009 given 2004-08 data.
Tim Kniker is a contributor to Baseball Prospectus.