July 13, 2006
Hit the Ground Running
"Baserunning arrogance is just like pitching arrogance or hitting arrogance. You are a force, and you have to instill that you are a force to the opposition. You have to have utter confidence."
For those of you who watched The Science Channel's program "Baseball's Secret Formula" earlier this week, you'll recall that near the end of the show the focus turned to fielding. The narrator described the analysis and creation of fielding metrics as the "last bastion" for sabermetrics while highlighting the fine work of Baseball Info Solutions and John Dewan. For those of you who haven't caught the program yet, you have one more chance: July 15th at 5pm EST.
While I wouldn't disagree with that notion, and have written as much in the past, there are other areas of the game that perhaps can yield smaller insights but have not been quantified to the degree that we might like. As George Will put it in a recent column in the context of baseball, "the rage to quantify--to reduce reality to measurable units--is an impulse in modern societies."
This week, we'll satisfy some element of that impulse on the subject of baserunning.
A Look Back and a Step Forward
As some readers may recall, both James Click and I have done some work previously on quantifying the benefits and costs of baserunning. If you want to read all about how the methodology I developed works, you can take a look at a series of articles from last summer, or a short summary and how it applies to an individual team in a column I wrote earlier this season.
The key point, however, is that the methodology I employed focuses only on the following scenarios:
In other words, we only credited baserunners with Incremental Runs (IR) when they were on base and a batter following the runner in the lineup got a hit. As you can well imagine, this view of a player's contribution on the bases leaves a little something to be desired. Particularly, it ignores the times when the batter did not get a hit and instead hit a ground ball or fly ball which still allowed the runner to use his speed to try and advance. But several other things were not factored in--stolen bases, caught stealings, and pick offs, and even aspects like the avoidance of grounding into double plays, and getting thrown out while attempting to stretch a hit. Today, we'll take a small step towards rectifying those inadequacies by quantifying runner advancement on ground balls in the infield.
Winning the Ground War
Baseball is often a game played 90 feet at a time. Advancing a runner into scoring position, particularly to third base with less than two outs, can be the difference between winning and losing. We've all seen excellent baserunners like Juan Pierre and Carlos Beltran advance from second to third on a grounder to shortstop, beat a force attempt, or leg out a fielder's choice at third. Clearly, these runners are helping their team by taking those 90 feet when other runners would simply remain anchored to their base, or perhaps foolishly attempt to advance, only to get thrown out. But before we can properly credit runners with such plays we need to create a baseline or framework for how often runners really advance on plays like this so we can form some expectations.
In order to do so, we took a look at the advancement frequencies in the following situations:
By using only these three scenarios (note they do not overlap with the scenarios mentioned above) we're able to isolate the effects of the runner, since there are no other baserunners that either impede the runner from taking an extra base or that alter the behavior of the fielders such that we can't properly credit the runner because of the interactions between the runners and the fielders. For those wondering, these three scenarios account for just over 80% of the groundballs or bunt grounders to the infield that are not hits or errors, so this allows us to catch most runner advancement effects.
It turns out that we also need to break even these scenarios down a bit. For example, there is a vast difference between how often a runner advances from second to third on a groundball to the third baseman (45%) versus how often he does so when the ball is hit to the first baseman (95%). As a result, we would want to give more credit to the runner for the former result rather than the latter, since the runner was clearly more likely to make it regardless of his baserunning ability. By controlling for the position of the fielder who fields the ball we are also intrinsically controlling for the handedness of the batter.
Although having a smaller effect in most instances, we also break it down by the number of outs, since an infielder clearly might be more prone to simply take an out in some situations, rather than attempt to nab a speedy runner trying to advance. However, we don't control for the score, which you can imagine will also play a role in decisions made by fielders.
The result is a matrix that records each possible advancement in our three scenarios by the position of the fielder and the number of outs. That matrix consists of 36 rows: 3 scenarios times 6 infield spots times 2 out states. In the interests of space, the following table shows just the typical advancement with a runner on second and nobody out.
Advancement When a Non-Hit Grounder is Fielded in the Infield Base Outs Pos To3rd Scores Stay OA x2x 0 P 69.91% 0.00% 18.61% 11.48% x2x 0 C 79.11% 0.89% 12.44% 8.44% x2x 0 1B 95.15% 0.00% 1.80% 3.05% x2x 0 2B 97.13% 0.00% 1.60% 1.28% x2x 0 3B 45.83% 0.09% 52.95% 1.22% x2x 0 SS 43.61% 0.00% 48.24% 8.15%
Because the sample sizes for any single season across some of these 36 scenarios are not very large (for example, there were only 49 instances of a runner on second with nobody out and a grounder hit to the catcher in 2005), the percentages here are calculated from aggregating all of the data from 2000-2005, which includes 53,500 ground balls.
Now, using this matrix as a baseline, we can do two things. First, we can assign run expectancy outcomes to each of the advancements in the table above. For those of you unfamiliar with the concept, a Run Expectancy table simply tells us how many runs a team will score on average given each of the 24 possible (8 base states times 3 out states) base/out situations. The Run Expectancy Matrix for 2000-2005 (using a non-weighted average for each cell) is as follows:
Base/Out 0 1 2 xxx 0.530 0.287 0.112 1xx 0.921 0.552 0.242 x2x 1.160 0.704 0.335 xx3 1.441 0.978 0.371 12x 1.523 0.935 0.445 1x3 1.844 1.214 0.510 x23 2.030 1.430 0.599 123 2.364 1.579 0.789
Using the table, we can create derivative run expectancy values for each scenario and advancement. For example, in the first row of the scenario table above, where the runner advances to third with nobody out, we'll credit the runner with changing the base state from a runner on second to a runner on third with one out in the inning (0.978) minus the run expectancy had the runner stayed put (0.704, for a runner on second with one out). It turns out that the runner making it to third is therefore worth .274, or about a quarter of a run. On the other hand if the runner is thrown out at third we'll ding him for reducing the run expectancy versus simply staying put; in this case equivalent to having a runner on second and one out (-0.704). Note that we're not assuming the defense would have turned a double play.
It should also be noted that the derivatives calculated don't penalize a player for getting forced at second, nor for simply staying put when unforced. From a pure change in run expectancy standpoint, there are of course arguments to be made for including both. The thought was not to penalize a runner for a situation which he did not create (the force) or for at least avoiding an out (staying put) since he wasn't the guy who hit the ball. Our sense of justice tells us that the batter and not the runner should rightly be dinged in both instances. Had both of these been included, the totals you'll see near the end of the article would certainly be lower.
The matrix can then be augmented with the run values associated with each scenario, which of course are the same for each row in the above table since the starting base and out situation are the same.
Base Outs Pos To3rd RunVal Scores RunVal Stay RunVal OA RunVal x2x 0 P 69.91% 0.2740 0.000 0.5828 18.61% 0.000 11.48% -0.7043 x2x 0 C 79.11% 0.2740 0.009 0.5828 12.44% 0.000 8.44% -0.7043 x2x 0 1B 95.15% 0.2740 0.000 0.5828 1.80% 0.000 3.05% -0.7043 x2x 0 2B 97.13% 0.2740 0.000 0.5828 1.60% 0.000 1.28% -0.7043 x2x 0 3B 45.83% 0.2740 0.001 0.5828 52.95% 0.000 1.22% -0.7043 x2x 0 SS 43.61% 0.2740 0.000 0.5828 48.24% 0.000 8.15% -0.7043
Second, we can create an expected number of runs contributed in each scenario by multiplying the frequency by the run value, and then summing across each outcome. So in the scenario of the runner on second with nobody out and the ball hit to the catcher we would calculate the following:
RunExp = (.7911 * .274) + (.009 * .5828) + (.1244 * 0) + (.0844 * -.7043)
The calculation above yields an expected run value of .2392. This value then is what we would expect a baserunner to contribute each time he's on second with nobody out and a ground ball is fielded by the catcher. The table of scenarios with the run values filled in follows:
Base Outs Pos To3rd RunVal Scores RunVal Stay RunVal OA RunVal RunExp x2x 0 P 69.91% 0.2740 0.000 0.5828 18.61% 0.000 11.48% -0.7043 0.1107 x2x 0 C 79.11% 0.2740 0.009 0.5828 12.44% 0.000 8.44% -0.7043 0.1625 x2x 0 1B 95.15% 0.2740 0.000 0.5828 1.80% 0.000 3.05% -0.7043 0.2392 x2x 0 2B 97.13% 0.2740 0.000 0.5828 1.60% 0.000 1.28% -0.7043 0.2571 x2x 0 3B 45.83% 0.2740 0.001 0.5828 52.95% 0.000 1.22% -0.7043 0.1175 x2x 0 SS 43.61% 0.2740 0.000 0.5828 48.24% 0.000 8.15% -0.7043 0.0621
At its core, this methodology compares each player to the aggregate behavior of all players. That means that it's possible that the behavior of the aggregate actually results in a negative run value. For example, with a runner on second with one out when a ball is hit to the pitcher, runners (ostensibly running on contact) were thrown out at home 13.6% of the time. That relatively high percentage drives down the run expectancy to -0.0295, so on average we would expect a runner to contribute negative runs in that situation. A runner who is able to advance safely to third therefore will be credited not only with the run value of making it to third (.0363), but also the magnitude of the expected run value, and so will be credited with 0.0658 runs.
Applying the Method
Now that we have the framework set up, we can total all of the opportunities individual players had in these scenarios, along with both the actual run value we attribute to them for their baserunning exploits and the run value we would expect. The difference between the total and the expected therefore equates to the runs each player contributed in these scenarios above and beyond what would have been expected. So may I have a drum roll please…
The top and bottom ten for 2005 in this new metric I'm christening (at least until I can think of a better name) "Equivalent Ground Advancement Runs" or EqGAR are:
Top 10 Name Opp Total GAR Ex EqGAR EqGAR Juan Pierre 54 16.67 9.15 7.52 Willy Taveras 40 12.33 6.83 5.50 Chone Figgins 53 13.89 8.99 4.90 Jason Ellison 33 12.14 7.75 4.40 Brady Clark 65 15.60 11.23 4.38 Cory Sullivan 28 10.44 6.33 4.11 Jimmy Rollins 51 14.24 10.34 3.89 Craig Counsell 47 14.17 10.39 3.77 Cristian Guzman 45 13.56 10.49 3.07 Jose Reyes 52 12.19 9.39 2.80 Bottom 10 Name Opp Total GAR Ex EqGAR EqGAR Joe Randa 26 0.99 4.53 -3.54 Emil Brown 31 1.49 4.95 -3.46 Jason Varitek 30 0.56 3.74 -3.18 Morgan Ensberg 22 0.67 3.80 -3.13 Cliff Floyd 21 0.15 3.22 -3.07 Robinson Cano 24 2.26 5.10 -2.84 David Ortiz 22 -0.24 2.54 -2.77 Rafael Palmeiro 22 0.96 3.52 -2.56 Mike Lowell 21 1.82 4.32 -2.50 Travis Hafner 29 1.50 3.94 -2.44
As you can see from this list, there is clearly some correlation between players we anecdotally know to be good baserunners and those who do well by this metric. You should also take notice that there are very few players who score better than +3 runs, so most players are somewhere in the middle.
On the other side of the coin, most of those who do poorly are players we might have guessed at, although I found it a bit surprising that Joe Randa and Robinson Cano make the list. You can also see from the list that by subtracting the expected EqGAR from the total, we can rightly rank players like Brady Clark, who score very well in total GAR but also had more opportunities.
However, because more opportunities means a greater EqGAR, much in the same way that more runners on base means more RBIs, we can create a ratio of total to expected ground advancement runs in order to create a rate stat useful for comparing players. The top and bottom ten in "GA Rate" with 20 or more opportunities are as follows:
Top 10 Name Opps EqGAR GA Rate Juan Pierre 54 7.52 1.82 Willy Taveras 40 5.50 1.80 Jim Edmonds 26 2.41 1.72 So Taguchi 30 2.75 1.67 Cory Sullivan 28 4.11 1.65 Jamey Carroll 23 2.58 1.60 Jason Ellison 33 4.40 1.57 Chone Figgins 53 4.90 1.55 Nick Punto 20 1.71 1.52 Kevin Mench 25 2.06 1.50 Bottom 10 Name Opps EqGAR GA Rate David Ortiz 22 -2.77 -0.09 Cliff Floyd 21 -3.07 0.05 Jason Varitek 30 -3.18 0.15 Morgan Ensberg 22 -3.13 0.18 Carlos Delgado 21 -2.11 0.20 Joe Randa 26 -3.54 0.22 Oscar Robles 20 -1.64 0.25 Rafael Palmeiro 22 -2.56 0.27 Emil Brown 31 -3.46 0.30 Melvin Mora 24 -2.28 0.31
Kevin Mench? Well, in his 25 opportunities he did advance an amazing 15 times, and was never thrown out advancing. On the other hand, David Ortiz scored very poorly by being thrown out at the plate with nobody out and generally not advancing in another 17 of his 22 opportunities.
Since we calculated EqGAR back to 2000, let's take a quick look at the seasonal leaders and trailers in those six seasons.
Season Leader Year Name Opps EqGAR 2005 Juan Pierre 54 7.52 2004 Aaron Miles 53 5.95 2003 Kenny Lofton 42 5.16 2002 Adam Kennedy 38 5.28 2001 Luis Castillo 49 4.44 2000 Johnny Damon 76 5.08 Season Trailer Year Name Opps EqGAR 2005 Joe Randa 26 -3.54 2004 Chipper Jones 22 -3.93 2003 Moises Alou 28 -4.02 2002 Mo Vaughn 26 -3.76 2001 Paul Lo Duca 38 -4.10 2000 Edgar Martinez 24 -3.57
As you can see, the leaders are usually a little over +5 runs per season (with the exception of Pierre's 2005), while the trailers are around -4 runs. That's a span of nine to ten runs of difference between the best and worst. Interestingly, the span is about the same as that in the Incremental Runs framework, meaning that an elite baserunner could add about 10 runs or about one win to his team by advancing on hits and grounders in the infield.
Finally, let's take a look at the cumulative leaders and trailers in the 2000-2005 time period:
Top 10 Name Opps EqGAR Juan Pierre 279 20.64 Adam Kennedy 206 14.89 Tony Womack 201 14.27 Kenny Lofton 210 12.21 Ray Durham 196 11.65 Rafael Furcal 224 10.85 Fernando Vina 229 10.14 Craig Counsell 184 10.11 Jimmy Rollins 212 9.98 Mike Matheny 183 9.16 Bottom 10 Name Opps EqGAR Paul Lo Duca 154 -11.75 Rafael Palmeiro 129 -11.18 Paul Konerko 142 -9.91 J.T. Snow 138 -9.62 Luis Gonzalez 133 -9.13 Carlos Delgado 162 -9.02 Manny Ramirez 132 -8.83 Chipper Jones 175 -8.41 Jorge Posada 158 -8.21 Edgar Martinez 113 -8.03
Ahead of the Tag
As just mentioned, one can imagine that through the collection and aggregation of metrics like this we could start to form a better picture of the contribution runners make on the bases. Look for refinements to EqGAR and the development of other metrics as we move in that direction.