July 27, 2006
An Air of Advancement
"Baserunning is perfectly measurable; it can be easily defined and, given properly maintained scoresheets, easily researched. Our lack of knowledge on the subject is attributable entirely to record-keeping decisions that were made a little over a century ago and have never been intelligently or systematically reviewed."--Bill James, the 1984 Baseball Abstract
With one out in the top of the sixth inning last Sunday, Juan Pierre of the Cubs tripled to right-center off of the Nationals' Tony Armas Jr. Neifi Perez followed by lofting a shallow fly ball to right field, where Austin Kearns drifted in to make the catch. Despite the minimal distance between Kearns and the plate, Pierre didn't hesitate and took off for home. Kearns made a strong throw, but one that was up the line and cut off by Armas, allowing Pierre to score easily.
Despite being on a tear during which he blasted five home runs in three days, the next batter, Aramis Ramirez, grounded out routinely to end the inning. That was the only run the Cubs would score in what proved to be a 7-1 loss and a sweep at the hands of the Nats.
For our purposes today, let's not dwell (please, oh please) on the Cubs' dismal season, but instead take note of the fact that had Pierre not been so aggressive on the bases the Cubs wouldn't have scored at all, so in no small measure that run should be credited to Pierre. This is precisely what we'll embark on today; crediting runners for advancement (or a lack thereof) on fly balls caught by outfielders. Keep in mind that this is just a part of working towards a single metric that encompasses all aspects of baserunning and not simply stolen bases or advancing on hits.
The Ground (or actually Air) Rules
For those who read this space two weeks ago, you'll know we laid out a methodology and ran some calculations for crediting runners for advancing on ground balls (including bunts) in the infield. It'll be no surprise, then, that we wish to do the same thing for balls hit in the air. To that end we'll take a look at the following scenarios:
The basic idea is that we want to look at plays where the runner was the lead runner, and with less than two outs a ball in the air is caught by an outfielder. Although we could have included the scenario where the runner was on first with second unoccupied and third occupied, the thought was that the runner on third would have the attention of the defense, and so advancement by the runner on first would, for all intents and purposes, be out of his control. Note that we did include scenarios where the runner was on third with first and second perhaps occupied but again, the assumption was that most of what happens and how the defense reacts is dependent on the lead runner. We are also giving runners the benefit of the doubt and crediting them with advancing despite whether subsequent errors occurred during the play. While this clearly gives the benefit to the runner, the data is not granular enough to easily disqualify these plays and appropriately credit the runner. It's a minor thing, though, since this affects about a dozen plays per season.
Given these scenarios, we first calculate the percentage of time runners advance in each scenario, so we can properly credit them when they exceed those thresholds. As with advancement on ground balls, the criteria we take into account includes the starting base for the runner, the other runners (if any) on base, the number of outs and the position of the fielder who made the catch. To that we'll also add whether the hit was a fly ball, pop-up, or line drive.
This allows us to create a matrix of 107 combinations using data from the 2000-2005 period, encompassing 48,957 fly balls. Because the matrix is so large, we'll reproduce just the section where the runner is on second with nobody out.
Base Outs Pos HitType To3rd Scores Stay OA x2x 0 LF Fly 11.8% 0.1% 86.5% 1.6% x2x 0 CF Fly 46.7% 0.3% 51.0% 1.9% x2x 0 RF Fly 55.3% 0.3% 42.1% 2.3% x2x 0 LF Line 0.0% 0.0% 100.0% 0.0% x2x 0 CF Line 22.1% 0.0% 76.0% 1.9% x2x 0 RF Line 39.0% 0.0% 56.9% 4.1% x2x 0 LF Pop 0.0% 0.0% 100.0% 0.0% x2x 0 CF Pop 0.0% 0.0% 100.0% 0.0% x2x 0 RF Pop 0.0% 0.0% 100.0% 0.0%
As our intuition would tell us, runners have a more difficult time advancing from second to third on a ball hit to left field than when hit to center and especially right. Over half the time the runner advances to third when the right fielder makes the catch, as opposed to just 12% of the time when the left fielder does so. As a result, runners are also thrown out more often (2.3% vs. 1.6%) when the ball is caught in right. Runners do occasionally score on fly balls like this, but it happens on the order of one in every 400 to 600 opportunities.
The table also reveals that it is more difficult to advance on line drives than on fly balls and it is effectively impossible to do so on pop-ups. Here, though, we run into both sample-size and data-collection issues that deserve a bit more discussion.
The data we're using for this analysis doesn't necessarily include a consistent definition of just what is a pop-up versus a fly ball versus a line drive. As a result, there are perhaps more fly balls represented in the data (which would typically be the default as you can imagine) than there actually were. In fact, 92% of the batted balls used for this analysis were recorded as fly balls, 8% line drives and less than 1% pop-ups. The consequence is that the advancement percentages for line drives should be taken with a grain of salt. It turns out that fortunately we don't really have the same problem for pop-ups, since we wouldn't imagine that runners could advance, and in reviewing the table we see that none did. Had we included infielders in the analysis, we perhaps would have had more to go on by including plays where runners tagged on pop-ups down the foul lines where the second baseman or shortstop has to go near or into the stands.
The other data collection issue here is that we're using batted-ball type (Fly, Line, Pop) as a proxy for batted-ball distance. In the example used to kick off this column, a fly ball to a right fielder with a runner on third and one out, runners score 76% of the time. But because it was really a shallow fly ball (as opposed to a pop-up) we should really be giving Pierre more credit than a runner who tags on a fly ball of medium or greater depth. Alas the data doesn't include batted-ball distances, so we'll have to accept our limitations for the time being. What we need is distance data like that used by the authors of this study in the Journal of Quantitative Analysis in Sports.
This does bring up one other issue, however. One way to help account for the differences in batted ball distance is to apply a park factor. The inductive argument runs like this.
I have no qualms at all with this reasoning, and in fact have applied park factors to runner advancement in my previous work. At this point, the work hasn't been done to calculate the park factors in these scenarios to determine whether they are in fact different than those for advancement or whether a single "baserunning advancement park factor" (BAPF?) could be created. A task for another day.
With the matrix developed, we can now create derivative matrices that assign credit in terms of runs for advancement. This is the same basic procedure we applied in the ground advancement framework, and here we never assume that trailing runners would advance. By multiplying the assigned credit for the advancement by its frequency we can calculate the number of runs we expect the player to garner in each scenario. The final result for the same set of scenarios shown earlier follows.
Base Outs Pos HitType To3rd RunVal Scores RunVal Stay OA RunVal RunExp x2x 0 LF Fly 11.8% 0.2740 0.1% 0.5828 86.5% 1.6% -0.5924 0.0238 x2x 0 CF Fly 46.7% 0.2740 0.3% 0.5828 51.0% 1.9% -0.5924 0.1184 x2x 0 RF Fly 55.3% 0.2740 0.3% 0.5828 42.1% 2.3% -0.5924 0.1401 x2x 0 LF Line 0.0% 0.2740 0.0% 0.5828 100.0% 0.0% -0.5924 0.0000 x2x 0 CF Line 22.1% 0.2740 0.0% 0.5828 76.0% 1.9% -0.5924 0.0492 x2x 0 RF Line 39.0% 0.2740 0.0% 0.5828 56.9% 4.1% -0.5924 0.0828 x2x 0 LF Pop 0.0% 0.2740 0.0% 0.5828 100.0% 0.0% -0.5924 0.0000 x2x 0 CF Pop 0.0% 0.2740 0.0% 0.5828 100.0% 0.0% -0.5924 0.0000 x2x 0 RF Pop 0.0% 0.2740 0.0% 0.5828 100.0% 0.0% -0.5924 0.0000
It makes sense that the runner will be expected to contribute fewer runs when the ball is caught by the left fielder than when caught by the right fielder because of the differences in the frequency of advancement. This fact is what allows us to give more credit to the runner when he advances on a ball caught by the left fielder.
The Incremental Value of Tagging
With the preliminaries out of the way, we can now apply the matrices to individual players to produce Equivalent Air Advancement Runs (EqAAR). First, the top and bottom ten from 2005.
Top 10 for 2005 Total Ex Opp AA EqAAR EqAAR Alex Rios 19 5.68 3.80 1.88 Chone Figgins 39 7.83 6.04 1.79 Marcus Giles 41 7.69 5.94 1.75 Johnny Damon 54 8.33 6.66 1.66 Tony Graffanino 23 6.11 4.46 1.65 Ichiro Suzuki 38 5.91 4.42 1.49 Reed Johnson 24 4.98 3.49 1.49 Luis Gonzalez (ARI) 32 6.71 5.26 1.44 Jimmy Rollins 38 7.10 5.68 1.42 Michael Young 32 3.67 2.33 1.35
Bottom 10 for 2005 Total Ex Opp AA EqAAR EqAAR Vladimir Guerrero 28 1.64 3.99 -2.35 Miguel Olivo 14 -0.09 2.00 -2.09 Joe Mauer 30 0.71 2.74 -2.02 Tadahito Iguchi 33 1.32 3.31 -2.00 Victor Diaz 9 -1.37 0.48 -1.85 Shannon Stewart 25 0.59 2.34 -1.74 Marlon Byrd 14 -0.18 1.53 -1.70 Luis Castillo 37 0.74 2.42 -1.68 Michael Cuddyer 15 0.57 2.12 -1.55 Derrek Lee 35 1.84 3.31 -1.47
Probably the main point to notice here is that incremental runs are more difficult to pick up via advancing on air outs, largely because runners advance so frequently. This is especially true from third base, where runners tag and score 78% of the time overall and are thrown out at a rate of less than 3%. That means there simply isn't enough room even for really good baserunners to take advantage of the difference between themselves and the average runner. The result is that even if you do everything right, you'll pick up at most around two runs more than the average player given the same opportunities.
But it is also possible (and in fact easier) to cost your team a couple of runs, as Vladimir Guerrero proved in 2005. In his 28 opportunities he was thrown out just once at home, by Ichiro Suzuki on May 5, costing him about a run and a half (-0.5099 for getting thrown out against a typical gain of 0.5594 runs). However, he also failed to advance another eight times, four of them when he was on third, when it would have been expected for him to pick up positive runs. In his other opportunities he simply did about what would be expected and never took an unexpected base.
The reason it's easier to end up on the negative side is that the relative cost of getting thrown out far outweighs the benefit of taking the extra base. You can also see this from the matrix above where getting thrown out at third is over twice as costly as simply advancing to third (yes, scoring from second on a fly ball will yield about the equivalent number of runs but that outcome occurs much less frequently).
Let's now expand the criteria and take a look at the league leaders and trailers for the 2000-2005 time frame.
Leaders for 2000-2005 Total Ex Opp AA EqAAR EqAAR 2000 Derek Jeter 51 8.47 6.39 2.08 2001 Jeff Conine 36 5.99 4.27 1.72 2002 Raul Mondesi 30 7.12 4.87 2.25 2003 Roger Cedeno 33 6.54 4.59 1.95 2004 Royce Clayton 37 5.72 3.94 1.78 2005 Alex Rios 19 5.68 3.80 1.88 Trailers for 2000-2005 Total Ex Opp AA EqAAR EqAAR 2000 Tim Salmon 29 -0.27 2.57 -2.84 2001 Benito Santiago 27 -0.21 3.52 -3.73 2002 Jim Edmonds 32 3.05 5.31 -2.26 2003 D'Angelo Jimenez 24 -0.98 2.53 -3.51 2004 Jim Thome 28 1.30 4.34 -3.04 2005 Vladimir Guerrero 28 1.64 3.99 -2.35
In 2001, Benito Santiago distinguished himself by posting the lowest EqAAR in the past six years, doing so by getting thrown out at the plate once and at third twice, in addition to failing to tag up on three balls hit to center and right field among his 27 opportunities.
In perusing these lists you can see that while speed certainly plays a factor in who does well and who does not, there is more variability in these rankings because of the smaller margins mentioned above. As a result, it is certainly possible for an otherwise average runner like Jim Edmonds to do poorly in a specific season by virtue of being thrown out once or twice (again because of the huge cost associated with getting thrown out). Overall, Edmonds came in with a positive EqAAR in four of the six seasons and ranked 25th in 2001 with an EqAAR of 1.00. Guerrero is also an interesting case, since he ranks among the worst in baseball, except for his 2001 season where he recorded an EqAAR of 1.52 ranking him third.
Next, let's take a look at the cumulative leaders.
Top 10 for 2000-2005 Total Ex Opp AA EqAAR EqAAR Derek Jeter 252 29.72 22.79 6.93 Ray Durham 230 31.10 26.62 4.49 Gary Sheffield 188 17.72 13.57 4.15 Tom Goodwin 83 16.27 12.54 3.74 Reed Johnson 88 14.26 10.61 3.65 Kevin Millar 105 17.37 13.79 3.58 Albert Pujols 151 19.56 16.00 3.56 Jimmy Rollins 180 29.46 26.19 3.28 Carlos Guillen 137 19.13 15.88 3.26 Jose Valentin 104 19.04 15.79 3.25 Bottom 10 for 2000-2005 Total Ex Opp AA EqAAR EqAAR Moises Alou 153 9.09 14.34 -5.25 D'Angelo Jimenez 110 8.54 13.10 -4.56 Benito Santiago 78 8.15 12.72 -4.56 Jim Thome 135 7.69 12.18 -4.49 Vladimir Guerrero 162 15.23 19.60 -4.38 Jose Offerman 96 7.39 11.46 -4.07 Timo Perez 87 4.47 8.16 -3.69 Tim Salmon 138 11.31 14.87 -3.55 Matt Lawton 198 16.30 19.78 -3.48 Richard Hidalgo 112 10.45 13.89 -3.45
As with the column two weeks ago, where Kevin Mench scored well, some of you are probably wondering how in the world could Kevin Millar do likewise here? The short answer is that Millar had a positive, if unspectacular, EqAAR in every season but 2001, when he ran up a 1.57. He did this by a) never being thrown out, and b) only failing to contribute positively in about a quarter of his opportunities. This once again points out that conservatism over the long haul is also a way to do well in this measure.
Finally, let's calculate a rate statistic to measure those who do best and worst given their opportunities (well, those with at least 50 or more anyway).
Top 10 for 2000-2005 by Rate Total Ex Opp AA EqAAR EqAAR Rate Reed Johnson 88 14.26 10.61 3.65 1.34 Jolbert Cabrera 50 10.47 7.89 2.58 1.33 Craig Monroe 65 12.78 9.67 3.11 1.32 Gary Sheffield 188 17.72 13.57 4.15 1.31 Derek Jeter 252 29.72 22.79 6.93 1.30 Tom Goodwin 83 16.27 12.54 3.74 1.30 Jose Guillen 110 9.89 7.64 2.25 1.29 Dustan Mohr 55 10.46 8.16 2.30 1.28 Gabe Kapler 94 14.89 11.67 3.22 1.28 Tony Graffanino 97 14.52 11.43 3.09 1.27 Bottom 10 for 2000-2005 by Rate Total Ex Opp AA EqAAR EqAAR Rate Tony Clark 69 2.61 6.04 -3.43 0.43 Andres Galarraga 73 3.68 6.79 -3.11 0.54 Timo Perez 87 4.47 8.16 -3.69 0.55 Charles Johnson 82 4.53 7.53 -3.00 0.60 Lou Merloni 51 3.05 4.91 -1.86 0.62 Jim Thome 135 7.69 12.18 -4.49 0.63 Coco Crisp 95 3.29 5.20 -1.90 0.63 Moises Alou 153 9.09 14.34 -5.25 0.63 Victor Martinez 57 2.48 3.87 -1.39 0.64 Benito Santiago 78 8.15 12.72 -4.56 0.64
Once again the relatively low rate values serves to underscore the lack of space between the top and bottom performers and those in the middle.
A Final Note About Variability
Before we sign off, we should briefly discuss one other concept related to measures like these. In the tables above, you'll notice that those who rate the best for a single season are generally in the range of +2 runs while those who rate the poorest are closer to -3 runs. However, when you look at the leaders and trailers for the six seasons from 2000 to 2005, you'll notice that the leaders are generally in the range of +3 to +5 while the trailers are -4 to -5.
If one were to extrapolate the +2 for the leaders across six seasons you would end up with +12 instead of +5, and likewise for the trailers. There are two reasons for this. First and most obviously, the same players are not the leaders every season, so you wouldn't expect that kind of linear extrapolation. Secondly and more importantly the extreme values on each end don't represent an actual skill, but instead a combination of skill and random variation which has the effect of pushing some values to the extremes. Of course that applies to all kinds of metrics, including both traditional ones like batting average, and new ones like these.