Schrodinger's Bat: An Air of Advancement

“Baserunning is perfectly measurable; it can be easily defined and, given properly maintained scoresheets, easily researched. Our lack of knowledge on the subject is attributable entirely to record-keeping decisions that were made a little over a century ago and have never been intelligently or systematically reviewed.”–Bill James, the 1984 Baseball Abstract

With one out in the top of the sixth inning last Sunday, Juan Pierre of the Cubs tripled to right-center off of the Nationals’ Tony Armas Jr. Neifi Perez followed by lofting a shallow fly ball to right field, where Austin Kearns drifted in to make the catch. Despite the minimal distance between Kearns and the plate, Pierre didn’t hesitate and took off for home. Kearns made a strong throw, but one that was up the line and cut off by Armas, allowing Pierre to score easily.

Despite being on a tear during which he blasted five home runs in three days, the next batter, Aramis Ramirez, grounded out routinely to end the inning. That was the only run the Cubs would score in what proved to be a 7-1 loss and a sweep at the hands of the Nats.

For our purposes today, let’s not dwell (please, oh please) on the Cubs’ dismal season, but instead take note of the fact that had Pierre not been so aggressive on the bases the Cubs wouldn’t have scored at all, so in no small measure that run should be credited to Pierre. This is precisely what we’ll embark on today; crediting runners for advancement (or a lack thereof) on fly balls caught by outfielders. Keep in mind that this is just a part of working towards a single metric that encompasses all aspects of baserunning and not simply stolen bases or advancing on hits.

The Ground (or actually Air) Rules

For those who read this space two weeks ago, you’ll know we laid out a methodology and ran some calculations for crediting runners for advancing on ground balls (including bunts) in the infield. It’ll be no surprise, then, that we wish to do the same thing for balls hit in the air. To that end we’ll take a look at the following scenarios:

Runner on first with second and third unoccupied, less than two outs, a line drive, pop-up, or fly ball is caught by an outfielder;
Runner on second but not third, less than two outs, a line drive, pop-up, or fly ball is caught by an outfielder;
Runner on third with other bases optionally occupied, less than two outs, a line drive, pop-up, or fly ball is caught by an outfielder.

The basic idea is that we want to look at plays where the runner was the lead runner, and with less than two outs a ball in the air is caught by an outfielder. Although we could have included the scenario where the runner was on first with second unoccupied and third occupied, the thought was that the runner on third would have the attention of the defense, and so advancement by the runner on first would, for all intents and purposes, be out of his control. Note that we did include scenarios where the runner was on third with first and second perhaps occupied but again, the assumption was that most of what happens and how the defense reacts is dependent on the lead runner. We are also giving runners the benefit of the doubt and crediting them with advancing despite whether subsequent errors occurred during the play. While this clearly gives the benefit to the runner, the data is not granular enough to easily disqualify these plays and appropriately credit the runner. It’s a minor thing, though, since this affects about a dozen plays per season.

Given these scenarios, we first calculate the percentage of time runners advance in each scenario, so we can properly credit them when they exceed those thresholds. As with advancement on ground balls, the criteria we take into account includes the starting base for the runner, the other runners (if any) on base, the number of outs and the position of the fielder who made the catch. To that we’ll also add whether the hit was a fly ball, pop-up, or line drive.

This allows us to create a matrix of 107 combinations using data from the 2000-2005 period, encompassing 48,957 fly balls. Because the matrix is so large, we’ll reproduce just the section where the runner is on second with nobody out.

Base  Outs  Pos HitType    To3rd  Scores    Stay      OA
x2x      0   LF     Fly    11.8%    0.1%   86.5%    1.6%
x2x      0   CF     Fly    46.7%    0.3%   51.0%    1.9%
x2x      0   RF     Fly    55.3%    0.3%   42.1%    2.3%
x2x      0   LF    Line     0.0%    0.0%  100.0%    0.0%
x2x      0   CF    Line    22.1%    0.0%   76.0%    1.9%
x2x      0   RF    Line    39.0%    0.0%   56.9%    4.1%
x2x      0   LF     Pop     0.0%    0.0%  100.0%    0.0%
x2x      0   CF     Pop     0.0%    0.0%  100.0%    0.0%
x2x      0   RF     Pop     0.0%    0.0%  100.0%    0.0%

As our intuition would tell us, runners have a more difficult time advancing from second to third on a ball hit to left field than when hit to center and especially right. Over half the time the runner advances to third when the right fielder makes the catch, as opposed to just 12% of the time when the left fielder does so. As a result, runners are also thrown out more often (2.3% vs. 1.6%) when the ball is caught in right. Runners do occasionally score on fly balls like this, but it happens on the order of one in every 400 to 600 opportunities.

The table also reveals that it is more difficult to advance on line drives than on fly balls and it is effectively impossible to do so on pop-ups. Here, though, we run into both sample-size and data-collection issues that deserve a bit more discussion.

The data we’re using for this analysis doesn’t necessarily include a consistent definition of just what is a pop-up versus a fly ball versus a line drive. As a result, there are perhaps more fly balls represented in the data (which would typically be the default as you can imagine) than there actually were. In fact, 92% of the batted balls used for this analysis were recorded as fly balls, 8% line drives and less than 1% pop-ups. The consequence is that the advancement percentages for line drives should be taken with a grain of salt. It turns out that fortunately we don’t really have the same problem for pop-ups, since we wouldn’t imagine that runners could advance, and in reviewing the table we see that none did. Had we included infielders in the analysis, we perhaps would have had more to go on by including plays where runners tagged on pop-ups down the foul lines where the second baseman or shortstop has to go near or into the stands.

The other data collection issue here is that we’re using batted-ball type (Fly, Line, Pop) as a proxy for batted-ball distance. In the example used to kick off this column, a fly ball to a right fielder with a runner on third and one out, runners score 76% of the time. But because it was really a shallow fly ball (as opposed to a pop-up) we should really be giving Pierre more credit than a runner who tags on a fly ball of medium or greater depth. Alas the data doesn’t include batted-ball distances, so we’ll have to accept our limitations for the time being. What we need is distance data like that used by the authors of this study in the Journal of Quantitative Analysis in Sports.

This does bring up one other issue, however. One way to help account for the differences in batted ball distance is to apply a park factor. The inductive argument runs like this.

Different parks have different outfield configurations;
Those outfield configurations influence where a team positions its fielders;
That positioning influences where outfielders catch batted balls;
Where the batted balls are caught is an influencer on how often a runner can be thrown out.

I have no qualms at all with this reasoning, and in fact have applied park factors to runner advancement in my previous work. At this point, the work hasn’t been done to calculate the park factors in these scenarios to determine whether they are in fact different than those for advancement or whether a single “baserunning advancement park factor” (BAPF?) could be created. A task for another day.

With the matrix developed, we can now create derivative matrices that assign credit in terms of runs for advancement. This is the same basic procedure we applied in the ground advancement framework, and here we never assume that trailing runners would advance. By multiplying the assigned credit for the advancement by its frequency we can calculate the number of runs we expect the player to garner in each scenario. The final result for the same set of scenarios shown earlier follows.

Base  Outs  Pos  HitType   To3rd  RunVal  Scores  RunVal    Stay      OA  RunVal  RunExp
x2x      0   LF     Fly    11.8%  0.2740    0.1%  0.5828   86.5%    1.6% -0.5924  0.0238
x2x      0   CF     Fly    46.7%  0.2740    0.3%  0.5828   51.0%    1.9% -0.5924  0.1184
x2x      0   RF     Fly    55.3%  0.2740    0.3%  0.5828   42.1%    2.3% -0.5924  0.1401
x2x      0   LF    Line     0.0%  0.2740    0.0%  0.5828  100.0%    0.0% -0.5924  0.0000
x2x      0   CF    Line    22.1%  0.2740    0.0%  0.5828   76.0%    1.9% -0.5924  0.0492
x2x      0   RF    Line    39.0%  0.2740    0.0%  0.5828   56.9%    4.1% -0.5924  0.0828
x2x      0   LF     Pop     0.0%  0.2740    0.0%  0.5828  100.0%    0.0% -0.5924  0.0000
x2x      0   CF     Pop     0.0%  0.2740    0.0%  0.5828  100.0%    0.0% -0.5924  0.0000
x2x      0   RF     Pop     0.0%  0.2740    0.0%  0.5828  100.0%    0.0% -0.5924  0.0000

It makes sense that the runner will be expected to contribute fewer runs when the ball is caught by the left fielder than when caught by the right fielder because of the differences in the frequency of advancement. This fact is what allows us to give more credit to the runner when he advances on a ball caught by the left fielder.

The Incremental Value of Tagging

With the preliminaries out of the way, we can now apply the matrices to individual players to produce Equivalent Air Advancement Runs (EqAAR). First, the top and bottom ten from 2005.

Top 10 for 2005
                            Total        Ex
                      Opp      AA     EqAAR   EqAAR
Alex Rios              19    5.68      3.80    1.88
Chone Figgins          39    7.83      6.04    1.79
Marcus Giles           41    7.69      5.94    1.75
Johnny Damon           54    8.33      6.66    1.66
Tony Graffanino        23    6.11      4.46    1.65
Ichiro Suzuki          38    5.91      4.42    1.49
Reed Johnson           24    4.98      3.49    1.49
Luis Gonzalez (ARI)    32    6.71      5.26    1.44
Jimmy Rollins          38    7.10      5.68    1.42
Michael Young          32    3.67      2.33    1.35

Bottom 10 for 2005
                            Total        Ex
                      Opp      AA     EqAAR   EqAAR
Vladimir Guerrero      28    1.64      3.99   -2.35
Miguel Olivo           14   -0.09      2.00   -2.09
Joe Mauer              30    0.71      2.74   -2.02
Tadahito Iguchi        33    1.32      3.31   -2.00
Victor Diaz             9   -1.37      0.48   -1.85
Shannon Stewart        25    0.59      2.34   -1.74
Marlon Byrd            14   -0.18      1.53   -1.70
Luis Castillo          37    0.74      2.42   -1.68
Michael Cuddyer        15    0.57      2.12   -1.55
Derrek Lee             35    1.84      3.31   -1.47

Probably the main point to notice here is that incremental runs are more difficult to pick up via advancing on air outs, largely because runners advance so frequently. This is especially true from third base, where runners tag and score 78% of the time overall and are thrown out at a rate of less than 3%. That means there simply isn’t enough room even for really good baserunners to take advantage of the difference between themselves and the average runner. The result is that even if you do everything right, you’ll pick up at most around two runs more than the average player given the same opportunities.

But it is also possible (and in fact easier) to cost your team a couple of runs, as Vladimir Guerrero proved in 2005. In his 28 opportunities he was thrown out just once at home, by Ichiro Suzuki on May 5, costing him about a run and a half (-0.5099 for getting thrown out against a typical gain of 0.5594 runs). However, he also failed to advance another eight times, four of them when he was on third, when it would have been expected for him to pick up positive runs. In his other opportunities he simply did about what would be expected and never took an unexpected base.

The reason it’s easier to end up on the negative side is that the relative cost of getting thrown out far outweighs the benefit of taking the extra base. You can also see this from the matrix above where getting thrown out at third is over twice as costly as simply advancing to third (yes, scoring from second on a fly ball will yield about the equivalent number of runs but that outcome occurs much less frequently).

Let’s now expand the criteria and take a look at the league leaders and trailers for the 2000-2005 time frame.

Leaders for 2000-2005
                                    Total        Ex
                              Opp      AA     EqAAR   EqAAR
2000    Derek Jeter            51    8.47      6.39    2.08
2001    Jeff Conine            36    5.99      4.27    1.72
2002    Raul Mondesi           30    7.12      4.87    2.25
2003    Roger Cedeno           33    6.54      4.59    1.95
2004    Royce Clayton          37    5.72      3.94    1.78
2005    Alex Rios              19    5.68      3.80    1.88

Trailers for 2000-2005
                                    Total        Ex
                              Opp      AA     EqAAR   EqAAR
2000    Tim Salmon             29   -0.27      2.57   -2.84
2001    Benito Santiago        27   -0.21      3.52   -3.73
2002    Jim Edmonds            32    3.05      5.31   -2.26
2003    D'Angelo Jimenez       24   -0.98      2.53   -3.51
2004    Jim Thome              28    1.30      4.34   -3.04
2005    Vladimir Guerrero      28    1.64      3.99   -2.35

In 2001, Benito Santiago distinguished himself by posting the lowest EqAAR in the past six years, doing so by getting thrown out at the plate once and at third twice, in addition to failing to tag up on three balls hit to center and right field among his 27 opportunities.

In perusing these lists you can see that while speed certainly plays a factor in who does well and who does not, there is more variability in these rankings because of the smaller margins mentioned above. As a result, it is certainly possible for an otherwise average runner like Jim Edmonds to do poorly in a specific season by virtue of being thrown out once or twice (again because of the huge cost associated with getting thrown out). Overall, Edmonds came in with a positive EqAAR in four of the six seasons and ranked 25th in 2001 with an EqAAR of 1.00. Guerrero is also an interesting case, since he ranks among the worst in baseball, except for his 2001 season where he recorded an EqAAR of 1.52 ranking him third.

Next, let’s take a look at the cumulative leaders.

Top 10 for 2000-2005
                            Total        Ex
                      Opp      AA     EqAAR   EqAAR
Derek Jeter           252   29.72     22.79    6.93
Ray Durham            230   31.10     26.62    4.49
Gary Sheffield        188   17.72     13.57    4.15
Tom Goodwin            83   16.27     12.54    3.74
Reed Johnson           88   14.26     10.61    3.65
Kevin Millar          105   17.37     13.79    3.58
Albert Pujols         151   19.56     16.00    3.56
Jimmy Rollins         180   29.46     26.19    3.28
Carlos Guillen        137   19.13     15.88    3.26
Jose Valentin         104   19.04     15.79    3.25

Bottom 10 for 2000-2005
                            Total        Ex
                      Opp      AA     EqAAR   EqAAR
Moises Alou           153    9.09     14.34   -5.25
D'Angelo Jimenez      110    8.54     13.10   -4.56
Benito Santiago        78    8.15     12.72   -4.56
Jim Thome             135    7.69     12.18   -4.49
Vladimir Guerrero     162   15.23     19.60   -4.38
Jose Offerman          96    7.39     11.46   -4.07
Timo Perez             87    4.47      8.16   -3.69
Tim Salmon            138   11.31     14.87   -3.55
Matt Lawton           198   16.30     19.78   -3.48
Richard Hidalgo       112   10.45     13.89   -3.45

As with the column two weeks ago, where Kevin Mench scored well, some of you are probably wondering how in the world could Kevin Millar do likewise here? The short answer is that Millar had a positive, if unspectacular, EqAAR in every season but 2001, when he ran up a 1.57. He did this by a) never being thrown out, and b) only failing to contribute positively in about a quarter of his opportunities. This once again points out that conservatism over the long haul is also a way to do well in this measure.

Finally, let’s calculate a rate statistic to measure those who do best and worst given their opportunities (well, those with at least 50 or more anyway).

Top 10 for 2000-2005 by Rate
                            Total        Ex
                      Opp      AA     EqAAR   EqAAR    Rate
Reed Johnson           88   14.26     10.61    3.65    1.34
Jolbert Cabrera        50   10.47      7.89    2.58    1.33
Craig Monroe           65   12.78      9.67    3.11    1.32
Gary Sheffield        188   17.72     13.57    4.15    1.31
Derek Jeter           252   29.72     22.79    6.93    1.30
Tom Goodwin            83   16.27     12.54    3.74    1.30
Jose Guillen          110    9.89      7.64    2.25    1.29
Dustan Mohr            55   10.46      8.16    2.30    1.28
Gabe Kapler            94   14.89     11.67    3.22    1.28
Tony Graffanino        97   14.52     11.43    3.09    1.27

Bottom 10 for 2000-2005 by Rate
                            Total        Ex
                      Opp      AA     EqAAR   EqAAR    Rate
Tony Clark             69    2.61      6.04   -3.43    0.43
Andres Galarraga       73    3.68      6.79   -3.11    0.54
Timo Perez             87    4.47      8.16   -3.69    0.55
Charles Johnson        82    4.53      7.53   -3.00    0.60
Lou Merloni            51    3.05      4.91   -1.86    0.62
Jim Thome             135    7.69     12.18   -4.49    0.63
Coco Crisp             95    3.29      5.20   -1.90    0.63
Moises Alou           153    9.09     14.34   -5.25    0.63
Victor Martinez        57    2.48      3.87   -1.39    0.64
Benito Santiago        78    8.15     12.72   -4.56    0.64

Once again the relatively low rate values serves to underscore the lack of space between the top and bottom performers and those in the middle.

A Final Note About Variability

Before we sign off, we should briefly discuss one other concept related to measures like these. In the tables above, you’ll notice that those who rate the best for a single season are generally in the range of +2 runs while those who rate the poorest are closer to -3 runs. However, when you look at the leaders and trailers for the six seasons from 2000 to 2005, you’ll notice that the leaders are generally in the range of +3 to +5 while the trailers are -4 to -5.

If one were to extrapolate the +2 for the leaders across six seasons you would end up with +12 instead of +5, and likewise for the trailers. There are two reasons for this. First and most obviously, the same players are not the leaders every season, so you wouldn’t expect that kind of linear extrapolation. Secondly and more importantly the extreme values on each end don’t represent an actual skill, but instead a combination of skill and random variation which has the effect of pushing some values to the extremes. Of course that applies to all kinds of metrics, including both traditional ones like batting average, and new ones like these.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Schrodinger’s Bat: An Air of Advancement

Thank you for reading

Latest Articles

Single-A Dynasty Pitching Prospect Standouts, April 2024 $

FAAB Review 2024: Week Four $

MLU: Meet Sem in St. Louis $

Box Score Banter: Gladly Pay You Tuesday B

The Splitter “Revolution” Part 1 B

Dan Fox

Latest Articles

Single-A Dynasty Pitching Prospect Standouts, April 2024 $

FAAB Review 2024: Week Four $

MLU: Meet Sem in St. Louis $