“Who is the best baserunner in the major leagues?…Last year we began an effort to better document baserunning. We have added some things to the record this year, and I have actual answers to the questions aboveā€¦I’m not just teasing you with the possibility that I could come up with answers if I wanted to.”

– Bill James, “Baserunning”, The Bill James Handbook 2007

Here we go again. Despite fear of being typecast as “that baserunning guy”, this week’s column will take a look at the baserunning metrics discussed this summer as applied to the 2006 season. Before we do, I’ll take a short digression into the baserunning work Bill James discussed in a short essay in the excellent The Bill James Handbook 2007. Brace yourselves for one more trip around the bases here in 2006.

Context Matters

When I discovered that the 2007 Handbook would include a new essay and expanded baserunning numbers I was as giddy as a schoolgirl. And when I read the opening paragraphs quoted above, I’ll have to admit that the tension in the room was palpable.

In the six page essay titled “Baserunning”, James details his point system for rating baserunners that takes into consideration six categories aimed at capturing baserunning skill. Those categories are:

  • Runners going from first to third on a single
  • Scoring from second on a single
  • Scoring from first on a double
  • Bases Taken (including advancing on wild pitches, passed balls, balk, sacrifice flies, and defensive indifference)
  • Baserunning Outs
  • Runs Scored as a Percentage of Times on Base

What Baseball Info Solutions does is to calculate how many bases the average runner would have gained in each situation given the same number of opportunities and then credits or debits each runner based on his difference from the average. While four of the six categories assign one point per base difference, the baserunning outs category receives triple weighting since outs on the bases are so costly while the runs scored percentage category is divided by three since it is so heavily context dependent. Total them all up and you can derive a rating that can answer the question, “who were the best and worst baserunners of 2006?” Simple.

For example, as discussed in the essay, Carlos Beltran advanced from first to third 9 times in 20 opportunities (45%) whereas the average runner advanced 28% of the time and so Beltran is credited with +3.4 bases or ‘points’. Likewise, Beltran took 23 bases from the defense in 226 times on base which is +6.6 more than expected. Beltran then gets +3.3 points for not making as many outs on the bases as the average runner, -0.2 points for advancing from second on a single, +0.48 for advancing on doubles when he was on first, and +7 points for his run scoring percentage having scored 86 runs in his times on base. Add it all up and Beltran is given a rating of +21. It turns out that’s very good and it does crack the top ten under this methodology, as shown in the following list.

1.  Chone Figgins     +28
2.  Chase Utley       +27
3.  Mark Ellis        +23
4.  Orlando Cabrera   +23
5.  David DeJesus     +23
6.  Jose Reyes        +22
7.  Mark Teahen       +22
8.  Willy Taveras     +21
9.  Carlos Beltran    +21
10. Hanley Ramirez    +21

Similarly the bottom ten are:

1.  Josh Willingham   -30
2.  Adrian Gonzalez   -25
3.  Mike Piazza       -25
4.  Frank Thomas      -23
5.  Jason Giambi      -23
6.  Ryan Howard       -21
7.  Pat Burrell       -21
8.  Travis Hafner     -21
9.  Victor Martinez   -20
10. Juan Rivera       -20

In addition, all players who were on base 50 or more times in 2006 are included in a table sorted alphabetically that has their advancement percentages, bases taken, out advancing, times doubled off, bases taken and total points in this system.

Clearly this metric captures a good deal of the essence of baserunning by virtue of the fact that the runners in the top and bottom lists are eminently reasonable. Where I quibble with the methodology, however, is along the following lines:

  • The first three categories fall under the heading of advancing on hits. In my research what I’ve found is that the field to which the ball was hit and the number of outs, not to mention whether there are runners on in front of our man, very heavily influence the advancement percentage. It is often the case that a runner is twice or more as likely to advance in one scenario than another. As a result, if a runner is followed in the order by a left-handed pull hitter they would be expected to advance more frequently from first to third on a single than a runner typically followed by a right-handed pull hitter. Likewise, if the runner were on base more frequently with two outs (say a seventh place hitter) as opposed to nobody out (a leadoff hitter) they would be expected to advance more frequently since they needn’t wait to see if the ball is caught. By not taking this additional context into account, the system is essentially saying the context evens out in the long run. This might be true over a longer period, but a season does not a long run make. In many cases, as documented for Beltran, a runner will get fewer than 30 opportunities in a particular scenario in a season and those can be heavily influenced by context. Equivalent Hit Advancement Runs (EqHAR), which includes all three hit advancement scenarios, takes this into consideration.
  • An additional contextual item not taken into account in the first three measures nor in the sacrifice flies component of the bases taken category, is park effect. Parks do indeed influence how frequently runners can advance on both hits and sacrifice flies as noted by James in the essay in discussing the difference between Derek Jeter and David Ortiz. EqHAR and Equivalent Air Advancement Runs (EqAAR), which includes both sac flies and advancing on other fly balls, both take this into account.
  • The bases taken category includes a couple of items that I had not considered but, James convinced me, are worthy of consideration. A moment’s thought is enough to realize that advancing on wild pitches and passed balls and perhaps even balks are categories that should be included in a total measure of baserunning. This is the flip side of the excellent work done by Sean Forman on catchers. They were omitted from the metrics I developed but will be included in the future. However, I’m a little hesitant to include advancing on defensive indifference. While it probably is correlated with speed and therefore baserunning, the value of advancing in those scenarios is so questionable, and the opportunities to do so coupled with the difficulty of judging when a runner is in such a situation add it up to my conclusion that including those occasions makes little sense/
  • These categories do not include advancing on ground outs, which is clearly a skill that good baserunners possess and in my estimation accounts for about a fifth of the total baserunning picture. Equivalent Ground Advancement Runs (EqGAR) takes this into account.
  • As James discusses, the idea of crediting runners for scoring a higher percentage of times they are on base is very context dependant. This is the reason he gives it a one-third weighting. However, it seems to me that many of the other categories (if he had also included advancing on ground outs) already include the components that lead to a higher run scoring percentage without introducing the problem of runners who score more frequently on homeruns and extra base hits by subsequent hitters in the order. The fact that in Beltran’s case fully one-third of his final score was based on run scoring hints at this weakness.
  • The baserunning outs category includes getting thrown out advancing on hits, getting doubled off on balls in the air, and getting thrown out trying to score on a sac fly. All of these are included in EqHAR, EqAAR, and EqGAR with the advantage that, being based on the Run Expectancy matrix, the weightings are more precise and take into consideration the number of outs and baserunners. Simply triple weighting them is a valid approximation but one that is less accurate. That said, this category also includes getting thrown out attempting to advance on passed balls and wild pitches, which none of my metrics currently include, but certainly should.
  • Stolen bases and getting picked off are not included in any of the categories. Clearly an argument can be made for excluding them since they are not always thought of as reflecting ‘pure baserunning’. Still, I think including it in Equivalent Stolen Base Runs (EqSBR) helps to complete the picture.
  • Finally, although the methodology that underlies metrics like EqHAR is based not on actual runs but merely theoretical ones (hence the term “equivalent” in the titles), converting baserunning events into this currency provides a clearer look at the magnitude of the impact of good and poor baserunners.

Put Up or Shut Up

Don’t get me wrong. I love that this kind of data is being published and even more that it is beginning to be analyzed. Quantifying a previously purely subjective concept like baserunning helps us to put some mental parameters on how much we should value such skills as we weigh them alongside others. And as mentioned, I’ve learned from the essay that there is at least one other category of baserunning data that I need to integrate into my framework (stretching hits is another although far more difficult).

But I’d be less than honest if I didn’t admit that I believe the combination of my metrics is in some sense better for the reasons mentioned above. So without further ado, here are the top and bottom ten baserunners for 2006.

2006 Total Baserunning

Name              Opp EqGAR   Opp EqAAR  Opp EqSBR   Opp EqHAR Total
Chone Figgins      38  2.20    50  1.76   66  0.30    56  4.93  9.19
Hanley Ramirez     44  3.08    26  1.57   66  1.45    50  2.78  8.89
Orlando Cabrera    36  1.27    36  0.83   30  2.95    67  2.64  7.69
Jimmy Rollins      45  1.50    46  0.59   40  3.99    71  1.49  7.57
Chris Duffy        25  0.62    14  0.37   29  3.24    33  1.88  6.10
Ichiro Suzuki      60  1.71    57  0.51   47  6.03    72 -2.19  6.07
Willy Taveras      39  0.51    39  0.24   44  0.97    52  4.21  5.92
Chase Utley        24  0.96    18  0.56   19  0.34    61  3.86  5.72
Jose Reyes         54  2.93    42  1.36   84 -0.42    51  1.52  5.39
Dave Roberts       30  0.23    40  0.60   58  4.00    55  0.39  5.21
Victor Martinez    29 -0.97    24  0.13    0  0.00    62 -4.90 -5.73
Javy Lopez         11 -0.42    22 -1.93    0  0.00    27 -3.22 -5.56
Magglio Ordonez    32 -0.26    28 -2.59    5 -2.67    44 -0.03 -5.54
Jason Giambi       18 -0.68    19 -1.41    2  0.19    44 -3.50 -5.40
Pat Burrell        11 -0.18    17 -1.27    1 -0.63    41 -3.16 -5.24
Paul Konerko       17 -0.35    41 -1.01    1  0.10    50 -3.86 -5.12
Jorge Posada       20 -0.79    31 -0.21    3  0.28    46 -4.21 -4.94
Josh Willingham    14 -0.48    14 -0.31    5 -1.31    26 -2.61 -4.72
Bill Hall          22  0.29    23 -0.99   17 -4.19    32  0.32 -4.56
Bengie Molina      19 -0.58    18 -0.82    2 -0.40    27 -2.74 -4.53

When we compare the lists here with those built under James’ system what we find is that six of the top ten and four of the bottom ten are in common. Clearly the major difference is the inclusion of EqSBR which vaults Ichiro Suzuki (who James calls out for his lackluster performance) and Dave Roberts into the top ten by virtue of their +6.03 and +4.00 scores respectively. Likewise, Magglio Ordonez and Bill Hall make the bottom ten because of their -2.67 and -4.19 EqSBR values. If EqSBR is excluded David DeJesus, included in James’ list, also makes the top ten as do Johnny Damon, Maicer Izturis, and Grady Sizemore. On the other end of the spectrum Frank Thomas comes in 9th and Ryan Howard barely misses the top ten. When EqSBR is taken out of the picture we have twelve of the twenty players in common.

So these lists are in substantial agreement although Chone Figgins and Hanley Ramirez look to be separated from the pack a bit in my rankings, whereas they do not under James’ system. It should be noted that Figgins led in 2005 as well with a total of 8.29, a run and a half better than Jose Reyes in second place. From both analyses it’s clear that a very strong case can be made for Figgins being the best all around baserunner in the game today with Ramirez probably a close second.

But what of Beltran? James’ system ranks him ninth, while in my system he ranks 19th at +3.86 by recording an EqGAR of +0.08, an EqAAR of +1.26, an EqSBR of +1.40 and an EqHAR of +1.11. He was not spectacular in any category but clearly above average in all respects. Over the course of the last ten years, the Mets center fielder is indeed probably the best baserunner the majors has seen. In a previous column, Beltran came out on top from 2000-2005 at +25.44 runs.

James also has an interesting discussion of Ortiz (+0 in James’ system) and Jeter (+3) with the point being that although Jeter is often regarded as being superior in all aspects, he’s probably overrated as baserunner while Ortiz is underrated because of his perceived slowness. Under my system Jeter ranks 185th at +0.22 doing especially poorly in EqHAR at -1.81. Ortiz on the other hand ranks 584th (67th from the bottom clustered among the likes of Jermaine Dye, Bernie Williams, and Michael Barrett) at -2.22. Ortiz scored so poorly based on his -2.48 in EqHAR, a category James gives him praise because he scored from second on a single 8 times in 17 opportunities and from first on a double twice in five opportunities. However, given the context of those opportunities, Ortiz comes out negative in both as he also did in his 26 opportunities to advance from first to third on a single. Where he was especially hurt was in his two opportunities when he was on second with the batter singling to right field with two outs. Historically, however, Jeter had done very well from 2000-2005, ranking second in the majors behind Beltran at +25.09 and so his reputation as a good baserunner seems to be well deserved. But as regards 2006, James is certainly correct that Ortiz was not as poor a baserunner as some might imagine and Jeter not as good as some give him credit for.

Interestingly, I participated in a Royals Roundtable recently on Baseball Think Factory where Kansas City Star columnist Joe Posnanski mentioned that Mark Teahen is “one of the best baserunners I’ve ever seen.” My rankings back that observation up, as Teahen placed 17th overall at +3.91 with a positive value in all four categories. Score one for Joe.

Moving the Chains

If I’ve been overly subtle, let me reiterate that in my view the system that James introduces in the 2007 Handbook is both a good approximation at answering the question of who is the best baserunner, and includes aspects that I hadn’t considered. In those respects it certainly moves the conversation forward. There are, however, some weaknesses in the system related particularly to context that can, and have been, addressed. James also includes an interesting essay on Manufactured Runs…but let’s save that for another day.