“Let him hit it, you’ve got fielders behind you.”
–Alexander Cartwright, attributed by Bob Chieger in Voices of Baseball

When it comes to fielding analysis, there really is no such thing as simple. Be that as it may, in this space during the last month and a half we’ve been exploring a nascent fielding system I developed based on play-by-play data: “Simple Fielding Runs,” or SFR, for those who aren’t totally nauseated by yet another TLA (three-letter acronym).

Simply put, the main advantage of developing such a system is that it claims the middle ground between systems that are based on traditional fielding statistics (such as the Davenport Translations, or the subsequently-developed fielding component of Bill James’ Win Shares), and those based on zone data tracked for every ball put in play, notably Mitchel Lichtman’s Ultimate Zone Rating [UZR] or John Dewan’s Plus/Minus system (as described in the initial 2006 edition of The Fielding Bible). As we’ll see, each approach has its advantages, and we’ll exploit one of those where SFR is concerned in the final section of today’s column.

While the idea of creating a system based on play-by-play records turned out not to be very original, the implementation of SFR is unique, and it has been interesting to see how the two systems fare when compared to those based on more granular data. To that end, and in the formulaic threefold manner that defines so many of these columns, this week we’ll briefly recount the refinements that have been made in SFR over the last month, then make a few comparisons between SFR, UZR, and the Plus/Minus system, and wrap up by applying SFR to the 2007 minor leagues to take a look at who is flashing the leather down on the farm.

Evolution Not Revolution

We’ll lead off with a quick recap of where we’ve been with SFR and how it has evolved. Regular readers will recall that the original version discussed in early December had three core components.

First, a baseline for each position is created that specifies how frequently and at what cost balls of different types (line drives, groundballs, popups) hit “near” a fielder were turned into outs. The second step is to make the same calculations for each fielder and compare their results to the matrix to determine how many balls above or below average the fielder either turned into outs or let past him (assuming all other things being equal). The final step is to convert the difference into a run value, which we did using values derived from linear weights. The rub in all of this is in determining just what a ball hit “near” a fielder actually means, since we don’t have hit location data broken down into zones. In the first iteration very simple partitioning rules were used to come up with the “virtual area of responsibility” for each position; and by “simple” I mean static, as in “for shortstops, all ground balls fielded by the shortstop and left fielder are considered, as well as half the groundballs fielded by the center fielder.”

We then compared SFR to UZR for 2005 and 2006 data and computed a correlation coefficient of 0.75, which was high enough to justify continued development of the system.

Our next step was taken a week later and primarily incorporated two refinements that most readers likely thought obvious: batter handedness and bunting. In taking the first into consideration, we’re controlling for fielders facing a disproportionate number of batters from one side that may in turn skew the overall difficulty of turning batted balls into outs. In the latter case, we’re affecting corner infielders who usually end up fielding bunts, since treating bunts like normal ground balls is clearly not sufficient. In addition, this second attempt eliminated from consideration for middle infielders all line drives that resulted in extra-base hits, since the odds of a line drive catchable by an infielder resulting in a double or triple are negligible. One final refinement in the second beta version of the system involved introducing more sophisticated partitioning rules. These rules were based on the proportion of batted balls that we find “in the wild” for balls we know were fielded by the positions participating in the split (i.e. between third and short, and second and first) with the split between short and second handled as a 50/50 division. The new partitioning rules also put an end to the double-counting of batted balls where, for example, the same grounder to the outfield was assigned to both the third baseman and shortstop. Instead, each ball was assigned using the proportion calculated in the partitioning rules.

After applying these changes we once again ran our comparison with UZR and found that our correlation coefficient rose slightly, from 0.75 to 0.78, in addition to bringing SFR in line with UZR in terms of range and standard deviation. It should be noted that while I’m effectively using UZR as a baseline for comparison, that doesn’t mean that UZR is a perfect system. My use of it is based on the principle that all other things being equal, a system like UZR based on more detailed data should, in theory, give us results that are closer to reality.

While those changes were certainly beneficial, I quickly discovered (and have not yet discussed in this space) a few more tweaks that could possibly make a substantive difference.

First, I considered the additional context for all infielders depending on whether first base was occupied, and added this to the baseline used for comparison. It turns out that for first basemen, on groundballs hit by lefties, the percentage of runners who reach via hit or error goes from 16 percent to 24 percent when first is occupied, and from 20 percent to 29 percent for right-handers. For second basemen in the same scenarios it goes (respectively) from 27 percent to 29, and from 30 percent to 34. For shortstops, interestingly, the trend is the opposite, as it declines from 34 percent to 30, and 32 percent to 31. Finally, for third basemen it’s pretty steady–an unchanged 27 percent on lefties, and from 24 percent to 26 against right-handers. Although a far smaller percentage of fielded balls, the differences are significant for bunts when first is occupied, since the vast majority of such attempts are sacrifice bunts instead of bunt hit attempts, which have a higher success rate.

Secondly, the partitioning rules for determining which balls are assigned to first and third basemen were once again altered. In the previous attempts, all balls in the shared areas of responsibility–except as noted above for middle infielders when a line drive resulted in an extra-base hit–were partitioned according to the split percentages. This was changed to exclude bunts from the calculation of the partitioning percentage and, more importantly, all extra-base hits on groundballs to left field are now assigned to the third baseman, and all extra-base hits on grounders to right are assigned to the first baseman. This is done based on the assumption that these are hits down the line that naturally would fall within the corner infielder’s area of responsibility.

Based on some correlation results shown in the second article, we also no longer partition any popups, fly balls, or line drives to the outfield, since by doing so we didn’t seem to be adding much information to what we already knew. Fielders still get credit, however, for balls of these other types that they field. And as noted above, we were using a 50/50 split on groundballs up the middle. Shortly after publication, this was changed to partition these balls based on the percentages of grounders actually fielded by the shortstop and second baseman. Finally, in just the last few days, a small error in attributing popups, line drives, and fly balls for shortstops was corrected.

The end result of all of these changes and refinements is that we’re ready to–in the words of my field of software development–ship the code and stamp the current version for infielders as version 1.0 of SFR. What that means is that you can now download a spreadsheet containing the 2005 through 2007 data for all infielders. Enjoy.

Like Two Peas in a Pod?

In order to provide just a little more context for this first version, let’s run a few more comparisons to UZR and the Plus/Minus system. As we did before, we can compute correlation coefficients for SFR and UZR and break it down by position. This time, though, we’ll use all seasonal data for 2003 through 2006 for players who played in 50 or more games at their position. The results are shown in Table 1:

Table 1. Correlation Coefficients for SFR vs. UZR, Seasonal 2003-2006 for >=50 Games Played
Pos  Seasons    r
All     549   0.80
1B      143   0.68
2B      132   0.81
SS      141   0.81
3B      133   0.82

From this it appears that our latest set of changes bought us a slight rise in correlation coefficient, increasing the similarity slightly for first basemen, second basemen, and shortstops. It’s still interesting to note that first basemen lag behind the other positions, although I still have no solid reason why this should be the case. It certainly could be that the style of play varies more for first basemen in terms of positioning and allowing or not allowing a second baseman to field certain types of balls that a more granular system can take into account. It should also be noted that in this version of SFR the context of whether first base is occupied is taken into account, although the state of second base is not. This may have a larger effect than one might think. This whole question is obviously one ripe for further research.

In addition to running correlations based on seasonal data, we can also see how our metric performs across a span of a few seasons for players with significant playing time. Sean Smith reported his findings for TotalZone+ (the analogous system referenced in the introduction) recently, so we’ll take a similar tack here. In Table 2 you’ll find the correlations by position for all players who played in 162 or more games (Sean filters by 500 or more chances) at their positions from 2003 through 2006:

Table 2. Correlation Coefficients for SFR vs. UZR, Aggregated 2003-2006 for >=162 Games
Pos  Players   r
All     156   0.87
1B       38   0.78
2B       40   0.88
SS       41   0.86
3B       37   0.89

To give you a feel for what this looks like graphically, Figure 1 shows the scatter plot colored by position (and yes, that blue dot in the upper right hand corner of the graph is new Twins shortstop Adam Everett, which SFR has at +81 runs in 481 games, and which UZR has at +104):

Figure 1. SFR vs. UZR for players with >=162 G from 2003 through 2006


In addition to comparing SFR to UZR we can also compare it, at least in the aggregate, to the Plus/Minus system developed by Baseball Info Solutions. I used the caveat “at least” since full data for 2006 and 2007 is not available, although summary data as well as leaders and trailers have been published in the Hardball Times’ annual and The Bill James Handbook 2008.

In THT’s published work, we find the aggregated team totals for 2007 broken down by middle and corner infielders, which we can compare to the SFR totals for those positions, as shown in Table 3 ordered by total SFR:

Table 3. Comparison of SFR to Plus/Minus, 2007 Team Totals
           Middle          Corner          Total
Team    SFR     +/-     SFR     +/-     SFR     +/-
TOR      42      56       5      25      47      81
COL      35      46       9     -16      44      30
SDN      30       0       8      -6      38      -6
SFN      19       1      18      46      37      47
BOS      15       3      15      14      30      17
OAK      21      14       8       8      30      22
CHN      11       3       3      13      14      16
SLN     -15     -17      28      58      13      41
BAL      17       3      -5      -5      12      -2
ATL      11      -5       1       4      12      -1
NYN      15      15      -3      11      11      26
KCA       6      25       5      24      11      49
PHI      17      25      -6      -2      11      23
MIN      -6      14      11      16       5      30
ARI       2      19       3     -18       5       1
ANA       0       8       1      27       1      35
TEX       4      -9      -3     -14       0     -23
NYA      -8     -20       3      17      -6      -3
DET     -26       0      16      29     -10      29
WAS     -11     -33       0      -8     -11     -41
LAN      -5      -8      -8     -20     -13     -28
CLE     -10       5      -5      -9     -16      -4
SEA      -7     -11     -10     -14     -17     -25
HOU      -3      -5     -15      17     -18      12
PIT     -23      -4       2      18     -22      14
CIN     -13       7     -13     -29     -26     -22
CHA     -19     -38      -9     -15     -28     -53
TBA     -39     -51      -4     -17     -43     -68
MIL     -16      -5     -33     -41     -49     -46
FLO     -43     -54     -20     -44     -64     -98

Keep in mind that while SFR is denominated in runs, Plus/Minus is simply counting the difference from expected in the number of balls fielded (although for corner infielders Plus/Minus does have a concept termed “Enhanced +/-” that considers balls hit down the line for corner infielders to give them added weight, although I can find no evidence that the numbers presented here are “Enhanced”). This means that the Plus/Minus numbers will have a larger magnitude. A quick translation would multiply the Plus/Minus number by something like 0.44 for middle infielders and 0.50 for corner infielders to convert them to runs.

The lists agree substantially, with 13 of the 16 teams that SFR pegs as above-average also rating as such in Plus/Minus. Overall, a regression between the totals results in a correlation coefficient of 0.79 with no discernable difference between middle infielders (at 0.78) and corner infielders (at 0.78). Once again, to give you a visual, the following graph depicts how the two systems see the teams overall in terms of their infield defense:

Figure 2. SFR vs. Plus/Minus for 2007 Teams

SFR vs. +/-

There are clearly differences between the systems–for example, the cluster of Detroit, Pittsburgh, and Houston, which SFR sees as below average, while Plus/Minus has them in the black–but the similarities and the fairly tight correlation between both UZR and Plus/Minus should give us some confidence that the system is indeed measuring fielding prowess to a substantial degree.

We can’t perform the same kinds of correlations for Plus/Minus using player seasons as we can for UZR, since the data is not available, but we can sample the leaders and trailers for 2005 through 2007 as published in the Handbook. What follows is a series of tables that shows the top and bottom five in Plus/Minus, along with the SFR value computed for the player. I’ve also added a few notable players who may have done well in one metric but not the other (blank values for Plus/Minus indicate that the player did not appear in the leaders or trailers lists).

Table 4. Leaders and Trailers in Plus/Minus Compared to SFR, 2005-2007
     First Base                          Second Base
Player             SFR       +/-    Player              SFR       +/-
Albert Pujols       22        72    Chase Utley          29        64
Casey Kotchman       7        31    Orlando Hudson       19        53
Doug Mientkiewicz    8        31    Aaron Hill           36        48
Lyle Overbay        14        24    Mark Ellis           42        43
Kevin Youkilis      10        19    Mark Grudzielanek    27        36
--------------------------------    ---------------------------------
Mike Jacobs         -7       -25    Jorge Cantu         -29       -29
Richie Sexson       -4       -25    Jose Vidro          -13       -31
Carlos Delgado     -11       -26    Craig Biggio         -3       -33
Adam LaRoche        -7       -28    Jeff Kent           -13       -36
Prince Fielder     -21       -33    Rickie Weeks        -45       -41

     Notables                            Notables
Player             SFR       +/-    Player              SFR       +/-
Mark Teixeira       16        15    Jamey Carroll        32        17
Nick Johnson         7         6    Placido Polanco      10        28
Justin Morneau       5        14    Brian Roberts         8        25
Olmedo Saenz        -8              Jose Castillo       -22
Jason Giambi       -11              Brandon Phillips    -24
                                    Dan Uggla           -30

     Shortstop                           Third Base
Player             SFR       +/-    Player               SFR      +/-
Adam Everett        64        92    Pedro Feliz          30        64
Jason Bartlett      22        45    Brandon Inge         33        61
Clint Barmes        17        43    Scott Rolen          27        50
Jimmy Rollins       22        42    Joe Crede            24        44
Jack Wilson         11        41    Adrian Beltre         7        42
--------------------------------    ---------------------------------
Marco Scutaro      -15       -33    Garrett Atkins        6       -21
Felipe Lopez       -20       -34    Edwin Encarnacion   -26       -25
Hanley Ramirez     -23       -43    Hank Blalock         -8       -28
Michael Young      -26       -64    Mark Teahen         -22       -30
Derek Jeter        -37       -90    Miguel Cabrera      -16       -37

     Notables                            Notables
Player             SFR       +/-    Player              SFR       +/-
Omar Vizquel        45        31    Nick Punto           19        28
Jose Reyes          30              Eric Chavez          30        27
John McDonald       20        33    Ryan Zimmerman        9        24
Rafael Furcal       18        36    Morgan Ensberg       16        16
Troy Tulowitzki     10        30    Ryan Braun          -28
Angel M. Berroa    -27       -33    Alex Rodriguez      -19
Carlos Guillen     -21

Defense in a Minor Key

Because of the confidence we’ve gained through the comparisons to UZR and Plus/Minus, we can now begin to take the next step and apply the SFR methodology to data sets where we do not have numbers generated from a more granular system. To wrap up today, let’s review the leaders and trailers for all of the minor leagues (except the Mexican League) for 2007 by position.

First, let’s tackle the first basemen:

Table 5. Minor League SFR Leaders and Trailers at First Base, 2007
Player             League   Team       SFR
Brandon Snyder       SAL     DEL         9
Daric Barton         PCL     SRC         9
Yurendell De Caster  INT     IND         8
Larry Broadway       INT     COH         8
Todd Self            TXS     COR         7
Lars Anderson        SAL     CAP        -6
Jeffrey Cunningham   PIO     CAS        -8
Logan Morrison       SAL     GBO        -9

Our top slots go to Orioles farmhand and converted catcher Brandon Snyder playing in A-ball and Daric Barton playing at Triple-A for the A’s organization. Snyder just missed Kevin Goldstein‘s list of top Orioles prospects, while Barton (another converted catcher), after being named the top prospect before the 2007 season, didn’t disappoint with the bat and remains at the top of the Oakland stack.

On the flip side, 20-year-old left-handed slugger Logan Morrison, toiling in A-ball for the Marlins, had a good year with the bat by smacking 24 home runs despite struggling against southpaws. However, he did not turn in such a good season with the glove. Another big slugger in the Rockies system, Jeffrey Cunningham, got his first taste of professional baseball in Casper and rated at -8 runs in just 59 games.

Now it’s on the second basemen:

Table 6. Minor League SFR Leaders and Trailers at Second Base, 2007
Player             League   Team       SFR
Adam Davis           SAL     LCO        17
Jayson Nix           PCL     CSP        16
Jose Vallejo         MDW     CLI        15
Luis Valbuena        SOU     WTD        15
Miguel Abreu         SAL     DEL        14
Chih-Hsien Chiang    SAL     CAP       -14
Brooks Conrad        PCL     RRE       -14
Chase Fontaine       SAL     ROM       -15

Grabbing the top spot at +17 runs is Adam Davis playing in Low-A for Cleveland. Not exactly a prospect at 22 years old, he split time between second (104 games) and third base (25 games) where he also rated +2. At runner-up we find the Rockies’ 2001 first-round pick, Jayson Nix, shining afield at Colorado Springs. After delivering his best season with the bat since 2003, he’s in the running to win the second base job, where he’ll be competing with Marcus Giles, Ian Stewart, Clint Barmes, and Omar Quintanilla, among others. He will have a leg up defensively.

Chase Fontaine was promoted from the Sally League to the Carolina league, and in both stops was tried at short, third, second, and the outfield. While I haven’t run the outfield numbers, it’s clear that the Braves are trying to find him a position where he can do the least damage. When you add it all up, in his two stops his infield “contribution” totaled -27 runs.

While never touted for his glove work, Brooks Conrad’s 2007 season was a disaster on both sides of the ball. As profiled by Marc Normandin, his .218/.305/.420 performance at Round Rock for the Astros will be an impediment now that he’s a minor league free agent. His -14 SFR at second base to go along with a -2 at third base in just 13 games won’t help either.

On to the shortstops…

Table 7. Minor League SFR Leaders and Trailers at Shortstop, 2007
Player             League   Team       SFR
Hainley Statia       CLF     RCQ        21
Ramon Santiago       INT     TOL        21
Clint Barmes         PCL     CSP        19
Jonathan Herrera     TXS     TUL        16
Juan Sanchez         DSL     DTW        15
Jeffrey Dominguez    CLF     HDM       -16
Dylan Johnston       NWN     BOI       -16
Neil Walton          FSL     VBD       -17

At shortstop, our leader is Hainley Statia playing his first full season at High-A. According to Baseball America, Statia’s “an instinctual middle infielder with plus defensive skills.” SFR agrees, as he was +14 at two stops in 2006 and +16 at two stops in 2005. In second and third places, we find a couple of Triple-A veterans, Detroit’s Ramon Santiago and Colorado’s Clint Barmes, both of whom will be competing for utility roles on their respective clubs in 2008.

Prior to the 2006 season, Neil Walton was ranked by Baseball America as the top defensive middle infielder and the player with the best infield arm in the Rays’ system. His 2006 SFR of +5 seems to support that, but in 2007 he committed 25 errors in 88 games and showed decreased range on his way to a -17 SFR. Dylan Johnston was drafted by the Cubs in 2005, has played nothing but shortstop in his three seasons, and reportedly has average range with a quick release. In 2007 he was promoted from Boise to Peoria, but before moving east he committed 28 errors in 56 games leading to an SFR of -16. He wasn’t done, however, and after moving to the Midwest League he totaled -8 by committing 18 more miscues for a 2007 SFR total of -24.

Finally, at the hot corners of the minors, the best and worst…

Table 8. Minor League SFR Leaders and Trailers at Third Base, 2007
Player             League   Team       SFR
Ryan Rohlinger       SAL     AUG        35
Mario Lisson         CRL     WIL        16
John Contreras       DSL     DCU        15
Andrew Davis         NWN     SKV        15
Mike Hessman         INT     TOL        14
Michael Grace        SAL     KAN       -11
Matthew Sweeney      MDW     CED       -17
Mat Gamel            FSL     BRE       -24

As a 24-year old who hit .235 in Low-A, Ryan Rohlinger is not exactly high on most people’s radar, but his SFR of +35 ranked the highest of any player. Although he did play second and short in college, one wonders whether there is a data problem (although in looking at the other infielders at Augusta, nothing jumps out) or if this is just something of a fluke in the system. We’ll give Ryan the benefit of the doubt for now, and crown him with the title of best infield defender in the minors in 2007.

In the runner-up slot we find Royals third sacker Mario Lisson, who played at High-A Wilmington as a 23-year-old. Lisson is acknowledged as a good defender–as you might expect of a converted shortstop–and he’s on the 40-man roster, since the Royals opted to protect him in the Rule 5 draft.

On the bottom we find Brewers farmhand Mat Gamel, who purportedly has plus arm strength but poor footwork that leads to poor throws. He made an astonishing 53 errors in 113 games in 2007, and since 2005 has committed 91 errors in 229 professional games at third base, good for -34 runs on his career. Can you spell DH?

Finally, we have 19-year-old Angels prospect Matthew Sweeney, who in his first full season in Low-A committed 28 errors in 85 games at third leading to his -17 SFR. Although he has decent arm strength, a move to first base seems to be in his future.