<< Previous Article
Wait 'Til Next Year: W... (04/03)
|
<< Previous Column
Schrodinger's Bat: Rem... (03/27)
|
Next Column >>
Schrodinger's Bat: Def... (04/10)
|
Next Article >>
Prospectus Preview: Th... (04/03)
|
April 3, 2008
Schrodinger's Bat
Reminiscing with SFR, the Sequel
by Dan Fox
"Aw, how could he lose the ball in the sun, he's from Mexico."
--Harry Caray, on a Jorge Orta feat in the field.
Baseball season is finally underway, and that means multiple games every day to choose from, along with a neverending supply of story lines, intrigue, and just plain excitement from now to October. But at the same time, our quest for enlightenment never ceases, and so this week we'll take a second look at applying Simple Fielding Runs (SFR) to historical play-by-play data.
Regular readers will recall that last week we applied the algorithms for Simple Fielding Runs (at least for infielders) to the 1988 through 1998 seasons, and spent a few enjoyable moments thinking about the accomplishments of the defenders of yesteryear. I noted that SFR could be run in its original form against that particular data set because the two primary pieces of information that are required--fielder position and hit type (line drive, groundball, fly ball, popup)--are relatively intact and found in the same ratios as in the 2003 through 2007 data, as well as the minor league data to which SFR has been applied. Sadly, that is not the case for earlier seasons where hit type information is notably in shorter supply. However, the record of the fielder who fielded the ball is essentially complete for the vast majority of the seasons stretching back to 1957. So, just as the scientists at Jurassic Park filled in the missing dinosaur genetic code with that of a frog, we can adjust the SFR algorithm to do likewise when hit type data is absent. Let's just hope our results are a little more positive.
To be more precise, two main adjustments were made to SFR this week.
- Pitcher Handedness. I'm sure some readers have wondered why pitcher handedness had not been considered in the basic contextual framework that already includes batter handedness, hit type, hit value (single, double, etc.), first base occupied, and bunt. My intuition was that it probably didn't add a great deal to what we already knew about batted ball distribution, but since I was in the midst of modifying the software, you can wonder no longer, since I threw it into the mix. At this point, however, I've not had a chance to see how its inclusion affects SFR's correlation with UZR or the Plus/Minus system.
- Missing Links. As mentioned above, the algorithm has now been tweaked to account for events for which the hit type is blank. For the most part, we find that data is missing for singles and doubles fielded by outfielders, although there are a few instances of infield hits that go unrecorded in this way as well. In order to account for this, the basic idea is to assign a percentage of those unrecorded events as groundballs, since in measuring infield defense we're really primarily interested in grounders that get through the infield. To do this, I create a matrix of ground-ball percentages for those seasons for which full data exists, broken down by the various pieces of context mentioned above, and apply the percentages to the aggregated events for which no hit type is recorded. This allows the system to treat an appropriate subset of these events as groundballs and do so in proportion to the best information we have (data from other seasons). Although I had considered creating a matrix by pitcher in order to account for ground-ball versus fly-ball tendencies, I did not do both because of time constraints and because in looking at the data I was uncomfortable in vastly reducing the sample sizes that taking an individual pitcher approach would necessitate. Another idea that I did not follow through on--but that likely would be effective--is to adjust the aggregate percentages across the full range of context by the ground ball tendency of the pitcher. In any case, with the unrecorded hit types now safely accounted for, the system can run as normal.
<< Previous Article
Wait 'Til Next Year: W... (04/03)
|
<< Previous Column
Schrodinger's Bat: Rem... (03/27)
|
Next Column >>
Schrodinger's Bat: Def... (04/10)
|
Next Article >>
Prospectus Preview: Th... (04/03)
|