October 11, 2012
Is Joe Saunders a Double Play Machine?
On last Friday's episode of Effectively Wild, the daily podcast from Baseball Prospectus, our own Ben Lindbergh and Sam Miller (and guest Marc Normandin) discussed Buck Showalter's decision to start Joe Saunders in the AL wild card play-in game against the Texas Rangers. They noted that Saunders—who'll get the call again tonight in Game Four of the Yankees-Orioles ALDS—does not have amazing stuff and allows a lot of runners to reach base, and also that he does not have an exceedingly high groundball rate. Still, he seems to induce more groundballs at opportune times, and as a result, he gets a lot of double plays to bail him out of some major jams. Perhaps Saunders changes his approach with a runner on first and no one out in an intentional bid to get a groundball. It would make complete sense that he would do so.
Sure enough, Saunders pitched 5 2/3 innings and induced three twin-killings on the way to an Orioles win. Ben and Sam (and Marc) are the smartest human beings on the face of the earth.
But maybe this is just another case of selective memory and a fortunate example. Do pitchers really induce more grounders in double-play situations? The issue was preliminarily addressed by James Gentile over at Beyond the Boxscore last month, and he found that there was little evidence to suggest that groundball rates increased when the fields were ripe for a double play. I decided to take a second look.
Warning! Gory Mathematical Details Ahead!
I started with my trusty 2008-2011 Retrosheet database. I coded all plate appearances for whether they ended in a groundball. I also coded all balls in play for whether they were a groundball (this will become important in a moment.) Finally, I coded each plate appearance for whether it represented a potential double-play situation—that is, runner on first and one or zero outs. Only plate appearances in which a pitcher who faced at least 250 batters opposed a batter who had at least 250 PA were welcome. Pitchers batting were kindly excused.
This set up perfectly for a binary logistic regression, using a framework I've used elsewhere. I calculated the batter's groundball percentage, the pitcher's groundball percentage, and the league groundball percentage. All percentages were converted into odds ratios: through the formula (batter OR * pitcher OR / league OR), one can determine an "expected" rate for a groundball to happen given this batter/pitcher matchup. I converted the resulting expected odds ratio into a logged odds ratio, primarily because I was about to shove it into a binary logistic regression, which works on logged odds ratio anyway. This serves as our control for batter and pitcher matchup effects. If an event is randomly distributed (i.e., the result is, Strat-O-Matic style, simply a function of the average probabilities of the batter and pitcher), then other variables will have no predictive power.