BP Comment Quick Links


September 8, 2011 Resident Fantasy GeniusThe Infield Fly Rules
Yesterday, Jason Collette penned an article about infield flies that ties in with a discussion I’ve been having over the past few weeks with one of my former writers at The Hardball Times Fantasy, Jeff Gross, and one of his readers, Alex Hambrick. Additionally, BP readers JoshC77 and kcshankd wondered in the comments section of Jason’s article whether the ability to induce infield flies was a repeatable skill for pitchers. Today, I thought I’d try to answer that question and present some of the research I’ve conducted in my conversations with Jeff and Alex. Earlier this year, I wrote an article entitled “When Pitchers' Stats Stabilize,” in which I looked at how “stable” (or how “repeatable,” in terms of being a “skill”) a number of stats were—infield flies among them. In the article, I found that infield fly balls, as a rate of total batted balls, took roughly 0.6 years to “stabilize.” In other words, this initial research suggested that infield flyball rate was indeed a very repeatable skill. But in my discussions with Alex, I’ve come to suspect this may not necessarily be true—or at least not in the way my previous research suggests. In one email, Alex wrote: I have known for quite some time that IFFB is a strong function of FB, much like HR is a strong function of FB for pitchers. (The league average IFFB/FB hovers around 10%). In other words, you can, with reasonable accuracy, predict IFFB by simply multiplying FB by .10. (.10*FB correlates to IFFB with an r^2 of .70). While we’ve certainly known that there is a correlation between total fly balls and infield fly balls (due in large part to the vertical location and movement of a pitcher’s pitches, forcing batters to put the ball in the air), this presented an interesting question: “Is infield fly rate a skill above and beyond a pitcher’s skill in simply allowing fly balls in general?” To help answer this question, I decided to pull out my old friend, the splithalf correlation (if you’re interested, you can read all about the methodology in the “When Pitchers' Stats Stabilize” article). What I’ve done is run two separate analyses. In the first, I’ve run a splithalf correlation using infield flies per contacted balls (henceforth referred to as IF FB%) in both halves. In the second, I put IF FB% in one half and put all flies multiplied by .07 (roughly seven percent of flies are of the infield variety) per contacted balls (henceforth referred to as FB*.07) in the other half. So essentially, I’m first correlating IF FB% with itself and then correlating it with flyball rate. This gives us the following results:
That’s very interesting. While the differences are small, IF FB% actually seems to be less stable than simply using a leagueaverage percentage of infield flies per total flies. Since multiplying by .07 doesn’t affect the correlation (I’ve just included it to better illustrated the point that this is an estimate of infield flies), what this essentially tells us is that flyball percentage better predicts infieldfly percentage than it can predict itself. Let’s check out one more way of looking at infield flies. We know that total flies stabilize very quickly, so perhaps infield flies per total flies (henceforth referred to as IF/FB) will prove useful.
Nope. While the percentage of total flies (FB%) stabilizes extremely quickly (as quickly as groundballs do, for what it’s worth), it takes the average pitcher two and a half years for his IF/FB rate to stabilize. That means we can throw IF/FB rate out the window entirely—we’d be far better off using either IF% or FB*.07. We could stop here and have learned something useful, but we can do a little better yet. What these splithalf correlation tests give us is the point at which the stat produces an R of 0.50. Using this data, we can create a regression to the mean equation and estimate a player’s true talent level. While the easiest mean to regress to is always the league average, it’s rarely the best. As Alex posited, and as has been known for a while now, flyball rate and infieldfly rate are correlated. As such, we can reexamine this relationship and then use our results as the mean we regress to.
(The above graph includes all pitcher seasons with at least 100 innings pitched in a year from 2005 to 2010) As you can see, the relationship is very strong (rsquared of 0.68). Pitchers who give up a lot of total flies also manage to induce a lot of infield flies. The most extreme flyball pitchers actually manage to convert nearly 10 percent of all contacted balls into popups. So instead of regressing each pitcher to leagueaverage flyball rate, we can regress each pitcher to his own unique rate based upon this relationship. Now we get to the fun part: applying all of this to actual players. Ideally, we’d use multiple years, incorporate aging, weighting, etc., but I’m just going to do it simply. What I’ve done is used a pitcher’s actual, unregressed 2011 flyball rate to create his personal mean based on the formula above. I’ve then regressed his 2011 infieldfly rate onto this mean based upon the splithalf correlation tests we ran at the beginning of the article. When I do this and focus on all pitchers who made at least 10 starts this season, here are the pitchers with the best infield fly ball “true talent” levels (rIF%):
That’s an interesting name at the top of the list. When we think about top pitchers, Guillermo Moscoso isn’t the first guy that comes to mind. While his strikeout and walk rates are underwhelming, he’s an extreme flyball pitcher, which will allow him to induce a lot of popups. There are some other notsoterrific players on this list (Collmenter, Matusz, Tillman, Cecil, Hughes), but since they all have posted high fly rates, they’ll at least be useful in terms of inducing popups and keeping a lowerthannormal BABIP. Jered Weaver—sort of the poster boy for using infield flies to beat his FIP—ranks second on the list, and longtime popup artist Clayton Kershaw also ranks highly. Now let’s take a look at our trailers:
Extreme groundball pitchers rule this list. Notorious sinkerballer Derek Lowe trails everyone in terms of regressed infieldfly rate, followed by 2011 breakout pitcher Charlie Morton. Romero and Greinke have been excellent this season in terms of strikeouts and walks, but as groundball pitchers, they shouldn’t be expected to induce many popups. They remain elite pitchers, though, since these kinds of pitchers can afford to give up a few more hits as they allow fewer homers. 39 comments have been left for this article. (Click to hide comments) BP Comment Quick Links studes (280) Apologies for not knowing the answer to this, but how is infield fly defined? I believe you use the MLBAM version, which is the same as Pitchf/x? Is that right? How do they define it? Sep 08, 2011 14:16 PM Yes, Dave, I used the MLBAM classifications, which are what's displayed in the PITCHf/x feeds. Subjectively is how they're defined :) Sep 08, 2011 15:00 PM studes (280) Thanks, Derek. That's what I thought. One observation is that BIS is more rigorous about identifying infield flies. It doesn't depend who caught the ballit's based on where the ball lands. I know Colin has issues with how well they measure it, but I'd have more faith in the BIS classifications being meaningful than MLBAM's. Sep 08, 2011 15:12 PM I have not, generally speaking, been able to see a quality difference between BIS and MLBAM data. I would say that for popup data, too, to the extent that I've compared the two in that. I would also say that the MLBAM definition makes a lot more baseball sense to me than the BIS definition, even if the BIS definition is somewhat more carefully applied (and I wouldn't actually assume that it is). Sep 08, 2011 16:49 PM studes (280) Why would you say that, Mike. Cataloging flies according to the person who caught them is dependent on the range of the infielders in question. Why would that make more "baseball sense" than where the ball actually lands? Sep 08, 2011 18:17 PM Because a little bloop or a skyhigh, well, I don't know what you'd call it besides a popup, that happens to land or be caught a few feet on the outfield grass does not seem to me to be like a high can of corn out to an outfielder. It seems much more like a little bloop or a skyhigh pop up that is caught on the infield dirt. Sep 08, 2011 19:52 PM I should clarify that my understanding is that the BIS definition is that anything on the outfield grass is an outfield fly (or line drive, of course). If that's not right, please correct me. Sep 08, 2011 19:58 PM studes (280) I'm confused. Derek said that any fly caught by an infielder is an infield fly. You're saying that MLBAM uses distance instead? That's exactly what BIS does too (and they don't just use the infield parameters). Sep 09, 2011 04:11 AM studes (280) That is, they don't use "outfield grass," which obviously can't be used in some ballparks anyway. Sep 09, 2011 04:20 AM I don't have a written guideline for either MLBAM or BIS. What I have is my observation of the data, which is from a fairly large pool for MLBAM (several seasons) and a smaller pool for BIS (partial season). Sep 09, 2011 06:22 AM studes (280) You know, I think THT started this whole infield fly thing when we requested infield fly data from BIS back in 2005. We thought it was a new angle that would be interesting. Guess that's why I have such a strong interest in it. Sep 09, 2011 07:09 AM studes (280) Quick question #2: other studies have found that IF/FB goes up as FB% goes up. It appears that your graph might support that theory too. Any thoughts? Sep 08, 2011 14:20 PM Yes, I'd agree with that. I ran those numbers when putting the article together too but they didn't make it in. The relationship was weak, but it was there, at least in whatever cursory tests I ran. 0.05 rsquared, significant at the 1% level IIRC. I think at the extremes, it ended up being something like the most extreme fly ball pitchers would be expected to post an 8.5% IF/FB and the most extreme nonfly ball pitchers would be 5.5% (league average is about 7%). Sep 08, 2011 15:01 PM Actually, I think I'm remembering something different there. The rsquared was actually 0.21, super significant pvalue. Sep 08, 2011 15:10 PM studes (280) Wow. That is significant. I'm having problems, then, reconciling that fact with the notion of just using straight flyball rates as a proxy for infield fly rates. Wouldn't the rate (the 7% you used in your first table) go up as the overall flyball rate goes up? Sep 08, 2011 15:20 PM Well, yes, it's certainly significant. But I think it comes down to what's *more* significant. If I run a correlation using the same set of pitchers (I should have mentioned before it's all pitchers from 20052011 with 100 IP in a season) but using IF/CON and FB/CON, the rsquared is 0.68  much more significant. It's not as if we can't predict infield flies using IF/FB, it's just that there are better ways to do it. I mean, 2.5 years isn't insignificant. HR/FB is close to 10 years, for the sake of comparison. It's just not optimal. Sep 08, 2011 15:44 PM studes (280) I guess when you throw off rsquareds like that, I don't know what you're saying. I often have that problem with BPro articles (and some of my own too!). What is the point of that correlation? Sep 08, 2011 18:20 PM Well, of course it depends on what you want to get out of the data. I think we have two different goals. As such, either way is viable, but if we're looking for accuracy (what I'm looking for and what really matters for fantasy), then one seems to be preferable. Sep 08, 2011 19:14 PM studes (280) I get confused when people start flipping off rsquareds as to what is being correlated with what, and how that relates to other correlations. Sep 08, 2011 19:40 PM Sorry, I should clarify. I wasn't referring to the splithalf correlation. The 0.21 comes from IF/FB and FB%, which you originally asked about. The 0.68 comes from IF% and FB%, which I advocated in my article, but which I ran a separate correlation for in the comments so that it could be compared to the first one I did in response to your comment. Sep 08, 2011 19:44 PM studes (280) Got it. Thanks. It makes sense to me that IF% and FB% would be correlated, since IFs are included in FBs. At least, I think it makes sense. :) Sep 08, 2011 19:48 PM Yes, that's probably a big reason for the correlation (though the IF/FBtoFB% connection plays a part too, plus any additional skill that may be getting captured). Sep 08, 2011 19:52 PM studes (280) Okay, I think I see. By using a linear formula that isn't based on zero, you're approximating what might be a curvilinear relationship. Sep 09, 2011 04:27 AM studes (280) By the way, I am just totally confused about these points. What does HR/FB have to do with things? Are you referring to Colin's previous analysis, which I also didn't get? Sep 08, 2011 18:32 PM Dave, what confuses you? Sep 08, 2011 19:16 PM studes (280) Last comment, I promise. When you say this... Sep 08, 2011 18:41 PM Maybe we're referring to different things here. Now I'm confused. The graph in the article shows the relationship between IF% and FB%  which is I thought what you were talking about. League average IF% is roughly 7%, and as FB% goes up, so does IF%. "The graph shows that relationship." Sep 08, 2011 19:19 PM studes (280) I'm going to attempt to recap what I think I found here. Sorry for taking up the space, but I'd be interested if you think I got it right. Sep 09, 2011 08:13 AM studes (280) I need an editor. This... Sep 09, 2011 08:17 AM studes (280) BTW, now that I've wasted your afternoon with many comments, I should say nice job, Derek. Regressing against pitchers with similar flyball rates is an elegant solution. Sep 09, 2011 10:52 AM Not a subscriber? Sign up today!

I think this series is great but also misses the point. Are we not trying to identify those pitchers that xFIP will routinely underestimate/overestimate due to a unique skill  ability to minimize flyball distance? Or, pitchers whose skill is not accurately reflected in the amount of flyballs they cede but the distance of those flyballs. Infield Fly balls is an aid in measuring the distance of those flyballs and therefore what is really relevant is the IFF/FB (just looking at the list, the better pitchers have better IFF/FB rates than just IFF rates).
But as you showed, that alone does not have much predictable value. Would it not be more effective if you combined it with HR/FB? A pitcher with a low HR/FB and high IFF/FB should indicate a pitcher with a skill for minimizing flyball distance therefore to be underestimated by xFIP. Whereas a pitcher a with a high HR/FB and low IFF/FB should indicate a pitcher with the reverse skill and therefore likely to be overestimated by xFIP. And for the pitchers in between FB% is good enough.
I'm not sure the ability to induce popups and the "ability to minimize flyball distance" are the same thing. Popups are valuable in and of themselves because they become outs something like 9899% of the time. Minimizing flyball distance (if it indeed is a skill pitcher's can possess to any significant degree) would help in preventing home runs.
"just looking at the list, the better pitchers have better IFF/FB rates than just IFF rates." I can check this. Since 2003, pitchers with xFIPs better than 3.5 have an aggregate 10.3% IF/FB. Everyone else is 10.4%.
"A pitcher with a low HR/FB and high IFF/FB should indicate a pitcher with a skill for minimizing flyball distance therefore to be underestimated by xFIP." I agree in principle that minimizing the distance of fly balls would be an important skill for a pitcher (if it's something they can control to any reasonable degree), but the problem here is that HR/FB is highly unstable. Over a long period of time it will be able to serve as a fair proxy for fly ball distance, but for the kind of samples we usually deal with, it just isn't going to be useful for that purpose. My previous article showed that it takes 10 years worth of data before we can account for just half of the variation in the stat. And that ignores issues of weighting and aging which are huge over a period that long.
I can, however, see your reasoning that a pitcher who induces a lot of infield flies may also induce shorter outfield flies and, therefore, allow fewer homers. As one way of checking this, if I look at pitchers since 2003 with an IF/FB greater than 15%, they post a HR/OF of 11.7 percent. League average over that time has been 11.5 percent. A good thought, but a cursory glance shows there to be no relationship.