Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

Yesterday, Jason Collette penned an article about infield flies that ties in with a discussion I’ve been having over the past few weeks with one of my former writers at The Hardball Times Fantasy, Jeff Gross, and one of his readers, Alex Hambrick. Additionally, BP readers JoshC77 and kcshankd wondered in the comments section of Jason’s article whether the ability to induce infield flies was a repeatable skill for pitchers. Today, I thought I’d try to answer that question and present some of the research I’ve conducted in my conversations with Jeff and Alex.

Earlier this year, I wrote an article entitled “When Pitchers' Stats Stabilize,” in which I looked at how “stable” (or how “repeatable,” in terms of being a “skill”) a number of stats were—infield flies among them. In the article, I found that infield fly balls, as a rate of total batted balls, took roughly 0.6 years to “stabilize.” In other words, this initial research suggested that infield fly-ball rate was indeed a very repeatable skill. But in my discussions with Alex, I’ve come to suspect this may not necessarily be true—or at least not in the way my previous research suggests.

In one e-mail, Alex wrote:

I have known for quite some time that IFFB is a strong function of FB, much like HR is a strong function of FB for pitchers. (The league average IFFB/FB hovers around 10%). In other words, you can, with reasonable accuracy, predict IFFB by simply multiplying FB by .10. (.10*FB correlates to IFFB with an r^2 of .70).

While we’ve certainly known that there is a correlation between total fly balls and infield fly balls (due in large part to the vertical location and movement of a pitcher’s pitches, forcing batters to put the ball in the air), this presented an interesting question: “Is infield fly rate a skill above and beyond a pitcher’s skill in simply allowing fly balls in general?”

To help answer this question, I decided to pull out my old friend, the split-half correlation (if you’re interested, you can read all about the methodology in the “When Pitchers' Stats Stabilize” article). What I’ve done is run two separate analyses. In the first, I’ve run a split-half correlation using infield flies per contacted balls (henceforth referred to as IF FB%) in both halves. In the second, I put IF FB% in one half and put all flies multiplied by .07 (roughly seven percent of flies are of the infield variety) per contacted balls (henceforth referred to as FB*.07) in the other half. So essentially, I’m first correlating IF FB% with itself and then correlating it with fly-ball rate. This gives us the following results:

Stat

Denominator

Correlated With

Stabilizes

Years

IF FB

GB+OF+IF+LD

IF FB%

288

0.6

FB*.07

GB+OF+IF+LD

IF FB%

216

0.5

That’s very interesting. While the differences are small, IF FB% actually seems to be less stable than simply using a league-average percentage of infield flies per total flies. Since multiplying by .07 doesn’t affect the correlation (I’ve just included it to better illustrated the point that this is an estimate of infield flies), what this essentially tells us is that fly-ball percentage better predicts infield-fly percentage than it can predict itself.

Let’s check out one more way of looking at infield flies. We know that total flies stabilize very quickly, so perhaps infield flies per total flies (henceforth referred to as IF/FB) will prove useful.

Stat

Denominator

Stabilizes

Years

OF+IF

GB+OF+IF+LD

109

0.2

IF FB

OF+IF

414

2.5

Nope. While the percentage of total flies (FB%) stabilizes extremely quickly (as quickly as groundballs do, for what it’s worth), it takes the average pitcher two and a half years for his IF/FB rate to stabilize. That means we can throw IF/FB rate out the window entirely—we’d be far better off using either IF% or FB*.07.

We could stop here and have learned something useful, but we can do a little better yet. What these split-half correlation tests give us is the point at which the stat produces an R of 0.50. Using this data, we can create a regression to the mean equation and estimate a player’s true talent level. While the easiest mean to regress to is always the league average, it’s rarely the best. As Alex posited, and as has been known for a while now, fly-ball rate and infield-fly rate are correlated. As such, we can reexamine this relationship and then use our results as the mean we regress to.

(The above graph includes all pitcher seasons with at least 100 innings pitched in a year from 2005 to 2010)

As you can see, the relationship is very strong (r-squared of 0.68). Pitchers who give up a lot of total flies also manage to induce a lot of infield flies. The most extreme fly-ball pitchers actually manage to convert nearly 10 percent of all contacted balls into popups. So instead of regressing each pitcher to league-average fly-ball rate, we can regress each pitcher to his own unique rate based upon this relationship.

Now we get to the fun part: applying all of this to actual players. Ideally, we’d use multiple years, incorporate aging, weighting, etc., but I’m just going to do it simply. What I’ve done is used a pitcher’s actual, unregressed 2011 fly-ball rate to create his personal mean based on the formula above. I’ve then regressed his 2011 infield-fly rate onto this mean based upon the split-half correlation tests we ran at the beginning of the article. When I do this and focus on all pitchers who made at least 10 starts this season, here are the pitchers with the best infield fly ball “true talent” levels (rIF%):

Pitcher

FB%

rIF%

Guillermo Moscoso

58%

15.0%

Jered Weaver

50%

14.8%

Jeremy Hellickson

47%

13.2%

Josh Collmenter

48%

12.5%

Brian Matusz

54%

12.4%

Alexi Ogando

44%

12.1%

Rich Harden

49%

11.8%

Brandon Beachy

48%

11.6%

Ted Lilly

46%

11.3%

Colby Lewis

50%

11.2%

Scott Baker

46%

11.0%

Michael Pineda

44%

11.0%

Chris Tillman

43%

10.9%

Shaun Marcum

44%

10.6%

Brett Cecil

46%

10.5%

Clayton Kershaw

39%

10.3%

Phil Hughes

43%

10.2%

That’s an interesting name at the top of the list. When we think about top pitchers, Guillermo Moscoso isn’t the first guy that comes to mind. While his strikeout and walk rates are underwhelming, he’s an extreme fly-ball pitcher, which will allow him to induce a lot of popups. There are some other not-so-terrific players on this list (Collmenter, Matusz, Tillman, Cecil, Hughes), but since they all have posted high fly rates, they’ll at least be useful in terms of inducing popups and keeping a lower-than-normal BABIP. Jered Weaver—sort of the poster boy for using infield flies to beat his FIP—ranks second on the list, and long-time pop-up artist Clayton Kershaw also ranks highly.

Now let’s take a look at our trailers:

Pitcher

FB%

rIF%

Derek Lowe

22%

2.7%

Charlie Morton

19%

3.2%

Chris Volstad

26%

3.5%

Jake Westbrook

22%

3.6%

Trevor Cahill

25%

3.7%

Jason Marquis

25%

3.9%

Zach Britton

28%

4.1%

Ivan Nova

28%

4.1%

John Lannan

24%

4.2%

Dontrelle Willis

24%

4.2%

Ricky Romero

30%

4.3%

Zack Greinke

29%

4.3%

Fausto Carmona

24%

4.3%

Nick Blackburn

27%

4.3%

Edinson Volquez

29%

4.5%

Jaime Garcia

26%

4.6%

Aaron Cook

24%

4.7%

Joel Pineiro

32%

4.7%

Extreme ground-ball pitchers rule this list. Notorious sinkerballer Derek Lowe trails everyone in terms of regressed infield-fly rate, followed by 2011 breakout pitcher Charlie Morton. Romero and Greinke have been excellent this season in terms of strikeouts and walks, but as ground-ball pitchers, they shouldn’t be expected to induce many popups. They remain elite pitchers, though, since these kinds of pitchers can afford to give up a few more hits as they allow fewer homers.  

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
nicholj
9/08
I think this series is great but also misses the point. Are we not trying to identify those pitchers that xFIP will routinely underestimate/overestimate due to a unique skill - ability to minimize flyball distance? Or, pitchers whose skill is not accurately reflected in the amount of flyballs they cede but the distance of those flyballs. Infield Fly balls is an aid in measuring the distance of those flyballs and therefore what is really relevant is the IFF/FB (just looking at the list, the better pitchers have better IFF/FB rates than just IFF rates).

But as you showed, that alone does not have much predictable value. Would it not be more effective if you combined it with HR/FB? A pitcher with a low HR/FB and high IFF/FB should indicate a pitcher with a skill for minimizing flyball distance therefore to be underestimated by xFIP. Whereas a pitcher a with a high HR/FB and low IFF/FB should indicate a pitcher with the reverse skill and therefore likely to be overestimated by xFIP. And for the pitchers in between FB% is good enough.
derekcarty
9/08
I'm not sure the ability to induce popups and the "ability to minimize flyball distance" are the same thing. Popups are valuable in and of themselves because they become outs something like 98-99% of the time. Minimizing flyball distance (if it indeed is a skill pitcher's can possess to any significant degree) would help in preventing home runs.

"just looking at the list, the better pitchers have better IFF/FB rates than just IFF rates." I can check this. Since 2003, pitchers with xFIPs better than 3.5 have an aggregate 10.3% IF/FB. Everyone else is 10.4%.

"A pitcher with a low HR/FB and high IFF/FB should indicate a pitcher with a skill for minimizing flyball distance therefore to be underestimated by xFIP." I agree in principle that minimizing the distance of fly balls would be an important skill for a pitcher (if it's something they can control to any reasonable degree), but the problem here is that HR/FB is highly unstable. Over a long period of time it will be able to serve as a fair proxy for fly ball distance, but for the kind of samples we usually deal with, it just isn't going to be useful for that purpose. My previous article showed that it takes 10 years worth of data before we can account for just half of the variation in the stat. And that ignores issues of weighting and aging which are huge over a period that long.

I can, however, see your reasoning that a pitcher who induces a lot of infield flies may also induce shorter outfield flies and, therefore, allow fewer homers. As one way of checking this, if I look at pitchers since 2003 with an IF/FB greater than 15%, they post a HR/OF of 11.7 percent. League average over that time has been 11.5 percent. A good thought, but a cursory glance shows there to be no relationship.
studes
9/08
Apologies for not knowing the answer to this, but how is infield fly defined? I believe you use the MLBAM version, which is the same as Pitchf/x? Is that right? How do they define it?
derekcarty
9/08
Yes, Dave, I used the MLBAM classifications, which are what's displayed in the PITCHf/x feeds. Subjectively is how they're defined :)

Actually, I wasn't exactly sure, so I asked Mike Fast. He wasn't 100% sure either, but said that any ball classified as a pop-up and was caught was fielded by an infielder. So basically any ball fielded by an infielder, it becomes a subjective judgment about how high the ball went. There were a few pop-up singles to outfielders, though. So I guess we're not entirely sure what directions, if any, stringers are given. Maybe a good question for someone at MLBAM, like a Cory Schwartz. Or a stringer.
studes
9/08
Thanks, Derek. That's what I thought. One observation is that BIS is more rigorous about identifying infield flies. It doesn't depend who caught the ball--it's based on where the ball lands. I know Colin has issues with how well they measure it, but I'd have more faith in the BIS classifications being meaningful than MLBAM's.

Don't know if that would affect your analysis at all--just an observation.
mikefast
9/08
I have not, generally speaking, been able to see a quality difference between BIS and MLBAM data. I would say that for popup data, too, to the extent that I've compared the two in that. I would also say that the MLBAM definition makes a lot more baseball sense to me than the BIS definition, even if the BIS definition is somewhat more carefully applied (and I wouldn't actually assume that it is).

studes
9/09
Why would you say that, Mike. Cataloging flies according to the person who caught them is dependent on the range of the infielders in question. Why would that make more "baseball sense" than where the ball actually lands?
mikefast
9/09
Because a little bloop or a sky-high, well, I don't know what you'd call it besides a pop-up, that happens to land or be caught a few feet on the outfield grass does not seem to me to be like a high can of corn out to an outfielder. It seems much more like a little bloop or a sky-high pop up that is caught on the infield dirt.

The MLBAM definition is not really about who caught it, it's about distance from the plate, and 160-180 feet or so from the plate makes more sense to me as a popup boundary than the varying 127.6-155.5 feet boundary that BIS uses.

Ultimately, of course, I think a lot of things about the GB-LD-FB-PU division are screwy, but the MLBAM definition just seems a little less screwy here than the BIS definition.
mikefast
9/09
I should clarify that my understanding is that the BIS definition is that anything on the outfield grass is an outfield fly (or line drive, of course). If that's not right, please correct me.
studes
9/09
I'm confused. Derek said that any fly caught by an infielder is an infield fly. You're saying that MLBAM uses distance instead? That's exactly what BIS does too (and they don't just use the infield parameters).
studes
9/09
That is, they don't use "outfield grass," which obviously can't be used in some ballparks anyway.
mikefast
9/09
I don't have a written guideline for either MLBAM or BIS. What I have is my observation of the data, which is from a fairly large pool for MLBAM (several seasons) and a smaller pool for BIS (partial season).

What I see is that, leaving line drives aside here, MLBAM basically codes anything in the air that is of a depth that could be reasonably caught by an infielder at some position to be a popup. It's not about whether the infielder actually catches it or lets it fall (ideally). My impression is that MLBAM drew a line that was basically along the boundary where an outfielder racing in and an infielder racing back would meet.

On the other hand, I have seen balls just on the outfield grass be coded by BIS as outfield flies. Because I have much less BIS data to go on, I don't have as firm of an impression as to where or how they drew the boundary. I had thought it was at the edge of the outfield grass (or the equivalent line painted on the turf), but you seem to indicate that's not the case, so I don't know.
studes
9/09
You know, I think THT started this whole infield fly thing when we requested infield fly data from BIS back in 2005. We thought it was a new angle that would be interesting. Guess that's why I have such a strong interest in it.

They mark infield flies based on distance from home plate. The latest I heard is 140 to 150 feet.
studes
9/08
Quick question #2: other studies have found that IF/FB goes up as FB% goes up. It appears that your graph might support that theory too. Any thoughts?
derekcarty
9/08
Yes, I'd agree with that. I ran those numbers when putting the article together too but they didn't make it in. The relationship was weak, but it was there, at least in whatever cursory tests I ran. 0.05 r-squared, significant at the 1% level IIRC. I think at the extremes, it ended up being something like the most extreme fly ball pitchers would be expected to post an 8.5% IF/FB and the most extreme non-fly ball pitchers would be 5.5% (league average is about 7%).
derekcarty
9/08
Actually, I think I'm remembering something different there. The r-squared was actually 0.21, super significant p-value.

League average is about 21%, the most extreme are 28%, the less extreme are 15%.
studes
9/08
Wow. That is significant. I'm having problems, then, reconciling that fact with the notion of just using straight flyball rates as a proxy for infield fly rates. Wouldn't the rate (the 7% you used in your first table) go up as the overall flyball rate goes up?

I have to admit that I'm still a fan of IF/OF, though I can't adequately explain why. It feels like it captures a nuance that IF/Contact misses.
derekcarty
9/08
Well, yes, it's certainly significant. But I think it comes down to what's *more* significant. If I run a correlation using the same set of pitchers (I should have mentioned before it's all pitchers from 2005-2011 with 100 IP in a season) but using IF/CON and FB/CON, the r-squared is 0.68 - much more significant. It's not as if we can't predict infield flies using IF/FB, it's just that there are better ways to do it. I mean, 2.5 years isn't insignificant. HR/FB is close to 10 years, for the sake of comparison. It's just not optimal.

"Wouldn't the rate (the 7% you used in your first table) go up as the overall flyball rate goes up." Yes, it does go up. The graph shows that relationship. Those with a 15% FB% will have about a 2% IF% and those with a 60% FB% will have about a 15% IF%.
studes
9/09
I guess when you throw off r-squareds like that, I don't know what you're saying. I often have that problem with BPro articles (and some of my own too!). What is the point of that correlation?

Also, is correlating something the only point? How about having a better handle on what makes pitchers unique and interesting?
derekcarty
9/09
Well, of course it depends on what you want to get out of the data. I think we have two different goals. As such, either way is viable, but if we're looking for accuracy (what I'm looking for and what really matters for fantasy), then one seems to be preferable.

Not sure where the r-squared confusion comes from. If we're testing for accuracy, the r-squareds provide evidence that one way is more accurate than the other.
studes
9/09
I get confused when people start flipping off r-squareds as to what is being correlated with what, and how that relates to other correlations.

For instance, you say that your r-squared of .68 is "much more significant." But than what? The .21? But those were two different regressions--one regressing IF/FB rate vs FB/Contact rate and the other comparing the IF/Contact rate in two halves. Why compare them?
derekcarty
9/09
Sorry, I should clarify. I wasn't referring to the split-half correlation. The 0.21 comes from IF/FB and FB%, which you originally asked about. The 0.68 comes from IF% and FB%, which I advocated in my article, but which I ran a separate correlation for in the comments so that it could be compared to the first one I did in response to your comment.
studes
9/09
Got it. Thanks. It makes sense to me that IF% and FB% would be correlated, since IFs are included in FBs. At least, I think it makes sense. :)

So, could you improve your model by varying the 7%, according to the pitcher's FB%? Or would that not be worth it?
derekcarty
9/09
Yes, that's probably a big reason for the correlation (though the IF/FB-to-FB% connection plays a part too, plus any additional skill that may be getting captured).

Yes, this is what I did in the second half of the article. Based upon the correlation I ran between IF% and FB% (the equation for which is shown in the graph), I created each pitcher's own, unique, flyball-based IF% mean. That's what I regressed their actual IF% to.
studes
9/09
Okay, I think I see. By using a linear formula that isn't based on zero, you're approximating what might be a curvilinear relationship.

So, how quickly does this formula stabilize? Is it an improvement over the straight 7%?
derekcarty
9/10
Not sure what you mean by curvilinear. It seems pretty linear to me, but yeah, it should be an improvement to the straight 7%.
studes
9/09
By the way, I am just totally confused about these points. What does HR/FB have to do with things? Are you referring to Colin's previous analysis, which I also didn't get?

Should I just give up on this stuff?
derekcarty
9/09
Dave, what confuses you?

The HR/FB thing was just mentioned as a point of reference, to say that the "stabilization point" for IF/FB is pretty modest, that IF/FB does have a fair amount of predictive value in terms of predicting IF FBs. Everyone knows that HR/FB is incredibly unstable, so I listed that to show that IF/FB isn't anywhere near that and does have some utility.
studes
9/09
Oh, okay. Thanks for the clarification. I thought you were referring to the HR/OF vs. HR/Contact analysis Colin had a couple of months ago. I left several confused comments in that one, too.
studes
9/09
Last comment, I promise. When you say this...

"Yes, it does go up. The graph shows that relationship."

...are you saying that you didn't use a standard 7% of flyballs in your first table? You increased it as the pitcher's flyball rate increased?
derekcarty
9/09
Maybe we're referring to different things here. Now I'm confused. The graph in the article shows the relationship between IF% and FB% - which is I thought what you were talking about. League average IF% is roughly 7%, and as FB% goes up, so does IF%. "The graph shows that relationship."

Were you referring to something different?
derekcarty
9/09
In the first table, I used FB*.07 for everything, regardless of FB%.
studes
9/09
OK, I think we weren't quite addressing each other. That helps clarify. Thanks.
derekcarty
9/08
I think it's almost akin to saying "yes, we can predict ERA with itself with reasonable accuracy, but we can predict it better with FIP."
studes
9/09
I'm going to attempt to recap what I think I found here. Sorry for taking up the space, but I'd be interested if you think I got it right.

FB/BIP "stabilizes" in about 100 BIP, while IF/BIP "stabilizes" in 288 BIP. I assume this is almost entirely due to the fact that there are many less FB than IF in a group of BIP.

0.07*FB/BIP "stabilizes" against IF/BIP more quickly than IF/BIP itself does (216 BIP). I assume it has something to do with the relative frequency of FB and the tight relationship between FB and IF. If you take anything that has a strong relationship with another thing--and that second thing happens a lot more often than the first--you will naturally have an equation that "stabilizes" more quickly. This is probably natural mathematics.

IF/FB takes longer to stabilize (414 flyballs). I assume this is due to two things. One is again the low rate of infield flies, but exacerbated by the fact that we don't know the pitcher's flyball rate. The rate at which flyballs are actually infield flies will partially depend on how many flyballs that pitcher gives up per ball in play. I wonder how quickly IF/FB would stabilize if you included the pitcher's flyball tendencies?

So, if you want to predict future flyballs, you need to base your calculations off something that includes "information" about his contact rate and his flyball rate. Due to the correlation between infield flies and all flies, previous IF rate does that, but previous FB rate stabilizes more quickly because there are a lot more of them.

Does this make sense to people?
studes
9/09
I need an editor. This...

FB/BIP "stabilizes" in about 100 BIP, while IF/BIP "stabilizes" in 288 BIP. I assume this is almost entirely due to the fact that there are many less FB than IF in a group of BIP.

should say this...

FB/BIP "stabilizes" in about 100 BIP, while IF/BIP "stabilizes" in 288 BIP. I assume this is almost entirely due to the fact that there are many less IF than FB in a group of BIP.
derekcarty
9/10
I think that's probably about right, Dave.
studes
9/09
BTW, now that I've wasted your afternoon with many comments, I should say nice job, Derek. Regressing against pitchers with similar flyball rates is an elegant solution.
derekcarty
9/10
Thanks, and no worries about wasting my time. I'm happy to respond to comments.