BP Comment Quick Links
Vote in the Internet Baseball Awards for a chance at a free copy of Dollar Sign on the Muscle



January 3, 2011 Between The NumbersGroundball Rates in the Minors and MajorsThe idea behind SIERA, as opposed to say FIP, is that the key factors in a pitcher's profile are his walk rate, strikeout rate, and groundball rate. Those factors become the prime shapers of other statistics—hits, home runs, and ultimately runs—that go into a pitcher's value. I've been adjusting the translation process (as applied to pitchers) to reflect those ideas. There are a couple of key points to make about groundball rates and pitcher development. 1. Groundball rates decline as you rise through the minors. A simple look at groundball rates compiled for leagues makes this plain. In 2010, the groundball out percentage—ground outs divided by the sum of ground outs and air outs—was around .55.56 for the shortseason leagues. In the fullseason A leagues, it was more like .53.54, dropping to .52 in DoubleA, to .51 in TripleA, and finally to .50 in the majors. The cause isn't a mystery. Ascent through the minors is also an ascent through age brackets, with each step of the minors averaging roughly a year older than the next level down. These older players have “filled out” their bodies more—not the way mine has filled through my 40s, but gaining in upper body strength. More strength leads to more ability to hit home runs, which leads to altering your swing to take advantage of that increased ability, so there are more uppercuts and more fly balls. A pitcher who does nothing but keep pace with his league will lose 56 points off his groundball rate as he climbs the minorleague ladder. 2. Even relative to their leagues, groundball rates decline as you rise through the minors. One of my favorite tools to pull out is a script that will compare the stats of every player who meets a condition—say, someone who played in TripleA one year and in the majors the next year. This program will pull all the stats for all the players that meet the criterion, scale them to the lesser of the two plate appearances, and then sum them. This gives me a dataset where each pitcher has the same weight toward the total, the total plate appearances are identical, and allows a comparison for change between the leagues. I can run this with real stats, translated stats, or what I call nodif stats—these are translations where I don't make any adjustment for the league difficulty, but only worry about resetting the offensive context to a standard value. If I were to run a nodif set for players who go from TripleA to the majors in the same year, I'd get results like this:
Note: All pitchers from 20052010; we don't have good playbyplay info for the minor leagues before 2005. Strikeout rates are calibrated to a standard value where league average equals 6.0, walk rates equal 3.0, ERA equals 4.50, hits are 9.0, home runs 1.0, and groundball rates equal 50 percent. It should come as no surprise to find that strikeout rates fall and walk rates rise as you move to the majors, and that those numbers should move from a player who was above average for his league (or else why would he get promoted?) to one who is below average in the majors... but it is not so obvious that groundball rates would behave the same way. Remember, these have been normalized to the league—this drop in ground balls is over and above the drop that comes from the overall league average falling. Pitchers who get promoted from TripleA tend to have groundball rate above 50 percent in TripleA, and become below average after promotion to the majors. This pattern asserts itself repeatedly, across different levels and across differences in years:
*i.e., TripleA in 2008 and majors in 2009 **i.e., TripleA in 2008 and majors in 2010 The difference is always present; it always results in a score above league average to one that is below league average (and so cannot be written off as simple regression to the mean). For equallytimed transitions, the effect gets larger as the difficulty gap between leagues increases—that DoubleA players take a bigger hit going to the majors than TripleA, Aball bigger still, etc. This is exactly the way strikeout or walk rates behave; it needs to be accounted for in the translation process in a manner similar to the way strikeouts are handled. Just to emphasize those points, it also works in reverse:
Players who are demoted from the majors to TripleA in the next year tend to be pitchers whose groundball rates were below average, and they improve those numbers in TripleA. If we look at transitions from the majors from one year to the next, what do we see?
Those are for the same 20052010 as the minorleague data already shown; here it is for the much broader 19542010 time frame we have available for the major leagues:
These numbers do look like regression to the mean, albeit with a clear preference for starting with pitchers with aboveaverage groundball rates. There is also an apparent trend toward longevity, with longer "survival" times in the majors equaling higher groundball rates. That is a preference that will also show up if I look at pitchers who, for whatever reason, did not pitch in the majors in the following years:
Pitchers who are headed out of the league have belowaverage rates, and the lower they are the sooner they'll be gone. So the reason to prefer minorleague pitchers with high groundball rates, first noted by Nate Silver several years ago, is getting a little clearer. Teams have a preference for pitchers with high groundball rates at the majorleague level, for good reasons that I'll examine at a later time. That isn't to say that flyball pitchers don't exist—there are plenty of pitchers in the majors with consistently low strikeout rates or high walk rates, but they have to be able to make up for it in other areas. But groundball rates, while they are highly correlated from one year to the next, are not conserved as pitchers face increasingly better hitters while rising through the minors; the rates go down. And the only way to end up with an average majorleague pitcher, in this measure, is to start from a level that is high enough to weather that erosion. Note: When I refer to groundball rates throughout this article, I am actually referring to groundball outs—times when a man batted, hit a ground ball, and either he or another runner was put out before reaching the next base. Air outs refer to any batted ball, be it a line, fly, or pop, which is caught. Like Colin, I am deeply suspicious and distrustful of the distinctions between flies and line drives in particular in the stats we have—there is a large middle ground between them, and scorers for whatever reason have clear, consistent, and persistent variations between themselves as to which side they call it. There is also a large amount of the Retrosheet playbyplay data where that information is not known for hits. The groundout and airout data has the advantage of being precisely known for a much wider set of data, and being almost completely free of scorer biases. With the ground/fly data known, the correlation between full ground/fly data and ground/airout data is on the order of 90 percent. 2 comments have been left for this article.

Where/when would we see the results? (Presumably not in the Annual's printed projections of Julio Teheran, for example, right?)
What would be the difference in major league projected SIERA for a Julio Teheran, for example, before/after the change in the translations? (Sizable?)