Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

Let’s talk about batted balls.

I’m sure we’re all familiar with the category labels that we use to describe batted balls—ground balls, line drives, fly balls, and popups. Precise definitions vary, but David Cortesi gives a succinct set of criteria:

A ground ball is a batted ball that touches the ground short of the outfield grass. The line drive, the fly ball and the popup are all balls that are hit into the air and are caught before they hit the ground, or if they aren’t caught, fall to earth in the outfield.

As I have discussed previously, there is evidence for park biases in the way batted balls are assigned these category labels. The question becomes—what do we do about it? It turns out that sabermetricians have a handy tool to use in handling these issues—park factors. But how best to apply this tool to the problem at hand?

Prior Art

There have been efforts to park adjust batted ball rates in the past, of course, and it would be remiss of me not to acknowledge them.

There may be others, but I can’t locate them—if you know of any, please drop me a line or leave a comment. 

So, what do we know about batted balls, and how can we use that in the construction of park factors?

Hold Steady

The key thing to keep in mind is that when we say that a park causes more ground balls—either in fact or in the perception of the scorer—there has to be less of something else created. The total number of events is fixed. And when looking at rates of batted ball types, there are certain constraints as to what the can be less of—a higher grounder per batted-ball rate doesn’t directly affect the number of strikeouts or walks. (And if we’re talking about a scorer bias effect, rather than an atmospheric effect on the type of batted balls, there isn’t even an indirect effect.)

So if we see a park effect creating more ground balls, those ground balls have to be coming at the expense of other batted balls. The epiphany that I had is that most of this “theft” has to be coming from the most adjacent batted ball type.

Think about it—for any particular batted ball that is “borderline,” there are two categories it can be placed in. And the park effects have to act primarily upon these borderline batted balls, don’t they? It doesn’t matter if it’s scorer bias caused by parallax or an actual change of the trajectory of the batted ball due to atmospheric effects.

The Method

Here’s how I did the park adjustments—all numbers for illustrative purposes. First, I calculated the ground-ball rate (per batted ball) for a team and its opponents, both home and away, each year from 2003-09. (This includes the batting and pitching side for each team.) Each of those rates were then regressed to the mean, to try and cut out the effect of random variance.

Taking regressed home GB rate over regressed road GB rate gives you a one-season park factor. Taking a three-year average of those gives us three-year, regressed park factors.

So let’s say a team has 2,200 ground balls at home, cumulative, and a park factor of 1.05. We take:

2000 – 2000/1.05 = 95

That’s 95 ground balls more than the team “should” or “would” have hit in a neutral park. What do we do with those 95 ground balls? We add them to line drives, to get our first set of adjusted line drives.

Now, we start the process over again. We take our ground ball adjusted line drives to figure home LD rate (this time, looking only at air balls, or batted balls minus ground balls), and regress that. We also regress the observed road LD rate. From there, we derive a park factor for the ground ball adjusted liners.

Now, we adjust line drives a second time. Say we have 900 LDs, after adjusting for GB rate, and a LD park factor of 0.90. We take:

900 – 900/0.90 = -100

So we subtract 100 fly balls from our team totals.

The process repeats one more time, as we adjust fly balls per fly balls plus popups.

To give you a sense of what these park factors look like, the complete 2003 set:

YEAR_ID

HOME_TEAM_ID

 

 

 

 

GB_PF

 

 

 

 

LD_PF

FB_PF

2003

ANA

1.00

1.01

0.99

2003

ARI

0.96

0.98

1.02

2003

ATL

0.98

0.92

1.00

2003

BAL

0.97

0.95

1.00

2003

BOS

1.01

0.84

1.02

2003

CHA

0.97

0.95

0.97

2003

CHN

1.00

0.92

0.99

2003

CIN

0.95

1.16

0.99

2003

CLE

1.09

1.15

1.02

2003

COL

0.95

1.01

1.02

2003

DET

1.01

1.03

0.99

2003

FLO

0.98

0.87

0.98

2003

HOU

1.06

0.90

1.00

2003

KCA

1.00

1.07

1.00

2003

LAN

1.02

1.09

0.98

2003

MIL

0.98

1.09

0.98

2003

MIN

1.03

0.77

1.01

2003

MON

1.06

1.14

1.02

2003

NYA

1.00

0.96

1.00

2003

NYN

1.03

0.94

0.99

2003

OAK

0.99

1.07

0.98

2003

PHI

0.99

1.09

1.00

2003

PIT

0.99

0.90

1.01

2003

SDN

1.06

1.04

1.02

2003

SEA

0.97

0.91

0.97

2003

SFN

1.03

1.19

1.01

2003

SLN

0.98

1.08

1.00

2003

TBA

0.95

0.90

0.98

2003

TEX

0.99

1.14

1.00

2003

TOR

1.02

1.03

1.03

The “line-drive” factors (which is really a misnomer, since they’re a far greater park of our adjustment of fly balls than the factor I’m calling “FB” here) has the greatest varability—in other words, the fly ball-line drive distinction is the one most subject to variability. That isn’t to say that the ground ball-line drive boundary is “stable,” or at least as stable as we may have thought. (To present all years here would take an egregious amount of space; the full set of park factors is available here.)

The Next Step

The trick is that we’ve park-adjusted the batted balls, but we haven’t park-adjusted the batted-ball outcomes. Say we want to take this and apply it to ground-ball BABIP—how would we go about doing that? I don’t know.

Let’s say, again, we know that there are 90 balls that “shift” from GB to LD when we do our park adjustment. The question is, how many of those are hits?

Well, since we know those are “really” line drives, if all else is equal, we know that line drives are more likely to be a hit than grounders, so they’re likely to have a higher hit rate than your typical GB (but perhaps lower than your typical LD).

But—is all else equal? In other words, is a scorer as likely to have trouble scoring a batted ball if it’s a hit or an out? Or does the very act of catching a ball affect its scoring?

Consider—for the ground-ball/line-drive boundary, what matters is where the ball lands—or in the case of a ball that is caught before it lands, where the ball would have landed if not acted upon by the fielder.

 So you’re essentially presenting the scorer with two different tasks, depending on if the batted ball was a hit or out. So my supposition is that you will see a disproportionate amount of outs among these borderline batted balls. But that’s a supposition only—I can’t tell you how many there would be.  

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
baseballben
6/23
"It doesn’t matter if it’s scorer bias ... or an actual change of the trajectory of the batted ball due to atmospheric effects."

Nice. I can agree with that, Colin. No matter how much (or how little) scorer bias exists, the net park factor is real. It's next to impossible to isolate one from the other with just one data source.

Seeing Colorado near the top almost annually suggests there's a lot of natural park influence in there.

-Ben