Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

Last week, ESPN's Rob Neyer picked up on my series regarding this year's dip in scoring and increase in strikeout rates. Echoing a familiar refrain from the comments to both pieces that I received—both online and via the airwaves, where I've discussed the two articles at least half a dozen times—Rob suggested that one reason for both trends may be that teams are focusing more on defense than they have in previous years.

Anecdotally, this certainly rings a bell. In the wake of Moneyball a few years ago, we all heard about A's general manager Billy Beane shifting attention from on-base percentage and plate discipline to defense because that's where the new inefficiencies were in the market for talent. Two years ago, the Rays made a stunning turnaround thanks to an historic 54-point improvement in their Defensive Efficiency. Last year, the Rangers followed that template and emerged as contenders thanks to a 29-point turnaround in that department. Over the winter, GMs of both the Mariners (Jack Zduriencik) and the Red Sox (Theo Epstein) emphasized defense in their off-season plans. Apparently, all the cool kids are doing it.

If it's the case that teams are placing an increasing emphasis on defense, the first place we'd expect this to show up isn't in errors, which are entirely based upon the subjective decisions of official scorers, or FRAA, UZR, or some other new-fangled defensive metric which has its own issues regarding subjectivity. No, the first place we'd expect a change in defensive performance to show up is in the frequency with which they convert batted balls into outs, either via batting average on balls in play (BABIP) or Defensive Efficiency (DE). While the five aforementioned teams rank among the top six in the current AL Defensive Efficiency rankings, the evidence that such a shift in philosophy is driving scoring rates down this year is faint at best:

Year

R/G

Change

BABIP

DE

1990

4.26

 

.2867

.7012

1991

4.31

1.2%

.2849

.7040

1992

4.12

-4.4%

.2849

.7053

1993

4.60

11.7%

.2939

.6940

1994

4.92

7.1%

.2998

.6884

1995

4.85

-1.5%

.2984

.6908

1996

5.04

3.9%

.3014

.6866

1997

4.77

-5.3%

.3013

.6875

1998

4.79

0.5%

.2997

.6893

1999

5.08

6.1%

.3016

.6868

2000

5.14

1.1%

.3002

.6878

2001

4.78

-7.1%

.2960

.6924

2002

4.62

-3.3%

.2928

.6962

2003

4.73

2.4%

.2942

.6950

2004

4.81

1.8%

.2973

.6922

2005

4.59

-4.6%

.2953

.6947

2006

4.86

5.8%

.3013

.6886

2007

4.80

-1.3%

.3030

.6872

2008

4.65

-3.0%

.3000

.6902

2009

4.61

-0.8%

.2993

.6913

2010

4.45

-3.5%

.2990

.6910

Though it takes a hike out to the fourth decimal to find it, this year's BABIP and DE were actually both down from 2009 levels through Sunday, which is happening because the rates of hitters reaching on error are actually higher than last year, rising from 0.93 percent of all plate appearances to 1.04 percent. Not exactly what we'd expect if defense were actually being, y'know, emphasized. Shifting our gaze to recent seasons, the year-to-year changes in BABIP and DE don't line up tremendously well with the recent changes in scoring rates. The DE for all 30 teams was four points higher in 2003, the year Moneyball was published, yet scoring was about three-tenths of a run higher. The same can more or less be said for the entire 2001-05 span.

To put it another way, in the grand scheme, DEs and BABIPs correlate very well with scoring levels over the full range of years shown above, at -.87 and .82, respectively, while for strikeouts it's just .38. But if we narrow our focus to ditch the pre-expansion, pre-strike, pre-juiced ball years and use only complete seasons, the relationship becomes less clear. For the range 1996-2009, the correlations fall to -.71 and .54; for the range 2004 (the year after Moneyball was published) to 2009, it's -.62 and .52.

But that's not the end of the story. As Rob astutely pointed out, "If you're selecting for defense, you're taking runs away from your opponents and you're taking runs away from yourself. It's a two-fer." Which again makes sense, as there are plenty of slick-fielding glove men out there who can't hit their hat size. Think of the lineup hit the Mariners took for employing Casey Kotchman (.208 TAv), or the Astros for Pedro Feliz (.209), or the Orioles for Cesar Izturis (.197), not to mention plenty of other players who are above replacement level but carry bats a bit light for their respective positions because they're perceived as top-notch defenders.

If teams are actually choosing glove men more often, we might expect to see the relative level of offense supplied by the more bat-friendly positions (first base, third base, left field and right field) dropping relative to the level supplied by the more glove-friendly positions (catcher, second base, shortstop and center field) over time. Checking in on things like BABIP, home run and strikeout rates, we find that the two classes of players have paralleled each other in rather striking fashion over the past two decades:

Putting all three measures on the same scale, you can see how relatively little such measures have varied. The offense-first positions have shown higher strikeout and homer rates than the defense-first positions in every single year, and only three times did they fall behind in BABIP, twice by less than one point, and not at all since 2003.

Looking at even a crude aggregate measure of offense such as OPS is more helpful, as is another stat Baseball-Reference.com publishes called tOPS+, which is defined as "the OPS+ of this split relative to the player or team's overall OPS," except that in this case, we're looking at MLB as a whole instead of an individual player or team. The formula for this is 100 * ((split OBP / total OBP) + (split SLG / total SLG) – 1). At first glance, the data is hardly a slam dunk:

Year

Off
tOPS+

Def
tOPS+

dif

Off OPS

Def OPS

dif

R/G

1990

111

97

14

.751

.697

.054

4.26

1991

110

96

14

.743

.693

.050

4.31

1992

110

97

13

.735

.687

.048

4.12

1993

112

96

16

.780

.719

.061

4.60

1994

112

94

18

.811

.736

.075

4.92

1995

114

92

22

.809

.722

.087

4.85

1996

113

93

20

.820

.738

.082

5.04

1997

112

95

17

.805

.736

.069

4.77

1998

113

94

19

.808

.731

.077

4.79

1999

112

95

17

.826

.759

.067

5.08

2000

113

94

19

.835

.760

.075

5.14

2001

115

94

21

.815

.734

.081

4.78

2002

114

94

20

.800

.724

.076

4.62

2003

112

95

17

.803

.736

.067

4.73

2004

112

95

17

.812

.744

.068

4.81

2005

112

96

16

.795

.733

.062

4.59

2006

113

95

18

.820

.748

.072

4.86

2007

111

96

15

.802

.742

.060

4.80

2008

111

96

15

.793

.734

.059

4.65

2009

111

96

15

.794

.735

.059

4.61

2010

111

96

15

.776

.718

.058

4.45

Either in terms of tOPS+ or simply the raw OPS gap between the two classes, the gap is narrower than it was five or 10 years ago but still not as close as it was prior to the sweeping changes which began taking hold in 1993. Over the 21-season range, there's a definite correlation (.69 for tOPS+, .76 for raw OPS), though if we narrow the focus to the post-strike years, the correlations drop to .36 and .45, respectively. Limiting the focus to the post-Moneyball era full seasons (2003-09)—in case any other front offices were taking note, backlash be damned—they climb back to .62 and .72. So perhaps there is something to Rob's hypothesis.

For the purposes of comparison, I've built a table summarizing the correlations between these various elements and run scoring over the course of the 1990-2010 period and the full-season continuum of 2009:

Measure

1990-2010

1996-2009

2004-2009

DE

-0.87

-0.71

-0.62

BABIP

0.82

0.54

0.52

HR

0.88

0.70

0.69

SO

0.38

-0.52

-0.32

TB/H

0.80

0.36

0.52

tOPS+ dif

0.69

0.36

0.65

OPS dif

0.76

0.45

0.74

TB/H is the rate of total bases per hit, the Power Factor which paleo-sabermetrician Eric Walker has invoked as evidence that the ball itself has changed. Note that in just about every case, the correlations are the largest for the samples that include the pre-strike era, the divide across which we've seen most of these rates change substantially.

As it turns out, we can use linear regression on the 21 seasons of data discussed above and in my pieces of recent weeks to construct a fairly accurate model to predict scoring levels. Using just home run rate and Defensive Efficiency, we can build a model which produces a correlation of .93 (an r-squared of .87) and a standard error of 0.10 runs per game. Adding Power Factor to the mix we can get to a correlation of .978 (r-squared of .957) and a standard error of 0.059 runs per game. We can push even it further by adding strikeout rate, to a correlation of .984 (r-squared of .969) and a standard error of 0.052 runs per game. Adding the batman/gloveman differential in either form doesn't advance the cause, however, and actually increases the standard error a hair, so we'll leave it aside.

The nasty-looking but nonetheless robust formula we're left with is:

Runs/Game = 26.98 + HRr * (92.41) + DE * (-27.39) + TB/H * (-2.67) + SOr * (-9.33)

That formula will get you inside of one-tenth of a run per game for any year from 1990 onward, and it's a whole lot closer than that over the last few years:

Year

R/G

Est R/G

Dif

1990

4.26

4.32

0.06

1991

4.31

4.22

-0.09

1992

4.12

4.11

0.00

1993

4.60

4.65

0.05

1994

4.92

4.91

-0.02

1995

4.85

4.78

-0.06

1996

5.04

5.01

-0.03

1997

4.77

4.82

0.05

1998

4.79

4.80

0.01

1999

5.08

5.07

-0.02

2000

5.14

5.06

-0.08

2001

4.78

4.79

0.02

2002

4.62

4.60

-0.02

2003

4.73

4.72

0.00

2004

4.81

4.83

0.02

2005

4.59

4.69

0.09

2006

4.86

4.92

0.07

2007

4.80

4.79

-0.01

2008

4.65

4.64

-0.01

2009

4.61

4.62

0.01

2010

4.45

4.42

-0.03

Not too shabby, eh? What's interesting is that the model suggests that 2010 scoring levels could actually drop based upon the more elemental measures we've observed thus far.

Which doesn't entirely answer the question once and for all about whether an increased emphasis on defense has any role in the change in scoring levels. But over the long-term period covered here, such an emphasis doesn't appear to be a driving factor the way that rising home run and strikeout rates—themselves reflective of physical, philosophical, and technological changes in the game—have been. Over the shorter term, the effect would appear to be small at best, the stuff more of a few good stories told about enlightened front offices zigging while the rest of the field zagged than of the type of trend readily apparent from at least one person's archaeological dig through the stat sheets.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
jjaffe
7/07
Oops, meant 2004-2009 when referring to the post-Moneyball era full seasons just above the correlation table.
tdees40
7/07
Is it just me, or is that some serious overfitting? Four variables for 19 datapoints?
markpadden
7/07
"Last week, ESPN's Rob Neyer picked up on my series regarding this year's dip in scoring and increase in strikeout rates."

This is misleading, as your own article shows that K rates this season were not significantly higher than last season (18.1% vs. 18.0%).

More importantly, I do not get the point of running a regression to create "a fairly accurate model to predict scoring levels." Unless I am completely misreading the data presented (possible), you are using current-season data as input to estimate current (the same) season run scoring. Where is the evidence that this model predicts future scoring better than simply using past run scoring? It seems that these underlying stats are likely to be just as noisy as the stat (runs) you are trying to predict. Optimizing four such stats over a very limited set of training data has almost no chance of working well as a predictive model going forward.
jjaffe
7/07
Regarding the opening sentence, it could have been worded better if I'd said "declining scoring rates and rising strikeout rates of the past few years." Neither of us were talking solely about 2009 vs. 2010 rates.

As to the regression, the point was to show that there are other factors which are having a much greater impact on the variations in scoring than some fuzzily-defined emphasis on defense. If I've gone about it in a ham-fisted way, rest assured that I'll leave most of the regression equations to the folks in the BP's math wing and return to the liberal arts wing where I spend more of my time.
BrewersTT
7/07
It's a good question about overfitting. I was surprised that no variables that reflect getting on base made the cut for the model, and this may be why. The good R-squares don't add up to much, because adding any variables at all in an already overfitted model will raise R-squared but it doesn't mean much in such a case. Simple regression on a time variable also has its issues.

But Jay has not made any big claims about the model (though he did use the word "robust") and hasn't based a lot of further work on it. I get the impression it's meant as a first step at this stage.

Back-of-the-envelope, a difference of 0.005 in league DE (about the biggest difference from 2010 in the table above) equates to about 0.015 more plays made by each team per game. So if a team accomplishes this with defensive aces at a significant expense to the offense at, say, two positions, they could afford to lose up to a .0075 hits per game from each of them before the trade-off hurt the team. it would be better of course to translate both sides of the question in to runs. It would be interesting to see this calculation done rigorously.