keyboard_arrow_uptop

Sometime this past March, I was talking about the Mariners with a friend. Robinson Cano came up, and my friend made the comment that Cano wasn’t that great of a hitter in 2015, because he only hit about .280 (the actual figure is .287, but that’s not the point). Setting aside advanced versus non-advanced stats, we talked a little about what sort of line a “good” hitter has, what it takes to lead the league in average, that sort of thing. He was shocked to learn that only six people in the AL hit .300 or better last year, and that .320 led the league; his impression was that the old standard of .300/.400/.500 was about average for a star hitter—in fact, only four qualified batters reached that threshold last year.

Fast-forward to about a month ago. Fellow BP writer Jeff Long and I were talking about article ideas, and he suggested that I write about what the “average” player looks like (statistically) these days, because “the steroid era screwed everyone's perception.” This was pretty much confirmed by Jeff via Twitter (note: I’m not trying to call anyone out, but I am using you all to make a point).

What I’ve done for today, then, is taken a look at a range of offensive stats over the past 66 years to see how their respective averages and ranges have changed over time, and which direction, if any, they’re headed. Notably I excluded BP’s own True Average; this is because the average TAv is fixed at .260 every year. I also limited my sample within each season to batters with 200 PA or more.

First up: batting average.

Okay, let’s take a minute to go over what you’re seeing here, then we’ll talk about what it means. The dark line in the center of the graph is the average batting average (note: not the league’s batting average) for each year in the sample, run through a LOWESS smoothing process in R[i]. The darkest blue band, in the middle, represents the 40th to 60th percentile range; the medium blue is the 20th to 80th percentile, and the light blue is 5th to 95th. The boundaries of all regions have been LOWESS-smoothed as well. The idea is to give the reader not just an idea of the average within a statistic, but the range, as well—if you want to know if someone has a “good” batting average (or whichever statistic you’re talking about), you don’t just want to know what the average is, you also want to know how tightly distributed around that average the whole data set it. Said another way (and using made-up numbers), if someone’s hitting .302 for the year, you don’t just want to know that the average batting average is .260, you also want to know whether most batters fall within the range of .250-.270 or .220-.320.

So, what you’re seeing here for batting average is a fairly consistent range, always about 100 total points, and a generally flat average line, only ranging by about 15-20 points over the whole sample. You’re an average hitter (at least in terms of AVG) if you hit about .260, and you’re in the top 20 percent if you hit better than just under .300. The part I find most interesting about this particular graph is the shape of the 80th– and 95th-percentile lines. There are peaks (or maybe plateaus) and valleys (anti-plateaus? I don’t know the word for that), separated almost generationally. So, for example, someone who first became a baseball fan in the mid-70s would’ve been influenced by their observations into thinking slightly-high-than-average batting averages were the norm; if that person then had a kid who became a baseball fan roughly 20-30 years later, they’d have also caught an above-normal period, which would correspondingly add to the influence on the parent (assuming they’re paying attention along with the kid). So you get families who are used to high batting averages (relatively—still only a 15-20 point swing) and families used to low ones. It even looks like it’s continuing, as all lines are trending up in the last few years. That’s an outrageously speculative take on this, but the cyclical nature of the graph jumped out at me.

On-base percentage is similarly smooth overall; the late 90s-early 2000s bump (visible in the batting average graph but not explicitly mentioned) is once again obvious, and there’s a recent-year trend upward after a bit of a valley. Average is generally around .335-.350; top 20 percent is, roughly speaking, better than .375. What I take away from this is the steep decline through the 50s and 60s leading up to the rule changes in 1968, after which OBP almost immediately rebounded.

Next up is slugging; here, there’s been a real persistent change ever since the 90s, shifting all lines upward by at least 20 points. Recent years have even gotten as high as the 90s/2000s peak. It’s almost reached the point where you’d have to slug .500 to be in the top 20 percent. This is the first instance of what I think is a visibly skewed distribution, too, where the 80th-95th range is notably wider than the 5th-20th range.

Unsurprisingly, HR% (that is, HRs per PA) shows a lot of similarity to SLG. Amazingly, the average figure is rising even above that of the “steroid” era. I’d also like to point out the same skew as seen for SLG, the separation between average HR/PA and median (visible by the trend of the average line to be above the middle of the darkest blue region, which would be the median), and the 50s/60s hump in all lines, though especially the 95th-percentile trace—which was also effectively killed off by the 1968 rule changes.

I wish there was more to say about walk rate, but aside from a sharp descent leading up to the late 1960s (sensing a trend there?), it’s remained highly consistent.

Again, there’s nothing here in the strikeout percentage graph that hasn’t been discussed to death already. Strikeouts have been steadily increasing since always, with the exception of a short hiatus in the 1970s. If you have a conception about how often the average batter strikes out, and it’s based on a rate you knew in the past, you’re almost certainly too low.

Lastly, in the process of researching this piece I thought of another question that relates to this whole perception-of-average thing, and it turned out to be fun to answer: Which player’s current 2016 batting line best matches the league-average batting line of years past?

I found this in three different ways, and I’ll show you the results of all of them. I used the same stats as I discussed above, and threw in a batter-ball profile stat (GB%) for good measure. This excludes, then, both baserunning and defense—all I can show you is who’s the best match for past league average *batting* lines. I also limited the sample to players with 250 PA or more this year.

I used three distance measurements, two legitimate and one of my own creation (probably not legitimate). I used the ‘ecodist’ package in R to measure both the Euclidean and Mahalanobis distances between year league averages and current player stat lines, and then also figured the total relative error (treating the league figures as the true values and the player figures as the estimates). Euclidean distance is the same as the Pythagorean formula (the mathematical one, not the baseball one) extended to further dimensions, while the Mahalanobis distance is nearly the same but first places all values on a standard deviation scale (and therefore reflects the scale/distribution of stats, while Euclidean distance does not). The chart that follows shows all results. In case my meaning isn’t clear, here’s how to read the chart, using 1954 and relative error as the example: it would be accurate to say that “by this measure (summed relative error), the 2016 player whose batting line most resembles league average in 1954 is Joe Panik.” Some definitely do NOT pass the smell test, but then again, this article has all been about misconceptions in what average means, so maybe it’s my skepticism that’s wrong.

Player

Year

Rel. Error

Euclidean

Mahalanobis

1954

Joe Panik

Yadier Molina

Buster Posey

1955

Joe Panik

Yadier Molina

Buster Posey

1956

Aaron Hill

Elvis Andrus

Buster Posey

1957

Joe Panik

Jacoby Ellsbury

Jose Iglesias

1958

Coco Crisp

Jacoby Ellsbury

Matt Duffy

1959

Coco Crisp

Jacoby Ellsbury

Jose Iglesias

1960

Coco Crisp

Jacoby Ellsbury

Matt Duffy

1961

Coco Crisp

Jordy Mercer

Denard Span

1962

Aaron Hill

Jordy Mercer

Denard Span

1963

Tucker Barnhart

Matt Duffy

Matt Duffy

1964

Tucker Barnhart

Matt Duffy

Matt Duffy

1965

Tucker Barnhart

Matt Duffy

Matt Duffy

1966

Tucker Barnhart

Matt Duffy

Matt Duffy

1967

Tucker Barnhart

Matt Duffy

Matt Duffy

1968

Matt Duffy

Matt Duffy

Matt Duffy

1969

Tucker Barnhart

John Jaso

Denard Span

1970

Aaron Hill

John Jaso

Denard Span

1971

Tucker Barnhart

Matt Duffy

Denard Span

1972

Tucker Barnhart

Matt Duffy

Denard Span

1973

Jordy Mercer

Matt Duffy

Denard Span

1974

Jordy Mercer

Matt Duffy

Denard Span

1975

Jordy Mercer

Jacoby Ellsbury

Denard Span

1976

Matt Duffy

Jacoby Ellsbury

Denard Span

1977

Aaron Hill

Jordy Mercer

Denard Span

1978

Chase Utley

Jacoby Ellsbury

Denard Span

1979

Joe Panik

Yadier Molina

Jose Iglesias

1980

Jordy Mercer

Yadier Molina

Jose Iglesias

1981

Matt Duffy

Jacoby Ellsbury

Denard Span

1982

Joe Panik

Jacoby Ellsbury

Jose Iglesias

1983

Jordy Mercer

Jacoby Ellsbury

Denard Span

1984

Chase Utley

Jacoby Ellsbury

Denard Span

1985

Aaron Hill

Matt Duffy

Denard Span

1986

Aaron Hill

Jordy Mercer

Matt Duffy

1987

Coco Crisp

Nomar Mazara

Buster Posey

1988

Chase Utley

Chase Utley

Yonder Alonso

1989

Chase Utley

Nick Markakis

Yonder Alonso

1990

Chase Utley

Nick Markakis

Yonder Alonso

1991

Chase Utley

Chase Utley

Yonder Alonso

1992

Chase Utley

Chase Utley

Yonder Alonso

1993

Aaron Hill

Nomar Mazara

Matt Duffy

1994

Coco Crisp

Nomar Mazara

Matt Duffy

1995

Coco Crisp

Nomar Mazara

Matt Duffy

1996

Nomar Mazara

Nomar Mazara

Rajai Davis

1997

Brandon Crawford

Nomar Mazara

Rajai Davis

1998

Nomar Mazara

Nomar Mazara

Rajai Davis

1999

Hanley Ramirez

Odubel Herrera

Anthony Rendon

2000

Hanley Ramirez

Hanley Ramirez

Hanley Ramirez

2001

Nomar Mazara

Hanley Ramirez

Tucker Barnhart

2002

Scooter Gennett

Scooter Gennett

Tucker Barnhart

2003

Nomar Mazara

Nomar Mazara

Matt Duffy

2004

Nomar Mazara

Nomar Mazara

Stephen Piscotty

2005

Scooter Gennett

Nomar Mazara

Matt Duffy

2006

Nomar Mazara

Nomar Mazara

Stephen Piscotty

2007

Brandon Crawford

Nomar Mazara

Rajai Davis

2008

Chase Utley

Chase Utley

Rajai Davis

2009

Brandon Crawford

Chase Utley

Rajai Davis

2010

Chase Utley

Chase Utley

Rajai Davis

2011

Rajai Davis

Chase Utley

Rajai Davis

2012

Rajai Davis

Rajai Davis

Rajai Davis

2013

Rajai Davis

Rajai Davis

Rajai Davis

2014

Yasiel Puig

Yasiel Puig

Rajai Davis

2015

Rajai Davis

Rajai Davis

Rajai Davis

2016

Rajai Davis

Rajai Davis

Rajai Davis



[i] This was done using the LOWESS function, with the utterly insane overkill of allowing up to 100,000 iterations. Full R code is available upon request, though data may or may not be.

You need to be logged in to comment. Login or Subscribe
ErikBFlom
8/03
So Rajai Davis is the very model of a modern major league baseball player. He's information information vegetable, animal and mineral.
lipitorkid
8/03
I know this sounds lame. But an actual slash line at the end of the article would feel satisfying.
JohnChoiniere
8/03
Yeah, in restrospect I definitely would/should have. On the other hand, just check out Rajai Davis's line and you'll get pretty close!
lipitorkid
8/03
So, for those who don't want to go look, the average line is: .256/.317/.403/.719 Cause that's Rajai's 2016 line.
gregarakaki
8/03
This is great info! Is it possible to do something similar for pitching, both starters and relievers?
JohnChoiniere
8/03
Yes! It would've made this article too long, but I'll do that in a subsequent piece.
bhacking
8/03
This seems like a pretty average article, but I think that's because BP articles are generally so good they "screwed my perception".
mhmckay
8/03
Feel like you waltzed past the BB decline. After 20 years of increase 1980-2000, we've had general decline since the steroids era ended. Especially interesting given that HR trends are back above steroids era peaks. Combined with the K rates, would suggest that pitchers structurally have regained the upper hand.
JoshC77
8/05
Late to the party here....but great stuff. You do any in-depth look at a correlation between the "league average player" and the various league expansions?