Sometime this past March, I was talking about the Mariners with a friend. Robinson Cano came up, and my friend made the comment that Cano wasn’t that great of a hitter in 2015, because he only hit about .280 (the actual figure is .287, but that’s not the point). Setting aside advanced versus nonadvanced stats, we talked a little about what sort of line a “good” hitter has, what it takes to lead the league in average, that sort of thing. He was shocked to learn that only six people in the AL hit .300 or better last year, and that .320 led the league; his impression was that the old standard of .300/.400/.500 was about average for a star hitter—in fact, only four qualified batters reached that threshold last year.
Fastforward to about a month ago. Fellow BP writer Jeff Long and I were talking about article ideas, and he suggested that I write about what the “average” player looks like (statistically) these days, because “the steroid era screwed everyone's perception.” This was pretty much confirmed by Jeff via Twitter (note: I’m not trying to call anyone out, but I am using you all to make a point).
@JeffLongBP My insticts say .280 would be "good", but I don't even know what the league average is.
— Ryan Tamanini (@rmthawk64) June 26, 2016
@JeffLongBP I would guess not. .300 is the magic number
— Admiral Buzzkill (@TheOriginalBull) June 26, 2016
@JeffLongBP I don't even think I know.
— August Fagerstrom (@AugustFG_) June 26, 2016
What I’ve done for today, then, is taken a look at a range of offensive stats over the past 66 years to see how their respective averages and ranges have changed over time, and which direction, if any, they’re headed. Notably I excluded BP’s own True Average; this is because the average TAv is fixed at .260 every year. I also limited my sample within each season to batters with 200 PA or more.
First up: batting average.
Okay, let’s take a minute to go over what you’re seeing here, then we’ll talk about what it means. The dark line in the center of the graph is the average batting average (note: not the league’s batting average) for each year in the sample, run through a LOWESS smoothing process in R[i]. The darkest blue band, in the middle, represents the 40^{th} to 60^{th} percentile range; the medium blue is the 20^{th} to 80^{th} percentile, and the light blue is 5^{th} to 95^{th}. The boundaries of all regions have been LOWESSsmoothed as well. The idea is to give the reader not just an idea of the average within a statistic, but the range, as well—if you want to know if someone has a “good” batting average (or whichever statistic you’re talking about), you don’t just want to know what the average is, you also want to know how tightly distributed around that average the whole data set it. Said another way (and using madeup numbers), if someone’s hitting .302 for the year, you don’t just want to know that the average batting average is .260, you also want to know whether most batters fall within the range of .250.270 or .220.320.
So, what you’re seeing here for batting average is a fairly consistent range, always about 100 total points, and a generally flat average line, only ranging by about 1520 points over the whole sample. You’re an average hitter (at least in terms of AVG) if you hit about .260, and you’re in the top 20 percent if you hit better than just under .300. The part I find most interesting about this particular graph is the shape of the 80^{th}– and 95^{th}percentile lines. There are peaks (or maybe plateaus) and valleys (antiplateaus? I don’t know the word for that), separated almost generationally. So, for example, someone who first became a baseball fan in the mid70s would’ve been influenced by their observations into thinking slightlyhighthanaverage batting averages were the norm; if that person then had a kid who became a baseball fan roughly 2030 years later, they’d have also caught an abovenormal period, which would correspondingly add to the influence on the parent (assuming they’re paying attention along with the kid). So you get families who are used to high batting averages (relatively—still only a 1520 point swing) and families used to low ones. It even looks like it’s continuing, as all lines are trending up in the last few years. That’s an outrageously speculative take on this, but the cyclical nature of the graph jumped out at me.
Onbase percentage is similarly smooth overall; the late 90searly 2000s bump (visible in the batting average graph but not explicitly mentioned) is once again obvious, and there’s a recentyear trend upward after a bit of a valley. Average is generally around .335.350; top 20 percent is, roughly speaking, better than .375. What I take away from this is the steep decline through the 50s and 60s leading up to the rule changes in 1968, after which OBP almost immediately rebounded.
Next up is slugging; here, there’s been a real persistent change ever since the 90s, shifting all lines upward by at least 20 points. Recent years have even gotten as high as the 90s/2000s peak. It’s almost reached the point where you’d have to slug .500 to be in the top 20 percent. This is the first instance of what I think is a visibly skewed distribution, too, where the 80^{th}95^{th} range is notably wider than the 5^{th}20^{th} range.
Unsurprisingly, HR% (that is, HRs per PA) shows a lot of similarity to SLG. Amazingly, the average figure is rising even above that of the “steroid” era. I’d also like to point out the same skew as seen for SLG, the separation between average HR/PA and median (visible by the trend of the average line to be above the middle of the darkest blue region, which would be the median), and the 50s/60s hump in all lines, though especially the 95^{th}percentile trace—which was also effectively killed off by the 1968 rule changes.
I wish there was more to say about walk rate, but aside from a sharp descent leading up to the late 1960s (sensing a trend there?), it’s remained highly consistent.
Again, there’s nothing here in the strikeout percentage graph that hasn’t been discussed to death already. Strikeouts have been steadily increasing since always, with the exception of a short hiatus in the 1970s. If you have a conception about how often the average batter strikes out, and it’s based on a rate you knew in the past, you’re almost certainly too low.
Lastly, in the process of researching this piece I thought of another question that relates to this whole perceptionofaverage thing, and it turned out to be fun to answer: Which player’s current 2016 batting line best matches the leagueaverage batting line of years past?
I found this in three different ways, and I’ll show you the results of all of them. I used the same stats as I discussed above, and threw in a batterball profile stat (GB%) for good measure. This excludes, then, both baserunning and defense—all I can show you is who’s the best match for past league average *batting* lines. I also limited the sample to players with 250 PA or more this year.
I used three distance measurements, two legitimate and one of my own creation (probably not legitimate). I used the ‘ecodist’ package in R to measure both the Euclidean and Mahalanobis distances between year league averages and current player stat lines, and then also figured the total relative error (treating the league figures as the true values and the player figures as the estimates). Euclidean distance is the same as the Pythagorean formula (the mathematical one, not the baseball one) extended to further dimensions, while the Mahalanobis distance is nearly the same but first places all values on a standard deviation scale (and therefore reflects the scale/distribution of stats, while Euclidean distance does not). The chart that follows shows all results. In case my meaning isn’t clear, here’s how to read the chart, using 1954 and relative error as the example: it would be accurate to say that “by this measure (summed relative error), the 2016 player whose batting line most resembles league average in 1954 is Joe Panik.” Some definitely do NOT pass the smell test, but then again, this article has all been about misconceptions in what average means, so maybe it’s my skepticism that’s wrong.
Player 

Year 
Rel. Error 
Euclidean 
Mahalanobis 
1954 
Joe Panik 

1955 
Joe Panik 
Yadier Molina 
Buster Posey 
1956 
Buster Posey 

1957 
Joe Panik 

1958 
Jacoby Ellsbury 

1959 
Coco Crisp 
Jacoby Ellsbury 
Jose Iglesias 
1960 
Coco Crisp 
Jacoby Ellsbury 
Matt Duffy 
1961 
Coco Crisp 

1962 
Aaron Hill 
Jordy Mercer 
Denard Span 
1963 
Matt Duffy 
Matt Duffy 

1964 
Tucker Barnhart 
Matt Duffy 
Matt Duffy 
1965 
Tucker Barnhart 
Matt Duffy 
Matt Duffy 
1966 
Tucker Barnhart 
Matt Duffy 
Matt Duffy 
1967 
Tucker Barnhart 
Matt Duffy 
Matt Duffy 
1968 
Matt Duffy 
Matt Duffy 
Matt Duffy 
1969 
Tucker Barnhart 
Denard Span 

1970 
Aaron Hill 
John Jaso 
Denard Span 
1971 
Tucker Barnhart 
Matt Duffy 
Denard Span 
1972 
Tucker Barnhart 
Matt Duffy 
Denard Span 
1973 
Jordy Mercer 
Matt Duffy 
Denard Span 
1974 
Jordy Mercer 
Matt Duffy 
Denard Span 
1975 
Jordy Mercer 
Jacoby Ellsbury 
Denard Span 
1976 
Matt Duffy 
Jacoby Ellsbury 
Denard Span 
1977 
Aaron Hill 
Jordy Mercer 
Denard Span 
1978 
Jacoby Ellsbury 
Denard Span 

1979 
Joe Panik 
Yadier Molina 
Jose Iglesias 
1980 
Jordy Mercer 
Yadier Molina 
Jose Iglesias 
1981 
Matt Duffy 
Jacoby Ellsbury 
Denard Span 
1982 
Joe Panik 
Jacoby Ellsbury 
Jose Iglesias 
1983 
Jordy Mercer 
Jacoby Ellsbury 
Denard Span 
1984 
Chase Utley 
Jacoby Ellsbury 
Denard Span 
1985 
Aaron Hill 
Matt Duffy 
Denard Span 
1986 
Aaron Hill 
Jordy Mercer 
Matt Duffy 
1987 
Coco Crisp 
Buster Posey 

1988 
Chase Utley 
Chase Utley 

1989 
Chase Utley 
Yonder Alonso 

1990 
Chase Utley 
Nick Markakis 
Yonder Alonso 
1991 
Chase Utley 
Chase Utley 
Yonder Alonso 
1992 
Chase Utley 
Chase Utley 
Yonder Alonso 
1993 
Aaron Hill 
Nomar Mazara 
Matt Duffy 
1994 
Coco Crisp 
Nomar Mazara 
Matt Duffy 
1995 
Coco Crisp 
Nomar Mazara 
Matt Duffy 
1996 
Nomar Mazara 
Nomar Mazara 

1997 
Nomar Mazara 
Rajai Davis 

1998 
Nomar Mazara 
Nomar Mazara 
Rajai Davis 
1999 

2000 
Hanley Ramirez 
Hanley Ramirez 
Hanley Ramirez 
2001 
Nomar Mazara 
Hanley Ramirez 
Tucker Barnhart 
2002 
Scooter Gennett 
Tucker Barnhart 

2003 
Nomar Mazara 
Nomar Mazara 
Matt Duffy 
2004 
Nomar Mazara 
Nomar Mazara 

2005 
Scooter Gennett 
Nomar Mazara 
Matt Duffy 
2006 
Nomar Mazara 
Nomar Mazara 
Stephen Piscotty 
2007 
Brandon Crawford 
Nomar Mazara 
Rajai Davis 
2008 
Chase Utley 
Chase Utley 
Rajai Davis 
2009 
Brandon Crawford 
Chase Utley 
Rajai Davis 
2010 
Chase Utley 
Chase Utley 
Rajai Davis 
2011 
Rajai Davis 
Chase Utley 
Rajai Davis 
2012 
Rajai Davis 
Rajai Davis 
Rajai Davis 
2013 
Rajai Davis 
Rajai Davis 
Rajai Davis 
2014 
Yasiel Puig 
Rajai Davis 

2015 
Rajai Davis 
Rajai Davis 
Rajai Davis 
2016 
Rajai Davis 
Rajai Davis 
Rajai Davis 
[i] This was done using the LOWESS function, with the utterly insane overkill of allowing up to 100,000 iterations. Full R code is available upon request, though data may or may not be.