Today, I'll be doing a little follow-up to my last piece, which looked at hitter performance against fast pitching (spoiler alert if you haven't read it yet: it hasn't improved). I stand by my methodology, and will in fact use it again here today, but in discussions I've had both on- and off-line various people have suggested improvements. So here we are.
As a reminder, I'm using four different measures to assess hitter performance against 97 mph pitches since 2008. They were chosen to reflect the various parts of what goes into being able to hit an incredibly fast pitch. The first is swing rate, which I intended to measure comfort level with seeing fast pitching at all. The second is contact rate, intended to be a proxy for how well hitters see the ball. The third is fair-ball rate on contact, intended to stand in for hitter ability to time the pitch. The last is linear weights runs per plate appearance, to measure quality of contact.
I'm further measuring everything not in an absolute way, but rather relative to performance against 92 mph pitching. This attempts to control for any overall change in talent level. As before, the data sample is restricted to only include fastballs. For full details you can read my last piece here at BP, and if there are still things that don't make sense you can e-mail me.
The first idea I'll mention came from the comments section. A reader (jfranco77) asked if it was possible to look at NCAA data for the same purpose, looking for velocity-related improvement before they reach the major-league level. This was really intriguing to me, especially if it could be used as a predictor for major-league success, but unfortunately the data is simply not available to me, and so I couldn't do it. I also wonder if the sheer number of NCAA teams relative to MLB teams would lead to sample size issues; certainly that would be the case with minor-league data, which is more available but not necessarily complete.
The next point that was raised to me is one I'm disappointed I didn't think of in the first place; namely, that there might be something to be learned by further restricting data to the same players across all the years included. This makes an incredible amount of sense, for a bunch of reasons, but isn't perfect either. Keeping the player pool consistent across a nine-season data set guarantees that the numbers will reflect batting performance of players seeing more and more very fast fastballs, which is what I want to have in order to look for any sort of learning, but there's also an enormous selection bias that enters the system: these are the players who performed well enough to actually stick around for nine years, which is very clearly a minority of players.
It also inherently becomes difficult (well, more difficult) to separate out the potential learning effect of seeing more fast pitches versus seeing more pitches in general. I’m somewhat concerned about sample size, too–there are only 91 qualifying players in this group, a half-dozen or so of whom are pitchers. By pitch count, this is only about a one-third to one-fifth (depending on the season) of the total sample from before. All told, though, I would have chosen this data set for the first article if I'd thought of it, so let's see how the data looks.
Starting here with swing rate, there’s not a huge difference between the all-years players and the original data set; in both cases, swing rate stayed pretty much level throughout. It appears more volatile in the data here, but that’s at least somewhat expected given the reduced population.
As was the case with the original data, contact rate is an area where some improvement is seen. Notably, the 2008 rates of ~87 percent for the probable-strike category and ~89.5 percent for all pitches are a bit lower here in the all-years players; I don’t have an explanation for that, though I can venture a guess that it’s somehow related to this cohort likely being younger than the removed players, or at least in the younger half, in 2008 as opposed to in 2015 (by which point they’ve all aged eight years, of course, and the removed players are more likely to be younger).
Here, in what I called last time the “least informative” measurement of the four, we see a break from the prior data set. Whereas in the full complement of players the data was too up-and-down to learn anything from, here we can see, to my eye at least, a steady rise. I suggested last time this would show an improvement in the ability to time swings on fast pitches correctly, but that’s hardly more than speculation, so I’ll leave interpretation to the reader.
Lastly, we come to the value stat–LWTS-Runs per PA. The prior data set showed a very gradual decline, whereas here, at least since 2011, things are either level or slightly rising. This fits with the “learning” idea–which really is just intuition–that players will improve at this skill with practice. However, as before, there isn’t really enough data here to draw hard conclusions. Last time around I included a graph with 2016 data that showed an upward trend; I can’t replicate that with this set of all-years players, since only 23 of the 91 made it to 2016, but in the full data set that fact hasn’t changed with the addition of the last two weeks of games.
It was suggested as well that there was something particular about 97 mph pitching, so I looked at the same 5-mph difference but using 98 and 93 instead of 97 and 92 mph. For the first three measures I observed the same trend as with 97 mph, but the upward trend in the value metric was even more pronounced (admittedly following a deeper valley in 2011). See for yourself below.
Changing gears, also suggested to me was the possibility that splitting by player on an individual basis, not just averaging everyone, could reveal even more (particularly if I focus on the all-nine-years players). I love the idea, but would run the risk of becoming the old cliché used by anti-saber folks of slicing the data sample so much I can make it say whatever I want. More to the point, the sample sizes resulting from doing such a thing become so small as to approach meaninglessness.
A compromise that I do want to show, though, is splitting the samples by division. Although it’s been becoming more evenly distributed, there were seasons within the dataset where one division saw 33 percent more 97 mph pitches than another did. The unbalanced schedule means a team is more likely to be exposed to a pitcher within their own division, and therefore there are divisions where hitters are at least in theory getting more practice with this fast pitching than in others.
Unfortunately (or maybe you’re breathing a sigh of relief here), I’m not going to show your 24 graphs of all four measures broken down by division. What I will do is summarize the findings, and say that I’m happy to share the graphs upon request. In both leagues, the West division showed the most upward-facing trend. Strangely, these were the two divisions that averaged the fewest total pitches seen in the 97 mph category, so maybe that blows the whole “learning” idea completely out of the water.
The point of all this is coalescing around two main points. First, the data remains sparse enough that I’m not willing to draw any hard-and-fast conclusions. It’ll be extremely interesting, to me at least, to revisit this five years down the road. Second, there’s still a ton of work out there for an interested person to do on this subject. It was beyond my scope here, but I think a reasonable next step would be to look at age cohorts and see if any patterns can be found there. For now, I’ll leave that to someone else.