keyboard_arrow_uptop

Note: Baseball Prospectus has removed the leaderboards mentioned in this article. Thank you for your interest in our work and for your patience as we attempt to resolve this issue.

Last year, the folks at MLB Advanced Media started publishing what is commonly described as “exit velocity”: the pace at which the baseball is traveling off the bat of the hitter, as measured by the new Statcast system.

As a statistic, exit velocity is attractive for several reasons. For one thing, it is new and fresh, and that’s always exciting. It also makes analysts feel like they are traveling inside the hitting process, and getting a more fundamental look at a hitter or pitcher’s ability to control the results of balls in play.

However, we’ve seen many people take the raw average of a player’s exit velocities and assume it to be a meaningful indication, in and of itself, of pitcher or batter productivity. This is not entirely wrong: Raw exit velocity can correlate reasonably well with a batter’s performance.

But this use of raw averages also creates some problems. First, if you use exit velocity as a proxy of player ability, then you must also accept that one player’s exit velocity is a function of his opponents, be they a batter or pitcher. Put more bluntly, a player’s average exit velocity is biased by the schedule of the player’s team.

Second, and much more importantly, we have concluded Statcast exit velocity readings, as currently published, are themselves biased by the ballpark in which the event occurs. This goes beyond mere differences in temperature and park scoring tendencies. In fact, it appears that the same player generating the same hit will have its velocity rated differently from stadium to stadium, even if you control for other confounding factors.

Third, and this admittedly is a technical point, raw averages are virtually always an inaccurate estimate of the player’s probable contribution to each play. This principle, which follows from the James-Stein estimator, underlies our shift to mixed modeling for all of our new metrics at Baseball Prospectus. The most likely contribution of each player to his average exit velocity is narrower than it appears. By using a mixed model, we shrink these raw averages to the player’s most likely contribution, at the same time we control for these other factors.

Our new Statcast leaderboards attempt to address these biases in a variety of ways. Our “adjusted exit velocity” metric uses a linear mixed model to control for opposing pitcher / batter and stadium, while also incorporating shrinkage principles to make the new averages a better fit of player performance overall. There are separate leaderboards for pitchers and batters, and the relevant column in both to adjusted exit velocity is Adj_Exit_Vel. We advise checking these leaderboards before making any sweeping claims about the significance of a player’s associated exit velocities.

In terms of ballpark bias, how much of an effect can a particular park have? As it turns out, a fair amount. Here is the table of generated intercepts for each stadium and its effect on average, adjusted exit velocity during the 2015 season:

Stadium

Park Effect

ARI

1.17

ATL

0.02

BAL

0.98

BOS

-0.02

CHC

0.18

CIN

-0.96

CLE

0.32

COL

-0.37

CWS

-0.21

DET

0.85

HOU

-0.96

KC

0.46

LAA

-0.19

LAD

0.26

MIA

0.09

MIL

-0.02

MIN

0.21

NYM

-0.94

NYY

-0.14

OAK

-0.02

PHI

-0.02

PIT

0.11

SD

-0.45

SEA

0.05

SF

0.39

STL

-0.60

TB

0.10

TEX

-0.04

TOR

-0.05

WSH

-0.20

As you can see, this amounts to a difference of over 2 mph from the fastest to the slowest stadium reading for what should be essentially the same hit. This is less significant for batters, but for pitchers—who have an adjusted exit velocity range of only about 3.5 mph total—we are talking about a potentially significant impact.

The existence of this stadium bias is not really that surprising: While each team does regularly calibrate its Trackman radar systems for consistency, a number of factors conspire against them. First, ballparks have different geometries, which means the radar is not placed in exactly the same place in every city. In other words, exit velocity is probably not measured at precisely the same “point” at each ballpark. Moreover, the equipment has inherent variability of its own, in a manner that is unique to each ballpark.[i] A radar installed in Philadelphia will not lose calibration in the same way, nor at the same time, as one installed in Los Angeles. This is due both to the environment and park-unique installation.

How do we know these differences are a function of equipment bias, rather than just park-scoring tendencies? Well first, any experienced eye can tell that the intercepts above do not correlate with known park factors. Cincinnati is not a place where hits go to die, and San Francisco is not a ballpark known for inflating offense. Naturally, though, we also tested this statistically. To control for stadium run-scoring, and to make sure this wasn’t just some internal BP thing, we tested our hypothesis using the pitcher park factors calculated for the 2015 season by our friends at Baseball Reference. The result? Even controlling for temperature and inherent park scoring factor, and even shrinking each stadium factor toward the grand mean, the further effect of having exit velocity measured at different stadiums was still statistically significant (p<.05). Our leaderboards therefore control for this important bias.

Finally, our leaderboards also go one step further, and translate exit velocity and launch angle into estimated runs generated/prevented by the player. These run estimates account for both adjusted exit velocity and launch angle, as it is the combination that really matters. To our knowledge, no other public leaderboard does this. We’ve also made the descriptions a bit less cryptic. On our Statcast leaderboards, these calculations are now described as follows:

· Pred_Runs: the number of runs we would expect the batter / pitcher to generate / prevent based on the adjusted exit velocity and launch angle of the ball off the bat.

· Pred_Runs_Rate: this is the column we sort by, and it tells you the average effect, per batted ball, that the player’s presence has on run-scoring, per their adjusted exit velocity and launch angles.

· Act_Runs: the raw number of runs generated while the batter / pitcher was involved so far this season.

· Act_Runs_Rate: the raw average run effect, per batted ball, while the batter / pitcher was involved so far this season.

· Act – Pred: the differential between the outcome of batted balls with the batter / pitcher involved and what our models would have predicted. The differential suggests how lucky / unlucky a player has been so far this year. A positive value means that the batter/pitcher was predicted to create/allow fewer runs than they actually did.

· BIP+HR: the number of batted balls at issue, comprising balls in play (non-HR, fair balls) and home runs.

We hope you find these new leaderboards useful, and also want to thank MLBAM for making this exit velocity data publicly available.

Bibliography

Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1),

1-48. doi:10.18637/jss.v067.i01.

R Core Team (2016). R: A language and environment for statistical computing. Version 3.3.0. R Foundation for Statistical Computing, Vienna, Austria. URL

https://www.R-project.org/.

Wood, S.N. (2006) Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC.

All Statcast data courtesy of Baseball Savant, https://baseballsavant.mlb.com/

Models:

For those who enjoy the technical aspects, here are the details on our models.

Our adjusted exit velocity model is as follows:

Our linear weights model incorporating measured hit speed and launch angle is as follows:

Our raw run rate model is as follows:

Finally, our expected run rate model is as follows:

All models are fit with the lme4 package in the R computing environment, except for the gam, which is fit with the mgcv package.

All variables were tested on the 2015 season. Each variable was tested with a likelihood ratio test as compared to a reduced model, and then was further checked with 10-fold cross-validation, from which we took the mean absolute error over all runs. If the variable passed at least the 10-fold-cross-validation test in terms of reducing error, it was added; otherwise, it was rejected.



[i] Most of this can be attributed to the radar's limited ability to eliminate “clutter,” or non-baseball objects like the ground, the umpire, a passing bird, or a stray D-cell battery. Artifacts like this can slow or inhibit the system's ability to identify the moment when bat hits ball.

You need to be logged in to comment. Login or Subscribe
ggdowd
5/17
"Finally, our leaderboards also go one step further, and translate exit velocity and launch angle into estimated runs generated/prevented by the player." Finally, indeed! I've been waiting for this since Statcast data started flowing in. Now hopefully people will stop looking at deviations of BABIP from league average and throwing up their hands. This is very exciting stuff; great work.
dgalloway15fish
5/17
Articles like this make me wish I paid more attention in AP Statistics. P values, man
bachlaw
5/17
P values are overrated. The cross-validation is the key!
bachlaw
5/17
Because I expect people will ask, the likelihood ratio tests confirm that stadium remains a statistically-significant factor in 2016.
a-nathan
5/17
Do the park effects in 2016 correlate with those in 2015. If you do random split season for 2015, how do park effects correlate? Or 1st half/2nd half split?
bachlaw
5/17
Alan, at the moment I can speak to 2015 versus 2016. So far, the Pearson correlation between the park intercepts for each season is .6, with the caveat that there is a lot less 2016 data than 2015 data, obviously.
a-nathan
5/17
Thanks, Jonathan. I'm still trying to wrap my brain around the park effects associated with Trackman. I have asked the brain trust at Trackman for their thoughts on the subject. Will let you know what they tell me.
bachlaw
5/17
Thanks Alan. I think it's important to stress that this doesn't mean there is anything "wrong" with the system or what they are doing. Every measurement, of course, has biases and this is all about helping everyone get the most out of what Trackman has to offer.
jacksontaigu
5/17
How, if at all, could the missing data in Statcast affect the findings?
bachlaw
5/17
That's a great question. Hard to know what we do not know.
a-nathan
5/17
That really is an interesting question. It is quite likely that missing data are geometry-dependent, having to do with the location and orientation of the Trackman device. These things are surely park dependent. So, it could be the case that some of the park effects could be due to the missing data. But that is just speculation on my part.
a-nathan
5/17
I have a question regarding the temperature term in the exit velocity model. Is there an easy way to summarize what you found for that term? Or specifically, what is the effect of temperature on exit velocity? Some effect is expected, since the coefficient of restitution of the baseball is temperature dependent. That effect has been measured and is known reasonably well. It would be interesting to see whether that dependence could account for what you find in your analysis.
bachlaw
5/17
Certainly. Let's take 2015 data because it is the most complete. Game-time temperature (as reported by BAM) was converted to a natural log. The model for adjusted exit velocity starts from a baseline intercept of 83.2, and for each one-unit increase in ln_temp, exit velocity increases by just over 1 mph, after controlling for batter, pitcher, and stadium, and presumably holding them constant.
TangoTiger1
5/17
First, a big {clap clap clap}. Just to confirm your numbers, roughly speaking (and understanding it's not linear, but focusing on the temperatures around 60-110), each 6 to 10 degrees Fahrenheit increases exit velocity by 0.1mph, correct?
ggdowd
5/17
If my math is correct: Temp Change | Predicted Change in MPH 60 to 70 | .154 70 to 80 | .134 80 to 90 | .118 90 to 100 | .105 100 to 110 | .095
a-nathan
5/17
Thanks. Changing ln(T) by 1 is a change from 90F to 33F, which is a big change. You find a change in exit velocity by ~1 mph. The best I can do is estimate based on the change in COR of the ball, which changes by about 5.7% based on the measurements we did (you can read about it in the last section of this paper: http://baseball.physics.illinois.edu/AJP-June2011.pdf). The corresponding change in exit velocity is about 3.5 mph, which is quite a bit larger than what you found from the statistical analysis. Of course, for the measurements we did, the baseball had been stored in a constant temperature environment for ~2 weeks, so that we can be pretty sure the temperature was the same throughout the volume of the ball. While it doesn't take 2 weeks to reach temperature equilibrium, it probably takes a lot longer than just a few hours, so it is not so surprising that you find a smaller effect. I have learned a lot from your analysis. Still trying to learn more.
lipitorkid
5/17
I'm curious about another factor. Is exit velocity more likely to be stable for power hitters than slap hitters. By slap hitters I mean someone like Rod Carew or Tony Gwynn. Both Carew and Gwynn could hit the ball harder, but based on the game situation and defensive placement the smarter swing may have been the softer swing. Also do bunts get factored into average exit velocity?