According to Dave Cameron and recently confirmed in a blog post by Tom Tango, MLB has changed the meaning of start_speed, a pitch-by-pitch parameter in the MLB Components Data ("Gameday") Files. This brief post summarizes the history of the start_speed parameter, includes a cautionary note to new pitch-tracking researchers, and describes a method for estimating release point (extension) by taking advantage of Gameday’s parameter switch.
The parameter start_speed has, for the better part of 10 years, coded for the velocity at a fixed distance 50 feet from home plate. Although 50 feet is much too close to home plate to actually be a realistic guess at a pitcher release point, this distance was initially chosen to reasonably match the velocities reported by scout’s radar guns. Several websites (including BP and BrooksBaseball.net) quickly realized that 55 feet was actually a better estimate for pitcher release point, and so have used that as convention for much of the PITCHf/x era. Due to technical limitations of the PITCHf/x system, it was not possible to record the actual release point of the pitch, which limited the ability of the system to determine the actual speed at release.
Trackman Doppler Radar, which serves as the pitch tracking hardware for the new MLB Statcast system, has the advantage of being able to measure the actual release point of the ball—and the speed at that point—with excellent fidelity.
With the transition to this new system and the supplantation of the PITCHf/x system, it appears MLB has decided to repurpose the start_speed parameter. Whereas before, start_speed matched sqrt(v0x^2+v0y^2+v0z^2), it no longer seems to do so. Instead, it reportedly captures the actual release speed at the point first detected by the radar system.
This change is challenging from a data management perspective. A previous field 10 years young now codes for a similar but slightly different quantity. All websites that reported PITCHf/x data and use this start_speed field will have incomparable season-to-season data as of several days ago (NB: one author’s website, BrooksBaseball.net, is generally unaffected by this change, detailed below). This change will be particularly challenging for novice pitch-tracking researchers, who will have to account for the switch. Perhaps the simplest switch is to manually calculate the start_speed field when using the 2017 data given the formula above, which continues to match old conventions.
However, this change also provides a unique opportunity. Gameday now provides the velocity (vr) at some unknown release point. It also provides the usual nine-parameter constant acceleration fit with the initial distance of 50 feet. With the aid of some simple physics formulas and some algebra, this allows us to find the velocity v at any other point in space. In particular, we can find the distance from home plate at which v=vr. That point is the release point. All of this can easily be coded into an Excel spreadsheet or into R, as we now describe. (Files are available here.)
This code will generate a pitch-by-pitch dataframe with a new variable, yRelease, reported for each pitch, which is our estimated release point along the Y axis (that is, distance from the plate). It will then aggregate this data for each pitcher, and additionally compute extension, which is simply 60.5 – yRelease.
Presented below are individual pitch trajectories (Figure 1) and aggregate plots (Figure 2) for a sample of pitches (presented individually because of scraping difficulties for Extension from MLB.com). A unity line, showing a 1 to 1 correlation, has been added to each plot.
There is almost a perfect correlation on an aggregate level, and a very strong correlation on a pitch-by-pitch level.
The reason there is a weaker correlation on a pitch-by-pitch level concerns new rounding in the Gameday data files. MLB reports the start_speed quantity with a higher degree of precision at MLB Savant. If those values are used instead, the above plot looks like:
This error slightly propagates into the aggregate level:
The rounding error on the release speed matters more than one might expect for two reasons. First, we are working with squares of velocities, hence magnifying the rounding error. Second, we are looking at differences of these squares, which are much smaller, resulting in the rounding error giving a large fractional error.
We note, however, that while this trend is highly replicable across most parks, a clear discrepancy is noted in Tampa Bay, at least for the game on April 2, which may be indicative of calibration issues:
When only data from the game on April 4 is used, the clear trend reemerges:
NB: BrooksBaseball.net has, for many years, shifted the estimate of pitcher release point back to 55 feet. This is extremely close to the actual average release point on an aggregate basis. As to the parameter change, because of this re-calculation of pitch speed, we abandoned using the start_speed parameter long ago. The data on BrooksBaseball.net will, by convention, continued to be reported from an assumed release point of y=55’.