July 10, 2017
Measuring Pitcher Similarity
The PITCHf/x optical video and TrackMan Doppler radar sensors estimate parameters of pitches, including the speed, horizontal movement and vertical movement. The data recorded by these systems can be used to develop pitcher similarity measures. These measures are valuable not only for comparing major-league pitchers to each other, but also for allowing the direct comparison of pitchers in other leagues (minor, amateur and foreign) to their MLB counterparts.
A pitcher similarity measure can be employed for multiple purposes by analysts. The identification of groups of similar pitchers can be used to generate optimized projection models , or to generate larger samples for predicting the outcome of batter/pitcher matchups , . In addition, a similarity measure allows for individual pitchers to be monitored over time in order to detect possible changes in pitch characteristics, health and throwing mechanics.
Previous methods for quantifying pitcher similarity have been limited to the comparison of pitches of the same type, which makes these methods highly dependent on the outcome of pitch-classification algorithms. Kalk ,  developed a similarity measure that compared pitches of the same type using variables that included pitch frequency, speed and movement. Loftus , ,  improved on Kalk's approach by separating pitchers by handedness while using the Kolmogorov-Smirnov distance to compare distributions. Like Kalk's method, however, this approach only considers comparisons between pitches of the same type.
A difficulty for these methods is that different pitch types for a single pitcher or across multiple pitchers can have similar properties. This causes the pitch-frequency statistics used by similarity algorithms to depend heavily on the classification process; it also prevents the comparison of similar pitches that are classified as different pitch types.
In 2016, for example, Ubaldo Jimenez's sinker averaged 91.12 mph, -7.35 inches of horizontal movement and 8.53 inches of vertical movement, while Jeremy Hellickson's four-seam fastball had nearly identical averages of 90.81 mph, -7.63 inches of horizontal movement and 8.44 inches of vertical movement. Due to this issue, Loftus  conceded that his own method is best suited for comparing individual pitches as opposed to comparing pitchers based on their entire arsenal. Gennaro  has proposed a more qualitative approach to measuring pitcher similarity by using a hand-selected set of features and weightings. The features used by this method include a pitcher's two most-common pitch types and his most-common two-pitch sequence.
In this work, we develop a pitcher similarity measure that considers the speed and movement of every pitch. We note that other factors that are less indicative of a pitcher's raw stuff such as pitch location , sequencing , and deception  also play a role in determining performance.
Given pitch speed and movement, we can plot a pitch as a point in a cube. Using data from Brooks Baseball, for example, we can plot a thousand Jon Lester pitches from 2016 with the speed (s) in miles per hour, along with the horizontal and vertical movement parameters (x, z) in inches, and where different colors represent different pitch types:
Jon Lester pitches in 2016
We also can do this for 1,000 Chris Sale pitches:
Chris Sale pitches in 2016
Lester and Sale clearly have different pitch distributions. But how different are they?
Here's a puzzle: Suppose that each of Lester's pitches in the plot is a ten-pound weight. Without worrying about pitch types, move each of Lester's thousand pitches so that, as a group, they end up at the same location as Sale's thousand pitches. To make this more interesting, find the way to move the pitches that requires the least work.
Too busy to solve the puzzle right now? That's OK. There's an algorithm called the Earth Mover's Distance, or EMD , which can figure out the easiest way to move the pitches and how much work is required. The idea is that the less work that's needed to rearrange Lester's pitches to match Sale's pitches, the more similar the two pitchers are to each other. Even better, the EMD algorithm is efficient and can normalize the distributions so that we don't need the same number of pitches in each plot.
Things get more complicated because some paths are more difficult to traverse as we move pitches around in the cube. To be more specific, let's look at a plot of the speed and vertical movement (again, represented by s and z) for a large set of pitches from different pitchers in 2016.
Scatterplot of Speed and Vertical Movement
We see that s and z have a significant correlation, so that a pitch thrown with a higher speed will tend to have a higher vertical movement. This means that moving a pitch with the flow from the orange spot toward the red spot is easier than moving it against the flow toward the green spot. We can address this issue by combining a whitening transform  with the Earth Mover's Distance to account for both differences in the variances of the s, x and z variables and their correlation structure.
Since a pitcher's approach depends on batter handedness, we use the whitened EMD to compare pitchers separately based on their pitch distributions against right-handed and left-handed batters. The two values are then combined into a single measure of similarity. If you'd like more details  on how this all works, just follow the link.
We will demonstrate the similarity measure for several applications including the identification of similar and dissimilar pitchers, the identification of unique pitchers, the quantification of year-to-year pitcher stability, and the quantification of pitcher variation with batter handedness and the count. All analysis in this article uses the pitch data from Brooks Baseball and the associated pitch classifications from Pitch Info. Pitch speed will be given in miles per hour, and the x and z movement parameters  will be specified in inches.
For the 2016 season, we consider the 196 right-handed pitchers and the 63 left-handed pitchers who threw at least 1,000 pitches during the regular season. For each of these pitchers, the most similar pitcher and the corresponding distance can be found here . Smaller values of the distance correspond to more similar pitchers.
The most similar pair of right-handed pitchers in 2016 was Matt Harvey and Shelby Miller. Both threw four-seam fastballs with similar parameters (speed, horizontal movement and vertical movement) at similar frequencies. In particular, each pitcher threw 59-60 percent four-seamers to right-handed batters, and 56-57 percent four-seamers to left-handed batters, with Harvey averaging 95.39 mph and Miller averaging 94.15 mph on these pitches. We also note that Harvey's slider (89.51 mph, 0.90 inches of horizontal movement, 4.28 inches of vertical movement) was like Miller's cutter (89.41, 1.17, 3.89), and each pitcher used this respective pitch 25-26 percent of the time against right-handed batters. Similarity metrics that do not compare pitches of different type would be unaware of the similarity of these pitches.
The most similar pair of left-handed pitchers in 2016 was Jon Niese and Chris Rusin. The most frequent pitches for each left-hander against right-handed batters were their sinker and cutter, which they threw at similar frequencies and with similar properties. For their sinkers against RHB, we have 89.52 mph, 9.63 inches of horizontal movement, 4.30 inches of vertical movement at 27.2 percent frequency for Niese, and 90.32, 9.74, 4.88 and 24.4 percent frequency for Rusin. For their cutters against RHB, we have 86.74 mph, -0.30 inches of horizontal movement, 3.86 inches of vertical movement and 27.2 percent frequency for Niese, and 87.49, 1.62, 3.78, 29.9 percent) for Rusin. Each pitcher's most frequent pitch to left-handed batters was their sinker, which Niese threw 40.7 percent of the time and Rusin threw 38.8 percent.
The most dissimilar pair of right-handed pitchers in 2016 was Brad Ziegler and Marco Estrada, with a distance of 5.688. The difference largely was due to an extreme discrepancy in the vertical movement on their pitches. Ziegler threw 57.7 percent sinkers with an average vertical movement of -6.72 inches, while Estrada threw 50.1 percent four-seam fastballs with an average vertical movement of 13.01 inches. Ziegler had the smallest average vertical movement, -5.33 inches, over all of his pitches. Estrada had the highest vertical movement at 9.64 inches.
The most dissimilar pair of left-handed pitchers was Zach Britton and Tommy Milone, with a distance of 4.238. Britton threw more than 90 percent sinkers, averaging at least 97 mph and with 3.70 inches of vertical movement. Milone averaged only 88.19 mph on his hardest and most frequent pitch, a four-seam fastball, which he threw 45.5 percent of the time with an average vertical movement of 11.45 inches.
The similarity measure can also be used to find the most unique major league pitchers.The right-handed pitchers with the greatest distance to their most similar match in 2016 are:
Left-handed pitchers with the greatest distance to their most similar match in 2016:
Hard-throwing Aroldis Chapman fell short of the 1,000-pitch threshold, but would rank as the second-most unique left-hander behind Britton, with a distance of 1.5495 to the nearest left-hander Tony Cingrani.
The similarity structure for a group of pitchers can be visualized using non-metric multidimensional scaling . We use NMDS to visualize properties of the similarity measure for unique right-handed and left-handed pitchers. NMDS results for the ten most unique right-handed pitchers plus the two most prominent knuckleballers R.A. Dickey and Steven Wright is:
NMDS Result for Unique Right-handed Pitchers in 2016
The most unique right-hander, Brad Ziegler, is in the far upper right in the figure. Ziegler's uniqueness is largely due to throwing a large amount (57.7 percent) of sinkers with a low average velocity (84.74 mph) and heavy sink (-7.28 inches of vertical movement). The closest pitchers to Ziegler in the plot are Steve Cishek and Aaron Nola, who each threw 40-44 percent sinkers but at a higher velocity than Ziegler. The pitchers in the plot with the highest average velocity over their pitches (Rodney, McCullers, Shaw) are in the lower-right quadrant. In this group, Rodney appears closest to Cishek and Nola due to also throwing a high percentage of sinkers (39.1 percent), but the high vertical movement on his pitches, particularly his four-seam fastball, pulls him to the left of these two. Bryan Shaw has the highest average velocity among pitchers in the figure and appears at the lowest point in the plot.
To the left of Rodney is a group of three pitchers (Estrada, Young, Clippard) who displayed the highest average vertical movement on their pitches among the pitchers in the figure. This high vertical movement was largely achieved by throwing 45-51 percent four-seam fastballs. Above this group is Jered Weaver, who threw pitches with a high average vertical movement, but also had the lowest average pitch velocity in the plot among the non-knuckleballers. Dickey and Wright appear together above Weaver and, as shown here , the two knuckleballers are the best match for each other over the 196 right-handed pitchers in the data set. We see that the most dissimilar right-handed pitchers in the entire data set, Ziegler and Estrada, are also the most separated in the plot.
The NMDS result for the ten-most unique left-handed pitchers, plus Aroldis Chapman, is:
NMDS Result for Unique Left-handed Pitchers
The most unique left-hander, Zach Britton, is on the far-right edge of the plot. Britton achieved his uniqueness by throwing a high volume (92.0 percent) of very hard (97.44 mph) sinkers. The closest left-hander to Britton in the figure is Clayton Richard who also threw a high volume (65.0 percent) of sinkers but at a lower velocity (91.59 mph). To the left of Richard and farther removed from Britton is Zach Duke who also threw a large number of sinkers but at an even lower frequency (50.4 percent) and velocity (90.13 mph). The second-most unique left-hander in the group, Aroldis Chapman, who threw a lot (81.1 percent) of very hard (101.32 mph) four-seam fastballs appears at the lowest point on the plot.
On the left side of the figure are four left-handers (Milone, Lamb, Urias, Kershaw) who all favored the four-seam fastball with frequencies varying between 45.5 percent for Milone and 55.3 percent for Urias. The average four-seam velocity for the pitchers increases from top to bottom with mph values of 88.19 (Milone), 90.49 (Lamb), 93.32 (Urias) and 93.74 (Kershaw). To the right of these four pitchers are Drew Pomeranz and Rich Hill, who both complemented their four-seam fastball with a large percentage of curves with sharp downward movement. Hill is the closest pitcher to Andrew Miller in the plot. Since Miller's four-seam fastball is harder than Hill's, and Miller's most frequent off-speed pitch is a slider that is thrown substantially harder than's Hill's curve, Miller appears lower than Hill. We see that the most dissimilar left-handed pitchers in the full-data set, Britton and Milone, are also the most separated in the plot.
Pitchers with Small Year-to-Year Variation
We can use the similarity measure to compare pitchers to themselves over time. For this purpose, we computed the similarity measure between 2015 and 2016 for each pitcher who threw at least 1,000 pitches in each regular season.
Right-handed pitchers who changed the least between 2015 and 2016 (with their age as of June 30, 2016):
Many of the smallest changers are veterans, with 13 of the 20 pitchers in the tables being at least 30 years old at midseason 2016, and with all pitchers (except Carlos Rodon) being at least 26. Two of the smallest changers are the knuckleballers R.A. Dickey and Steven Wright. Unsurprisingly, Bartolo Colon is also one of the least-changing right-handers.
Pitchers with Large Year-to-Year Variation
Right-handed pitchers who changed the most between 2015 and 2016:
We see that these pitchers are younger than their more stable counterparts, with only three of the 20 pitchers being at least 30 years old at midseason 2016. Six of the 10 right-handers, and eight of the ten left-handers, improved their ERA from 2015 to 2016. Several of these pitchers (Phelps, Chavez, Montgomery, Hand, Pomeranz) went from starting in 2015 to relieving in 2016. Others near the top of the lists include Trevor Bauer and Kelvin Herrera, who made significant changes to their pitch mix  , along with James Paxton, who made a significant change to his pitching mechanics .
Pitchers with Small Platoon Distances
We can use our similarity measure to compute the difference between a pitcher's distribution of pitches against right-handed and left-handed batters. We considered all pitchers who threw at least 1000 pitches during the 2016 regular season.
Right-handed pitchers who changed the least with batter handedness:
Several of these pitchers relied heavily on a single pitch type. Reed (72.2 percent), Allen (63.3 percent) and Conley (65.5 percent) threw a large fraction of four-seam fastballs. Dickey (87.6 percent) and Wright (83.1 percent) threw a large fraction of knuckleballs, while Harris (66.4 percent cutter), Britton (92.0 percent sinker) and Miller (60.7 percent slider) also threw a large fraction of a single pitch type. Throwing a similar distribution of pitches to right-handed and left-handed batters is a characteristic of a pitcher's approach, but is not necessarily indicative of his platoon results. While several of the pitchers (Reed, McCullers, Dickey, Happ) who had a similar approach against right-handed and left-handed batters exhibited a very small wOBA platoon split, others (Young, DeSclafani) had large wOBA platoon splits.
Pitchers with Large Platoon Distances
Right-handed pitchers who changed the most with batter handedness:
None of the right-handers and only two of the left-handers (Rivero and Siegrist) who changed the most in response to batter handedness threw a single pitch type at least 60 percent of the time. Seven of the right-handers (Ziegler, Weaver, Iglesias, McGowan, Herrera, Ramos, Chacin) contributed to their platoon variation by throwing a significantly higher fraction of sliders to right-handed batters and a significantly higher fraction of changeups to left-handed batters. For the purposes of this analysis, “significantly” refers to a fraction that is higher by at least 10 percent. Similarly, four of the left-handers (Rivero, Watson, Manaea, Corbin) threw a significantly higher fraction of sliders to left-handed batters and a significantly higher fraction of changeups to right-handed batters.
Another popular strategy used by six of the pitchers who changed the most (Weaver, McGowan, Hand, Duffy, Siegrist, Corbin) was to throw a significantly higher fraction of four-seam fastballs to same-side batters, and a significantly higher fraction of sinkers to opposite-side batters. Right-hander Kyle Hendricks employed the opposite approach by throwing a significantly higher fraction of sinkers to right-handed batters, and a significantly higher fraction of four-seam fastballs to left-handed batters. Left-handers Milone and Hill enhanced their platoon variation by throwing a significantly higher fraction of curveballs to left-handed batters.
Pitchers with Small Changes after Two Strikes
We can use the similarity measure to compute how much a pitcher changes his distribution of pitches as the count changes. For each pitcher who threw at least 1,000 pitches in 2016, we computed the similarity measure between the pitcher's distribution of pitches thrown before two strikes and his distribution of pitches thrown after two strikes.
The two right-handers who changed the least (Grilli 62.4 percent four-seamer, Reed 72.2 percent four-seamer) and the two left-handers who changed the least (Britton 92 percent sinker, Buchter 84.7 percent four-seamer) each threw a large fraction of a single pitch type in 2016. In addition, several of the other pitchers in the two tables (Wright 83.1 percent knuckler, Quackenbush 63.2 percent four-seamer, Oh 60.6 percent four-seamer, Cingrani 87.4 percent four-seamer, Bastardo 65.5 percent four-seamer) each threw over 60 percent of a single pitch type in 2016.
Pitchers with Large Changes after Two Strikes
The right-handed and left-handed pitchers who changed the most after reaching two strikes in 2016 are listed below. Each of these pitchers threw a significantly higher fraction of a particular breaking ball with two strikes. The pitch with the largest increase in frequency after two strikes over all batters faced is referred to as the Delta Pitch in the lists. The Δf column indicates how much more frequently a pitcher threw the Delta Pitch after two strikes as compared to before two strikes. Brad Ziegler, for example, threw his slider 10.16 percent of the time before two strikes and 40.45 percent of the time after two strikes for a Δf of 30.29 percent.
Among the pitchers in the lists with smaller values of Δf for their Delta Pitch, Fiers (six pitch types) and Darvish (seven pitch types) had a large set of possible pitch types with which to adjust frequencies. Left-handers Kershaw and Snell used a higher fraction of sliders with two strikes in addition to a higher fraction of their Delta Pitch curveballs.
We have developed a new tool that analysts can exploit to study a range of application areas. The similarity measure allows the direct comparison of pitchers across various contexts including MLB, MiLB, amateur and foreign leagues which can improve predictions for how a pitcher will perform in a new environment. The identification of similar pitchers increases the sample sizes that can be used to forecast the outcome of batter/pitcher matchups and supports regression to more appropriate population means by projection models. The measure also can be used to monitor pitchers over time, and to develop improved models for the health risk and aging characteristics associated with different pitcher classes.
For fans the new tool reveals similarities that we didn't know existed and shows us, once again, that there's more than one way to find success as a major-league pitcher.
The authors thank Tom Tango and Mitchel Lichtman for helpful comments on a previous draft of this article. All pitch data used in this study was obtained from Brooks Baseball.
 R. Duda, P. Hart and D. Stork. Pattern Classification. Wiley-Interscience, New York, 2001.
 A. Fagerstrom. (June 24, 2016). FanGraphs: Trevor Bauer looks like a completely different pitcher.
 V. Gennaro. The Big Data approach to baseball analytics. In SABR Analytics Conference, Phoenix, AZ, March 2013.
 G. Healey and S. Zhao. Using PITCHf/x to model the dependence of strikeout rate on the predictability of pitch sequences. Journal of Sports Analytics, 2017.
 G. Healey, S. Zhao and D. Brooks. Measuring pitcher similarity: Technical details.
 G. Healey, S. Zhao and D. Brooks. Most similar match tables, 2016.
 J. Kalk. (Feb. 12, 2008). Hardball Times: Pitcher similarity scores.
 J. Kalk. (Feb. 19, 2008). Hardball Times: Pitcher similarity scores (part 2).
 J. Kruskal. Multidimensional scaling by optimizing goodness of fit to a non-metric hypothesis. Psychometrika, 29:1-27, 1964.
 S. Loftus. (Apr. 15, 2013). Beyond the Box Score: Pitcher similarity scores.
 S. Loftus. (Apr. 25, 2013). Beyond the Box Score: Testing and visualizing similarity scores.
 S. Loftus. (Nov. 25, 2013). Beyond the Box Score: Pitcher similarity scores 2.0.
 A. Nathan. (Oct. 21, 2012). Determining pitch movement from PITCHf/x data.
 Y. Rubner, C. Tomasi and L. Guibas. The Earth Mover's Distance as a metric for image retrieval. International Journal of Computer Vision, 40(2):99-121, 2000.
 J. Sullivan. (April 13, 2016). FanGraphs: Now Kelvin Herrera is almost impossible.
 T. Tango, M. Lichtman and A. Dolphin. The Book: Playing the Percentages in Baseball. Potomac Books, Dulles, Virgina, 2007.