BP Comment Quick Links
July 10, 2017 Prospectus FeatureMeasuring Pitcher SimilarityThe PITCHf/x optical video and TrackMan Doppler radar sensors estimate parameters of pitches, including the speed, horizontal movement and vertical movement. The data recorded by these systems can be used to develop pitcher similarity measures. These measures are valuable not only for comparing majorleague pitchers to each other, but also for allowing the direct comparison of pitchers in other leagues (minor, amateur and foreign) to their MLB counterparts. A pitcher similarity measure can be employed for multiple purposes by analysts. The identification of groups of similar pitchers can be used to generate optimized projection models [18], or to generate larger samples for predicting the outcome of batter/pitcher matchups [3], [20]. In addition, a similarity measure allows for individual pitchers to be monitored over time in order to detect possible changes in pitch characteristics, health and throwing mechanics. Previous methods for quantifying pitcher similarity have been limited to the comparison of pitches of the same type, which makes these methods highly dependent on the outcome of pitchclassification algorithms. Kalk [8], [9] developed a similarity measure that compared pitches of the same type using variables that included pitch frequency, speed and movement. Loftus [11], [12], [13] improved on Kalk's approach by separating pitchers by handedness while using the KolmogorovSmirnov distance to compare distributions. Like Kalk's method, however, this approach only considers comparisons between pitches of the same type. A difficulty for these methods is that different pitch types for a single pitcher or across multiple pitchers can have similar properties. This causes the pitchfrequency statistics used by similarity algorithms to depend heavily on the classification process; it also prevents the comparison of similar pitches that are classified as different pitch types. In 2016, for example, Ubaldo Jimenez's sinker averaged 91.12 mph, 7.35 inches of horizontal movement and 8.53 inches of vertical movement, while Jeremy Hellickson's fourseam fastball had nearly identical averages of 90.81 mph, 7.63 inches of horizontal movement and 8.44 inches of vertical movement. Due to this issue, Loftus [13] conceded that his own method is best suited for comparing individual pitches as opposed to comparing pitchers based on their entire arsenal. Gennaro [3] has proposed a more qualitative approach to measuring pitcher similarity by using a handselected set of features and weightings. The features used by this method include a pitcher's two mostcommon pitch types and his mostcommon twopitch sequence. In this work, we develop a pitcher similarity measure that considers the speed and movement of every pitch. We note that other factors that are less indicative of a pitcher's raw stuff such as pitch location [4], sequencing [5], and deception [14] also play a role in determining performance. Given pitch speed and movement, we can plot a pitch as a point in a cube. Using data from Brooks Baseball, for example, we can plot a thousand Jon Lester pitches from 2016 with the speed (s) in miles per hour, along with the horizontal and vertical movement parameters (x, z) in inches, and where different colors represent different pitch types:
Jon Lester pitches in 2016
We also can do this for 1,000 Chris Sale pitches:
Chris Sale pitches in 2016
Lester and Sale clearly have different pitch distributions. But how different are they?
Here's a puzzle: Suppose that each of Lester's pitches in the plot is a tenpound weight. Without worrying about pitch types, move each of Lester's thousand pitches so that, as a group, they end up at the same location as Sale's thousand pitches. To make this more interesting, find the way to move the pitches that requires the least work. Too busy to solve the puzzle right now? That's OK. There's an algorithm called the Earth Mover's Distance, or EMD [16], which can figure out the easiest way to move the pitches and how much work is required. The idea is that the less work that's needed to rearrange Lester's pitches to match Sale's pitches, the more similar the two pitchers are to each other. Even better, the EMD algorithm is efficient and can normalize the distributions so that we don't need the same number of pitches in each plot.
Things get more complicated because some paths are more difficult to traverse as we move pitches around in the cube. To be more specific, let's look at a plot of the speed and vertical movement (again, represented by s and z) for a large set of pitches from different pitchers in 2016.
Scatterplot of Speed and Vertical Movement
We see that s and z have a significant correlation, so that a pitch thrown with a higher speed will tend to have a higher vertical movement. This means that moving a pitch with the flow from the orange spot toward the red spot is easier than moving it against the flow toward the green spot. We can address this issue by combining a whitening transform [1] with the Earth Mover's Distance to account for both differences in the variances of the s, x and z variables and their correlation structure.
Since a pitcher's approach depends on batter handedness, we use the whitened EMD to compare pitchers separately based on their pitch distributions against righthanded and lefthanded batters. The two values are then combined into a single measure of similarity. If you'd like more details [6] on how this all works, just follow the link.
Data Analysis We will demonstrate the similarity measure for several applications including the identification of similar and dissimilar pitchers, the identification of unique pitchers, the quantification of yeartoyear pitcher stability, and the quantification of pitcher variation with batter handedness and the count. All analysis in this article uses the pitch data from Brooks Baseball and the associated pitch classifications from Pitch Info. Pitch speed will be given in miles per hour, and the x and z movement parameters [15] will be specified in inches. Similar Pitchers For the 2016 season, we consider the 196 righthanded pitchers and the 63 lefthanded pitchers who threw at least 1,000 pitches during the regular season. For each of these pitchers, the most similar pitcher and the corresponding distance can be found here [7]. Smaller values of the distance correspond to more similar pitchers. The most similar pair of righthanded pitchers in 2016 was Matt Harvey and Shelby Miller. Both threw fourseam fastballs with similar parameters (speed, horizontal movement and vertical movement) at similar frequencies. In particular, each pitcher threw 5960 percent fourseamers to righthanded batters, and 5657 percent fourseamers to lefthanded batters, with Harvey averaging 95.39 mph and Miller averaging 94.15 mph on these pitches. We also note that Harvey's slider (89.51 mph, 0.90 inches of horizontal movement, 4.28 inches of vertical movement) was like Miller's cutter (89.41, 1.17, 3.89), and each pitcher used this respective pitch 2526 percent of the time against righthanded batters. Similarity metrics that do not compare pitches of different type would be unaware of the similarity of these pitches.
The most similar pair of lefthanded pitchers in 2016 was Jon Niese and Chris Rusin. The most frequent pitches for each lefthander against righthanded batters were their sinker and cutter, which they threw at similar frequencies and with similar properties. For their sinkers against RHB, we have 89.52 mph, 9.63 inches of horizontal movement, 4.30 inches of vertical movement at 27.2 percent frequency for Niese, and 90.32, 9.74, 4.88 and 24.4 percent frequency for Rusin. For their cutters against RHB, we have 86.74 mph, 0.30 inches of horizontal movement, 3.86 inches of vertical movement and 27.2 percent frequency for Niese, and 87.49, 1.62, 3.78, 29.9 percent) for Rusin. Each pitcher's most frequent pitch to lefthanded batters was their sinker, which Niese threw 40.7 percent of the time and Rusin threw 38.8 percent. Dissimilar Pitchers The most dissimilar pair of righthanded pitchers in 2016 was Brad Ziegler and Marco Estrada, with a distance of 5.688. The difference largely was due to an extreme discrepancy in the vertical movement on their pitches. Ziegler threw 57.7 percent sinkers with an average vertical movement of 6.72 inches, while Estrada threw 50.1 percent fourseam fastballs with an average vertical movement of 13.01 inches. Ziegler had the smallest average vertical movement, 5.33 inches, over all of his pitches. Estrada had the highest vertical movement at 9.64 inches. The most dissimilar pair of lefthanded pitchers was Zach Britton and Tommy Milone, with a distance of 4.238. Britton threw more than 90 percent sinkers, averaging at least 97 mph and with 3.70 inches of vertical movement. Milone averaged only 88.19 mph on his hardest and most frequent pitch, a fourseam fastball, which he threw 45.5 percent of the time with an average vertical movement of 11.45 inches. Unique Pitchers The similarity measure can also be used to find the most unique major league pitchers.The righthanded pitchers with the greatest distance to their most similar match in 2016 are:
Lefthanded pitchers with the greatest distance to their most similar match in 2016:
Hardthrowing Aroldis Chapman fell short of the 1,000pitch threshold, but would rank as the secondmost unique lefthander behind Britton, with a distance of 1.5495 to the nearest lefthander Tony Cingrani.
Visualizing Similarity
The similarity structure for a group of pitchers can be visualized using nonmetric multidimensional scaling [10]. We use NMDS to visualize properties of the similarity measure for unique righthanded and lefthanded pitchers. NMDS results for the ten most unique righthanded pitchers plus the two most prominent knuckleballers R.A. Dickey and Steven Wright is:
NMDS Result for Unique Righthanded Pitchers in 2016
The most unique righthander, Brad Ziegler, is in the far upper right in the figure. Ziegler's uniqueness is largely due to throwing a large amount (57.7 percent) of sinkers with a low average velocity (84.74 mph) and heavy sink (7.28 inches of vertical movement). The closest pitchers to Ziegler in the plot are Steve Cishek and Aaron Nola, who each threw 4044 percent sinkers but at a higher velocity than Ziegler. The pitchers in the plot with the highest average velocity over their pitches (Rodney, McCullers, Shaw) are in the lowerright quadrant. In this group, Rodney appears closest to Cishek and Nola due to also throwing a high percentage of sinkers (39.1 percent), but the high vertical movement on his pitches, particularly his fourseam fastball, pulls him to the left of these two. Bryan Shaw has the highest average velocity among pitchers in the figure and appears at the lowest point in the plot.
To the left of Rodney is a group of three pitchers (Estrada, Young, Clippard) who displayed the highest average vertical movement on their pitches among the pitchers in the figure. This high vertical movement was largely achieved by throwing 4551 percent fourseam fastballs. Above this group is Jered Weaver, who threw pitches with a high average vertical movement, but also had the lowest average pitch velocity in the plot among the nonknuckleballers. Dickey and Wright appear together above Weaver and, as shown here [7], the two knuckleballers are the best match for each other over the 196 righthanded pitchers in the data set. We see that the most dissimilar righthanded pitchers in the entire data set, Ziegler and Estrada, are also the most separated in the plot. The NMDS result for the tenmost unique lefthanded pitchers, plus Aroldis Chapman, is:
NMDS Result for Unique Lefthanded Pitchers
The most unique lefthander, Zach Britton, is on the farright edge of the plot. Britton achieved his uniqueness by throwing a high volume (92.0 percent) of very hard (97.44 mph) sinkers. The closest lefthander to Britton in the figure is Clayton Richard who also threw a high volume (65.0 percent) of sinkers but at a lower velocity (91.59 mph). To the left of Richard and farther removed from Britton is Zach Duke who also threw a large number of sinkers but at an even lower frequency (50.4 percent) and velocity (90.13 mph). The secondmost unique lefthander in the group, Aroldis Chapman, who threw a lot (81.1 percent) of very hard (101.32 mph) fourseam fastballs appears at the lowest point on the plot. On the left side of the figure are four lefthanders (Milone, Lamb, Urias, Kershaw) who all favored the fourseam fastball with frequencies varying between 45.5 percent for Milone and 55.3 percent for Urias. The average fourseam velocity for the pitchers increases from top to bottom with mph values of 88.19 (Milone), 90.49 (Lamb), 93.32 (Urias) and 93.74 (Kershaw). To the right of these four pitchers are Drew Pomeranz and Rich Hill, who both complemented their fourseam fastball with a large percentage of curves with sharp downward movement. Hill is the closest pitcher to Andrew Miller in the plot. Since Miller's fourseam fastball is harder than Hill's, and Miller's most frequent offspeed pitch is a slider that is thrown substantially harder than's Hill's curve, Miller appears lower than Hill. We see that the most dissimilar lefthanded pitchers in the fulldata set, Britton and Milone, are also the most separated in the plot. Pitchers with Small YeartoYear Variation We can use the similarity measure to compare pitchers to themselves over time. For this purpose, we computed the similarity measure between 2015 and 2016 for each pitcher who threw at least 1,000 pitches in each regular season. Righthanded pitchers who changed the least between 2015 and 2016 (with their age as of June 30, 2016):
Lefthanders:
Many of the smallest changers are veterans, with 13 of the 20 pitchers in the tables being at least 30 years old at midseason 2016, and with all pitchers (except Carlos Rodon) being at least 26. Two of the smallest changers are the knuckleballers R.A. Dickey and Steven Wright. Unsurprisingly, Bartolo Colon is also one of the leastchanging righthanders.
Pitchers with Large YeartoYear Variation Righthanded pitchers who changed the most between 2015 and 2016:
Lefthanders:
We see that these pitchers are younger than their more stable counterparts, with only three of the 20 pitchers being at least 30 years old at midseason 2016. Six of the 10 righthanders, and eight of the ten lefthanders, improved their ERA from 2015 to 2016. Several of these pitchers (Phelps, Chavez, Montgomery, Hand, Pomeranz) went from starting in 2015 to relieving in 2016. Others near the top of the lists include Trevor Bauer and Kelvin Herrera, who made significant changes to their pitch mix [2] [19], along with James Paxton, who made a significant change to his pitching mechanics [17].
Pitchers with Small Platoon Distances We can use our similarity measure to compute the difference between a pitcher's distribution of pitches against righthanded and lefthanded batters. We considered all pitchers who threw at least 1000 pitches during the 2016 regular season. Righthanded pitchers who changed the least with batter handedness:
Lefthanders:
Several of these pitchers relied heavily on a single pitch type. Reed (72.2 percent), Allen (63.3 percent) and Conley (65.5 percent) threw a large fraction of fourseam fastballs. Dickey (87.6 percent) and Wright (83.1 percent) threw a large fraction of knuckleballs, while Harris (66.4 percent cutter), Britton (92.0 percent sinker) and Miller (60.7 percent slider) also threw a large fraction of a single pitch type. Throwing a similar distribution of pitches to righthanded and lefthanded batters is a characteristic of a pitcher's approach, but is not necessarily indicative of his platoon results. While several of the pitchers (Reed, McCullers, Dickey, Happ) who had a similar approach against righthanded and lefthanded batters exhibited a very small wOBA platoon split, others (Young, DeSclafani) had large wOBA platoon splits.
Pitchers with Large Platoon Distances
Righthanded pitchers who changed the most with batter handedness:
Lefthanders:
None of the righthanders and only two of the lefthanders (Rivero and Siegrist) who changed the most in response to batter handedness threw a single pitch type at least 60 percent of the time. Seven of the righthanders (Ziegler, Weaver, Iglesias, McGowan, Herrera, Ramos, Chacin) contributed to their platoon variation by throwing a significantly higher fraction of sliders to righthanded batters and a significantly higher fraction of changeups to lefthanded batters. For the purposes of this analysis, “significantly” refers to a fraction that is higher by at least 10 percent. Similarly, four of the lefthanders (Rivero, Watson, Manaea, Corbin) threw a significantly higher fraction of sliders to lefthanded batters and a significantly higher fraction of changeups to righthanded batters. Another popular strategy used by six of the pitchers who changed the most (Weaver, McGowan, Hand, Duffy, Siegrist, Corbin) was to throw a significantly higher fraction of fourseam fastballs to sameside batters, and a significantly higher fraction of sinkers to oppositeside batters. Righthander Kyle Hendricks employed the opposite approach by throwing a significantly higher fraction of sinkers to righthanded batters, and a significantly higher fraction of fourseam fastballs to lefthanded batters. Lefthanders Milone and Hill enhanced their platoon variation by throwing a significantly higher fraction of curveballs to lefthanded batters. Pitchers with Small Changes after Two Strikes We can use the similarity measure to compute how much a pitcher changes his distribution of pitches as the count changes. For each pitcher who threw at least 1,000 pitches in 2016, we computed the similarity measure between the pitcher's distribution of pitches thrown before two strikes and his distribution of pitches thrown after two strikes. Righthanders:
Lefthanders:
The two righthanders who changed the least (Grilli 62.4 percent fourseamer, Reed 72.2 percent fourseamer) and the two lefthanders who changed the least (Britton 92 percent sinker, Buchter 84.7 percent fourseamer) each threw a large fraction of a single pitch type in 2016. In addition, several of the other pitchers in the two tables (Wright 83.1 percent knuckler, Quackenbush 63.2 percent fourseamer, Oh 60.6 percent fourseamer, Cingrani 87.4 percent fourseamer, Bastardo 65.5 percent fourseamer) each threw over 60 percent of a single pitch type in 2016.
Pitchers with Large Changes after Two Strikes The righthanded and lefthanded pitchers who changed the most after reaching two strikes in 2016 are listed below. Each of these pitchers threw a significantly higher fraction of a particular breaking ball with two strikes. The pitch with the largest increase in frequency after two strikes over all batters faced is referred to as the Delta Pitch in the lists. The Δf column indicates how much more frequently a pitcher threw the Delta Pitch after two strikes as compared to before two strikes. Brad Ziegler, for example, threw his slider 10.16 percent of the time before two strikes and 40.45 percent of the time after two strikes for a Δf of 30.29 percent. Righthanders:
Lefthanders:
Among the pitchers in the lists with smaller values of Δf for their Delta Pitch, Fiers (six pitch types) and Darvish (seven pitch types) had a large set of possible pitch types with which to adjust frequencies. Lefthanders Kershaw and Snell used a higher fraction of sliders with two strikes in addition to a higher fraction of their Delta Pitch curveballs. Conclusion We have developed a new tool that analysts can exploit to study a range of application areas. The similarity measure allows the direct comparison of pitchers across various contexts including MLB, MiLB, amateur and foreign leagues which can improve predictions for how a pitcher will perform in a new environment. The identification of similar pitchers increases the sample sizes that can be used to forecast the outcome of batter/pitcher matchups and supports regression to more appropriate population means by projection models. The measure also can be used to monitor pitchers over time, and to develop improved models for the health risk and aging characteristics associated with different pitcher classes. For fans the new tool reveals similarities that we didn't know existed and shows us, once again, that there's more than one way to find success as a majorleague pitcher.
Acknowledgment The authors thank Tom Tango and Mitchel Lichtman for helpful comments on a previous draft of this article. All pitch data used in this study was obtained from Brooks Baseball. References [1] R. Duda, P. Hart and D. Stork. Pattern Classification. WileyInterscience, New York, 2001. [2] A. Fagerstrom. (June 24, 2016). FanGraphs: Trevor Bauer looks like a completely different pitcher. [3] V. Gennaro. The Big Data approach to baseball analytics. In SABR Analytics Conference, Phoenix, AZ, March 2013. [4] G. Healey. The intrinsic value of a pitch. In SABR Analytics Conference, Phoenix, AZ, March 2017. [5] G. Healey and S. Zhao. Using PITCHf/x to model the dependence of strikeout rate on the predictability of pitch sequences. Journal of Sports Analytics, 2017. [6] G. Healey, S. Zhao and D. Brooks. Measuring pitcher similarity: Technical details. [7] G. Healey, S. Zhao and D. Brooks. Most similar match tables, 2016. [8] J. Kalk. (Feb. 12, 2008). Hardball Times: Pitcher similarity scores. [9] J. Kalk. (Feb. 19, 2008). Hardball Times: Pitcher similarity scores (part 2). [10] J. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29:127, 1964. [11] S. Loftus. (Apr. 15, 2013). Beyond the Box Score: Pitcher similarity scores. [12] S. Loftus. (Apr. 25, 2013). Beyond the Box Score: Testing and visualizing similarity scores. [13] S. Loftus. (Nov. 25, 2013). Beyond the Box Score: Pitcher similarity scores 2.0. [14] J. Long, J. Judge and H. Pavlidis. (Jan. 24, 2017). Baseball Prospectus: Introducing pitch tunnels. [15] A. Nathan. (Oct. 21, 2012). Determining pitch movement from PITCHf/x data. [16] Y. Rubner, C. Tomasi and L. Guibas. The Earth Mover's Distance as a metric for image retrieval. International Journal of Computer Vision, 40(2):99121, 2000. [17] E. Sarris. (June 9, 2016). FanGraphs: James Paxton's new angle on life. [18] N. Silver. Why was Kevin Maas a bust? In J. Keri, editor, Baseball between the numbers, pages 253271. Basic Books, New York, 2006. [19] J. Sullivan. (April 13, 2016). FanGraphs: Now Kelvin Herrera is almost impossible. [20] T. Tango, M. Lichtman and A. Dolphin. The Book: Playing the Percentages in Baseball. Potomac Books, Dulles, Virgina, 2007.
Dan Brooks is an author of Baseball Prospectus. Follow @brooksbaseball
10 comments have been left for this article.

Wow.