Creating a tool that considers the speed and movement of every pitch, the similarity measure allows the direct comparison of pitchers across various contexts.
The PITCHf/x optical video and TrackMan Doppler radar sensors estimate parameters of pitches, including the speed, horizontal movement and vertical movement. The data recorded by these systems can be used to develop pitcher similarity measures. These measures are valuable not only for comparing major-league pitchers to each other, but also for allowing the direct comparison of pitchers in other leagues (minor, amateur and foreign) to their MLB counterparts.
A pitcher similarity measure can be employed for multiple purposes by analysts. The identification of groups of similar pitchers can be used to generate optimized projection models , or to generate larger samples for predicting the outcome of batter/pitcher matchups , . In addition, a similarity measure allows for individual pitchers to be monitored over time in order to detect possible changes in pitch characteristics, health and throwing mechanics.
Previous methods for quantifying pitcher similarity have been limited to the comparison of pitches of the same type, which makes these methods highly dependent on the outcome of pitch-classification algorithms. Kalk ,  developed a similarity measure that compared pitches of the same type using variables that included pitch frequency, speed and movement. Loftus , ,  improved on Kalk's approach by separating pitchers by handedness while using the Kolmogorov-Smirnov distance to compare distributions. Like Kalk's method, however, this approach only considers comparisons between pitches of the same type.
A difficulty for these methods is that different pitch types for a single pitcher or across multiple pitchers can have similar properties. This causes the pitch-frequency statistics used by similarity algorithms to depend heavily on the classification process; it also prevents the comparison of similar pitches that are classified as different pitch types.
In 2016, for example, Ubaldo Jimenez's sinker averaged 91.12 mph, -7.35 inches of horizontal movement and 8.53 inches of vertical movement, while Jeremy Hellickson's four-seam fastball had nearly identical averages of 90.81 mph, -7.63 inches of horizontal movement and 8.44 inches of vertical movement. Due to this issue, Loftus  conceded that his own method is best suited for comparing individual pitches as opposed to comparing pitchers based on their entire arsenal. Gennaro  has proposed a more qualitative approach to measuring pitcher similarity by using a hand-selected set of features and weightings. The features used by this method include a pitcher's two most-common pitch types and his most-common two-pitch sequence.
In this work, we develop a pitcher similarity measure that considers the speed and movement of every pitch. We note that other factors that are less indicative of a pitcher's raw stuff such as pitch location , sequencing , and deception  also play a role in determining performance.
Most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers, and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.
Jonathan Hale has been using PITCHf/x to answer baseball questions all over the 'net since 2008. He once missed a Duane Ward curveball by three feet. You can read his writing about the Blue Jays or tell him what he should be analyzing next at The Mockingbird.
The advent and adoption of the PITCHf/x system has changed the way we scout pitchers, and more advances are still to come.
Believe it or not, most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.
Adam is the founder of Project Prospect, a scouting and statistical analysis website. He has been writing about baseball since 2006, when he began covering college baseball and conducting quantitative analysis of minor-league prospects.
Dissecting a day at the office for the Mets' Johan Santana.
Due to local blackout rules and the lack of a land-line phone capable of proving that my Penn State University residence was not in Philadelphia, I relied on MLB Gameday instead of MLB TV for a good chunk of the 2007 season. The application had been around for a while, but I soon noticed strange terminology and new data accompanying each pitch. Why are there two velocity readings? What does 13" of pFX mean? And what the heck is BRK? A little research soon made sense of the information, and within a few months I became hooked on the data set known as Pitch-f/x. Fast-forward two years, and Pitch-f/x continues to evolve, revolutionizing baseball research in the process. Unfortunately, with updates to system configurations and the amount of information offered, too many readers and baseball fans experience confused reactions similar to mine when they first encounter the data. In an attempt to quash this issue, it seemed prudent to explain some of the more commonly used numbers, discussing what they mean as well as how they should be used. Instead of merely defining terms, the system will be explored in action, with periodic discussions of its inner workings, much as Dan Fox did back in May 2007.
The rest of this article is restricted to Baseball Prospectus Subscribers.
Not a subscriber?
Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get access to the best baseball content on the web.
How do starters who throw particularly high pitch-count initial innings perform subsequently?
Delivering to the dish with a 2-2 count, Wandy Rodriguez hit the outside corner with a 91 mph fastball with which Edgar Renteria could do nothing but whiff. This heater happened to be the 55th pitch that Rodriguez threw in the inning on August 1, 2007. While the pitch brought the inning to a close, it simultaneously placed Rodriguez atop a list of the pitchers who had thrown the most pitches in a single inning. Compiled by Retrosheet's David Smith and posted on the Inside the Book blog, the list is composed of the pitchers with the most pitches thrown in an inning from 2004-2007.
I decided to examine the Pitch F/X for Wandy's game. Analyzing the velocity and movement of Rodriguez's fastball, I was surprised to find that his fastball sustained its velocity and "bite" as he went deeper into the inning. However, during the rest of the game things changed a bit. In the second inning, his velocity lost three miles per hour, but his movement increased. It has been theorized before that some pitchers may throw with more movement when they tire due to a dropping of their arm angle; perhaps this happened here, as Wandy lost velocity but threw with more movement.