June 14, 2007
The Science and Art of Building a Better Pitcher Profile
"Hell, yeah, I want to throw that pitch. They don't let me, though. They tell me I'm too young, that it's bad for my elbow. I told them I want to throw it."
"King" Felix Hernandez embodies the very definition of the word "phenom." At age 14 he reportedly threw 94 mph, and in 2005 the 19 year-old unleashed his arsenal against the American League. He promptly retired 14 of the last 16 Tigers he faced in his major league debut; five days later he tossed eight shutout innings against the Twins. He struck out 11 batters in his next outing, and then took a no-hitter into the seventh inning a few starts later at Toronto. When the dust settled he finished his rookie campaign, he'd held opponents to a batting average of .203 to go with 77 strike outs and 23 walks in 84 1/3 innings, a WHIP below 1.0, and a ridiculous 3.1 groundball/flyball ratio. A very slow start (his ERA was 5.78 before June rolled around) in 2006 had Mariners fans getting concerned, but a strong second half would round out a solid sophomore season that only seemed disappointing given his 2005 performance. In 2006 he maintained his strikeout and walk rates, although his home run rate certainly climbed, and his propensity to induce grounders fell, although it was still good enough to crack the top 10.
So, coming into 2007, hopes were understandably high, and the King, now throwing his slider, didn't disappoint. In his April 2nd start he threw eight shutout innings, giving up only three hits and striking out 12 A's. The next time out he blew away the Red Sox and Daisuke Matsuzaka, one-hitting them in a complete game shutout. At that point he had recorded 30 ground-outs to just four flyball outs. All of us know what happened next--he removed himself from his start on April 18the after just 20 pitches and a third of an inning because of tightness in his elbow. After a stint on the DL he has since made six less-than-dominating starts, and his ERA has risen to 4.41.
Hernandez fascinating to watch because of his stuff and repertoire, not to mention his battle to shake off the elbow problem, but it turns out that he's also a pitcher for whom we have some fairly good data in the PITCHf/x system for 2007. So today we'll use King Felix as a case study on mining that data to create a pitcher profile by asking and answering four basic questions: What does he throw, when does he throw it, where does he throw it, and what happens when he throws it.
1. What Does He Throw?
The PITCHf/x system has caught up with Felix in five of his starts, including the April 2nd gem against the A's, the April 18th abbreviated start against the Twins, the May 20th game versus San Diego, the June 4th start against Baltimore, and his most recent start on June 10th in San Diego. Taken together, this encompasses 415 of the approximately 800 pitches he's thrown this season. As you can imagine, our first task then is to identify and classify those pitches.
I mentioned in previous articles that the data tracked includes vectors for both horizontal and vertical break. As Joe P. Sheehan has shown, this data, when joined with the initial velocity of the pitch, can act as a starting point for visual pitch classification. Below is a plot of the horizontal (from the perspective of the hitter) and vertical break in inches of all of Hernandez's pitches, colored by starting speed:
You can see from the plot that there are essentially three groupings, or clusters, of pitches. In the upper left we have pitches with a positive vertical break and negative horizontal break (with "break" defined as the difference between the ending location of the pitch and the same pitch thrown with no spin along the respective axis). Judging from the speed--the majority of the pitches are at 90 mph and above--you can probably guess that most of these are fastballs. The positive vertical "break" is realized since a fastball doesn't drop as much as a pitch thrown with less velocity and less backspin. Robert K. Adair discussed this in The Physics of Baseball and noted that a 95 mph fastball (typically a four-seamer) thrown with a backspin of perhaps 1200 rpm would "hop" roughly four inches more than a typical 90 mph fastball. The hop is really an illusion created by the fact that the pitch does not drop as much as the batter expects. In fact, all pitches follow a smooth arc from the pitcher's hand to the catcher's glove; even Hernandez's four-seam fastball must drop an average of two and half feet on its way to the plate. A two-seam sinking fastball delivered without as much backspin may drop four to six inches less than a hopping fastball; this would be indicated by vertical break measurements that are a little lower.
The negative horizontal break shows that the pitch tails in to right-handed hitters, because the ball is released at an angle that creates side spin. This side spin lowers the pressure on the side of the ball where the velocity is greater, causing the so-called Magnus force. For a right-handed pitcher like Felix, the side of the ball facing third base will experience lower pressure, so the ball is pushed towards that side of the plate. However, you'll also notice that there are a few pitches here that are slower, even down to between 85 and 90 mph, indicating that we're dealing with an off-speed pitch that moves away from a left-hander.
In the middle of the plot we see a collection of pitches that have a slightly positive motion and move in to a left-handed hitter with a vertical break centered near zero. Since a fastball has positive vertical break, a value closer to zero actually indicates a pitch that sinks more than might be expected. What we have then is a sinking pitch that moves away from a right-hander thrown by a right-handed pitcher; at that velocity it's likely a cutter or a slider.
Finally, we see a cluster of pitches in the lower right-hand corner that definitely move down and away from a right-handed hitter at lesser velocities. These indicate, as you may have guessed, more classic curveballs.
Although this is a good start, we can do better since we're dealing with a specific pitcher who has been carefully observed over his short career. To aid my observations, I turned to Mariners blogger David Cameron. Having charted many of the Hernandez's starts last year, David is familiar with King Felix's repertoire. He feels that Hernandez throws five pitches with the following characteristics:
Using these observations along with David's charting of the June 10th start, I was able to create a simple algorithm to classify each of the 415 pitches that Hernandez has thrown this season. That algorithm not only takes into consideration starting velocity and vertical and horizontal break, but it turns out that the break angle is also important (although a clear definition of just what break angle is measuring is apparently elusive at this point). I then ran the program against the June 10th start, and found that David and I agree on 87 of the 91 pitches that were recorded (two pitches to Marcus Giles in the fifth inning were not recorded). To improve this effort other attributes, including release point and total break length, may also be included; for now, here are the "profiles" of the five pitches in his repertoire, given as average values across all pitches classified under each pitch type.
Pitch Type Count Break Start pFX Vert Horiz BrA RPC RPG Four-Seamer 168 5.55 96.98 11.44 7.57 -8.18 35.04 -2.18 5.83 Two-Seamer 51 6.17 93.84 10.03 6.40 -7.41 28.15 -2.18 5.82 Slider 92 8.40 90.53 3.26 0.98 1.14 -6.43 -2.23 5.67 Change 46 8.34 86.60 9.45 4.14 -8.22 22.62 -2.39 5.92 Curve 53 13.23 84.11 8.86 -7.24 4.53 -10.11 -2.23 5.94 Not Classified 5 22.28 62.58 7.66 4.88 -0.95 0.48 -2.79 7.13
Break is the greatest distance between the actual trajectory of the ball and the straight line from the release to the plate, Start is the starting velocity, pFX is a different measure of break that is the hypotenuse of the right triangle formed by the vertical and horizontal components, BrA is the break angle in degrees, RPC is the release point in feet from center that tracks the release point of the pitch with negative values towards the third base side of the rubber, and RPG is the release point from above ground in feet.
So here we see that his four-seam fastball is thrown the hardest (Start), has the small vertical drop (Vert), and tails into right-handed hitters (Horiz). His two-seamer, by comparison, is not thrown quite as hard, drops more, and doesn't tail quite as much. His slider has a much healthier drop (five to six inches more than the fastballs) and moves away from a right-handed hitter. The changeup is characterized by an even lower velocity, a little less drop than the slider, and even more fading movement away from lefties. Finally, his curveball exhibits by far the most drop while moving into a left-handed hitter at the slowest speed. Of the five pitches that weren't categorized, three were intentional balls and the other two were either changeups or curveballs that didn't seem to fit any of the profiles.
Perhaps this is a good place to pause and do three things: ask whether this technique can be applied globally, identify the technique's limitations, and see if it has any usefulness. To the first question, the apparent answer is yes, to a degree. While every pitcher will have a fairly unique signature for each pitch, it is possible to create generic signatures that would almost certainly classify a large number of pitches at the extremes. Four-seam 95 mph fastballs and big overhand curveballs are two examples of pitches that we could pretty well identify for the majority of pitchers. The problem is in the vast array of sinking fastballs, cutters, and sliders that most pitchers include in their bag of tricks. For example, without the help of direct observation, it would be very difficult to tell (with a moderate confidence level) a Hernandez two-seam fastball from his four-seamer: the vertical break, although less for the two-seamer, is still very close to the four-seamer and the velocities overlap in some cases depending on generally how fast Felix was throwing that day. For example, under my algorithm the first four pitches of the game thrown to Marcus Giles on May 20th would be classified as two-seamers when it is likely that, as he's done in most other games, he tried to establish his four-seamer in the first inning. When you compound this with the fact that pitchers tire and lose velocity as the game moves along, you begin to see that there is plenty of margin for error in differentiating between certain types of pitches. That's why, for the most part, I've lumped all fastballs together.
It's also instructive to look at two of the four pitches on I did not agree with the direct observations from the June 10th start. These were pitches thrown to Kevin Kouzmanoff in the second inning that I classified as a changeup and a curveball, when David had them both as sliders. On the first of those pitches, my software chose changeup because the pitch did not in fact break towards the left-handed batter's box (it had a small but negative horizontal break, and Felix's changeup typically fades away from a left-handed hitter) and the break angle was also positive. From a velocity perspective it was in the range that sliders and changeups overlap. The observation-driven alternative is that the pitch was perhaps an unintentional back-door slider. On the second pitch, the situation was reversed. My software chose curveball because the pitch broke in to the left-handed box, sank a good deal, and had a negative break angle within the range of velocities in which the two pitches overlap. Once again, I would trust the direct observation that this was an especially hard-breaking slider. In both cases, the point is that a pitcher will throw pitches that are either not effective or extremely "filthy," and as a result we'll end up classifying them either wrongly or with a low degree of confidence.
In order to accurately classify pitches using this data, observation and knowledge of particular pitchers is needed both to help set the parameters for the pitches a pitcher throws and also to confirm or correct particular pitch classifications that are made with a low probability of certainty. While we've used a visualization technique here in order to begin the process of classification, it's also probable that algorithmic techniques falling under the heading "cluster analysis" will be useful in order to detect those smaller differences between two- and four-seam fastballs, for example.
Perhaps the larger question, though, is whether our desire to classify pitches is actually more a product of our human propensity to categorize and organize rather than something that will actually be helpful to our analysis. After all, we're dealing with a more or less continuous set of data; as we've shown, any classification system will have its problems, juast as much as direct observation will inevitably reflect some problems with subjectivity. Is it enough to simply examine pitches using the attributes that describe their trajectories and velocities?
At this point I would argue that it is useful to strive for classification, for a couple reasons. First, classification is helpful because it gives us a common nomenclature to reference. It is true that a sinker or a curveball can mean different things for different pitchers, but talking about a curveball rather than a pitch with a pfx_x of one to seven inches and a downward break angle of six to 14 degrees is more understandable for all concerned and, let's face it, more fun. More to the point, though, pitchers already intentionally throw pitches of certain types, so there is a definite distinction being made. As a result, we're not classifying for its own sake, but instead mining the data to get at the true data point. By identifying those distinctions, we can help sort out pitching patterns and perhaps shed some light on decisions made collectively by the pitcher/catcher battery.
With all of that said, what classifying Felix's pitches allows us to do is redraw the previous plot, this time color-coding by pitch type:
Although it's somewhat difficult to detect, some pitches that you'd otherwise think were curveballs are now classified as hard-breaking sliders, and some would-be sliders are actually curveballs that didn't curve very much.
This classification also gives us the opportunity to compare the April 2nd start with those that followed the injury. Hernandez has been clearly less effective since, and that's supported by the differences in the two tables below, which show the average values for each of his pitches before and after the injury. As mentioned previously the two types of fastballs are grouped together since their differentiation is based more on velocity than other pitches, and Hernandez's velocity has varied somewhat in a few of his starts, making exact identification more difficult.
April 2nd start versus Oakland Type Count Break Start pFX Vert Horiz RPC RPG Change 14 10.0 86.0 9.4 2.6 -8.8 -2.6 6.1 Curve 15 15.5 84.3 10.4 -9.4 4.0 -2.5 6.1 Fastball 53 6.7 97.5 10.7 5.8 -8.8 -2.4 6.0 Slider 20 9.8 90.5 2.0 -1.0 0.0 -2.6 6.0
Three Starts Beginning with May 20th Type Count Break Start pFX Vert Horiz RPC RPG Change 32 7.6 86.9 9.5 4.8 -7.9 -2.3 5.8 Curve 38 12.3 84.0 8.3 -6.4 4.7 -2.1 5.9 Fastball 166 5.4 95.9 11.2 7.8 -7.8 -2.1 5.8 Slider 72 8.0 90.5 3.6 1.5 1.4 -2.1 5.6
What jumps out at you in these two tables is that his average fastball was 1.5 mph faster in his first start, the vertical break on his curveball was more than three inches greater, and the horizontal movement on both his changeup and his fastball was about an inch greater in his first start, all of which leads to greater break values across the board. He was not only throwing harder, but he was clearly getting more movement on his pitches. Of course, movement isn't everything, and certainly his command seems to have been affected as well.
Although I'm not sure how much certainty there is in the measurement, his release point in the April 2nd game appears to have been a little more extended towards third base and a little higher. Perhaps, as has been reported, this is a result of trying to protect his elbow, and that he's not throwing as freely as he did prior to the injury.
2. When Does He Throw It?
With pitch types in hand we can begin to drill down on Hernandez's pitching patterns. You'll recall that one of the topics discussed last week was the truth of the common wisdom that pitchers attempt to establish their fastballs early in the game. It turns out that Felix is the quintessential example of that old axiom. The following chart shows the percentage of pitches thrown by pitch type and inning:
Inning Change Curve Fastball Slider 1 7.4% 4.9% 64.2% 23.5% 2 16.7% 11.5% 44.9% 26.9% 3 9.5% 7.9% 65.1% 17.5% 4 11.5% 9.8% 55.7% 23.0% 5 10.5% 24.6% 45.6% 19.3% 6 7.7% 23.1% 41.0% 28.2% 7 11.1% 27.8% 50.0% 11.1% 8 23.1% 7.7% 46.2% 23.1%
Whereas the average pitcher throws fastballs in the first inning around 58 percent of the time, Felix does so over 64 percent of the time before seriously turning to his changeup and curve in the second inning. It also appears he relies more heavily on his curveball in the middle innings, and mixes in his slider at a fairly constant rate throughout the game. But a large part of that first inning emphasis is seen in the very first batter--in total, 19 of his 23 pitches to the first batter in these games were fastballs. It's also interesting to note that while his fastball percentage does rise again the third and fourth innings, a far greater percentage of them are sinking two-seamers, as shown in the following table (with the caveats of uncertainty discussed above).
Percentage of Fastballs That Are Two-Seamers Inning Pct 1st 17.3% 2nd 11.4% 3rd 29.3% 4th 41.2% 5th 30.8% 6th 18.8% 7th 0.0% 8th 16.7%
But of course when a pitch is thrown also varies considerably by count, so we can produce the following two tables that show the number of pitches of each type thrown at various counts against left- and right-handed hitters.
Vs Left Count CH CV FB SL Total Pct 0-0 11 4 31 6 52 26.3% 0-1 6 4 8 5 23 11.6% 0-2 0 4 4 6 14 7.1% 1-0 7 1 13 3 24 12.1% 1-1 6 1 7 6 20 10.1% 1-2 3 6 7 5 21 10.6% 2-0 1 0 3 1 5 2.5% 2-1 4 1 3 3 11 5.6% 2-2 1 5 7 6 19 9.6% 3-0 0 0 0 1 1 0.5% 3-1 0 0 2 0 2 1.0% 3-2 1 0 3 2 6 3.0% Total 40 26 88 44 198 Pct 20.2% 13.1% 44.4% 22.2%
Vs Right Count CH CV FB SL Total Pct 0-0 1 9 38 11 59 29.8% 0-1 3 5 14 5 27 13.6% 0-2 0 1 11 0 12 6.1% 1-0 0 0 14 9 23 11.6% 1-1 0 2 13 10 25 12.6% 1-2 1 6 15 4 26 13.1% 2-0 0 0 6 1 7 3.5% 2-1 1 0 6 2 9 4.5% 2-2 0 4 6 3 13 6.6% 3-0 0 0 1 0 1 0.5% 3-1 0 0 4 0 4 2.0% 3-2 0 0 3 3 6 3.0% Total 6 27 131 48 212 Pct 2.8% 12.7% 61.8% 22.6%
These charts confirm that Felix does not like to throw his changeup against right-handed hitters, doing so only three percent of the time, and that he instead relies more on his fastball. Otherwise, he throws changeups and sliders in roughly the same proportions against either side. From these tables we can produce the following tendencies:
Vs Left Count CH CV FB SL Ahead 15.5% 24.1% 32.8% 27.6% Even 19.6% 10.3% 49.5% 20.6% Behind 27.9% 4.7% 48.8% 18.6% Two Strike 8.3% 25.0% 35.0% 31.7%
Vs Right Count CH CV FB SL Ahead 6.2% 18.5% 61.5% 13.8% Even 1.0% 14.6% 58.3% 26.2% Behind 2.3% 0.0% 70.5% 27.3% Two Strike 1.8% 19.3% 61.4% 17.5%
Here we see that when he's ahead in the count against lefties, King Felix really mixes up his pitches, as he does with two strikes (at least his curve, fastball, and slider). When even or behind in the count, he relies primarily on his fastball, throwing in just a few sliders and changeups. Against right-handed hitters he'll go to his slider when even in the count or behind (at which point he avoids his curveball like the plague), but contrary to our earlier descriptions of his repertoire, he apparently relies on his fastball as his two-strike pitch.
3. Where Does He Throw It?
Obviously, pitch location is also of interest, and so it's not difficult to create the following charts that show the location (from the perspective of the catcher) by pitch type against both left- and right-handed hitters. The strike zones drawn are the average of all those hitters who have faced Hernandez.
Perhaps the only thing that is clear here is that he seems to have more of a pattern against right-handed hitters, working them either up and in with fastballs, or low and away with sliders; against left-handers he's all over the place, as he apparently throws back-door sliders and jams hitters with both sliders and fastballs. It's been noted that he's struggled with his command since coming back from the injury, and there's certainly nothing in these charts that would suggest otherwise.
4. What Happens When He Throws It?
Finally, let's get into the outcomes of his various pitches. We can do this with two tables that show pitch outcomes by pitch type:
Vs Left-Handers Outcome CH CV FB SL Total Pct Ball 13 8 36 13 70 35.4% Ball In Dirt 0 0 1 1 2 1.0% Called Strike 6 4 15 7 32 16.2% Foul 7 5 11 8 31 15.7% Foul Bunt 0 0 2 0 2 1.0% Foul Tip 1 0 0 1 2 1.0% In Play 7 3 17 3 30 15.2% Swinging Strike 6 5 6 11 28 14.1% Swinging Strike (Blocked) 0 1 0 0 1 0.5% Total 40 26 88 44 198 Pct 20.2% 13.1% 44.4% 22.2% 100.0%
Vs Right-Handers Outcome CH CV FB SL Total Pct Ball 1 9 43 16 69 32.5% Ball In Dirt 0 1 0 2 3 1.4% Called Strike 1 4 35 8 48 22.6% Foul 3 4 19 3 29 13.7% Foul Bunt 0 0 0 0 0 0.0% Foul Tip 0 0 0 0 0 0.0% In Play 0 5 25 9 39 18.4% Swinging Strike 1 3 9 10 23 10.8% Swinging Strike (Blocked) 0 1 0 0 1 0.5% 6 27 131 48 212 2.8% 12.7% 61.8% 22.6% 100.0%
Against left-handers, his most effective pitch--the one that generates the greatest percentage of strikes--is the slider, where 27 of his 44 pitches resulted in strikes, a quarter of them swinging. His fastball is most often out of the strike zone at 42 percent. Against right-handers, his fastball generates a strike 48 percent of the time, although his slider is the most deceptive, getting a swing and miss over 20 percent of the time; it also misses the strike zone 38 percent of the time.
Going back to last week's discussion on plate discipline, from an aggregate perspective, Hernandez doesn't seem to induce fishing any more than your average pitcher, which is a bit surprising given the movement he gets on his pitches. Right-handed hitters swung at 32.5 percent of his pitches out of the strike zone, while lefties did so 31.3 percent of the time; this is against an average of 32.5 percent for all pitchers with more than 150 pitches swung at. Interestingly, though, Hernandez does have the lowest Bad Ball percentage (defined as the percentage of pitches out of the strike zone that were swung at where contact was made) against both sides, indicating that when they do go fishing for his sliders or four-seamers, they don't tend to make contact.