World Series time! Enjoy Premium-level access to most features through the end of the Series!
April 23, 2009
Checking the Numbers
Due to local blackout rules and the lack of a land-line phone capable of proving that my Penn State University residence was not in Philadelphia, I relied on MLB Gameday instead of MLB TV for a good chunk of the 2007 season. The application had been around for a while, but I soon noticed strange terminology and new data accompanying each pitch. Why are there two velocity readings? What does 13" of pFX mean? And what the heck is BRK? A little research soon made sense of the information, and within a few months I became hooked on the data set known as Pitch-f/x. Fast-forward two years, and Pitch-f/x continues to evolve, revolutionizing baseball research in the process. Unfortunately, with updates to system configurations and the amount of information offered, too many readers and baseball fans experience confused reactions similar to mine when they first encounter the data. In an attempt to quash this issue, it seemed prudent to explain some of the more commonly used numbers, discussing what they mean as well as how they should be used. Instead of merely defining terms, the system will be explored in action, with periodic discussions of its inner workings, much as Dan Fox did back in May 2007.
The season is still quite young, and the samples of statistics are nowhere near the pre-requisite size for drawing any lasting conclusions, so there is no better time than now to repeat Dan's process, incorporating some of the newer additions to the data. With that in mind, we are going to revisit Johan Santana's dominance of the Milwaukee Brewers on April 18 as seen through the lens of Pitch-f/x. Keep the following disclaimer in mind: no conclusions about Santana's early success will be drawn here, as the lone goal involves investigating how a pitcher worked on a given day, while explaining the underlying pitch data.
Overall, Santana threw 102 pitches over seven solid innings of work and held the Brew Crew scoreless. Johan scattered five hits, did not issue a free pass, and fanned seven of the 26 batters he faced. In his inaugural start at Citi Field, Santana began the game standing 60'6" away from Rickie Weeks. The legend for his pitches can be found beneath the horizontal axis on each of the graphs, and the pitches are numbered in sequential order. For reference: FF=Four-Seam Fastball, FT=Two-Seam Fastball, CH=Changeup, and SL=Slider. Play ball!
Before delving into the pitch sequence, the velocity readings, the movement data, or any of the other interesting components found in the Pitch-f/x system, it is imperative to understand that these graphs show the viewpoint of the catcher. Anyone reading this article right now has essentially taken on the role of the backstop. A very common misconception is that these location charts show the viewpoint of the pitcher, which turns the inside corner into the outside corner and lefty hitters into righties, skewing our readings in the process. Rickie Weeks, a right-handed batter, would be standing near the vertical axis in this chart.
Santana threw five pitches to Weeks, starting the game off with a two-seam fastball close to the outside corner for a called strike. Each pitch documented here carries with it a sequential number that corresponds to the velocity readings in the upper-right corner. The first velocity reading is taken at the point of release, and closely resembles the radar gun readings on television broadcasts. The second measure comes from the field "end_speed," which records the pitch velocity just before the ball crosses home plate. Some analysts previously preferred the second velocity reading due to an increased stability across all parks, though issues with camera calibrations have been rectified since the system's inception.
After the two-seamer and a pair of four-seam fastballs that missed for balls, Santana pulled the string on consecutive pitches. After fouling the first off, Weeks lined the second changeup right to Carlos Beltran in center field.
Corey Hart stepped up to the plate, a righty who, again, would be standing next to the vertical axis on this chart. Here we see Santana throw an inside slider early in the plate appearance, an interesting selection given his vastly decreased usage of the pitch; since 2006, Santana has replaced a good portion of his sliders with two-seam fastballs and changeups. In this particular game he delivered just five sliders, and never threw more than one to a single hitter. The location chart indicates that Santana stayed inside for most of the at-bat before rearing back and getting the swinging strikeout on what would be one of his fastest pitches of the outing.
Santana missed badly on the second pitch but stayed on the inside corner for most of this brain vs. Braun battle. After two straight balls, Santana got the call on a perfectly placed changeup. He wanted the call on that fourth pitch, but the umpire did not agree on the location and Braun boasted a 3-1 count. After another called strike, Braun singled up the middle on a four-seam fastball that was right down the middle of the plate. Runner on first, two outs, for the lefty Prince Fielder.
After mixing three or four pitches to the first three batters of the game, Johan went back to the basics against Fielder, staying away with four consecutive four-seam fastballs before tying him up inside with a devastatingly slower changeup. Fielder was no match for the 14 mph drop-off and whiffed, Santana's second swinging strikeout of the inning. Though he allowed just a meaningless single, Santana labored in the inning, throwing 21 total pitches. A bit of simple arithmetic shows that he became more efficient after the first frame, spreading the 81 remaining pitches over six innings, an average of slightly over 13 per inning.
Despite the same sequence of out, out, single, out, Santana threw just 10 total pitches in the second inning, less than half of his total from the opening frame. Instead of mixing and matching with his repertoire, he primarily stuck with the four-seam fastball and the changeup and stayed in the zone much more than in the first inning. Speaking of the zones-and yes, I purposely structured that sentence to segue-you may not have noticed, but the strike zone boxes in these charts have slight modifications for each hitter.
The staff and operators of the Pitch-f/x system set the zones differently depending on the height of the hitter. Unfortunately, the player-specific zones are not always consistent based on a few factors, most notable of which is that hitters do not always replicate their batting stance perfectly before each pitch. To work around this issue I tend to take the overall average of the "sz_top" and "sz_bot" fields for every plate appearance a player has amassed to that point. The zone data might not deviate very much for each of a player's plate appearances, but having a constant measure aids in the validity of studies, especially those dealing with pitches seen and swung at in and out of the zone.
Santana continued to pitch efficiently in the third inning, breezing through the first two hitters on just three pitches. Opposing pitcher Yovani Gallardo took a four-seam fastball for a called strike and grounded out to third base on the next pitch. Weeks came up for the second time and flied out to left field on a first-pitch hanging changeup. Johan had thrown three or fewer pitches to four straight hitters before Corey Hart stepped in again. After Hart took a two-seamer down the middle for the first strike, Johan missed inside with a changeup. Hart fouled off a four-seamer on the inside corner, and then took a high and inside four-seamer to even the count at two balls and two strikes. Hart struck out swinging for the second time on a changeup that fell completely out of the zone with a velocity drop-off of 10 miles per hour.
Now seems like a good time to discuss the pitch classification system in place, given the various velocities on Santana's two-seam fastball. According to Eric Simon of Amazin' Avenue, Johan does in fact throw a two-seamer, and has worked to increase his usage over the latter part of last season and into the 2009 campaign; additionally, Johan has even been quoted as saying that it's his best pitch. There have been tremendous strides made since 2007 with the algorithm used to classify pitches; it can now identify pitch type in the data. This season, MLB Advanced Media added pitcher-specific profiles as well as speed indexing to increase the accuracy of their pitch identifications. The profile for Santana would consist of a four-seam fastball, a two-seam fastball, a slider, and a changeup. Last season there was no differentiation made between the seamed fastballs. Speed indexing goes hand in hand with the profiles and helps to prevent classification errors that could arise with uniform velocity benchmarks in place. The lack of uniformity helps to avoid situations where an 82 mph pitch is mistakenly called a changeup for a pitcher whose maximum fastball velocity is 84 mph. That pitch might be a changeup for Santana and his 90-plus velocity, but not for someone like Jamie Moyer.
Still, the algorithm in place is not 100 percent accurate, and some of Santana's two-seam fastballs merit further investigation. How could he go from 90 miles per hour with the pitch in the first inning to merely 80-81 miles per hour in the third and fourth innings? The likeliest scenario is that these lower velocity two-seamers are actually changeups, but with similar movement to the training data entered into the system that serve to characterize two-seam fastballs. This might not always be the case, and Johan may in fact have thrown some two-seam fastballs that were drastically slower than others, but a comparison of the movement components on the pitches could definitely shed more light.
The above swinging strikeout with Bill Hall is actually the perfect example of a situation meriting a closer look. The first pitch to Hall, a changeup, came in at 80.2 mph with 7.1 inches of horizontal movement and 5.8 inches of vertical movement. The fourth and final pitch, classified as a two-seam fastball, was clocked at 81.3 mph with 7.0 inches of horizontal movement and 4.7 inches of vertical movement. Horizontal movement is often referred to as either PFX_X or PFX, while vertical movement can be found on most studies as PFX_Z or PFZ. Both components measure the amount of movement on the ball as compared to a pitch of the same velocity thrown with absolutely no spin. This can lead to some confusion, particularly with regards to the idea of a rising fastball. When we say that the two-seam fastball in question here had 4.7 inches of vertical movement, it does not mean that the ball rose by that specific amount, but rather that it did not drop the 4.7 inches that a pitch would have dropped if thrown at the same velocity but with no spin, due to the force of gravity.
Four-seam fastballs are usually short on the horizontal but heavy on vertical movement. The backspin on these hard fastballs counteracts the drag and gravitational forces resulting in a straighter trajectory. Without the spin, pitches with similar velocity would drop much more. Inversely, it is quite common for two-seam fastballs to feature plenty of horizontal movement, breaking in on righties and away from lefties; picture that patented Greg Maddux fastball that righties convince themselves is out of the zone and lefties actually think will graze their jerseys. Many have wondered why lefties post negative horizontal movement marks while righties post positive numbers, and the reasoning deals with the position of home plate relative to the catcher and umpire. Positive PFX_X data indicates that the pitch came from the right of the catcher and umpire, which is where lefties release the ball. Right-handed hurlers release the ball from the left of the plate, corresponding to the negative numbers. For the purposes of calculating average movement and normalizing the data, take the absolute value.
For pitchers who throw sinkers and two-seamers, the less vertical movement the better, as the ultimate goal is to sink out of the zone or get to an area at which hitters can do little more than foul the pitch off or beat it into the ground. Getting back to Santana's pitch selection above, the similar velocities and movement information on pitches one and four suggest that the final pitch to Bill Hall more closely resembled the changeup, especially given the fact that two-seam heaters do not normally possess substantially less velocity than four-seamers. Some pitchers may throw a 94 mph four-seamer and an 89 mph two-seamer, but it's unlikely that someone would experience a 10-plus mph drop-off between the two pitches. How did his two-seamers fare in the velocity department for the rest of the inning?
Three more of those two-seam fastballs to Kendall and Gallardo, all tightly grouped in a range of 90.6 to 90.8 mph, further indicating that the slower pitches documented earlier were indeed changeups. The misclassification tells us that those changeups did not move as much as a normal Santana changeup, so while they were actually part of his off-speed repertoire, their appearances more closely resembled the two-seamer. This leads into the ongoing debate of whether or not the officially determined pitch titles matter more than their appearance when classifying them. If Santana swears he throws a curveball but it moves exactly like a slider, should the pitch be classified based on how it appears to the hitter, or based on the pitcher's self-perception?
The system also produces a field called "type_confidence," which displays how confident it feels about the classification of a pitch. Higher confidence coefficients increase the likelihood of a correct classification. The final pitch to Bill Hall in this fifth inning carried with it a rather low confidence mark of 0.624. The pitch data came closest to matching what the system knows about two-seamers, but it is in no way an absolute label.
After 16 pitches in the sixth inning, Santana had thrown a grand total of 91, with the following breakdown by inning: 21, 10, 8, 17, 19, and 16. Despite modestly higher than desired pitch counts in all but the second and third frames, Santana had not faced more than four batters in a single inning. The second pitch to Corey Hart is but another changeup classified as a two-seamer, but Santana stuck primarily to the four-seamer and changeup when squaring off against Braun, Fielder, and Hardy. It has become increasingly clear that he left the slider by the wayside in this outing, instead sticking to the two different fastballs and changeups.
Did Santana's release point stay consistent on each offering in his repertoire? The "z0" column in the data set measures the vertical height of the ball at its position of release, and pitchers strive to stay consistent in this area in order to avoid any sort of tell. The height of Santana's release on two-seamers and changeups stayed the same, right around 5.78 feet, but his four-seam fastballs soared toward the plate from a height of 5.93 feet. The discrepancy translates to a little below two inches, which is not necessarily anything to worry about; my pinky, which I like to think is fairly average in size, is a smidgen over two inches tall.
Santana broke from the previously established mold in his final inning on the mound, incorporating the slider more often and retiring Cameron, Hall, and Kendall on a mere 11 pitches. As the 26 batter charts show, Santana stayed in the strike zone and exhibited both command and control, locating his pitches and working side to side throughout the game. One valuable aspect of graphing the pitch locations as we did here is the ability to compare the deliveries to hitters at various points throughout the game. For instance, Santana faced Jason Kendall three different times, but he seemed to stick to the same game plan each time, hammering the inside corner. Likewise, he faced Prince Fielder on three different occasions and stayed completely away, avoiding the inside corner at all costs. His closest pitch to the inside corner against Fielder came in their second meeting, when Fielder singled on a two-seamer down the middle.
Here is a chart detailing the flight patterns for each of Santana's primary pitches from the view of the first baseman:
This graph essentially shows the average position of the pitch at different time intervals along its course to home plate. The calculations factor in the initial point of vertical release, the acceleration of the pitch in feet per second squared, and the velocity of the pitch in feet per second as opposed to miles per hour. As expected, the two-seam fastball drops much more than the four-seamer due to a lesser amount of vertical movement. The vertical movement, as we discussed above, signals that the pitch dropped less than it would have had gravity acted alone, and when breaking balls record negative vertical movement, the data is indicating that the spin on the pitch caused a drop greater than what would have occurred had gravity acted alone.
I am not na´ve enough to think that all of the misunderstandings or confusion surrounding this data will evaporate following the publication of this article. However, there are certainly things that Pitch-f/x analysts can do to increase universal levels of understanding. One simple way would involve using words either to support, or in place of, certain numbers. Which statement paints a more descriptive picture: Santana throws his changeup with 7.1 inches of horizontal movement, or Santana's changeup breaks sharply away from righties? My tone likely betrays my feeling that the second statement does a better job of describing the information, and is therefore more valuable for scouting and evaluative purposes. Start with the description, and then use the numbers for further clarification.
Still only a toddler, the Pitch-f/x system has already provided several new avenues of exploration that will increase in scope as the years roll on. Due to the continued harnessing of the system's inner workings, it may be several years before the information reaches its true potential, but it is imperative that the data is handled carefully in these formative years; nothing it tells us should be treated as gospel without intense examination, in order to avoid making inaccurate claims. We're currently in a very exciting age for baseball research, but many avid baseball fans will not be able to share in the excitement unless the information becomes more comprehensible. Hopefully, by exploring Johan's performance last week, some questions have been answered; if not, please feel free to e-mail me or post anything else of interest in the comments section, and I will do my best to assuage any misunderstandings or confusion.