Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

Due to local blackout rules and the lack of a land-line phone capable of proving that my Penn State University residence was not in Philadelphia, I relied on MLB Gameday instead of MLB TV for a good chunk of the 2007 season. The application had been around for a while, but I soon noticed strange terminology and new data accompanying each pitch. Why are there two velocity readings? What does 13″ of pFX mean? And what the heck is BRK? A little research soon made sense of the information, and within a few months I became hooked on the data set known as Pitch-f/x. Fast-forward two years, and Pitch-f/x continues to evolve, revolutionizing baseball research in the process. Unfortunately, with updates to system configurations and the amount of information offered, too many readers and baseball fans experience confused reactions similar to mine when they first encounter the data. In an attempt to quash this issue, it seemed prudent to explain some of the more commonly used numbers, discussing what they mean as well as how they should be used. Instead of merely defining terms, the system will be explored in action, with periodic discussions of its inner workings, much as Dan Fox did back in May 2007.

The season is still quite young, and the samples of statistics are nowhere near the pre-requisite size for drawing any lasting conclusions, so there is no better time than now to repeat Dan’s process, incorporating some of the newer additions to the data. With that in mind, we are going to revisit Johan Santana‘s dominance of the Milwaukee Brewers on April 18 as seen through the lens of Pitch-f/x. Keep the following disclaimer in mind: no conclusions about Santana’s early success will be drawn here, as the lone goal involves investigating how a pitcher worked on a given day, while explaining the underlying pitch data.

Overall, Santana threw 102 pitches over seven solid innings of work and held the Brew Crew scoreless. Johan scattered five hits, did not issue a free pass, and fanned seven of the 26 batters he faced. In his inaugural start at Citi Field, Santana began the game standing 60’6″ away from Rickie Weeks. The legend for his pitches can be found beneath the horizontal axis on each of the graphs, and the pitches are numbered in sequential order. For reference: FF=Four-Seam Fastball, FT=Two-Seam Fastball, CH=Changeup, and SL=Slider. Play ball!

graph

Before delving into the pitch sequence, the velocity readings, the movement data, or any of the other interesting components found in the Pitch-f/x system, it is imperative to understand that these graphs show the viewpoint of the catcher. Anyone reading this article right now has essentially taken on the role of the backstop. A very common misconception is that these location charts show the viewpoint of the pitcher, which turns the inside corner into the outside corner and lefty hitters into righties, skewing our readings in the process. Rickie Weeks, a right-handed batter, would be standing near the vertical axis in this chart.

Santana threw five pitches to Weeks, starting the game off with a two-seam fastball close to the outside corner for a called strike. Each pitch documented here carries with it a sequential number that corresponds to the velocity readings in the upper-right corner. The first velocity reading is taken at the point of release, and closely resembles the radar gun readings on television broadcasts. The second measure comes from the field “end_speed,” which records the pitch velocity just before the ball crosses home plate. Some analysts previously preferred the second velocity reading due to an increased stability across all parks, though issues with camera calibrations have been rectified since the system’s inception.

After the two-seamer and a pair of four-seam fastballs that missed for balls, Santana pulled the string on consecutive pitches. After fouling the first off, Weeks lined the second changeup right to Carlos Beltran in center field.

graph

Corey Hart stepped up to the plate, a righty who, again, would be standing next to the vertical axis on this chart. Here we see Santana throw an inside slider early in the plate appearance, an interesting selection given his vastly decreased usage of the pitch; since 2006, Santana has replaced a good portion of his sliders with two-seam fastballs and changeups. In this particular game he delivered just five sliders, and never threw more than one to a single hitter. The location chart indicates that Santana stayed inside for most of the at-bat before rearing back and getting the swinging strikeout on what would be one of his fastest pitches of the outing.

graph

Santana missed badly on the second pitch but stayed on the inside corner for most of this brain vs. Braun battle. After two straight balls, Santana got the call on a perfectly placed changeup. He wanted the call on that fourth pitch, but the umpire did not agree on the location and Braun boasted a 3-1 count. After another called strike, Braun singled up the middle on a four-seam fastball that was right down the middle of the plate. Runner on first, two outs, for the lefty Prince Fielder.

graph

After mixing three or four pitches to the first three batters of the game, Johan went back to the basics against Fielder, staying away with four consecutive four-seam fastballs before tying him up inside with a devastatingly slower changeup. Fielder was no match for the 14 mph drop-off and whiffed, Santana’s second swinging strikeout of the inning. Though he allowed just a meaningless single, Santana labored in the inning, throwing 21 total pitches. A bit of simple arithmetic shows that he became more efficient after the first frame, spreading the 81 remaining pitches over six innings, an average of slightly over 13 per inning.

graph

graph

graph

graph

Despite the same sequence of out, out, single, out, Santana threw just 10 total pitches in the second inning, less than half of his total from the opening frame. Instead of mixing and matching with his repertoire, he primarily stuck with the four-seam fastball and the changeup and stayed in the zone much more than in the first inning. Speaking of the zones-and yes, I purposely structured that sentence to segue-you may not have noticed, but the strike zone boxes in these charts have slight modifications for each hitter.

The staff and operators of the Pitch-f/x system set the zones differently depending on the height of the hitter. Unfortunately, the player-specific zones are not always consistent based on a few factors, most notable of which is that hitters do not always replicate their batting stance perfectly before each pitch. To work around this issue I tend to take the overall average of the “sz_top” and “sz_bot” fields for every plate appearance a player has amassed to that point. The zone data might not deviate very much for each of a player’s plate appearances, but having a constant measure aids in the validity of studies, especially those dealing with pitches seen and swung at in and out of the zone.

graph

graph

graph

Santana continued to pitch efficiently in the third inning, breezing through the first two hitters on just three pitches. Opposing pitcher Yovani Gallardo took a four-seam fastball for a called strike and grounded out to third base on the next pitch. Weeks came up for the second time and flied out to left field on a first-pitch hanging changeup. Johan had thrown three or fewer pitches to four straight hitters before Corey Hart stepped in again. After Hart took a two-seamer down the middle for the first strike, Johan missed inside with a changeup. Hart fouled off a four-seamer on the inside corner, and then took a high and inside four-seamer to even the count at two balls and two strikes. Hart struck out swinging for the second time on a changeup that fell completely out of the zone with a velocity drop-off of 10 miles per hour.

graph

graph

graph

graph

Now seems like a good time to discuss the pitch classification system in place, given the various velocities on Santana’s two-seam fastball. According to Eric Simon of Amazin’ Avenue, Johan does in fact throw a two-seamer, and has worked to increase his usage over the latter part of last season and into the 2009 campaign; additionally, Johan has even been quoted as saying that it’s his best pitch. There have been tremendous strides made since 2007 with the algorithm used to classify pitches; it can now identify pitch type in the data. This season, MLB Advanced Media added pitcher-specific profiles as well as speed indexing to increase the accuracy of their pitch identifications. The profile for Santana would consist of a four-seam fastball, a two-seam fastball, a slider, and a changeup. Last season there was no differentiation made between the seamed fastballs. Speed indexing goes hand in hand with the profiles and helps to prevent classification errors that could arise with uniform velocity benchmarks in place. The lack of uniformity helps to avoid situations where an 82 mph pitch is mistakenly called a changeup for a pitcher whose maximum fastball velocity is 84 mph. That pitch might be a changeup for Santana and his 90-plus velocity, but not for someone like Jamie Moyer.

Still, the algorithm in place is not 100 percent accurate, and some of Santana’s two-seam fastballs merit further investigation. How could he go from 90 miles per hour with the pitch in the first inning to merely 80-81 miles per hour in the third and fourth innings? The likeliest scenario is that these lower velocity two-seamers are actually changeups, but with similar movement to the training data entered into the system that serve to characterize two-seam fastballs. This might not always be the case, and Johan may in fact have thrown some two-seam fastballs that were drastically slower than others, but a comparison of the movement components on the pitches could definitely shed more light.

graph

The above swinging strikeout with Bill Hall is actually the perfect example of a situation meriting a closer look. The first pitch to Hall, a changeup, came in at 80.2 mph with 7.1 inches of horizontal movement and 5.8 inches of vertical movement. The fourth and final pitch, classified as a two-seam fastball, was clocked at 81.3 mph with 7.0 inches of horizontal movement and 4.7 inches of vertical movement. Horizontal movement is often referred to as either PFX_X or PFX, while vertical movement can be found on most studies as PFX_Z or PFZ. Both components measure the amount of movement on the ball as compared to a pitch of the same velocity thrown with absolutely no spin. This can lead to some confusion, particularly with regards to the idea of a rising fastball. When we say that the two-seam fastball in question here had 4.7 inches of vertical movement, it does not mean that the ball rose by that specific amount, but rather that it did not drop the 4.7 inches that a pitch would have dropped if thrown at the same velocity but with no spin, due to the force of gravity.

Four-seam fastballs are usually short on the horizontal but heavy on vertical movement. The backspin on these hard fastballs counteracts the drag and gravitational forces resulting in a straighter trajectory. Without the spin, pitches with similar velocity would drop much more. Inversely, it is quite common for two-seam fastballs to feature plenty of horizontal movement, breaking in on righties and away from lefties; picture that patented Greg Maddux fastball that righties convince themselves is out of the zone and lefties actually think will graze their jerseys. Many have wondered why lefties post negative horizontal movement marks while righties post positive numbers, and the reasoning deals with the position of home plate relative to the catcher and umpire. Positive PFX_X data indicates that the pitch came from the right of the catcher and umpire, which is where lefties release the ball. Right-handed hurlers release the ball from the left of the plate, corresponding to the negative numbers. For the purposes of calculating average movement and normalizing the data, take the absolute value.

For pitchers who throw sinkers and two-seamers, the less vertical movement the better, as the ultimate goal is to sink out of the zone or get to an area at which hitters can do little more than foul the pitch off or beat it into the ground. Getting back to Santana’s pitch selection above, the similar velocities and movement information on pitches one and four suggest that the final pitch to Bill Hall more closely resembled the changeup, especially given the fact that two-seam heaters do not normally possess substantially less velocity than four-seamers. Some pitchers may throw a 94 mph four-seamer and an 89 mph two-seamer, but it’s unlikely that someone would experience a 10-plus mph drop-off between the two pitches. How did his two-seamers fare in the velocity department for the rest of the inning?

graph

graph

graph

Three more of those two-seam fastballs to Kendall and Gallardo, all tightly grouped in a range of 90.6 to 90.8 mph, further indicating that the slower pitches documented earlier were indeed changeups. The misclassification tells us that those changeups did not move as much as a normal Santana changeup, so while they were actually part of his off-speed repertoire, their appearances more closely resembled the two-seamer. This leads into the ongoing debate of whether or not the officially determined pitch titles matter more than their appearance when classifying them. If Santana swears he throws a curveball but it moves exactly like a slider, should the pitch be classified based on how it appears to the hitter, or based on the pitcher’s self-perception?

The system also produces a field called “type_confidence,” which displays how confident it feels about the classification of a pitch. Higher confidence coefficients increase the likelihood of a correct classification. The final pitch to Bill Hall in this fifth inning carried with it a rather low confidence mark of 0.624. The pitch data came closest to matching what the system knows about two-seamers, but it is in no way an absolute label.

graph

graph

graph

graph

After 16 pitches in the sixth inning, Santana had thrown a grand total of 91, with the following breakdown by inning: 21, 10, 8, 17, 19, and 16. Despite modestly higher than desired pitch counts in all but the second and third frames, Santana had not faced more than four batters in a single inning. The second pitch to Corey Hart is but another changeup classified as a two-seamer, but Santana stuck primarily to the four-seamer and changeup when squaring off against Braun, Fielder, and Hardy. It has become increasingly clear that he left the slider by the wayside in this outing, instead sticking to the two different fastballs and changeups.

Did Santana’s release point stay consistent on each offering in his repertoire? The “z0” column in the data set measures the vertical height of the ball at its position of release, and pitchers strive to stay consistent in this area in order to avoid any sort of tell. The height of Santana’s release on two-seamers and changeups stayed the same, right around 5.78 feet, but his four-seam fastballs soared toward the plate from a height of 5.93 feet. The discrepancy translates to a little below two inches, which is not necessarily anything to worry about; my pinky, which I like to think is fairly average in size, is a smidgen over two inches tall.

graph

graph

graph

Santana broke from the previously established mold in his final inning on the mound, incorporating the slider more often and retiring Cameron, Hall, and Kendall on a mere 11 pitches. As the 26 batter charts show, Santana stayed in the strike zone and exhibited both command and control, locating his pitches and working side to side throughout the game. One valuable aspect of graphing the pitch locations as we did here is the ability to compare the deliveries to hitters at various points throughout the game. For instance, Santana faced Jason Kendall three different times, but he seemed to stick to the same game plan each time, hammering the inside corner. Likewise, he faced Prince Fielder on three different occasions and stayed completely away, avoiding the inside corner at all costs. His closest pitch to the inside corner against Fielder came in their second meeting, when Fielder singled on a two-seamer down the middle.

Here is a chart detailing the flight patterns for each of Santana’s primary pitches from the view of the first baseman:

graph

This graph essentially shows the average position of the pitch at different time intervals along its course to home plate. The calculations factor in the initial point of vertical release, the acceleration of the pitch in feet per second squared, and the velocity of the pitch in feet per second as opposed to miles per hour. As expected, the two-seam fastball drops much more than the four-seamer due to a lesser amount of vertical movement. The vertical movement, as we discussed above, signals that the pitch dropped less than it would have had gravity acted alone, and when breaking balls record negative vertical movement, the data is indicating that the spin on the pitch caused a drop greater than what would have occurred had gravity acted alone.

I am not naïve enough to think that all of the misunderstandings or confusion surrounding this data will evaporate following the publication of this article. However, there are certainly things that Pitch-f/x analysts can do to increase universal levels of understanding. One simple way would involve using words either to support, or in place of, certain numbers. Which statement paints a more descriptive picture: Santana throws his changeup with 7.1 inches of horizontal movement, or Santana’s changeup breaks sharply away from righties? My tone likely betrays my feeling that the second statement does a better job of describing the information, and is therefore more valuable for scouting and evaluative purposes. Start with the description, and then use the numbers for further clarification.

Still only a toddler, the Pitch-f/x system has already provided several new avenues of exploration that will increase in scope as the years roll on. Due to the continued harnessing of the system’s inner workings, it may be several years before the information reaches its true potential, but it is imperative that the data is handled carefully in these formative years; nothing it tells us should be treated as gospel without intense examination, in order to avoid making inaccurate claims. We’re currently in a very exciting age for baseball research, but many avid baseball fans will not be able to share in the excitement unless the information becomes more comprehensible. Hopefully, by exploring Johan’s performance last week, some questions have been answered; if not, please feel free to e-mail me or post anything else of interest in the comments section, and I will do my best to assuage any misunderstandings or confusion.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
llewdor
4/23
If you'd like a good recent example of Pitch F/X largely failing to identify pitches, take a look at Jarrod Washburn's start on April 21. The pitch identification algorithm labels 8 different pitches, which he absolutely did not throw.
PWHjort
4/23
Haha, Johan Santana loves the 0-0 changeup.
MountainCat
4/23
Muzzle velocity versus terminal velocity? I have a very hard time understanding why you can have 2 pitches initially thrown 90 mph and one of them slows to 85 by the time it crosses home plate and the other slows to 83. Other than gravity, which acts in a perpendicular direction, the main force acting on the ball during its flight is the friction from the air. That would seem to be the same for both pitches. I can understand why a pitch that starts out at 80 mph might not slow down as much as one that starts at 90 but not two pitches that start at the same speed. Any chance that what we are seeing here is measurement error?
boards
4/23
I would assume wind conditions do not remain constant throughout the game, even throughout an at bat.
yteimlad
4/24
Spin would also have an effect- not every pitch will be thrown with the same rpm, even if initial velocity is the same.
mnsportsguy1
4/23
Where can people access this Pitch-f/x data?
EJSeidman
4/23
The data can be found through mlb.com, at this URL: http://12.130.102.19/components/game/int/year_2009/. If you don't have database experience you can find individual game data there. If you want to just take a look at the data, Dan Brooks has a wonderful website that can be found here: http://www.brooksbaseball.net/pfx/.

PujolsEsElHombre
4/23
Cool stuff. How did you graph the flight patterns?
EJSeidman
4/23
I took the average PITCHf/x data for each of his pitches and plugged it into a speadsheet I have that calculates the position of the pitch based on the average data along its course to home plate, factoring in the position of initial release, the acceleration, and the feet per second form of velocity at various different intervals.

Essentially it's showing that if a ball is released at Point A at Time 0 with Acceleration B and Velocity C, then at Time 0.025 it will be at Point B... at Time 0.050 it will be at Point C, etc, all the way up until it crosses/passes home plate.
kmbart
4/24
How is the vertical movement factored at each stage across the path of the pitch. In other words, if a pitch has a PFX_Z of 5, and that means that it dropped 5" less than an otherwise identical spin-less pitch, that's just over the entire path of the pitch--how does your spreadsheet distribute each bit of downward vertical movement that the ball DOES have at each time interval? It seems pretty clear from the graph that the pitches fall in each time interval as they approach the plate.

Thanks for the great read, just trying to understand a little better.
EJSeidman
4/24
kmbart, let me start by saying the PFZ is not factored into the flight trajectories. The fields of interest for the trajectories are x0, y0, z0, vx0, vy0, vz0, and ax0, ay0, az0. x0 is the horizontal position of the ball at its initial release point and z0 is the vertical initial release. y0 measures the distance from which the ball is released. In the dataset the standard is 50 ft though some analysts prefer to backtrack everything to 55 ft. The ax0, ay0, az0 measures acceleration of the pitch in feet per second squared in three dimensions, and the vx0, vy0, vz0 measures velocity in feet per second. To find the start_speed, or what we know as velocity, you take the square root of the sum of the squares of vx0, vy0, and vz0.

To calculate the horizontal trajectory we return to formulas probably seen in high school/collegiate physics but thrown by the wayside as soon as the final exam ended:

Horizontal Trajectory = x0+vx0*t+0.5*ax0*(t*t), where t = the time interval. The other trajectories are calculated similarly, except you substitute y and z for x in each instance.

So when you see my above flight pattern for Santana, the horizontal was excluded because you cannot see that from the first base view... from that view, the vertical trajectory was most important. Note that the formula with the y's instead of x's and z's is used regardless, but for first base views we use it in conjunction with the z's. Looking from an angle to see horizontal trajectory we would use the x's and y's, not the y's and z's.

MHaywood1025
4/28
Is there a reason why the "graph" of the playing field (x,y,and z planes) is taken from a bird's eye view while all of the pitch data graphs are seen from behind the plate?

The way the graphs are set up, it seems more intuitive to have horizontal movement be the x-plane, vertical movement the y-plane, and then your z be the distance from the ball to home plate.
sparendo
4/23
Are you sure about this?

"Many have wondered why lefties post negative horizontal movement marks while righties post positive numbers, and the reasoning deals with the position of home plate relative to the catcher and umpire. Positive PFX_X data indicates that the pitch came from the right of the catcher and umpire, which is where lefties release the ball. Right-handed hurlers release the ball from the left of the plate, corresponding to the negative numbers."

I was under the impression that the PFX took the initial velocity of the ball, calculated where it would go and used that as the point of comparison.

That is, there is the tendency for the direction to do as you said, but I thought PFX already took that into account and calculated the difference between the actual X axis movement and the projected no-spin X axis movement.
EJSeidman
4/23
As I mentioned in the article, the movement components measure how much the ball moved relative to a pitch thrown at the same velocity with no spin. So when Santana threw with 7.1 horizontal inches, about two inches less than a 20 oz. water bottle, that's compared to a ball with no spin at that velocity. However, in the data, lefties have positive horizontal movement values on fastballs and righties have negative horizontal movement values on fastballs. It isn't that they are doing things differently but rather that the initial point of release comes from opposite sides of the plate relative to the catcher and ump.
sparendo
4/23
Well, let me ask it this way...suppose a righty threw one of those Platonic no spin pitches, where the pitch ended up exactly where calculations show it would from the initial speed and direction of the pitch.

From what you wrote, the PFZ would be 0--which makes sense. I would think the PFX would also be 0, since it wound up where it would have without break, but it seems you're saying it would be some negative amount since the release point was to one side of the plate.
EJSeidman
4/23
What I'm basically saying is that if we see a righty with an average PFX_X of -5.5 inches on his fastball and a lefty with an average PFX_X of +5.5 inches on his fastball, they threw with the same amount of horizontal movement relative to a pitch thrown at both velocities with no spin. The only reason for the sign discrepancies is that the pitches came from opposite sides of the plate, with the right of the catcher/ump being positive and the left being negative.
sparendo
4/23
Cool, got it. Thanks.
EJSeidman
4/23
No problem, glad you enjoyed it.
sparendo
4/23
BTW, as an addendum to my previous comment, it seems like you were describing PFZ in the way I thought it was, but PFX was described differently.

And I also wanted to compliment you on the article...good read, and a cool look at a game from the PFX standpoint.
JayhawkBill
4/24
Eric, my strongest congratulations on your excellent article. I've seen no better description of the system for users at the level of BP readers. I've used Pitch f/x quite a bit since 2007, and I still found your article a great guide--I sure could've used it at the inception of Pitch f/x. Thank you.
esmcmaha
4/24
While wondering if JayhawkBill is currently employed by a certain AL team, I share the sentiment that the article was wonderful. There was a time when BP was clearly out front of most front offices in the use of data to drive decision making. I find myself wondering, with increasing frequency, whether BP has given up that ground while choosing to chase the next dollar. Nothing wrong with that, at least not yet, but I'd much prefer a return to data mining to see what's next. Injuries? Got it. DIPS? Ok. VORP? Nice. Pitch f/x opens up an entire new area for exploration, one that I think can bridge, once and for all, the analytical and scouting communities.

On another note, for pure entertainment value I'm not sure you can beat Buck Showalter trying to explain Defensive Efficiency. Tonight I learned that arm strength is somewhere in there. I feel good about this.
Oleoay
4/24
I don't think it was a case of chasing the next dollar.. it's just the person who focused on it the most, Dan Fox, got hired by the Pirates ;)
Oleoay
4/24
Oh as an addendum, thanks Eric for reviving pitchF/x on here. I've enjoyed your articles immensely. Also thanks for the link to the pitchF/x data since I hadn't seen that posted on BP before.
EJSeidman
4/24
Yeah, I would agree... it has to be very tough to run a company when people tend to leave for teams or other work with great frequency. Kind of like managing a minor league team with stud prospects. When they get the callup there is really nothing you can do.

Glad you have enjoyed my work here. I have loads of fun writing and analyzing and it's always wonderful to hear that your hard work is making some sort of a difference to readers.
aquavator44
4/24
Would BP ever consider a reference page with links to these kinds of things?
EJSeidman
4/24
What sort of reference/links did you have in mind?
aquavator44
4/24
I envision a combination reference library and a "suggested further reading" with links to websites that BP writers/staff use for research or just enjoy reading. The link you posted for pitch/fx data is an example of something that could be in there, along with other useful statistical databases, especially ones that are mentioned frequently, like Retrosheet and Baseball-Reference. Other links could go to sites/blogs that have a similar take on baseball analysis, including sites to which BP writers also contribute.

It would be cool to have all that information in one place, especially for new users looking to catch up. I've been reading since BP was totally free, but for a new reader, I would think additional background information about the methods and data used here would be very helpful.
eighteen
4/24
This is one of those articles I print for handy reference. Thanks, EJ.
aquavator44
4/24
"This leads into the ongoing debate of whether or not the officially determined pitch titles matter more than their appearance when classifying them. If Santana swears he throws a curveball but it moves exactly like a slider, should the pitch be classified based on how it appears to the hitter, or based on the pitcher's self-perception?"

I don't know what kind of further study this merits, but I found this an interesting aside. Ultimately, I suppose it only matters how effective the pitch is at getting a hitter out, but it's something to think about.
SaberTJ
4/24
Love the article, can't wait to see more like it.
vetrini
4/26
Great analysis.