“Plate discipline though is difficult to measure. Good plate discipline can mean swinging at the first pitch, fouling off the fifth, taking the tenth; it’s about hitting when it’s possible to do so and walking when not. If it’s possible to hit, a walk is a relative failure. Ultimately though, because information as to just how many juicy pitches players swing at and how many unhittable ones they take is non-existent, though walks are an imperfect measure, they will have to do.”
—John Hill writing for The Cub Reporter weblog in 2005
We’ve been digging into the data captured by MLBAM’s 2007 version of the Gameday application. With each passing week, we’re blessed with more data to slice and dice. At this point, we have almost 60,000 data points to analyze, encompassing 23.8 percent of all pitches thrown in the 2007 season. While the results of the analysis become more interesting with each passing day, we’re still dealing with a subset of a subset of one season. Most of the analysis you find here at Baseball Prospectus encompasses aggregate data stretching back over 40 years, and so the conclusions we reach and the trends we see here are necessarily more tentative. That said, we do have enough observations to generate some statistically significant results where it concerns overall trends.
Today we examine the topic of air density and ballparks, take a first crack at determining whether pitchers do indeed try to establish their fastball early in games, and provide the basic framework for taking a closer look at the often mystical topic of plate discipline.
Velocity and Break
The questions from readers regarding the columns of the last couple of weeks have been pouring in. Of particular interest was last week‘s chart that showed the average deceleration of pitches thrown with little spin at the nine ballparks at which the PITCHf/x system is operating. What the chart revealed was that there is apparently a fairly large difference in how much the ball decelerates on its way to the plate between Arlington on the low end (roughly eight percent) and San Diego at the high end (a little over 13 percent). That span equates to a difference in velocity of an impressive four miles per hour as the pitch reaches the plate.
One would imagine that kind of discrepancy, if real, could clearly make a difference. Because there were lots of questions regarding whether the difference could really be that large, I looked at the starting times for all 859 games in my database, thereby allowing for a classification of each pitch by time of day.
To make this simple, we can break down the starting time of the game into afternoon and evening (I included as evening games those that started at 5 p.m. or later) and use the same criteria as in last week’s article looking only at pitches with a break length of less than 5.5 inches. All together that includes 18,288 pitches.
Time Pitches PctOfStart Afternoon 5502 10.2% Evening 12786 10.5%
The table reflects what we would expect per the theory that in the evening the air density is higher because of lower humidity and temperatures, which in turn creates greater friction on the ball, and slows it down. The difference between the afternoon and evening values may not seem very large, but it is statistically significant at the 99 percent confidence level. In other words, there is almost assuredly a persistent difference between afternoon and evening.
A better way to illustrate this difference is to graph it by release velocity in the same form we did last week:
At every release velocity between 86 and 98 mph, the ball slows down less in the afternoon than it does in the evening, confirming the overall results. Unlike the difference between parks, however, the difference here is only about three-tenths of one percent all along the curve, which equates to a divergence of less than a half-mile per hour when a pitch released at 95 mph reaches the plate. This result lends credence to the idea that atmospheric effects can be registered with this system, so if they exist, they should also be reflected in differences between parks.
A second, perhaps indirect way of determining whether atmospheric effects are affecting the flight of the ball differently at different parks is to see how the magnitude of the break differs, and how that stacks up against the deceleration we looked at last week. The theory again (well, actually the fact) is that denser air will exert a greater Magnus force (generated by the difference in pressure on the sides of the ball as it spins) on a ball, thereby allowing it to break to a greater degree than air that is less dense. Last week I noted that the pFX value is a measure of the break of the ball (reported in inches) that incorporates both the horizontal and vertical vectors. We’ll use this value to examine the average break for a subset of pitches across two different dimensions.
First, we’ll slice the data once again by start time, including all pitches with a pFX value greater than eight.
Time Pitches pFX Afternoon 12938 11.8 Evening 29767 12.1
Once again the difference in the two means is statistically significant at the 99 percent confidence level, consistent with the idea that the denser evening air makes for higher Magnus force, and thus better breaking balls.
The second way we can slice the data is to do so by park. The following table shows the average pFX value and the average percentage of velocity decrease at all parks for all pitches ordered by pFX:
Park Pitches pFX PctOfStart San Diego 4974 13.0 13.0% Chicago (AL) 4617 12.4 11.3% Los Angeles 4903 12.2 10.8% Anaheim 5300 12.0 9.4% Toronto 4879 12.0 10.5% Oakland 4630 11.7 10.0% Seattle 4849 11.5 10.3% Atlanta 4098 11.5 8.6% Texas 4243 11.4 8.2%
Just eyeballing the table reveals a healthy correlation (with a correlation coefficient of 0.90) between pFX and the percentage velocity decrease, supporting the idea that there is probably a link between the two. Although it’s possible that a calibration problem with regards to velocity may also affect the break value accounting for that link, it would seem highly unlikely. This data provides evidence that the system is indeed picking up atmospheric affects, and that the differing air density both in terms of time of day and park influences both the deceleration and the break of the pitch.
Establishing the Fastball
A second question that several readers entertained after last week’s piece was triggered by the graph that showed that pitchers lose their velocity in a fairly uniform way throughout the game. What the questioners wanted to know was whether the data showed that pitchers, as the common wisdom dictates, do indeed attempt to establish their fastballs early in the game in an effort to set up hitters for breaking balls in subsequent at-bats.
To take a first shot at this question, we can produce the following table which shows the number of “straight and fast” pitches thrown by inning for starters who threw 80 or more pitches in a game, along with their percentage of the total. The moniker “straight and fast” is used since only pitches that averaged 83 or more miles per hour with a horizontal break of less than 10 inches were chosen. (The data actually records this value as negative for a pitch that breaks into a right-handed hitter, and positive for a pitch that moves into a left-handed hitter, so the absolute value was used.) Obviously this conservative filter may miss fastballs for a minority of pitchers who don’t throw as hard and for others who have more movement (the “tail” on the fastball), but it should give us an idea of whether there is any truth to the old saw.
Inning Pitches SF AvgVel AvgHB Pct 1 5137 2965 87.2 -2.5 57.7% 2 4851 2608 87.1 -2.6 53.8% 3 4925 2595 87.1 -2.6 52.7% 4 4535 2275 87.2 -2.7 50.2% 5 4501 2228 87.1 -2.7 49.5% 6 3794 1784 87.1 -2.4 47.0% 7 2176 1056 87.0 -3.3 48.5% 8 677 328 86.8 -3.6 48.4% 9 206 103 86.7 -3.5 50.0%
To explain, SF is the number of straight and fast pitches, and AvgHB is the average horizontal break. The chart shows that pitchers do indeed throw a greater percentage of straight and fast pitches in the first inning (58 percent) than in any other inning. That percentage declines as the game goes along.
Because we’re using average velocity in our filter it is true that we’ve introduced some selection bias, since pitchers lose velocity as the game moves along. However, the average speed that hovers around 87 miles per hour through the first six innings indicates that it is probably the case that pitchers rely on their fastball more early in the game. However, the differences are not as great as I would have thought, especially given the vehemence to establish your fastball early is promoted.
As discussed two weeks ago, one of the fascinating aspects of the PITCHf/x data is the ability to examine the location of each pitch. The fact that a customized strike zone is also available for each batter for each plate appearance makes the data even more useful, since it allows us to fairly accurately determine (within an inch, according to MLBAM) whether the pitch was actually in the strike zone or not. This information can be used for many purposes, but to round out today’s column we’ll begin an examination of just one.
Recently, an interesting article written by Russell Carleton and titled “Is Walk the Opposite of Strikeout?” appeared in the February 2007 issue of SABR’s By the Numbers newsletter. The article deals with developing two new metrics based on signal detection theory in order to measure plate discipline. Discussion of the article and an explanation of the underlying math can be found on Russell’s blog.
At its heart, the technique he developed combines the ability to properly discern whether pitches are in the strike zone (signaled by swinging) with creating a proper balance between swinging too much and swinging too little. For the study he uses Retrosheet data, but as some readers may be aware, Retrosheet contains pitch sequence and outcomes going back to around 1988, but does not contain pitch location data. Using the location data provided by PITCHf/x, we should be able to more directly measure the concept of plate discipline.
Remember that we don’t have full data for all hitters, since there are only nine parks in which PITCHf/x is installed, and those are heavily biased to the American League West. We also are only looking at a subset of one season. Despite these limitations, we can create a few metrics including:
- Swing (S): Defined as the percentage of pitches the batter swung at, this information is available in many other places. Obviously, high values here are indicative of aggressive hitters, or hitters who see a greater percentage of pitches out of the strike zone.
- Fish (F): Defined as the percentage of pitches out of the strike zone that the hitter swung at. A higher percentage here indicates that the hitter may have trouble recognizing pitches, since he is offering at pitches that would likely otherwise be called balls. It should be noted that the strike zone as defined for this analysis is 17 inches wide (the standard) and uses the actual height customized for the player. No buffer room was added here, since we’re not concerned with giving the umpire the benefit of the doubt.
- Bad Ball (BB): Defined as the percentage of pitches out of the strike zone that were swung at where contact was made. This includes foul balls, although there is an argument to be made that a foul ball is not the intended outcome, and so should be discounted in some way. A higher value in this category indicates that, when swinging at bad pitches, the hitter is at least able to get the bat on the ball.
- Eye (E): Defined as the percentage of pitches in the strike zone on non-three and zero counts that were taken for strikes. A smaller value in this metric indicates a player who recognizes strikes and aggressively offers at them. I excluded 3-0 counts, since a hitter is much more likely to let a strike go by in this situation, and we don’t want to penalize them for that behavior. However, some readers will see where this idea could be extended to each of the eight possible counts, and a system could be devised where a smaller penalty is credited to hitters who take at 3-1 than those that do so at 0-2.
We can now apply these measures to the 174 hitters who have seen 100 or more pitches with PITCHf/x watching. First, a quick look at the leaders and trailers in each as shown in the following tables.
Sorted By Swing Name Pitches Swing Fish BadBall Eye Ivan Rodriguez 103 .631 .585 .868 .263 A.J. Pierzynski 357 .602 .469 .813 .117 Jason Smith 103 .592 .516 .424 .205 Joshua Barfield 115 .583 .474 .704 .241 Alfonso Soriano 103 .583 .571 .600 .303 Victor Diaz 123 .569 .433 .621 .196 Scott Thorman 196 .566 .514 .730 .255 Tony Pena 130 .562 .500 .718 .288 Jose Molina 157 .561 .462 .698 .206 Matt Diaz 152 .559 .444 .773 .189 ------------------------------------------------------------ Maicer Izturis 179 .352 .263 .871 .377 Esteban German 126 .333 .188 .667 .391 Andy Laroche 127 .331 .160 .538 .283 Reggie Willits 274 .328 .219 .818 .455 Wilson Betemit 208 .327 .155 .700 .282 Travis Hafner 138 .319 .232 .545 .381 Jack Cust 294 .310 .175 .563 .342 Reggie Willits 135 .304 .186 1.000 .400 Nick Swisher 149 .302 .236 .810 .483 Dan Johnson 387 .282 .148 .737 .346
Sorted By Fish Name Pitches Swing Fish BadBall Eye Ivan Rodriguez 103 .631 .585 .868 .263 Alfonso Soriano 103 .583 .571 .600 .303 Jason Smith 103 .592 .516 .424 .205 Scott Thorman 196 .566 .514 .730 .255 Tony Pena 130 .562 .500 .718 .288 Robinson Cano 155 .510 .475 .723 .327 Joshua Barfield 115 .583 .474 .704 .241 A.J. Pierzynski 357 .602 .469 .813 .117 Garret Anderson 260 .515 .467 .750 .350 Jose Molina 157 .561 .462 .698 .206 ------------------------------------------------------------ Julio Lugo 196 .378 .206 .714 .337 Marco Scutaro 183 .377 .204 .850 .369 Jim Thome 330 .358 .203 .545 .239 Esteban German 126 .333 .188 .667 .391 Reggie Willits 135 .304 .186 1.000 .400 Bobby Abreu 203 .374 .185 .545 .274 Jack Cust 294 .310 .175 .563 .342 Andy Laroche 127 .331 .160 .538 .283 Wilson Betemit 208 .327 .155 .700 .282 Dan Johnson 387 .282 .148 .737 .346
Sorted By Bad Ball Name Pitches Swing Fish BadBall Eye Brian Roberts 134 .440 .306 1.000 .344 Reggie Willits 135 .304 .186 1.000 .400 Luis Castillo 119 .395 .257 .944 .354 Ramon Martinez 124 .419 .258 .941 .345 Ken Griffey Jr. 109 .440 .366 .923 .289 Juan Pierre 431 .480 .385 .916 .302 Mark Grudzielanek 186 .462 .352 .895 .308 Luis Gonzalez 428 .395 .236 .889 .256 Kenji Jojima 270 .522 .389 .878 .217 Josh Bard 283 .470 .292 .878 .177 ------------------------------------------------------------ Derrek Lee 108 .454 .254 .533 .224 Howie Kendrick 164 .488 .396 .528 .347 B.J. Upton 225 .431 .269 .528 .227 Rocco Baldelli 135 .541 .441 .512 .190 Elijah Dukes 193 .399 .252 .500 .175 Rob Bowen 110 .400 .302 .474 .383 Craig Wilson 105 .410 .250 .471 .143 Royce Clayton 227 .515 .348 .457 .194 Jason Smith 103 .592 .516 .424 .205 Russ Branyan 139 .547 .344 .419 .061
Sorted By Eye Name Pitches Swing Fish BadBall Eye Nick Swisher 149 .302 .236 .810 .483 Reggie Willits 274 .328 .219 .818 .455 Darin Erstad 372 .398 .318 .797 .413 Reggie Willits 135 .304 .186 1.000 .400 Esteban German 126 .333 .188 .667 .391 Rob Bowen 110 .400 .302 .474 .383 Travis Hafner 138 .319 .232 .545 .381 Mark Ellis 440 .409 .302 .851 .380 Maicer Izturis 179 .352 .263 .871 .377 Jason Kendall 405 .432 .325 .870 .373 ------------------------------------------------------------ Albert Pujols 108 .417 .313 .760 .143 Craig Wilson 105 .410 .250 .471 .143 Vladimir Guerrero 403 .524 .421 .771 .142 Milton Bradley 135 .489 .260 .550 .140 Nomar Garciaparra 436 .557 .388 .800 .132 A.J. Pierzynski 357 .602 .469 .813 .117 Derek Jeter 178 .427 .246 .857 .111 Jorge Posada 121 .463 .247 .722 .104 Jeffrey Francoeur 337 .546 .425 .775 .094 Russ Branyan 139 .547 .344 .419 .061
Many inclusions on these lists should come as no surprise. Ivan Rodriguez is aggressive, as indicated by his high Swing (.631), and he swings at a lot of bad balls (a Fish of .585), but he also makes contact with a goodly number of those bad pitches he swings at (Bad Ball of .868); if the latter were not the case, he would strike out a lot more than he does. On the other end of the spectrum, Jack Cust (along with his A’s teammates Nick Swisher and Dan Johnson) swings at only 31 percent of all pitches, and only 17.5 percent of the pitches out of the strike zone.
What’s more interesting is to look at two primary metrics which together reflect two important components of plate discipline, Fish and Eye, and plot them together as shown in the graph below. Some players who fell in the middle have been left out of the chart in order to make it slightly more readable:
As you can see, the graph is split into four quadrants, and we’ve added a possible interpretation to each quadrant. Starting in the upper left, this quadrant contains players who don’t swing at bad pitches out of the strike zone, and who end up taking more than an average percentage of pitches in the strike zone. Players like Nick Swisher, Reggie Willits (both right- and left-handed, as you see him listed twice), Jack Cust, and Bobby Abreu all could therefore be said to be overly conservative in their approach, and may benefit from offering at more strikes.
Moving clockwise to the right, the next quadrant contains players who swing at a large number of pitches out of the strike zone, while also somehow managing to take a lot of pitches in the zone. Players who nonetheless perform well are typically bad-ball hitters (Alfonso Soriano, Ivan Rodriguez), while those who don’t manage to put the ball in play will struggle, as Garrett Anderson and Robinson Cano are doing thus far.
Moving to the third quadrant, we see players who chase a lot of pitches, but also take advantage of pitches in the strike zone. Classically aggressive and good hitters like Vladimir Guerrero and Nomar Garciapara can be found here.
Finally, we move to the sweet spot of the lower left quadrant, where players who don’t chase pitches and who offer at pitches in the strike zone can be found. It’s not surprising that Derek Jeter, Jorge Posada (against right-handed pitchers), Barry Bonds, David Ortiz, Magglio Ordonez, and Chipper Jones can be found here.
Once again, keep in mind that this is just a small snapshot at this point, and doesn’t include all hitters. As we gather more data, this can be refined further, and the two metrics combined into a single number that more accurately reflects the concept of plate discipline.