CSS Button No Image Css3Menu.com

Baseball Prospectus home
Click here to log in Click here for forgotten password Click here to subscribe
Strength of Schedule Report
<< Previous Article
Premium Article Transaction Action: Be... (11/06)
<< Previous Column
Premium Article Checking the Numbers: ... (10/29)
Next Column >>
Premium Article Checking the Numbers: ... (11/12)
Next Article >>
Prospectus Today: Look... (11/06)

November 6, 2009

Checking the Numbers

Detecting Discipline

by Eric Seidman

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

Subscribe for $4.95 per month
Recurring subscription - cancel anytime.

a 33% savings over the monthly price!

Purchase a $39.95 gift subscription
a 33% savings over the monthly price!

Already a subscriber? Click here and use the blue login bar to log in.

Ever since Billy Beane wrote Moneyball (right, Mr. Morgan?) in order to prove that the true path to success involved only seeking the services of high-OBP employees, analysts of several varieties have worked diligently to discover market inefficiencies worth exploiting. One of the areas that has risen to prominence recently, likely due to the increased availability of the data, focuses on plate discipline on both sides of the spectrum-for hitters, or induced by pitchers.

Data providers such as Baseball Info Solutions record information based on the strike zone in a plate appearance, determining the percentages of swings and contact on balls both in and out of the zone, as well as the rate of pitches thrown or observed that fell in the zone itself. This type of granular information affords analysts the opportunity to track tendencies such as which hitters chase more pitches out of the zone or which pitchers induce these chases more often. However, the numbers remain a tad ambiguous given that their application is largely contingent upon conventional wisdom; higher rates of out-of-zone swings are bad, mmmkay? This isn't always the case, though, and the rarely discussed inverse of taking too many pitches inside the zone could also be considered poor in process.

Luckily, with the ever-growing PITCHf/x dataset, we can apply a method known as the signal detection theory to gauge discipline at the plate. You might remember the signal detection theory from such articles as "Is Walk the Opposite of Strikeout?" or "The Return of the Fisheye". The technique is commonly used in epidemiological studies used in cognitive psychology and engineering. It hinges on the idea of a perfect test, one that codes all positive results as true positives and all negative results as true negatives. Unfortunately, such tests do not exist, with false negatives-being told you are healthy when you really aren't-and false positives-hearing some bad news in error-surfacing. The first linked article above, written by Russell Carleton, applied this technique to Retrosheet data in order to measure plate discipline in a results-based fashion. Since Retrosheet lacks data for pitch location, the study was restricted to the actual results-swings and misses, balls put in play, and called pitches.

Cue the wonderful dataset that is PITCHf/x. Essentially, the goal here is to apply the signal detection theory to PITCHf/x by coding the processes in and out of the strike zone, as opposed to just the end results. In that regard, a pitch thrown in the strike zone at which the batter swung becomes a true positive. A pitch in the zone that is taken is a Type II error, or a false negative. Moving outside of the zone, swings are Type I errors, or false positives; taken pitches become true negatives. With the pitches classified in this fashion, we basically treat every major league hitter as if he is his own epidemiological study. Then, a series of calculations (to be discussed in further detail in the coming paragraphs) will explain which hitters are more prone to mistakes, as well as whether or not they are biased more towards freely swinging or taking pitches. The former statistic is known as sensitivity, while the latter is called the response bias.

Ideally, sensitivity will be high, as higher numbers correspond to fewer mistakes. The goal for response biases is to get as close to 1.0 as possible, since that mark exudes balance. Below 1.0 and the hitter's level of success in being disciplined is biased towards keeping the bat stagnant with the opposite true for numbers above that threshold. As an example, over 2008-09, Luis Castillo posted a sensitivity rate slightly above the major league average, but with a very low .252 response bias that suggests his ability to make fewer mistakes in the box heavily relied upon a seeming refusal to swing. Because he rarely swung, he received some extra ball calls, but it came at the expense of many more called strikes. Hunter Pence had an almost identical sensitivity rating as Castillo, but with a response bias of .961, extremely close to 1.0, indicative of the fact that Pence has been more balanced in making errors and perhaps is not as easily exploitable as Castillo. In fact, Pence will actually make fewer mistakes than Castillo, because he is truly optimizing his balance. He is not costing himself anything extra in either direction.

Coding everything into my PITCHf/x database involved the assumptions that an appropriate range for the horizontal portion of the strike zone started at -0.9 and went all the way to 0.9. Some studies automatically set the horizontal parameters to range from -1 to 1, while others move closer to the 0.85 that the rulebook seems to dictate, so this seems to be a happy medium. For the vertical parameters, I am using the sz_top and sz_bot fields in the dataset, which are top and bottom coordinates set by the system operator prior to each pitch. No slack was given in any direction in order to utilize a strict definition of the strike zone relative to each hitter. If a pitch fell within the zone and the result involved a swing-swinging strike, foul, ball put in play, etc.-a true positive response was coded. This process was repeated for the different errors and the true negative response. Two alterations were made, however, in that 3-0 takes in the zone were removed since that is almost an automatic take, and foul balls with two strikes were deemed true positive responses regardless of location since, with two strikes, hitters are going to widen their zone in order to "protect" at the plate.

Once each of the four categories was summed, I exported the results to Excel to add in the necessary calculations. First, we need to calculate the true positive rate and the false alarm rates. These are fairly simple compared to the others in that the true positive rate is merely true positives divided by the sum of true positives and false negatives, calculating the percentage of pitches in the strike zone featuring a swing of some sort. The false alarm rate measures the number of swings on pitches out of the zone out of all pitches out of the zone.

The next step involves finding the z-distribution for each of these rates, which is the area under a normally distributed curve with a mean of zero and a standard deviation of one. In Excel, the NORMINV function comes into play. Chipper Jones has had a true positive rate of .692, so we would plug in =NORMINV(0.692,0,1), where the 0 and 1 correspond to the aforementioned mean and standard deviation. This step gets repeated for the false alarm rate and the sensitivity rate itself is merely the True Positive Z-Distribution - False Alarm Z-Distribution. In the case of the Braves third baseman, 1.504 spits out, the third highest sensitivity rating amongst batters to see 2000+ pitches over the last two seasons. Here are the top and bottom ten sensitivity ratings:

Player          Pitches   Sens.    Player            Pitches   Sens.
Daric Barton      2739    1.567    Ronny Cedeno        2105    0.881
Chris Iannetta    3071    1.527    Alexi Casilla       2504    0.874
Chipper Jones     3959    1.504    Victor Martinez     3740    0.873
Carlos Quentin    3352    1.467    Kendry Morales      2741    0.864
Carlos Pena       4481    1.456    Ryan Braun          4779    0.848
Milton Bradley    3703    1.451    Yuniesky Betancourt 3428    0.844
Joey Votto        4191    1.447    Michael Cuddyer     3436    0.838
Brad Hawpe        4702    1.445    Erick Aybar         3172    0.787
Lance Berkman     4469    1.445    Shane Victorino     4626    0.779
Akinori Iwamura   3805    1.426    Garret Anderson     3728    0.772

Remember, with sensitivity, higher numbers are better, and it should come as no surprise that some of these players fell into their respective bins. Players with good eyes like Chipper, Milton Bradley, and Berkman are not going to let too many zone pitches pass them by, nor will they fish out of the zone too often. On the flip side, Garret Anderson and Shane Victorino had the lowest success rates in this area, basically taking pitches when it would be more optimal to swing, and vice versa. What happens when we throw in their response biases, however, since not everyone obtains their sensitivities in the same fashion?

Player          Pitches  Sens.   Bias   Player            Pitches  Sens.   Bias
Daric Barton      2739   1.567  0.281   Ronny Cedeno        2105   0.881  1.058
Chris Iannetta    3071   1.527  0.436   Alexi Casilla       2504   0.874  0.679
Chipper Jones     3959   1.504  0.471   Victor Martinez     3740   0.873  0.662
Carlos Quentin    3352   1.467  1.127   Kendry Morales      2741   0.864  0.885
Carlos Pena       4481   1.456  0.772   Ryan Braun          4779   0.848  0.929
Milton Bradley    3703   1.451  0.594   Yuniesky Betancourt 3428   0.844  1.008
Joey Votto        4191   1.447  0.836   Michael Cuddyer     3436   0.838  0.764
Brad Hawpe        4702   1.445  0.821   Erick Aybar         3172   0.787  1.047
Lance Berkman     4469   1.445  0.717   Shane Victorino     4626   0.779  0.774
Akinori Iwamura   3805   1.426  0.442   Garret Anderson     3728   0.772  0.938

Here is where the dichotomy of strategies emerges, as Barton scored so high by rarely swinging, while Quentin was much more balanced, with an absolute deviation much closer to 1.0 than anyone in his group. Interestingly enough, the hitters with lower sensitivity scores were much more balanced in terms of response bias, suggesting that they didn't necessarily discriminate in making errors, they just flat-out made errors.

Calculating the response bias itself is a bit more tedious than the sensitivity. To determine which way the hitters sway, we need to compute the phi statistics for the true positive z-distribution and that of the false alarm. This is achieved by raising 'e' to the negative form of the squared z-distribution for each, with the result divided by the square root of two times pi. In Excel, EXP(2.4) calculates 'e' raised to the power of 2.4. Once the phi statistics are both calculated, dividing phi_false / phi_true positive produces the response bias. Here are the leaders and trailers:

Player               Bias   Dev-1.0     Player              Bias   Dev-1.0
Aaron Rowand        0.999    0.001      Denard Span        0.363    0.637
Mike Jacobs         0.996    0.004      Miguel Olivo       1.661    0.661
Yuniesky Betancourt 1.008    0.008      Bobby Abreu        0.321    0.679
Rajai Davis         0.990    0.010      Nick Johnson       0.315    0.685
Carlos Gonzalez     0.986    0.014      Daric Barton       0.281    0.719
Brandon Moss        0.983    0.017      Marco Scutaro      0.267    0.733
Miguel Cabrera      0.982    0.018      Luis Castillo      0.252    0.748
Ross Gload          1.020    0.020      Josh Hamilton      1.927    0.927
Ty Wigginton        1.024    0.024      Vladimir Guerrero  2.151    1.151
Marlon Byrd         1.025    0.025      Pablo Sandoval     2.483    1.483

Sorted by the absolute deviation from 1.0, we essentially see a top ten filled with guys upon which no real disciplinary reputation has been bestowed. Jacobs and Betancourt are known more as free-swinging, low-OBP players, but this group comprises the most balanced hitters over the last two years in terms of where their discipline or lack thereof is derived. Rowand is equally likely to take a pitch down the middle as he is to swing at one in the dirt. Across the table we start to see some familiar names, a cast of characters consisting of reputed patient hitters and their exact opposites. Does it surprise you that Bobby Abreu and Nick Johnson's performances are so imbalanced? It makes sense given how often they take pitches, but perhaps this idea works similarly to the idea of a break-even stolen-base success rate, in that these two on-base luminaries take too often, while the likes of Hamilton, Guerrero, and Sandoval swing far too often.

One important caveat here is that discipline does not always translate into positive results, as our most sensitive hitter, Daric Barton, is a borderline replacement-level hitter. Likewise, the trailers in sensitivity have a handful of All-Star appearances between them and would produce a fairly formidable lineup if placed on the same team.

This is but a granular approach to determining which hitters are more or less likely to make mistakes at the plate under the assumption that out of zone swings and in zone takes are mistakes-with alterations to 3-0 takes and two strike fouls. This assumption does not operate without faults, as the strike zone as defined here is very strict and robotic, when in reality it more closely represents a LOESS-like, or ovular shape. While enhancements such a more accurate definition of the zone and perhaps some more restrictions or constraints can make better this method, the method itself remains valid and worth investing time into as a means of digging deep into discipline. Knowing about a hitter's tendencies in terms of making mistakes and the direction in which he leans can only aid pre-formed scouting reports.

The methodology and data above incorporated all data from 2008-09, meaning that there was no segregation of pitches. Moving forward, I plan on looking at specific pitches as a means of determining, say, the levels of sensitivity and bias inherent in hitters as a curveball comes their way, which can certainly help pitchers to understand who will take on a "get me over" and who will flail at a dirt-grab offering. Additionally, we will re-visit Dan Fox's fish-eye method and compare the results to hammer home which hitters could make more optimal their approach by swinging more or offering less.

Eric Seidman is an author of Baseball Prospectus. 
Click here to see Eric's other articles. You can contact Eric by clicking here

11 comments have been left for this article.

<< Previous Article
Premium Article Transaction Action: Be... (11/06)
<< Previous Column
Premium Article Checking the Numbers: ... (10/29)
Next Column >>
Premium Article Checking the Numbers: ... (11/12)
Next Article >>
Prospectus Today: Look... (11/06)

BP Wrigleyville
BP En Espanol: Los lanzadores calificados pa...
What You Need to Know: Judge Rules Against T...
Short Relief: Home Runs and Runs Home
Premium Article The Prospectus Hit List: September 21, 2017
Premium Article Baseball Therapy: Confessions of a Fake Mana...
Flu-Like Symptoms: The Vanishing ERA Qualifi...

Premium Article Transaction Action: Party Hardy, Gardy
Prospectus Today: Looking Both Forwards and ...
Premium Article Transaction Action: Better Angels and a Bean...
Premium Article Kiss'Em Goodbye: New York Yankees
Premium Article Kiss'Em Goodbye: Philadelphia Phillies

2009-11-23 - Premium Article Checking the Numbers: Fielding Distrust
2009-11-19 - Premium Article Checking the Numbers: The 2009 Platoon Split...
2009-11-12 - Premium Article Checking the Numbers: Extending the Discipli...
2009-11-06 - Premium Article Checking the Numbers: Detecting Discipline
2009-10-29 - Premium Article Checking the Numbers: Quick Change Artistry
2009-10-27 - Premium Article World Series Prospectus: The Umpires
2009-10-22 - Premium Article Checking the Numbers: Crossing Over

2009-11-23 - Premium Article Checking the Numbers: Fielding Distrust
2009-11-19 - Premium Article Checking the Numbers: The 2009 Platoon Split...
2009-11-12 - Premium Article Checking the Numbers: Extending the Discipli...
2009-11-06 - Premium Article Checking the Numbers: Detecting Discipline
2009-10-29 - Premium Article Checking the Numbers: Quick Change Artistry
2009-10-22 - Premium Article Checking the Numbers: Crossing Over
2009-10-05 - Premium Article Checking the Numbers: Location and Perceptio...