Checking the Numbers: Extending the Discipline Detection

November 12, 2009

Last Friday, I discussed plate discipline at length, noting that the commonly cited facet of performance extends beyond its synonym of patience and into the realm of making fewer responsive mistakes in a given trip to the dish. I introduced signal detection theory as a means of more accurately measuring which hitters produce the correct responses most often, since having good plate discipline must also cover the optimization of in zone pitches and not merely how often a hitter chases.

As a brief refresher what I did last week was this: using PITCHf/x data, I coded events with swings on pitches thrown in the strike zone as true positive responses; takes on pitches out of the zone as true negatives; takes on zone offerings as false negatives; and swings on pitches out of the zone as false positives. Two metrics were introduced as well: sensitivity and response bias. The former measures how mistake-prone a hitter is, with higher figures translating to a lesser rate of erring in decision. The latter serves as a barometer of where their mistakes are biased, with the ideal output hovering right around the 1.0 mark. Below that threshold and hitters are more likely to take in zone pitches, while figures in excess are biased towards, well, think Vladimir Guerrero. An important aspect of discipline not discussed dealt with the actual results of these decisions. Without factoring in the likelihood of contact and/or positive results, we really cannot determine which hitters are stepping into the box with suboptimal strategies.

Think of it this way: Luis Castillo was found to be league average in terms of his sensitivity rating, but he was heavily biased towards keeping the bat on his shoulder. Due to the rather extreme bias, he was the recipient of extra ball calls, all of which came at the expense of many more called strikes. By comparison, Hunter Pence had a near-identical sensitivity rate with a response bias practically sharing a room with the 1.0 neutrality, meaning that he was equally likely to take a pitch in the zone as he was to chase one down in the dirt, which made him less exploitable and, to an extent, less prone to mistakes. But the logical assumption given this information, that Castillo utilized a suboptimal strategy, was incomplete because a good chance remained that perhaps he rarely swung because he understood his limitations and knew that earning freebies might hold the highest probability for him to reach first base. The question then becomes: How often did Castillo make contact, and what type of contact was made?

Baseball Info Solutions publishes plate discipline data that keeps track of swings and contact made both in and out of the zone, but I decided to use my PITCHf/x database in order to keep consistent within this entire methodology. Using the same parameters used to code swings and takes in and out of the zone, three columns were added-zone, swing, and contact-in order to make querying easier. Each column was binary in nature, where 1=yes and 0=no, making the PITCHf/x versions of these contact rates more easily calculable.

Looking at anyone who saw 2000 or more pitches over 2008-09, the league-average rates were 87.9 percent for zone contact, and 68.2 percent for out-of-zone contact. In this same span, Luis Castillo made contact 95.9 percent of the pitches in the strike zone at which he offered, while getting the bat on the ball 87.9 percent of the time on out-of-zone pitches. Castillo made contact at a much better rate than the league. Again, it would seem that all signs indicate his strategy is not optimal. He doesn’t swing all that much, but clearly boasts an ability to make contact whenever he swings. Why not swing more often?

Well, these numbers strictly counted the number of times contact was made, but contact is a rather ambiguous term. Contact literally refers to connecting the bat with the ball, and there are more ways than one to accomplish this feat. Aside from the differences in grounders, liners, and fly balls and the various subsets of each of those-sharp liners, frozen ropes-there is the simpler separation of balls put in play and those fouled off. If Castillo is making a ton of contact, but a hefty portion of that can be attributed to foul balls and spoiling off pitches, his contact rates may be misleading. Sure, fouling off pitches in many cases is a better result than a whiff, but such a situation makes less clear whether or not Castillo could have done himself better to offer a bit more often. The simple solution here is to add a fourth binary column recording whether or not a foul occurred on a swing. On a pitch that the batter fouled off in the zone, all of the zone, swing, contact, and foul columns would display a ‘1.’ Looking at overall foul-ball rates to avoid any type of sample-size issue potentially inherent in a zone/out of zone breakdown, the league average came out to 48.1 percent. Slightly less than half of all contacted balls amongst these 2000+ pitch batters were of the foul ball variety.

Castillo clocks in at 39.2 percent. Now it’s getting really interesting, as he swings less often than anyone in the game-his 31.7 percent swing rate was substantially lower than that of Bobby Abreu, who was second with 34.0 percent-and Castillo makes more contact than anyone in the game on the rare swings. Only five hitters-Cristian Guzman, Jeff Keppinger, Pedro Feliz, Placido Polanco, and Vernon Wells-produced lower foul-ball rates on said contact. While this confluence of characteristics should once again suggest that Castillo wields a suboptimal batting strategy, the aforementioned “Less Fouls Than Castillo” quintet does not exactly consist of perennial All-Stars or players reputed to boast solid discipline. This suggests that yet another step is needed to truly gauge strategy optimization: what happens on the non-foul contacted balls?

Unfortunately, our journey in this regard has to be put on hold at this juncture because there simply is not any reliable and freely available data on the percentage of balls in play that were hit sharply or weakly. The MLB Gameday application attempts to break things down in this fashion, occasionally coding balls as sharply hit or weak, but the lack of consistency across parks and the fact that not every event is coded in this fashion ultimately precludes its use in this arena. This exercise reflects why HITf/x data, whenever it becomes available, is going to be immensely important, as the velocity off the bat when contact is made is the final, missing piece to determining disciplinary optimization.

Before looking at a few more selected players under this lens, I did want to point out that there was an ever-so-slight miscalculation in the response bias last week, in that the phi statistics were raised to the squared term of the z-distribution for true correct results and false alarms and then multiplied by negative one. The correct version is to multiply by negative one-half. It does not change a whole heck of a lot, but I felt compelled to point this out. The sensitivity ratings were still valid, but here are the top and bottom ten revised response biases, where the top group has the least absolute deviation from 1.0 with the inverse true of the bottom group.


Player              Bias     Dev     Player             Bias     Dev
Aaron Rowand        1.000   0.000    Pablo Sandoval     1.576   0.576
Mike Jacobs         0.998   0.002    Luis Castillo      0.502   0.498
Yuniesky Betancourt 1.004   0.004    Marco Scutaro      0.516   0.484
Rajai Davis         0.995   0.005    Daric Barton       0.530   0.470
Carlos Gonzalez     0.993   0.007    Vladimir Guerrero  1.467   0.467
Brandon Moss        0.991   0.009    Nick Johnson       0.561   0.439
Miguel Cabrera      0.991   0.009    Bobby Abreu        0.566   0.434
Ross Gload          1.010   0.010    Denard Span        0.602   0.398
Ty Wigginton        1.012   0.012    Josh Willingham    0.609   0.391
Marlon Byrd         1.012   0.012    Josh Hamilton      1.388   0.388

Again, this tells us where they sway in terms of making mistakes. Aaron Rowand is literally neutral in this regard, meaning that without discussing his actual level of making mistakes, he is perfectly balanced when mistakes are made. Pablo Sandoval and Luis Castillo are interesting to compare given their positions on opposite sides of the response bias spectrum: Sandoval is more likely to err on the side of swinging too frequently than Castillo is to continue playing the role of Nicholas Noswing in the off-Broadway hit, Johnny Get That Bat Off of Your Shoulder.

The largest deviations here also tend to suggest that hitters are more likely to err on the side of caution than to let loose. In fact, the average response bias across the qualifying players is 0.845, indicating a rather substantial league-wide bias towards taking called strikes. In any event, there are a few players I was asked about when discussing this type of methodology that seems worthy of further analysis in this forum. Keep in mind the following league averages: Sensitivity (1.144), Zone Contact (87.9 percent), OOZ Contact (68.2 percent), Fouls (48.1 percent), and Response Bias (0.844 even though the deviation from 1.0 is of more interest).

Vladimir Guerrero: His 1.122 sensitivity is right is in line with the league, as is his 88.1 percent rate of zone contact. His 73.1 percent out of zone contact rate exceeds the league average, however; so does his 49.5 percent rate of foul balls. Combined with a response bias heavily skewed towards swinging at pitches out of the zone, Guerrero appears to be Castillo’s free-swinging analog. He swings much more often and contacts a higher raw number of pitches, but much of that comes in the form of foul balls. If his rate of well-hit balls exceeded the average, our assessment would shift. If this study were to be conducted a decade ago, different results would likely surface, as balls Guerrero can now only hope to foul off may have gone for extra base hits or solidly hit liners.
Carlos Quentin: The White Sox corner outfielder boasts the highest sensitivity with the most neutral response bias, the fourth highest sensitivity, and a 1.062 response bias. He is essentially league average on zone contact while below average in terms of out-of-zone contact. He also fouls off a higher percentage balls than the league. This seems like the perfect recipe for a player of average discipline, not a league leader.
Jack Cust: All signs here point towards an optimal strategy being used for a rather suboptimal player. Cust is well below average in zone contact (73.5 percent) and out-of-zone contact (49.5 percent), with one of the highest foul-ball rates at 55.6 percent. His response bias of 0.614 indicates a preference for not swinging, and with good reason, as he makes much less contact than the league, and a higher-than-average percentage of said contact comes in the form of foul balls. The rest of his contact may be balls hit over the wall or hit sharply around the diamond, but these numbers tend to agree with his strategy.

Again, we need some sort of measure of what type of contact was made before really moving forward but I hope this provides the framework for future studies in terms of determining plate discipline. We need to look past the idea that discipline equals not chasing bad pitches, and get more in tune with the idea that disciplined hitters understand their abilities and limitations as far as which types of pitches they can make decent contact with, and do not let these pitches go by, while spitting on pitches they do not feel capable of handling very well. A player can utilize a completely optimal strategy while batting and be suboptimal himself, which is more on the front office than the abilities of the individual player, another important distinction to make. Knowing which way the batters are biased towards in terms of making mistakes as well as their individual proneness to making mistakes in the first place is incredibly valuable, but the missing ingredient of what happens when contact is made is of equal value on its own since it can help redefine which pitches and events should be classified as errors in signal detection studies.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Eric Seidman

Latest Articles

You need to be logged in to comment. Login or Subscribe

buffum

11/12

This piece resonated with me more than the previous one. Giving examples of how players are using an optimal swing strategy really helped this coalesce for me. Thanks.

Reply to buffum

rweiler

11/12

I'm not sure you need velocity coming off the bat to complete the study. HR rate is probably a pretty decent proxy for velocity because balls that aren't hit hard don't get out of the park (seel, everyplace except Coors field). Luis Castillo hits very, very few HR which suggests that when he does hit a ball fair, it isn't hit all that hard. Of course, swing angle and park effects have something to do with that. Given his inability to get the ball out of the park when he does swing, I'd say that Castillo was probably right to leave the bat on his shoulder.

Reply to rweiler

EJSeidman

11/12

No, this is incorrect. You need velocity off of the bat or a means of gauging how well the ball is it because not all line drives are hit hard, so LD% could overestimate it, not all GB% are hit weakly, so GB% could underestimate it there as well, and you in no way need to hit a ton of HR to hit the ball hard.

Reply to EJSeidman

EJSeidman

11/12

As a further, if every non-foul ball that Castillo contacted was hit really, really hard, regardless of whether those balls were fielded or not, he would be utilizing a suboptimal strategy because he makes contact all the time, most of those contacted balls are put in play, and he hits them really hard. Home runs are hard hit balls but not all hard hit balls are home runs.

Reply to EJSeidman

Richie

11/13

Very good stuff. Thank you.

Reply to Richie

DrDave

11/13

Speaking as someone who usually doesn't advocate outcome-based measures, in this case I think you really could just look at whether the non-foul contact was a hit or not. That's what Luis Castillo is trying to do when he swings -- not hit the ball hard, not hit a HR, but simply get a hit. If his approach produces a bunch of bloop doubles, that's *success*, where a speed-off-the-bat measure would call it failure.

Of course, for outcome-based measures you need a large enough sample to iron out the luck. But an analysis using outcomes (available data) according to in/out of zone would already be much better info than we have, and not obviously worse that what you were hoping to do.

At least for Luis Castillo.

Reply to DrDave

EJSeidman

11/13

Dave, I would agree that hit/out could serve as a decent proxy FOR NOW, but certainly not when more data is available, assuming it becomes available. Castillo could smash line drives that are recorded for outs and get bloop hits. For me, the idea of how hard the ball was hit is much more important regardless of the outcome.

Reply to EJSeidman

DrDave

11/13

Careful; this is only true if the correlation between velocity and outcome, at the individual level, is always positive. If some player has an *ability* to get bloop hits or infield singles, then it simply is not true (for that player) that harder = better. This isn't ski jumping; there are no style points.

For most players, yes, velocity off the bat will be a good surrogate for luck-independent batting success. But not, I think, for all of them -- and Luis Castillo (like Ichiro) is a likely candidate for one of the exceptions.

Reply to DrDave

EJSeidman

11/13

Right, and extenuating circumstances like this are going to persist for several players, meaning we'll need to make careful adjustments for them. But velocity off the bat will cover a larger spread of the league as a better proxy than hits will, I think we can agree on that.

Reply to EJSeidman

BeplerP

11/13

Mr. Seidman: This is really amazing work. Thank you. I have no technical grasp of what you are doing (I don't have a technical grasp of what Einstein was doing either), but the results I believe I can understand- which I see crystallized in the sentence: "A player can utilize a completely optimal strategy while batting and be suboptimal himself, which is more on the front office than the abilities of the individual player, another important distinction to make."- that is, if I understand you, plate discipline is very valuable for a hitter like Albert Pujols, not quite as valuable for a hitter like Jack Cust? This is terrific information. Keep going, please, and I will try to catch up on the math. Regards,

Reply to BeplerP

EJSeidman

11/13

Peter, more like you can have a guy utilizing a sound strategy who just isn't a good hitter. So he might not be biased in any direction when making mistakes and might make fewer mistakes than the field, but when he has what we are calling positive responses--like swings in the zone--the results aren't solid.

Reply to EJSeidman

DrDave

11/14

Exactly -- and here is where we might have a chance to finally do some _objective_ commentary on who "makes the most of his talent", as opposed to "having natural tools". Historically, that's been a lazy way of making moral (and racist) judgments about players, but this would really help separate identification/recognition skills from twitch skills.

Reply to DrDave

Checking the Numbers: Extending the Discipline Detection

Thank you for reading

Latest Articles

speX ’24: Week Four $

Will I Be Drawing These Stupid Rabbits Forever? $

Deep League Landscape ’24: Week Four $

MLU: Bratt Frustrates Opposing Hitters $

Box Score Banter: Knuckling (Way, Way) Up B

Eric Seidman

Latest Articles

speX ’24: Week Four $

Will I Be Drawing These Stupid Rabbits Forever? $

Deep League Landscape ’24: Week Four $