BP Comment Quick Links
March 3, 2009 Is Consistency Key?Adventures in Pitching
"In baseball, my theory is to strive for consistency, not to worry about the numbers. If you dwell on statistics you get shortsighted; if you aim for consistency, the numbers will be there at the end."Tom Seaver Performance volatility can be very frustrating for fans, managers, and executives alike. A pitcher may exude confidence and appear capable of dominating one day before floundering in his next start. Pitchers like this tend to stick around by riding on the coattails of their fantastic outings even though said "ondays" are few and far between. One of my favorite stats here is Flake, a standard deviationbased metric that quantifies consistency. The standard deviation is a statistical tool used to measure dispersion in a dataset and, under a normal distribution, twothirds of a specific set falls within one standard deviation of the mean, or average. With regards to Flake, a lower mark translates to a higher consistency level. Flake measures consistency by taking the standard deviation of perstart Support Neutral Value Added (SNVA). Of those pitchers with at least 150 innings pitched in 2008, Ian Snell proved to be the least flaky, with a score of .193. Essentially, this explains that twothirds of Snell's 31 starts featured an SNVA mark ranging from 0.273 to +0.113, which is the 0.080 average SNVA plus or minus the 0.193 standard deviation. Does this matter? You may recall that Snell turned in one of the worstpitched seasons in 2008, with a 2.5 SNVA in the aggregate, making him very consistentconsistently bad. The milliondollar question then becomes whether or not the consistency evident in Flake correlates strongly to overall performance. Are the best pitchers the most consistent? Or does consistency on a perstart basis, in terms of value added to the team, fail to make a difference outside of personal preference and an ability to sleep easier at night? After exporting the last 10 years of data for pitchers with the aforementioned 150 IP minimum in a given season, I ran a correlation between Flake and SNVA and found a measly coefficient of 0.03. A correlation is a statistical test designed to track the lack of independence of two variables, or in other words, how they relate to one another. For a statistical relationship to be considered even moderately strong a correlation of at least 0.35 would be needed; the 0.03 correlation ultimately suggests that a pitcher's level of perstart consistency means relatively nothing to his overall success. This is not to say that consistency lacks importance, as the low volatility of steady performers brings with it the knowledge of what to expect, which helps general managers feel better about investments, and keeps managers and fans from pulling out chunks of their hair. Which begs a few followup questions. Is consistency consistent, and are the volatile performers this year a good bet to perform inconsistently next year? To find out, I ran an AR(1) IntraClass Correlation, which works practically the same way as a yeartoyear correlation, but incorporates more than two years of data, testing the yeartoyear consistency of the statistic. With an ICC of 0.05, the answer turns out to be no: consistency itself is inconsistent! Sure, there may be pitchers who are consistent only in their inconsistency, but they seem to comprise more of the exception than the rule. Using the Flake statistic in this manner set into motion the idea of applying the same concept to Pitchf/x data. In other words, which pitchers were the least or most consistent in fastball velocity and movement, and does this pitch data consistency share any sort of relationship with metrics like SNVA, BABIP, Infield Fly Percentage, HR/FB, or K/9? Perhaps the ability to add or subtract velocity from a fastball in a given game fools the hitters, resulting in more feeble contact or a much tougher time reading the pitcher. Or maybe more consistent movement leads to better overall production. There are two sets of deviations of potential interest: within starts, and between starts. The former refers to how far, on average, a pitcher strayed from his mean velocity or movement in a given start, regardless of the actual means themselves. The latter informs the opposite, explaining the levels of dispersion from the seasonal average velocity or movement amongst the games started. To better explain, let's use last week's subject, John Danks, who averaged 91.1 mph with the fastball with a 1.65 mph velocity deviation within starts and a 1.15 mph velocity deviation between starts. Basically, twothirds of his 33 starts (22 GS) featured an average fastball velocity of 89.95 mph to 92.25 mph, and in each of the 33 starts, twothirds of his fastballs thrown ranged plus or minus 1.65 mph of the gamespecific average velocity. The league averages for velocity deviations last season were 0.94 between and 1.12 within, meaning that Danks hovered close to home in terms of perstart velocity consistency, but fluctuated much more within each actual start. The most consistent pitchers would be those with very low deviations in both areas, meaning their pergame averages were largely consistent, and that they barely strayed from the mean within each start. Of those with at least 1,500 pitches thrown last season, just four pitchers threw their fastballs with velocity deviations under 1.00 in both areas: Andy Pettitte, Fausto Carmona, Joakim Soria, and Livan Hernandez. Looking at that group, it's not necessarily concrete proof that consistency in velocity is tantamount to overall success. Again, the milliondollar question becomes whether or not these deviations, within and between starts, in fastball velocity and movement, share strong relationships with other metrics. Below is a correlation matrix of the six deviation components to five performancebased indicators: SNVA BABIP INFFB HR/FB K/9 Bet Velo 0.03 0.14 0.00 0.02 0.05 Bet HorizMove 0.13 0.26 0.02 0.11 0.03 Bet VertMove 0.04 0.13 0.06 0.10 0.03 In Velo 0.02 0.09 0.11 0.03 0.21 In HorizMove 0.10 0.04 0.07 0.04 0.10 In VertMove 0.00 0.06 0.14 0.11 0.13 Bet Velo/HorizMove/VertMove: Standard deviation between starts in velocity and horizontal and vertical movement. In Velo/HorizMove/VertMove: Standard deviation within starts in velocity and horizontal and vertical movement Nothing reached the aforementioned benchmark of 0.35, meaning that not even a moderately strong relationship exists between the deviation components and any of these statistics. However, two relationships of low strength emerge: horizontal movement deviation between starts to BABIP, and velocity deviation within starts to K/9. It is important to remember that correlation does not necessarily equal causation and that these results, while solid starting points, merit further investigation. Why would more fluctuation in horizontal movement between starts cause a higher BABIP? The second relationship at least goes along with conventional wisdom in the sense that more of a velocity fluctuation within a start can fool hitters and therefore increase the overall K/9 rate. Outside of these two relationships, some of the correlations show that very, very weak relationships exist in some areas, suggesting that there are definitely certain instances where a deviation in pitch data leads to an edge in a performance statistic, but not enough to say with even an ounce of certainty that the relationship is direct or a nobrainer for all hurlers. Even more interesting is the relationship of these deviations to the Flake statistic itself. We might expect that the flakier pitchers are less consistent in their pitch data, which might cause the actual flakiness. The numbers largely disagree with this assumption: the correlations of all three deviation stats between starts, fall into oblivion when tested against Flake, as does withinstart velocity deviation. The components of withinstart movement share correlation coefficients in the 0.16 range with Flake, suggesting that a very weak relationship exists, in that the pitchers whose fastball movement is more volatile tend to be flakier. That relationship is not particularly strong in the least, however, and should not be taken as gospel. Overall, many place a premium on consistency because of personal aversions to risk, as well as the intuition that consistency directly relates to better performance. As the data here showed, not only is the ability to be consistent inconsistent in and of itself, being inconsistent does not make someone a bad pitcher, and being consistent in pitch data does not give a pitcher an edge over his inconsistent colleagues. It really all boils down to personal preference: do you want the Joe Blanton type, or a preSeattle Jeff Weaver? I suspect many would choose Blanton, but realistically his consistency does not make him any better, statistically speaking. From an executive's standpoint, the inconsistent pitchers are also more likely to be had on the cheap, leading to a pitcher arguably just as effective but for a fraction of the cost. Regardless, consistency is in no way vastly superior to inconsistency, whatever the negative associations. Eric Seidman is a contributor to Baseball Prospectus. You can contact Eric by clicking here.
Eric Seidman is an author of Baseball Prospectus. 17 comments have been left for this article.

This is a nice read. I wish I had the math background to evaluate the methodology and conclusions.
While the "consistency is inconsistent" conclusion seems contradictory, I can remember many times watching my fantasy aces blow up. I had Sabathia last year, and those first few starts gave me an ulcer.
The idea is that pitchers may be consistent one year but inconsistent the next. Sure there are studs who are consistently awesome every year, but by and large, these players are few and far between, so it isn't necessarily contradictory... it's just that when you think of it you're probably immediately thinking of counterpoints like Sabathia, Halladay, etc.
How many of those pitchers that are consistent one year and inconsistent the next year were affected by injury? Perhaps there'd be a stronger correlation for those who had two healthy seasons in a row, or maybe there are enough injured pitchers that it drags down the overall consistency numbers?
Given that he restricted the sample to 150 IP, any severe injuries that had a significant effect on the pitcher's performance in a season would be limited out of the data.
You could go on the DL for "elbow soreness", "dead arm" or "tendinitis" and still throw 150 IP.