“In baseball, my theory is to strive for consistency, not to worry about the numbers. If you dwell on statistics you get shortsighted; if you aim for consistency, the numbers will be there at the end.”Tom Seaver

Performance volatility can be very frustrating for fans, managers, and executives alike. A pitcher may exude confidence and appear capable of dominating one day before floundering in his next start. Pitchers like this tend to stick around by riding on the coattails of their fantastic outings even though said “on-days” are few and far between. One of my favorite stats here is Flake, a standard deviation-based metric that quantifies consistency. The standard deviation is a statistical tool used to measure dispersion in a dataset and, under a normal distribution, two-thirds of a specific set falls within one standard deviation of the mean, or average. With regards to Flake, a lower mark translates to a higher consistency level.

Flake measures consistency by taking the standard deviation of per-start Support Neutral Value Added (SNVA). Of those pitchers with at least 150 innings pitched in 2008, Ian Snell proved to be the least flaky, with a score of .193. Essentially, this explains that two-thirds of Snell’s 31 starts featured an SNVA mark ranging from -0.273 to +0.113, which is the -0.080 average SNVA plus or minus the 0.193 standard deviation. Does this matter? You may recall that Snell turned in one of the worst-pitched seasons in 2008, with a -2.5 SNVA in the aggregate, making him very consistent-consistently bad. The million-dollar question then becomes whether or not the consistency evident in Flake correlates strongly to overall performance. Are the best pitchers the most consistent? Or does consistency on a per-start basis, in terms of value added to the team, fail to make a difference outside of personal preference and an ability to sleep easier at night?

After exporting the last 10 years of data for pitchers with the aforementioned 150 IP minimum in a given season, I ran a correlation between Flake and SNVA and found a measly coefficient of 0.03. A correlation is a statistical test designed to track the lack of independence of two variables, or in other words, how they relate to one another. For a statistical relationship to be considered even moderately strong a correlation of at least 0.35 would be needed; the 0.03 correlation ultimately suggests that a pitcher’s level of per-start consistency means relatively nothing to his overall success. This is not to say that consistency lacks importance, as the low volatility of steady performers brings with it the knowledge of what to expect, which helps general managers feel better about investments, and keeps managers and fans from pulling out chunks of their hair.

Which begs a few follow-up questions. Is consistency consistent, and are the volatile performers this year a good bet to perform inconsistently next year? To find out, I ran an AR(1) Intra-Class Correlation, which works practically the same way as a year-to-year correlation, but incorporates more than two years of data, testing the year-to-year consistency of the statistic. With an ICC of 0.05, the answer turns out to be no: consistency itself is inconsistent! Sure, there may be pitchers who are consistent only in their inconsistency, but they seem to comprise more of the exception than the rule.

Using the Flake statistic in this manner set into motion the idea of applying the same concept to Pitch-f/x data. In other words, which pitchers were the least or most consistent in fastball velocity and movement, and does this pitch data consistency share any sort of relationship with metrics like SNVA, BABIP, Infield Fly Percentage, HR/FB, or K/9? Perhaps the ability to add or subtract velocity from a fastball in a given game fools the hitters, resulting in more feeble contact or a much tougher time reading the pitcher. Or maybe more consistent movement leads to better overall production.

There are two sets of deviations of potential interest: within starts, and between starts. The former refers to how far, on average, a pitcher strayed from his mean velocity or movement in a given start, regardless of the actual means themselves. The latter informs the opposite, explaining the levels of dispersion from the seasonal average velocity or movement amongst the games started. To better explain, let’s use last week’s subject, John Danks, who averaged 91.1 mph with the fastball with a 1.65 mph velocity deviation within starts and a 1.15 mph velocity deviation between starts. Basically, two-thirds of his 33 starts (22 GS) featured an average fastball velocity of 89.95 mph to 92.25 mph, and in each of the 33 starts, two-thirds of his fastballs thrown ranged plus or minus 1.65 mph of the game-specific average velocity. The league averages for velocity deviations last season were 0.94 between and 1.12 within, meaning that Danks hovered close to home in terms of per-start velocity consistency, but fluctuated much more within each actual start.

The most consistent pitchers would be those with very low deviations in both areas, meaning their per-game averages were largely consistent, and that they barely strayed from the mean within each start. Of those with at least 1,500 pitches thrown last season, just four pitchers threw their fastballs with velocity deviations under 1.00 in both areas: Andy Pettitte, Fausto Carmona, Joakim Soria, and Livan Hernandez. Looking at that group, it’s not necessarily concrete proof that consistency in velocity is tantamount to overall success. Again, the million-dollar question becomes whether or not these deviations, within and between starts, in fastball velocity and movement, share strong relationships with other metrics. Below is a correlation matrix of the six deviation components to five performance-based indicators:

                SNVA   BABIP   INFFB   HR/FB     K/9
Bet Velo        0.03   -0.14    0.00   -0.02    0.05
Bet HorizMove  -0.13    0.26    0.02    0.11    0.03
Bet VertMove   -0.04    0.13    0.06    0.10    0.03
In Velo         0.02    0.09    0.11    0.03    0.21
In HorizMove    0.10    0.04   -0.07   -0.04   -0.10
In VertMove     0.00    0.06   -0.14    0.11   -0.13

Bet Velo/HorizMove/VertMove: Standard deviation between starts
in velocity and horizontal and vertical movement.
In Velo/HorizMove/VertMove: Standard deviation within starts
in velocity and horizontal and vertical movement

Nothing reached the aforementioned benchmark of 0.35, meaning that not even a moderately strong relationship exists between the deviation components and any of these statistics. However, two relationships of low strength emerge: horizontal movement deviation between starts to BABIP, and velocity deviation within starts to K/9. It is important to remember that correlation does not necessarily equal causation and that these results, while solid starting points, merit further investigation. Why would more fluctuation in horizontal movement between starts cause a higher BABIP? The second relationship at least goes along with conventional wisdom in the sense that more of a velocity fluctuation within a start can fool hitters and therefore increase the overall K/9 rate. Outside of these two relationships, some of the correlations show that very, very weak relationships exist in some areas, suggesting that there are definitely certain instances where a deviation in pitch data leads to an edge in a performance statistic, but not enough to say with even an ounce of certainty that the relationship is direct or a no-brainer for all hurlers.

Even more interesting is the relationship of these deviations to the Flake statistic itself. We might expect that the flakier pitchers are less consistent in their pitch data, which might cause the actual flakiness. The numbers largely disagree with this assumption: the correlations of all three deviation stats between starts, fall into oblivion when tested against Flake, as does within-start velocity deviation. The components of within-start movement share correlation coefficients in the 0.16 range with Flake, suggesting that a very weak relationship exists, in that the pitchers whose fastball movement is more volatile tend to be flakier. That relationship is not particularly strong in the least, however, and should not be taken as gospel.

Overall, many place a premium on consistency because of personal aversions to risk, as well as the intuition that consistency directly relates to better performance. As the data here showed, not only is the ability to be consistent inconsistent in and of itself, being inconsistent does not make someone a bad pitcher, and being consistent in pitch data does not give a pitcher an edge over his inconsistent colleagues. It really all boils down to personal preference: do you want the Joe Blanton type, or a pre-Seattle Jeff Weaver? I suspect many would choose Blanton, but realistically his consistency does not make him any better, statistically speaking. From an executive’s standpoint, the inconsistent pitchers are also more likely to be had on the cheap, leading to a pitcher arguably just as effective but for a fraction of the cost. Regardless, consistency is in no way vastly superior to inconsistency, whatever the negative associations.

Eric Seidman is a contributor to Baseball Prospectus. You can contact Eric by clicking here.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
This is a nice read. I wish I had the math background to evaluate the methodology and conclusions.

While the \"consistency is inconsistent\" conclusion seems contradictory, I can remember many times watching my fantasy aces blow up. I had Sabathia last year, and those first few starts gave me an ulcer.
The idea is that pitchers may be consistent one year but inconsistent the next. Sure there are studs who are consistently awesome every year, but by and large, these players are few and far between, so it isn\'t necessarily contradictory... it\'s just that when you think of it you\'re probably immediately thinking of counterpoints like Sabathia, Halladay, etc.
How many of those pitchers that are consistent one year and inconsistent the next year were affected by injury? Perhaps there\'d be a stronger correlation for those who had two healthy seasons in a row, or maybe there are enough injured pitchers that it drags down the overall consistency numbers?
Given that he restricted the sample to 150 IP, any severe injuries that had a significant effect on the pitcher\'s performance in a season would be limited out of the data.
You could go on the DL for \"elbow soreness\", \"dead arm\" or \"tendinitis\" and still throw 150 IP.
It may not be any better, but I\'ll bet it cuts down the manager\'s and bullpen coach\'s Rolaid\'s bill. At least when your consistent guy blows one you don\'t have to wonder if it was because you wore blue socks or quit smoking or changed sunflower seeds brand, you can vow revenge against your GM and ownership for handing you that s.o.b. and know it wasn\'t the stars after all.
I chatted a fair amount with Dan Fox when he was doing pitch F/X analysis and he noted that some stadiums had calibration issues. This was discussed as we were trying to figure out reasons why pitcher\'s velocities lost higher or lower percentages of their velocity depending on what park they were thrown in. Humidity was also seen as a possible factor.

Can you try to rerun this study, but instead of looking at it from a pitcher\'s perspective, look at it from a ballpark perspective? Perhaps by looking at all of John Danks\'s home games at Comiskey, for example, there might be more of a correlation. Or, try to use a domed stadium that might control better for weather and other factors. Do some stadiums have more horizontal movement? Do some stadiums have a higher K/9 rate than others? etc.
There are some parks with screwy calibrations but there aren\'t as many as in 2007 and I do my best to normalize all of these things, so whenever you see/hear something is awry with the Pitch F/X data, know that anything you read of mine has some sort of correction, including pitch classifications.
It\'s good that the systems are more calibrated. However, I would find it interesting to see if home team pitchers have higher correlations of consistency. Not only would their performance be measured by the same system, but other factors such as mound sculpting, game start time, weather etc might have their effects minimized some.
From an executive\'s standpoint, aren\'t the consistent pitchers actually cheaper than the inconsistent, controlling for overall performance? The inconsistent ones tease people into thinking they can \'fix\' them, and thus get a star.
They could, but they could also be the guys that wash out of the league and are forced to take minor league deals with non-roster invitations.
Great article! This is the sort of work that separates BP from the sports analyst pack.

We can all imagine counterpoints to the \"consistency is inconsistent\" thesis; as you said, Sabathia, Halladay, etc. There are also the Jeff Weavers of the world. But are there more of these outliers than you would expect from a completely random distribution?

What I mean is this: suppose we did a Monte Carlo simulation of every pitcher\'s performance for ten years by using a Gaussian distribution about some true skill level (with the skill level, but not the shape of the curve, being different for each pitcher). In this situation we\'d find a predictable number of outliers in consistency: a few pitchers who are particularly consistent over the ten years, and a few who are particularly inconsistent. This wouldn\'t be due to any real characteristic on their part, it would just be a statistical necessity.

The number of such randomly consistent and randomly inconsistent pitchers should be calculable. So, I wonder: do we observe more of these than we should, or is Roy Halladay just really lucky to be so consistent?
Eric, superb article. Thank you!

You ask, \"Why would more fluctuation in horizontal movement between starts cause a higher BABIP?\" I\'d hypothesize that if pitchers frequently missed while \"painting the corner,\" one of two things would happen: either the pitch would be within the heart of the strike zone, increasing LD%, or the pitch would become an obvious ball, forcing the pitcher to pitch from behind in the count and indirectly increasing LD%. In either case, the increase in LD% would increase BABIP allowed. If between-game deviation was accidental, while in-game deviation was intentional, then increases in between-game deviation would result in more missed corners, and it might possibly explain the higher BABIP.

But that\'s just one possible explanation...frankly, I\'m struck that Between-Game metrics and In-Game metrics have such different correlations to your five performance-based indicators. Yes, they\'re all lower than .35, but I think that you\'re onto something structuring the data for analysis the way you have.
Interesting observation JayhawkBill. Perhaps the adage of changing the hitter\'s eye level also comes into play here.. since the pitch, in effect, moves side to side instead of flattening out, it might make it easier to hit. This would be similar to a right handed side-armer throwing to a left-handed batter.

As a bit of a corollary, I wonder if there\'s a difference in swings and misses based on horizontal movement versus vertical movement. Do batters swing and miss more on pitches that dive up and down, or on those that sweep side to side?
One of my favorite stats here is Flake

Thanks, Eric -- I\'m glad we talked Michael Wolverton into including it in his \"Support Neutral W/L\" reports, back in the prehistory of BP.

One of Michael\'s very early discoveries was that, for almost all pitchers, flaky is more valuable than consistent. The exceptions are the future Hall of Famers (Maddux, Seaver, etc.), where a small variation around their outstanding average performance means that they always have the opponent at a severe disadvantage.

For everyone else -- where 1 standard deviation better than their personal average would rank high in the league, but 1 standard deviation below wouldn\'t drop their rank much -- flakiness means a better chance of pitching a winnable game, at the expense of a few more bad games that you would have lost anyway.
BTW, was that 0.03 correlation between Flake and season total SNVA, or between Flake and SNVA per start? If it was the latter, that says that standard deviation is a better measure of flakiness than coefficient of variation. If it was the former, part of the reason for the small correlation might be that SNVA is a cumulative season stat, and so depends a lot on health as well as performance. There\'s also a selection bias -- flakier pitchers might be less likely to get starts, due to managerial aversion to inconsistency. Since opportunities ARE correlated with performance (for sure) and flakiness (possibly), you probably want to normalize for that when testing for an effect.
Dr. Dave,

The 0.03 was for season total SNVA and Flake, and I definitely agree that there could be some selection bias, however by restricting the sample each year to 150+ IP it isn't as if I was including guys with 7 starts with those with 30+ starts, so it is vastly reduced.