In last week’s column, we discussed the free agent market through a pair of bifocals, with one lens focused on volatile pitchers and the other concentrating on consistent moundsmen. It stood to reason that the more volatile pitchers would be of much greater utility to teams with nothing to lose, while the steady and reliable ones were better suited to add stability instead of putting a team over the top. Throughout the discussion, which personified the aforementioned archetypes with Joel Pineiro and Jon Garland, I operated under a major assumption: past volatility correlates strongly to wider performance distributions in the future, and more future volatility than consistency club card carriers. The popular saying that assumptions can make asses out of “u” and “mption” (or something like that-I’m not really up on what the kids say these days) used in conjunction to the axiom that past performance is not always a clear-cut indicator of what to expect moving forward suggests that a prequel needs to be written to better understand what was intuited last week. With that in mind, today’s efforts will be spent investigating what I simply figured to be true: Do volatile pitchers project to be more volatile than those considered consistent in a predetermined span?

For starters, why is this important? Recall our previous conclusion,that the Pineiros of the world should be sought after by teams like the Mets, with a wide range of potential outcomes, since the low end of his distribution would not realistically cause a hindrance, while the high end could conceivably push them into post-season contention. Well, if we find out that Pineiro and his brethren project similarly in the variance department to Garlandites, that takeaway is essentially rendered moot from an individualistic standpoint. We would be advocating decisions to be based on unreliable past information.


The research steps involved here are fairly simple, starting with the culling together of every four-year stretch starting between 1974 and 2006. For the sake of transparency, and since I get questions every now and then about why I avoid using 1954-1973 data, 1974 was set as the minimum first year since Retrosheet is missing some games prior to that season; 2006 was the maximum first year because of the length of the desired spans as a four-year span starting in 2006 ends last season. In each span was the number of games and starts as well as park-adjusted ERA. Sure, an estimator could be used in its place, but this is much more back of the envelope in scope and the adjustments for park definitely add more validity to the numbers. From there, the standard deviation of the adjusted metrics in the first three seasons was computed and the pitcher-spans were classified based on their relationship to the median deviation: The top 50 percent fell into the volatile bin with the rest categorized as consistent.

As an aside, pitcher-spans does not refer to unique pitchers, as many hurlers will appear multiple times in the sample-Roy Halladay 2001-04, 2002-05, 2003-06, you get the point-and further, some may vacillate between classifications. The next step involved the computation of the fourth-year park-adjusted ERA based on a hint of regression, a dash of aging and a simple, Marcel-esque weighting scheme of the first three seasons. With the data assembled, three main questions are raised:

  1. Which group proved easier to project?
  2. How did each group perform in the fourth year, on average?
  3. What were the standard deviations in actual performance in that fourth year?

These questions will inform us whether or not volatility yields more volatility in the future than consistency; in other words, whether or not Pineiro can actually be expected to have a wider performance distribution in 2010 than Garland, even though they would fall into different bins over the last few seasons. Each of the three questions above could be answered differently based on restrictions in playing time, but the first point can be discussed through root mean square error testing, with the second calculated through a weighted average and the third through a general standard deviation function. Enough stalling, I know, get to the results!


The table below shows the following year 4 data for each group, in different runs based on playing time parameters: Projected park-adjusted ERA, actual park-adjusted ERA, the root mean square error between actual and projected, the standard deviation of actual park-adjusted ERA, and the percentage of those whose actual mark fell two or more standard deviations from the mean. It should be stated, too, that playing time parameters refer to a minimum number of games started in each of the four seasons, so for the group of 20 each member made 20 or more starts in every season.

Type       GS Min    N   Projected Actual   RMSE    St.Dev   2+ SDS
Volatile     20     710    3.93     3.88    0.79     0.88     4.93%
Consistent   20     711    3.73     3.76    0.78     0.87     4.64%

Type       GS Min    N   Projected Actual   RMSE    St.Dev   2+ SDS
Volatile     10    1196    4.08     4.06    0.96     1.04     5.10%
Consistent   10    1186    3.82     3.91    0.98     0.99     4.38%

Type       GS Min    N   Projected Actual   RMSE    St.Dev   2+ SDS
Volatile      5    1560    4.19     4.17    1.18    1.26      4.29%
Consistent    5    1557    3.86     3.98    1.10    1.19      4.50%

A bit of recapping is in order here, but let’s get the obvious out of the way first in that, no, each sample is not comprised of 100-percent equal groupings, but they are very close in stature and the results do not rest on this in any event. What makes perfect sense is the inverse relationship between the games started requirements and sample size. As we become more flexible, more pitchers qualify. Even with the different playing time parameters, we may run into a bit of sampling bias here in that each group is required to have appeared in four consecutive seasons. Thus, someone whose volatility is the product of injury and missed a year would not be included, nor would anyone who flamed out after two years. It might not be prudent to include pitchers like this in such an analysis anyways, but it is certainly worthy of disclosure.

What initially stands out is that the consistent group posted better park-adjusted ERAs from both the projection and actual standpoints. In other words, on average, the consistent pitchers posted better numbers, though both groups worsen as the playing time restrictions loosen up and any old pitcher is permitted to attend the sample party. Moving to the root mean square error, which helps measure projected to actual values, we can see that there is virtually no difference amongst those making 20 or more starts for four straight seasons. If the restriction is lowered to 10 or more starts, the consistent pitchers are ever so slightly tougher to project, which is neither a typo nor intuitive. Lower the bar even further and, finally, the expected results surface in that volatile pitchers were tougher to peg.

Shifting the focus to standard deviations in actual park-adjusted ERAs in that fourth season, the numbers jibe with expectations but the magnitude of the deltas should tickle your interest bones. It makes sense from a sampling standpoint that the 20-plus games started group will experience minimal discrepancies in any category aside from performance given that only pitchers with a good amount of talent will be called upon that frequently and consistently. However, the deviations are not much different amongst the 10-plus games started crop, and there isn’t anything really to write home about with regards to the five-plus games started crop. Across a larger sample, a delta of 0.05 or 0.07 is certainly meaningful to an extent, but I entered this fray expecting to see deviation deltas along the lines of 0.15 or greater. Maybe I’m just an excitable person, but I cannot imagine being the only one whose expectations were not met.


Synthesizing the results, it is becoming clear that the differences between consistent and volatile pitchers in future performance deal primarily with playing time. If we expect Pineiro and Garland to each continue to toe the rubber for 30 starts, then it tends not to make a difference what they did over the last three years in terms of volatility, as neither is likely to end up any more consistent or volatile as the other. Sure, Pineiro in this instance might experience a boom-or-bust season, but the data suggests that decisions based on volatility or consistency are ill-defined if the underlying rationale centers predominantly on the recent past. Overall, this information does not alter the idea that volatile players should be attractive volatile teams, with the analog true for consistency, but it sheds light on the identification process of volatility and consistency.

A team that signs someone like Garland (stay classy, San Diego) should not expect any more consistency than from the lone member of the Pineiro Guild (hey, hey, L,A,A), even though the former has certainly been more consistent in the past then the latter. The idea of clubs deriving more utility from certain players remains intact, but our process of figuring out which pitchers could provide this increased utility is debatable.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
I'm skeptical as to how your methods could yield results.

The first sample (min 20 starts) seems useless because of the sample bias that you admit to.

I would suspect that the large majority of all players in the sample are hovering near the median, so it's no surprise in the slightest that the numbers are so close and don't indicate much. Perhaps taking the top and bottom quartile and comparing them would have yielded an interesting result.

It seems like it would more sense to analyze how likely a pitcher that had breakout season (Piniero) is to retain his performance.

Maybe I just didn't see where you might have mentioned it, but the most interesting thing to be about the numbers is that in all cases, the volatile pitchers out-performed the predictions and the consistent pitchers under-performed. A look into why that is would have been an direction worth taking.
There's a pretty obvious cause and effect explanation for why the volatile pitchers out-performed and the consistent pitchers under-performed their projections. If a volatile guy is being volatile in the wrong way, you quit starting him, so he doesn't make your start cutoff. If a steady guy is under-performing, you look at his track record and determine that he'll turn it around, so he gets his starts.

I agree that the 50%-50% cutoff shouldn't have been used. Garland's ERA once dropped almost a run and a half over the course of a season, is he always in the stable group?
These are both interesting thoughts I will work to explore next week, as perhaps the 50/50 is skewing things and the 25 vs 75 quartiles could prove more telling.
Totally agree. Splitting the entire population into two groups is not going to prove anything.