BP Comment Quick Links
January 27, 2010 Checking the NumbersPreceding the Value of Volatility
In last week's column, we discussed the free agent market through a pair of bifocals, with one lens focused on volatile pitchers and the other concentrating on consistent moundsmen. It stood to reason that the more volatile pitchers would be of much greater utility to teams with nothing to lose, while the steady and reliable ones were better suited to add stability instead of putting a team over the top. Throughout the discussion, which personified the aforementioned archetypes with Joel Pineiro and Jon Garland, I operated under a major assumption: past volatility correlates strongly to wider performance distributions in the future, and more future volatility than consistency club card carriers. The popular saying that assumptions can make asses out of "u" and "mption" (or something like thatI'm not really up on what the kids say these days) used in conjunction to the axiom that past performance is not always a clearcut indicator of what to expect moving forward suggests that a prequel needs to be written to better understand what was intuited last week. With that in mind, today's efforts will be spent investigating what I simply figured to be true: Do volatile pitchers project to be more volatile than those considered consistent in a predetermined span? For starters, why is this important? Recall our previous conclusion,that the Pineiros of the world should be sought after by teams like the Mets, with a wide range of potential outcomes, since the low end of his distribution would not realistically cause a hindrance, while the high end could conceivably push them into postseason contention. Well, if we find out that Pineiro and his brethren project similarly in the variance department to Garlandites, that takeaway is essentially rendered moot from an individualistic standpoint. We would be advocating decisions to be based on unreliable past information. Methodology The research steps involved here are fairly simple, starting with the culling together of every fouryear stretch starting between 1974 and 2006. For the sake of transparency, and since I get questions every now and then about why I avoid using 19541973 data, 1974 was set as the minimum first year since Retrosheet is missing some games prior to that season; 2006 was the maximum first year because of the length of the desired spans as a fouryear span starting in 2006 ends last season. In each span was the number of games and starts as well as parkadjusted ERA. Sure, an estimator could be used in its place, but this is much more back of the envelope in scope and the adjustments for park definitely add more validity to the numbers. From there, the standard deviation of the adjusted metrics in the first three seasons was computed and the pitcherspans were classified based on their relationship to the median deviation: The top 50 percent fell into the volatile bin with the rest categorized as consistent. As an aside, pitcherspans does not refer to unique pitchers, as many hurlers will appear multiple times in the sampleRoy Halladay 200104, 200205, 200306, you get the pointand further, some may vacillate between classifications. The next step involved the computation of the fourthyear parkadjusted ERA based on a hint of regression, a dash of aging and a simple, Marcelesque weighting scheme of the first three seasons. With the data assembled, three main questions are raised:
These questions will inform us whether or not volatility yields more volatility in the future than consistency; in other words, whether or not Pineiro can actually be expected to have a wider performance distribution in 2010 than Garland, even though they would fall into different bins over the last few seasons. Each of the three questions above could be answered differently based on restrictions in playing time, but the first point can be discussed through root mean square error testing, with the second calculated through a weighted average and the third through a general standard deviation function. Enough stalling, I know, get to the results! Results The table below shows the following year 4 data for each group, in different runs based on playing time parameters: Projected parkadjusted ERA, actual parkadjusted ERA, the root mean square error between actual and projected, the standard deviation of actual parkadjusted ERA, and the percentage of those whose actual mark fell two or more standard deviations from the mean. It should be stated, too, that playing time parameters refer to a minimum number of games started in each of the four seasons, so for the group of 20 each member made 20 or more starts in every season.
Type GS Min N Projected Actual RMSE St.Dev 2+ SDS
Volatile 20 710 3.93 3.88 0.79 0.88 4.93%
Consistent 20 711 3.73 3.76 0.78 0.87 4.64%
Type GS Min N Projected Actual RMSE St.Dev 2+ SDS
Volatile 10 1196 4.08 4.06 0.96 1.04 5.10%
Consistent 10 1186 3.82 3.91 0.98 0.99 4.38%
Type GS Min N Projected Actual RMSE St.Dev 2+ SDS
Volatile 5 1560 4.19 4.17 1.18 1.26 4.29%
Consistent 5 1557 3.86 3.98 1.10 1.19 4.50%
A bit of recapping is in order here, but let's get the obvious out of the way first in that, no, each sample is not comprised of 100percent equal groupings, but they are very close in stature and the results do not rest on this in any event. What makes perfect sense is the inverse relationship between the games started requirements and sample size. As we become more flexible, more pitchers qualify. Even with the different playing time parameters, we may run into a bit of sampling bias here in that each group is required to have appeared in four consecutive seasons. Thus, someone whose volatility is the product of injury and missed a year would not be included, nor would anyone who flamed out after two years. It might not be prudent to include pitchers like this in such an analysis anyways, but it is certainly worthy of disclosure. What initially stands out is that the consistent group posted better parkadjusted ERAs from both the projection and actual standpoints. In other words, on average, the consistent pitchers posted better numbers, though both groups worsen as the playing time restrictions loosen up and any old pitcher is permitted to attend the sample party. Moving to the root mean square error, which helps measure projected to actual values, we can see that there is virtually no difference amongst those making 20 or more starts for four straight seasons. If the restriction is lowered to 10 or more starts, the consistent pitchers are ever so slightly tougher to project, which is neither a typo nor intuitive. Lower the bar even further and, finally, the expected results surface in that volatile pitchers were tougher to peg. Shifting the focus to standard deviations in actual parkadjusted ERAs in that fourth season, the numbers jibe with expectations but the magnitude of the deltas should tickle your interest bones. It makes sense from a sampling standpoint that the 20plus games started group will experience minimal discrepancies in any category aside from performance given that only pitchers with a good amount of talent will be called upon that frequently and consistently. However, the deviations are not much different amongst the 10plus games started crop, and there isn't anything really to write home about with regards to the fiveplus games started crop. Across a larger sample, a delta of 0.05 or 0.07 is certainly meaningful to an extent, but I entered this fray expecting to see deviation deltas along the lines of 0.15 or greater. Maybe I'm just an excitable person, but I cannot imagine being the only one whose expectations were not met. Conclusion Synthesizing the results, it is becoming clear that the differences between consistent and volatile pitchers in future performance deal primarily with playing time. If we expect Pineiro and Garland to each continue to toe the rubber for 30 starts, then it tends not to make a difference what they did over the last three years in terms of volatility, as neither is likely to end up any more consistent or volatile as the other. Sure, Pineiro in this instance might experience a boomorbust season, but the data suggests that decisions based on volatility or consistency are illdefined if the underlying rationale centers predominantly on the recent past. Overall, this information does not alter the idea that volatile players should be attractive volatile teams, with the analog true for consistency, but it sheds light on the identification process of volatility and consistency. A team that signs someone like Garland (stay classy, San Diego) should not expect any more consistency than from the lone member of the Pineiro Guild (hey, hey, L,A,A), even though the former has certainly been more consistent in the past then the latter. The idea of clubs deriving more utility from certain players remains intact, but our process of figuring out which pitchers could provide this increased utility is debatable.
Eric Seidman is an author of Baseball Prospectus. 4 comments have been left for this article.

I'm skeptical as to how your methods could yield results.
The first sample (min 20 starts) seems useless because of the sample bias that you admit to.
I would suspect that the large majority of all players in the sample are hovering near the median, so it's no surprise in the slightest that the numbers are so close and don't indicate much. Perhaps taking the top and bottom quartile and comparing them would have yielded an interesting result.
It seems like it would more sense to analyze how likely a pitcher that had breakout season (Piniero) is to retain his performance.
Maybe I just didn't see where you might have mentioned it, but the most interesting thing to be about the numbers is that in all cases, the volatile pitchers outperformed the predictions and the consistent pitchers underperformed. A look into why that is would have been an direction worth taking.
There's a pretty obvious cause and effect explanation for why the volatile pitchers outperformed and the consistent pitchers underperformed their projections. If a volatile guy is being volatile in the wrong way, you quit starting him, so he doesn't make your start cutoff. If a steady guy is underperforming, you look at his track record and determine that he'll turn it around, so he gets his starts.
I agree that the 50%50% cutoff shouldn't have been used. Garland's ERA once dropped almost a run and a half over the course of a season, is he always in the stable group?
These are both interesting thoughts I will work to explore next week, as perhaps the 50/50 is skewing things and the 25 vs 75 quartiles could prove more telling.
Totally agree. Splitting the entire population into two groups is not going to prove anything.