In my preview of the NLDS between the Brewers and Diamondbacks last week, reader mopup1 posted a comment relating to something I’ve been meaning to look into for a while now.

I'm sure Derek and anyone in a week-week fantasy pool knows this but Gallardo's prone to S*it the bed at times. 6 times this season he's allowed 5 ER or more. His inconsistency is a big issue IMO in a playoff scenario.

Fans and analysts often talk about how a particular pitcher is inconsistent or how another is prone to a dominating or disastrous outing, but I’ve never seen a study looking at whether this sort of consistency (or inconsistency) is a repeatable skill. Surely it happens—as mopup1 said, Gallardo took several beatings this year—but is it something that the pitcher has any control over, or is it merely random luck? You see, for forward-looking purposes (like fantasy baseball), it only really matters if it’s something the pitcher can control. Otherwise, it’s nice to look back on and mention, but it wouldn’t tell us anything about whether the pitcher will be similarly consistent (or inconsistent) in the future.

To study whether start-to-start consistency is a repeatable skill for a pitcher, I’m going to look at several different measures of a pitcher’s performance in a given start:

ERA: Earned Run Average
FIP: Fielding Independent Pitching
Quality Starts (QS): A pitcher is awarded a quality start if he pitches at least six innings and gives up no more than three earned runs
Game Score (BJ GS): A stat invented by Bill James, it includes innings pitched, strikeouts, walks, hits, earned runs, and unearned runs.
Pure Quality Stats (PQS): A stat invented by Baseball HQ, it accounts for innings pitched, strikeouts, K/BB, hits, and home runs
True Quality Starts (TQS): A stat I invented at THT, it’s my stat of choice to measure individual starts since it uses run values (as opposed to the arbitrary weighting of Game Score and PQS) and ignores earned runs, hits, and homers. Instead, it uses strikeouts, walks, and line-drive neutralized ground balls, fly balls, and popups.

The Study
To measure how consistent a pitcher is in terms of each statistic, I’m going to use a statistical tool known as a standard deviation. A standard deviation tells us how closely a group of numbers is clustered together. So if, say, a pitcher’s TQS scores are clustered very closely together, he’ll have a low standard deviation—in other words, he’s been consistent. If his TQS scores are far apart, however, he’ll have a high standard deviation—in other words, he’s been inconsistent.

From here, I’m going to look at all pitchers who started at least 20 games in back-to-back seasons since 2002 and compare their consistency (as measured by standard deviation) between the two years. To do so, I’m going to run a correlation, which measures the relationship between two sets of numbers. The closer to zero, the weaker the relationship; the closer to one, the stronger the relationship.

The Results















While a pitcher’s consistency in terms of my own True Quality Starts holds the strongest year-to-year relationship, it’s still a very weak relationship overall (since 0.07 is much closer to zero than it is to one). What this tells us is that, no matter how we want to measure a pitcher’s success on a start-by-start basis, a pitcher’s consistency from start-to-start is largely unpredictable. Even if he was extremely consistent one year, he could just as easily be inconsistent the next year.

Application to Fantasy Leagues
The concept of consistency is of particular importance to owners in head-to-head leagues, who may be drafting players with the expectation that they’ll be consistent.* While this study isn’t definitive, it does seem like it would be ill-advised to count on a pitcher’s consistency. In fact, head-to-head leaguers are actually going to be doing themselves a disservice if they pass over a superior pitcher in favor one that that has a reputation of consistency. Thinking a pitcher is consistent might make us feel safer, but the point of a fantasy league is to win, and ignoring consistency to go with the better pitcher is going to give us a better chance of doing that.

 *It’s interesting to note that, while consistency is often the trait people value in a pitcher, an inconsistent pitcher is actually going to provide more value. Who is going to win more games, Pitcher A who allows 8, 0, 8, 0, 8, and 0 runs or Pitcher B who allows 4, 4, 4, 4, 4, and 4 runs? While both average the same number of runs allowed and Pitcher B is more consistent, Pitcher A is going to win more games because he’s almost assured of a win in his 0-run starts (Pitcher A would win something like 2.3 games to Pitcher B’s 1.1 games). Of course, this is a moot point since the study shows inconsistency to be largely random.  

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Honestly, the point reached by this article is useful (to some degree). But I have two problems with this:

1) Clicking on the link to TQS takes us to a non-BP article explaining the process. I understand that Derek created the metric before writing for BP. However, the fact that BP hasn't incorporated said metric (despite all the factors that it accounts for) is a bit of an issue. Anyone who writes a BP article where they include a bunch of stats, including their own non-BP verified metric, and then magically has their own data come out's a problem for me.

2) Let's say that nothing in comment #1 is a problem, and that statistical research completely and thoroughly verifies every point that Derek makes. Even if that's the case, I'm disappointed by this article. What sold me on Baseball Prospectus was the quality of the writing that accompanied the statistical analysis. Rany Jazayerli, Nate Silver, Joe Sheehan, Christina Kahrl - this is what brought me here. These people (along with current BP-er Jay Jaffe and others) knew how to combine the narrative of an event with statistical analysis, and I rarely left their articles without the information I needed to evaluate their claims. Derek's point is interesting, potentially valid, and worth considering. However, it's shallow, poorly written, has a grand total of ONE supporting data set, and it's flat-out uninteresting in its current format. The final of these points is the most damning for me. BP might be a lot of things, but it should never be boring.

For what it's worth, when I was running the numbers for my Fangraphs piece in January (, I tried looking at pitchers using a few different stats (xFIP was the most defense-independent) and found the same null result as you. I thought I'd find something, at least with relievers, but, well... nope. Nice work though, and nice point about the wins, too.

-Seth Samuels

I think consistency is important when judging pitchers but not on a start by start basis but on a seasonal basis.

I always like to balance risk and reward when selecting a fantasy team and I think guys like Haren, CC etc have value. Similarly guys like Ted Lilly, Hiroki Kuroda etc outperform the upside guys but consistently go below where they are drafted.

Has anyone looked at this on a seasonal basis? Are some pitchers more consistent than others over a number of seasons?
I'm not a statistician, but came across a PhD dissertation written perhaps 15 years ago whose conclusion was that pitcher performances, measured annually, varied by 40% in the NL & > 50% in the AL. In Rotisserie leagues, the 65/35 split in $ invested in hitting vs. pitching, which has held consistently for decades, regardless of depth of penetration of the player pool, is mostly a function of this inherent unpredictability (which no doubt is also contaminated by the fact that pitchers are injured ~ twice as often as hitters). So I find these results, with a high correlation of .07% unsurprising.