Several weeks ago in this space I took a look at batters that PECOTA has habitually overrated or underappreciated over a period of several seasons. Today I’ll take a look at starting pitchers to see if we can identify those that continually flummox PECOTA by making a mockery of their pre-season forecasts year after year.

When comparing hitters I used Equivalent Average-a metric that has the advantage of being specifically forecast by PECOTA and is translated to account for contexts such as ballpark and league difficulty. For pitchers, finding such a straightforward comparison between PECOTA and actual performance is a little trickier. For this article I’ve used PECOTA‘s projected Equivalent ERA (EqERA) and compared it to the “translated” ERA as shown on a pitcher’s DT Card. Both numbers are adjusted to account for differences in league and ballpark and are calibrated to fit a fictional league with an average ERA of 4.50, so they lend themselves quite well to comparison. On its own, ERA can be a pretty blunt instrument, dependent to some extent on factors beyond a pitcher’s control, but the translated version should be good enough for our purpose here, which is to identify those pitchers that PECOTA has habitually misread.

The charts below are based on the 153 pitchers that pitched at least 100 “translated” innings (per their DT Card, which adjusts usage somewhat) in 2006, then follows the 2006 PECOTA “misses” to see whether PECOTA improves its accuracy over time. Very few relievers reach the century mark in innings-even fewer in multiple season-so this should make our sample almost entirely starting pitchers. I’m using a benchmark of 0.33 runs of EqERA to identify a “missed” projection; there’s no complex statistical reason for that number, other than one-third of a run seemed about right.

First up are the players that underperformed their PECOTA projection by that magical 0.33 runs:

Pitchers with 100+ Translated IP During Season: PECOTA Optimism
        Sample                   Sample    PECOTA EqERA
Year  Description                 Size    0.33 runs Low    Pct.
2006  All Players                  153          36         24%
2007  Optimistic in 2006            18           2         11%
2008  Optimistic in 2006-07          1           0          0%

During the 2006 season, only 24 percent of pitchers that reached the 100-inning threshold were more than a third of a run worse than PECOTA‘s projection. Note that there is a selection bias at play here: pitchers that underperform their projections are far more likely to lose their spot on a staff, and thus not meet the innings threshold, than those that meet or exceed expectations. Of those 36 pitchers who disappointed in 2006, 18 went on to pitch 100 innings in 2007, with only two able to spend significant time in a major league rotation while continuing to significantly underperform their projection. By 2008, only one two-time disappointment logged the required 100 innings yet again, finally validating PECOTA‘s trust by exceeding his forecast.

Disappoint PECOTA twice at your peril; do so three times, and it’s highly unlikely you’ll continue to be entrusted with a major league rotation spot. Byung-Hyun Kim was only able to leverage the belief that he could morph back into his early-career Snake form for two seasons before the wishcasting came to an end. Only Felix Hernandez, the object of PECOTA‘s longest-running unrequited bot-crush, was given a third chance to match PECOTA‘s great expectations. It’s good to be the King.

So PECOTA almost never overhypes a starting pitcher three times, due to baseball’s natural culling of the pitching herd. What about players that outperform PECOTA‘s pessimistic forecasts?

Pitchers with 100+ Translated IP During Season: PECOTA Pessimism
        Sample                   Sample   PECOTA EqERA
Year  Description                 Size    0.33 runs High  Pct.
2006  All Players                  153         82         54%
2007  Pessimistic in 2006           48         26         54%
2008  Pessimistic in 2006-07        19          8         42%

During the 2006 season, fully 54 percent of pitchers that reached the 100-inning threshold were more than a third of a run better than PECOTA‘s projection. This may seem high, but again the selection bias is at work here: you usually get to stay in the rotation if you’re pitching well. Of those 82 go-getters, 48 pitchers then went on to toss 100 innings in 2007, with PECOTA again underestimating 54 percent of them. By 2008, only 19 pitchers that had twice been underestimated were able to log 100 innings, and eight of them were dissed by PECOTA a third time. A little over five percent of the pitchers in the initial sample (eight of 153) beat their projections by a fair amount three times in a row. For hitters the number was a little under five percent-quite comparable.

What is it about these pitchers that habitually gives PECOTA indigestion?

                    2006    2006  |  2007    2007  |  2008    2008
                   Actual  PECOTA | Actual  PECOTA | Actual  PECOTA
Player              EqERA   EqERA |  EqERA   EqERA |  EqERA   EqERA
Gil Meche           4.85    5.37  |  3.67    4.94  |  3.86    4.32
Ted Lilly           4.38    5.10  |  3.65    4.39  |  4.06    4.44
Chad Billingsley    3.81    4.62  |  3.12    4.74  |  3.47    4.13
Matt Cain           3.84    4.78  |  3.41    4.49  |  3.68    4.29
Wandy Rodriguez     5.85    6.87  |  4.38    5.74  |  4.32    4.84
Derek Lowe          3.50    4.59  |  3.94    4.56  |  3.48    4.41
Chris Young         3.57    4.76  |  3.34    4.39  |  3.76    4.25
Chien-Ming Wang     3.68    4.98  |  3.64    4.27  |  3.58    4.47

This is a prime example of what Steven Goldman might call “a congeries of unlike players.” Worm-killers like Lowe and Wang are balanced out by the soft-tossing fly-ball artistry of Young and Lilly. There are youngsters like Cain and Billingsley who seemingly matured ahead of PECOTA‘s anticipated timetable for them, and late-bloomers like Rodriguez or a re-bloomer like Meche, whose sudden successes belied a fairly well-established previous pattern of mediocrity. Even diving in from this 30,000-foot view to review a little more detail reveals very little. Many of these pitcher-seasons feature a relatively low BABIP, yet that doesn’t really explain much, as PECOTA often predicted an even lower BABIP rate. No matter how long I stare at the list above, the secret Magic Eye picture never reveals itself. The only unifying fact is this: PECOTA initially projected each player as being subpar (in some cases well below par), then slowly improved the projection each year-but never enough to match the player’s actual production.

Will any of these players make PECOTA out to be a four-time loser? Right now, Cain (Projected 4.14/Actual 3.19) continues to be an icon of misunderstood youth, while PECOTA has even less faith in the continuing effectiveness of Rodriguez (Projected 4.57/Actual 3.65). No one else seems likely to greatly exceed their projections.

Traditionally, pitcher performance is considered to be more variable and harder to predict than batting production. While PECOTA may seem to have similar counts of hits and misses for both pitchers and hitters over time using the criteria spelled out in these two articles, that point isn’t proven; the “0.33 points of ERA /10 points of EqA” benchmarks used aren’t necessarily equivalent margins of error. Lists of PECOTA‘s recurring misses are somewhat like lava lamps: interesting to look at, but only marginally illuminating. Further research is needed to throw more light on the types of players that are more likely to be badly misread, and in which direction. But if, like me, you once took Dan Meyer in an early round of your sim league draft, perhaps you can find comfort in the thought that even PECOTA can sometimes be very, very wrong.

You need to be logged in to comment. Login or Subscribe
Could it be something to do with HR rates? Extreme FB and GB pitchers might both have HR rates that PECOTA would expect to regress more than they actually would. Extreme FB guys are better than one would expect at preventing HR (see Cliff Lee). Also, Wandy's a guy who has always shown great peripherals aside from his struggles with gopheritis.
Are these pitchers better than expected with runners on base?
Are you going to open every article with "Several weeks ago" even when it's not true (as it isn't today)? Will Christina let yo get away with that?
Point taken, Evan. If I do any more follow-ups to earlier articles I'll try to use different verbage to refer back to them. Perhaps a "referential text randomizer" would do the trick, containing a menu of other constructs like "It was the best of posts, it was the worst of posts ..." or "A long time ago, in an article far, far away ...." ;) But since it's been 13 days since the first article, personally I think "several weeks ago" is reasonably accurate (albeit one day short), though admittedly not particularly inspired.
I like "It was the best of posts, it was the worst of posts." Please start your articles with this as often as possible.
Statistical aggregations pretty much have to live with the outliers. Individually they can only be explained case by case, and taken as a whole the existence of outliers is almost expected. Look at the Netflix contest where a small number of “controversial” movies cause most of the headaches for rating algorithms that rely on statistical trends. This is not to say that there are no general explanations for the outliers on any level, but that they are expected, or require more information than is accountable in the system. I guess a general theory might be that there are areas of the game that are not large enough to affect the overall predictive picture, but do affect a small number of players disproportionally. for instance, variations in pitch repertoire like that of wang may have be analysed by a formula that is produced by fitting with the body of pitchers that do not pitch like wang. So the predictive era formula produced like this probably wont work for wang. Yet, there is nothing one can do on the macro level. No tinkering with the formulas save going into more detail and building things from a more fundamental level can help with the outlier cases. If pecota is a tabletop, it may appear smooth to the eye, but take a scanning microscope to it, and one will see bumps and holes. However, there is really no way to observe the small bumps without zooming in, and thereby losing the original units of measurement, and even the original laws of physics. To complete the analogy, the naked eye’s view is the broad statistical trends, while case specific accounts do take into account the information that is introduced by zooming in with the microscope. By this analogy, there will not likely be a statistical solution to the problem of individual outliers, because the sources of explanation for them likely wont be statistical but mechanical/scouty (unless we invent a new set of base statistics that can take into account the mechanical/scouty points. This is however a dismal prospect)
Well said. Just looking at (and doing very cursory analysis of) the pitchers on this list certainly points to them being random outliers. If there's a discernable pattern to the players that are continually over- or under- projected (i.e., many of them exhibit this certain characteristic), AND very few other players that PECOTA didn't miss on exhibit the same characteristic, that might point to a correctable flaw -- but that's not what this cursory analysis has shown, or even hinted at.
I think Nate was aware that there were some types of players that PECOTA seemed ill-fit for. Or maybe certain individual players. Reading Ken's piece here brought to mind Nate's article from "several years ago": "The Unique Ichiro":
Key missing stat from this analysis: How many pitchers would we expect PECOTA to underestimate 3 times running just by chance, given the standard error of the projections? I have a hunch the answer is very close to 8...
What these articles really demonstrate, it seems to me, is the futility of PECOTA. I appreciate your willingness to expose the system's warts, but when it can't predict 75% of hitters within 10 points either way of EQA and 75% of pitchers within .33 of EqERA, the system is somewhere in the vicinity of pointless. That PECOTA is the best of the projection systems tells us that, even with all the great research being done, computers still can't predict player performance any better than can a reasonably sentient fan. With all due respect, BP authors ought to keep that in mind when the urge arises to treat PECOTA projections as inevitable.
If you can predict companies' share prices at 27 bucks and they end up at 30 bucks a share, you'll be a rich man. You won't be getting every last drop of your possible profit, but you will bury your competitors. And if you can do that and get it right 99.2 percent of the time over a three year period, well... Let's just say that I and my PECOTA-drafted fantasy teams doubt your "any sentient fan" imaginary construct. I don't see that "fan" putting his work out in April year after year to be judged.
No "sentient fan" will ever sit down and evaluate 800 players the way PECOTA does. PECOTA and its forecasting brethren do not hold themselves out as infallible oracles; and only an idiot would use them as such. They're tools, a baseline to assist a "sentient fan" in making his own judgments about what a player's performance will be. Those tools are only as good as the person using them - if the predictive house you build comes crashing down, don't blame PECOTA.
Both these responses miss the point: PECOTA tells us pretty close to nothing that we don't already know. A five percent improvement on my best guess about a player is hardly worth the time. I don't need PECOTA's help getting 75% of performance projections wrong. Moreover, BP authors often rely on PECOTA to evaluate signings, trades, etc., as if PECOTA is destiny. It's still a pretty weak predictor. None of this should be construed as a repudiation of the work done at BP. One can be a fan without being blind to the flaws. A little humility when it comes to predicting performance is indicated here.
Hmm.. a 5% increase in average for a .286 hitter turns him into a .300 hitter. A 5% improvement in WHIP turns a pitcher with a 1.40 WHIP into a 1.33 WHIP. Fantasy baseball leagues are won on such margins, or on the chances of those margins panning out. Not to mention, from a player perspective, someone who hits .300 will get paid more than someone who hits .286. Besides, a person's "best guess" requires some knowledge with which to make a guess. PECOTA, along with other predictive tools, are sources of information that can help make your "best guess" better.
How about instead of ERA compare Actual runs allowed vs projected?
Haha, I took Dan Meyer as a keeper in my fantasy league. Worse, I chose him over Dan Haren. *sigh*
I got hooked on Jose Jimenez for a draft or two...