Several weeks ago, a reader asked me if I kept tabs on the results of the recommendations made in the weekly starting pitcher planner. My response went something like, “yeah, that’d be great, but where’s the time gonna come from?”

With no need for a planner as we head into a short week, I thought it might be a good idea to use this space to slap some data together and review my work. We all know there’s a wide range of outcomes for any single contest, and no matter how solid the analysis underlying a recommendation is, it’s easy to hand-wave the results when they don’t line up with expectations. So and so didn’t have his best stuff that night, so and so was facing an inexplicably hot player, so and so was the victim of bad calls, or seeing-eye singles, or poor defense.

It’s important to take some accountability over a larger sample, though. You need to be able to trust me as an analyst, and I need to be able to trust my process, or make improvements where there might be holes. With that in mind, I looked back at the game logs for each of the recommendations I made over the past month, covering the weeks from June 6 to July 3. I’d love to go back further, but again, the time. I think this sample serves our purpose well enough.

I threw out about 40 games where the starter was scratched because of injury, demotions and promotions, rotation shuffles, and rainouts. I did include singleton starts that were supposed to be part of a two-start week before changing for any of those reasons. Here are the results for the 276 starts that made it into the sample:


# Starts



































Some observations:

There is obvious separation on the extremes. For the most part, the Auto-Starts considerably outperform the other groups and the overall numbers. That’s in spite of the fact that the sample contains a pair of Aaron Nola starts where he gave up 12 earned runs over 6 1/3 innings, and worse, Jon Lester’s eight-run blowup against the Mets last weekend. On the other end of the spectrum, the Sit recommendations have predictably been horrendous. I doubt you’re surprised by this if you’re an avid baseball watcher. For all the appointment-television, top-of-the-rotation arms – and there are more than I can remember in recent history – there are at least that many unwatchable bottom feeders. I’m not here to comment on the reasons behind that dynamic, but anecdotally speaking, it’s pretty clear that there is a gulf between the elite pitchers and the poor ones, and it seems as if that space is widening.

Don’t chase wins, girls and boys. Even though the Auto-Starts options have been clearly the best group at run prevention, limiting baserunners, and strikeouts, they win less often than the Consider group. Similarly, the Sit options win more often than my Start recommendations.

Maybe that’s because my Start recommendations have been, well, not very good based on the ratios. Let’s have a look at some of the biggest perpetrators:

Wei-Yin Chen, June 13-19 (@SD, COL): 8.1 IP, 10.80 ERA, 2.04 WHIP

I could have been more conservative here, as Chen gave up three bombs the preceding week, before yielding four in the first of this pair. At the time, I didn’t read it as much more than normalization for a pitcher who has always had trouble keeping the ball in the park. I still interpret it that way. Chen’s home runs against have always been lumpy.

J.A. Happ, June 6-12 (@DET, BAL): 12 IP, 7.50 ERA, 1.33 WHIP

Okay, yeah. This was dumb even though 10 of his 11 starts to this point had been of the quality variety. The Tigers and Orioles hit five homers in these two games, as they are wont to do.

Joe Ross, June 6-12 and June 27-July 3 (@CHW, PHI, NYM, CIN): 22.1 IP, 6.45 ERA, 1.52 WHIP

This is a relatively easy group of opponents, especially the three at home, where Ross had been substantially better in his short career. Ross was placed on the disabled list after the July 3rd with shoulder inflammation. Until his outing on July 2nd, there’s not much in the velocity that hints at an injury, though perhaps he was hiding something that didn’t manifest itself in the data.

Adam Conley, June 6-12 and June 27-July 3 (@MIN, @ARI, @DET, @ATL): 22 IP, 5.32 ERA, 1.41 WHIP

I have relentlessly recommended Conley as a Start option this year, even though stretches like this one show that you have to accept above-average volatility if you’re going to run him out there. To some extent, the sample I chose simply caught the worst of Conley without capturing much of the good version, but there is a valid argument that I should be more publicly cautious with some of the high-variance options I have a personal affinity for.

These aren’t the only four examples of picks gone wrong, just the four most egregious. I’m not sure I see a common thread. How about some examples of Consider recommendations that outperformed the designation:

Anthony DeSclafani, June 20-26 (@TEX, SD): 15 IP, 1.20 ERA, 0.74 WHIP, 11Ks

Danny Duffy, June 6-12 and June 26-July 3 (@BAL, @CHW, STL, @PHI): 29 IP, 1.86 ERA, 0.83 WHIP, 35 Ks

Matt Shoemaker, June 6-12 (@NYY, CLE): 15.2 IP, 2.30 ERA, 0.76 WHIP, 17 Ks

Michael Fulmer, June 6-12 (TOR, @NYY): 12 IP, 0.00 ERA, 0.83 WHIP, 6 Ks

Trevor Bauer, June 6-12 (@SEA, @LAA): 15.2 IP, 1.72 ERA, 1.02 WHIP, 13 Ks

I assume if you’re reading Baseball Prospectus, you enjoy, or at least value letting data tell you a story. As such, one of the hardest things to do as a fantasy player is make a call when there isn’t a preponderance of evidence to support either position. Often to a fault, I tend towards conservatism in that situation.

If there’s anything this whole exercise has taught me, it’s that I should be more aggressively recommending pitchers even when I don’t have a strong data-driven case to substantiate the recommendation. In the five guys listed above, you have a return from injury, a conversion to the rotation, a new pitch, a top prospect getting his career underway, and whatever is going on with Bauer. There were legitimate reasons to be more bullish that I was for each. Shoemaker, for example, was four games into his revival. Fulmer had given up one run in his past 22 1/3 innings. Duffy had shown an ability to miss bats even as he was still stretching out. In retrospect, what the hell was I waiting for?

All in all, I’m happy with my recommendations over the past month, with one exception. Those Starts sure would look better if I’d stepped out on a few solid boughs that I mistook for dainty sprigs. Lesson learned. I’ll be responsible about it, but you can count on me taking more chances going forward, and encouraging you to do the same. If the Considers are truly this indistinguishable from the Starts, there’s little reason not to.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Thanks, this is helpful for putting the recommendations in context.
So PECOTA doesn't predict outliers or small samples...but neither do any other statistical analysis systems. Darts always win stock picking contets.
I ck'd out another site's prognostication I have used and in a 2 wk sample ( Yes SSS ) had similar findings....those given high marks had less overall W's than those with lower probabilities.

Likewise, I stopped as I didn't make the time to see if it continued or if the site's picks correlated to outcome predicted.