I remember on Opening Day, when Brandon McCarthy was facing the Giants, and it seemed like he might just be the Cy Young front-runner. He struck out all three batters in the first. He got the Giants in order on seven pitches in the second. His curveball was lethal, and his fastball
McCarthy just hit 94, sitting 93. Average two-seamer last year was 91.
This guy’s about to throw a no-hitter, I told the guy I was with, and then the next inning began with a hit, which led to a run, and McCarthy finished the game with a 6.2/6/5/5/1/4 line. Not a good outcome. Not a bad performance. The rest of the year went like that, with McCarthy throwing significantly harder and putting up a good FIP but losing almost every time he pitched. When he got traded to the Yankees, it seemed like a good trade. When he started pitching well for the Yankees and getting good results, it seemed like a predictable outcome. When Drew Fairservice wrote this month that “we can chalk a lot of this early success (with the Yankees) up to regression and the power of numbers like xFIP,” I thought, yes, that makes sense.
The rest of Fairservice’s sentence continued, “without the fundamental changes brought about by an informed scouting and coaching staff, does McCarthy turn it around so quickly?” And here’s where it gets tricky for those of us out here. We have the stats. We don’t necessarily or always have access to the scouts and the coaching staff. Thankfully, we have guys like Nick Piecoro.
Piecoro is the Diamondbacks beat writer for the Arizona Republic, one of the best in the game. He wrote about McCarthy’s turnaround with a bigger question in the frame: “Diamondbacks must ask: Is bad pitching luck really about luck?” Piecoro described the Diamondbacks’ coaching philosophy with McCarthy (which involved changes in pitch selection), challenged GM Kevin Towers on his record, and ran down the list of the team’s recent imports and exports:
McCarthy isn't the only pitcher the Diamondbacks couldn't salvage. Among Ian Kennedy, Tyler Skaggs and Trevor Bauer, none has become a dominant major-league starter with his new club, but all are pitching better — and throwing harder — since leaving. What went wrong while they were here?
Since the start of 2011, the list of pitchers who improved upon coming to the Diamondbacks is short and riddled with asterisks.
Right-hander Brad Ziegler has been equally effective here as he was in Oakland. Relievers David Hernandez and Matt Reynolds have had success, but both wound up needing Tommy John surgery, which raises another concern.
Reliever Oliver Perez might be the best example of a pitcher the Diamondbacks brought in who has both pitched well and stayed healthy.
So, do the Diamondbacks really have a pitching problem? There are three stories here.
In this year’s Baseball Prospectus Annual, Russell Carleton wrote about pivoting from Large-N studies to N=1 analysis. In Large-N research, we take advantage of baseball’s millions of trials to draw conclusions about the way the game works. We’ve concluded, for instance, that the majority of baseball players don’t exhibit a significant tendency to perform differently in “clutch” situations compared to “regular” situations. That’s not to say none do, but that baseball generally doesn’t work that way. Generally, players are better when they’re 27 than 37; it’s not always the case, but it’s reliable enough that it should guide personnel decisions. We know that generally sacrifice bunts don’t help score runs (though they have their place) and that generally starters pitch better when they’re working in relief and that generally players with sizable platoon splits won’t have such sizable platoon splits much longer. Knowing that FIP (or xFIP) is usually a better guide to a pitcher’s performance than ERA in a limited sample is an understanding based on Large-N research.
And, thanks to that Large-N kind of data, we think that generally managers and coaches don’t move the needle all that much. Some, certainly. When Chris Jaffe wrote Evaluating Baseball’s Managers, he used Large-N methods to conclude that Tony LaRussa is the second most valuable manager in history. LaRussa’s value: About three wins a year. That’s significant, but it’s also enough to know that when a team outperforms its projections by 25 wins it’s probably not the manager’s doing, Manager of the Year voting to the contrary.
Carleton himself looked at pitching coaches last year. (Pitching coaches, and the rest of the coaching staff, would overlap with/be included in Jaffe’s observed affects.) The best pitching coach, Carleton suggested, could be worth (in the case of Leo Mazzone) 4.5 wins per year. The top active pitching coaches, Dave Righetti and Curt Young, were worth about 2 wins. So the Large-N approach tells us two things:
- Good coaching can make a difference and
- Almost certainly isn’t the cause of a disastrous run of results, like you might see in the Diamondbacks’ staff, so much as a weight on the scale.
And what it leaves out is the particular: Is the Diamondbacks’ coaching staff, in the past two or three years, helping or hurting?
So now we go to what maybe we’ll call a medium-N question. (My phrase, not Russell’s. It’s probably not a real thing. Unlike Russell, I have no doctorates.) Piecoro lists some pitchers who have pitched better with other teams than they did with the Diamondbacks, but players’ resumes can be very complicated, so we turn to PECOTA to see whether those pitchers pitched better than they should have pitched. If the Diamondbacks are ruining their pitchers, those pitchers should be performing worse than PECOTA expected they would upon arriving in the organization. And, once ruined by the Diamondbacks, those pitchers should have unnaturally suppressed projections when they leave, projections that should (assuming the pitchers get fixed by the new organization) be easy for the pitchers to exceed.
So, looking at every pitcher named in Piecoro's piece: McCarthy, Kennedy, Skaggs, Bauer, Cahill, Delgado, Reed, Ziegler, Hernandez, Reynolds, Perez. Some came, some left. Those who arrived, since 2011, produced a 3.68 ERA and a 3.60 FIP in a Diamondbacks uniform. PECOTA’s projection for them in those seasons: 4.05 ERA, 4.08 FIP. Diamondbacks who arrived pitched better than PECOTA expected based on the pitchers’ track records.
Those who departed, since 2011, produced, on average, a 3.99 ERA and a 4.13 FIP. PECOTA’s projections for them, with their Diamondbacks history helping guide the projections: 3.93 ERA, 3.89 FIP. They did a little bit worse than expected.
But those are small and selected groups, 18 player seasons for the arriving Diamondbacks (we counted every season they were with the team from 2011 on) and six player seasons for departing pitchers (we counted only the first season after they left, along with any partial seasons if traded mid-campaign; an exception was made for Bauer, because he’s so clearly part of the narrative). We could expand this to every pitcher who was imported to or exported from Arizona since 2011 and threw at least 30 innings in a season for Arizona. That makes for a much larger group, bringing Sam Demel, Armando Galarraga, Joe Saunders, and every other mop-up man, LOOGY, closer and long-haired Creed-singing slop thrower into the mix. Now we’ve got about 30 pitchers, with 15 player seasons for departures and 40 for arrivals. New results:
|Group||PECOTA FIP||Actual FIP||PECOTA ERA||Actual ERA|
This time the Diamondbacks' new pitchers do better than expected on peripheral measures but worse in actually preventing runs, which could be the defense behind them or it could be that they suck. Not much worse, though. Pitchers the Diamondbacks let leave, meanwhile, do worse by both measures, much worse by earned runs.
There are a lot of methodological decisions I had to make here,1 but one was that I didn’t weight the players’ stats in coming up with a group average. If I had, they would have been skewed by the fact that pitchers who exceed their projections would be allowed to throw a lot more innings, while those who were ruined would be tossed aside, and thus have practically no influence on the group stats. So, just unweighted averages. The downside to this is that some pitchers in nine-inning samples get really insane, unrepressed stats, like Mike Zagurski's 17 ERA. I don’t want to ignore those, but I don’t want to let them control the whole story, so for one table let’s ignore seasons of 15 innings or fewer:
|Group||PECOTA FIP||Actual FIP||PECOTA ERA||Actual ERA|
That quiets things down. Now it doesn’t look like the Diamondbacks ruin pitchers, and it no longer looks like the Diamondbacks improve pitchers. It just looks like everybody does what they’re supposed to.
So our first stage of exploration was to survey the big, giant, What We Know About The World landscape, and it led us to believe the Diamondbacks probably don't have the power to wreck a staff of pitchers. In our second, we surveyed the general trends around the team, looking at cohorts to see whether they pointed us to something. They don't seem to have. So: We conclude the Diamondbacks are A+ and in the clear?
I'd argue no. Because this gets us to the N=1 questions. As Russell is telling me this very minute, "N=1 is about the fact that overall, players (writ large) are pretty steady over time. But at the micro level, changes can be big and real and we are too quick to dismiss any changes as random fluctuations, because that's what large-N says." How do we find out which changes are real? One pretty great way might be to ask the players involved, and listen to them, especially if it's via a beat writer we trust reporting on a player who seems to be interested in the answer himself.
Sometimes, the data that can answer these questions come in spreadsheets, like those Russell might have generated for his piece on individualized "clutch" approaches this week. But often, the data might not look like data at all. It's knowing a team's scouting report on an opposing player, or it's knowing the player's own assessment of himself. In this case, I know that Piecoro has spent years talking to these Diamondbacks players, and likely has some sense of whether any of them are frustrated by the organization's approach, or by some other factor inherent in the Diamondbacks experience. I'd trust that he probably has insights into details of the club that wouldn't rise to the level of a story aimed at a newspaper audience but that wouldn't be missed by the ballplayers themselves. And I'd trust that he sees the players not as a cohort, but as a series of individuals, any one of whom might actually have had his career derailed by a bad coaching decision.
So how do I measure these competing ways of exploring the question, especially when they lead to answers that differ? That's the impossible question, the one there's no N-based study to rely on. In this case, I'd trust Piecoro's judgment that this was worth an article; in other words, I trust that it's a potential problem troubling enough that people in the know are talking about it. I'd conclude that the Diamondbacks have made some wrong decisions. Of course, every team has, and ultimately I'd probably put 40 percent of my faith in Piecoro, 35 percent in my cohorts, and 25 percent in the conclusion that coaching effects are usually small and difficult to pick up such that, in the short view, it's best for me not to try.
The Large-N questions broadly point us toward north, south, east, west. The cohort study tells us which map to use. But only the N=1 approach really puts us in the town and lets us meet the people. I'm buying on Brandon McCarthy.
As noted, many decisions had to be made on methodology, and here are some of the choices that were made: For players who were traded mid-season, preseason PECOTAs were used to establish an expected performance. A new PECOTA run would have, obviously, been a bit different with an extra half season of information, but not by much, and we don’t have those for past years. Craig Breslow’s half season post-trade was ignored because using his preseason PECOTA projection at the time would have included no Arizona stats, and thus wouldn’t be relevant. The calculation of FIP usually involves adding a “constant” that changes slightly from year to year, but we used a 3.20 constant in all cases for simplicity. Joe Thatcher’s 2013–14 half seasons with the Diamondbacks were combined into one season and compared to his 2014 preseason PECOTA projection. David Hernandez's projections for his first year with Arizona were probably too pessimistic because the system still treated him as a partial starter. I totally ignored the frequency-of-Tommy-John-surgery concern that Piecoro brought up, because I can't even begin to assign blame for injuries that might have started when the pitchers were 12 years old. And, finally, I flat out forgot to include Heath Bell. â†©