World Series time! Enjoy Premium-level access to most features through the end of the Series!
February 20, 2013
The Socratic Approach to PECOTA
And Why We Don't Hate Bryce Harper
When the PECOTA spreadsheet appears, one of the first things people do is pick out the players projected to make the greatest gains or suffer the largest declines. Then the questions start: Why does PECOTA like/dislike so-and-so so much? Is there a problem with the projections? Or is the system just picking up on something I’m not seeing?
Behind the scenes, the BP staff goes through the same thought process. Before we publish the projections, we approach PECOTA’s output with a skeptical eye, on the lookout for anything that could be a bug. But even after we’re satisfied with the spreadsheet and release it to our subscribers, PECOTA retains the capacity to surprise.
That PECOTA sometimes spits out surprising projections is a feature, not a bug. In that sense, it satisfies Bill James’ “80/20 Rule”, which says that a good statistic should conform to our expectations most of the time but still surprise often enough to be interesting. After all, if PECOTA simply confirmed what we already knew about player performance, it wouldn’t be worth the work. But James also (probably) said something like “No statistical finding is immune to the laws of common sense.” Granted, “common” sense might be more loophole than law, but the lesson seems sound: we shouldn’t blindly accept what every statistic says. A surprising stat might be right, but regardless, we’ll learn more by asking why it says what it says than we would by accepting it without question.
Earlier this week, I went on Clubhouse Confidential to talk about some of PECOTA’s more eye-popping projections. This is part of the first of two segments we did:
Before I went on, I wanted to be sure I could explain every stat in an easily digestible way, since there’s only so much time for talking on TV. As part of that process, I tried to anticipate and prepare for the questions and objections that might occur to someone seeing PECOTA projections for the first time.
They say that behind every great projection system is a great sabermetrician. (Note: no one actually says this.) PECOTA’s current keeper is Colin Wyers. As I crammed in the frantic few hours before I went on the air, I queried Colin (with words, not SQL code, though his brain can process both) on GChat. We thought BP readers might enjoy reading his responses, so we’ve republished the IM exchange below. (The transcript has been condensed and cleaned up a bit, so we sound like more eloquent instant messengers than we actually are.) —Ben Lindbergh
Ben: What do you think would be the best way to explain why we have Darvish as the second-most valuable pitcher?
Ben: I can talk about how good he was down the stretch...
Ben: I can also talk about how his comps are Pedro and Clemens.
Colin: In other words, what PECOTA is saying is that Darvish has some talent he showed in Japan that exceeds what he did in MLB last year, and we think he'll start bringing it this year.
Colin: And we think his high K rates are more important to his projection than some of his other stats.
Ben: Also—What do you think I can say about why our projection for the Nationals is more pessimistic than the mainstream perspective? That I will almost certainly be asked.
Ben: And 88 wins does sound a little lackluster.
Colin: 88 wins is a good record.
Ben: I know. But they won 98 last year, seemed to get better this winter, and we're projecting five other teams to win more than that.
Ben: So it's a little counterintuitive, at least.
Colin: The core of the Nationals is their starting pitching.
Colin: Gio, Strasburg, and Zimmermann all had the best seasons of their careers last year.
Colin: In terms of WARP—Stras and Zimmermann mostly improved in innings, Gio dropped innings slightly but improved in rates.
Colin: Expecting the three of them to repeat 2012 as a group is essentially fantasy. The odds that at least one of them gets hurt/misses significant time/is ineffective are pretty significant.
Ben: But we are projecting Strasburg to be worth more because of the innings, and the other guys not bad...
Ben: We're projecting them to have a roughly league-average offense, I guess is the other thing.
Colin: We're talking about a team that was sub-.500 two years ago.
Colin: And not as a blip, for every year since they moved to Washington.
Ben: All right. So it sort of falls under the "PECOTA has a long memory" category.
Colin: As for the offense, LaRoche is a 33-year-old who is coming off an Indian Summer season—guys with their career-best year at 32 typically don't hold onto those gains.
Ben: Yes, he's a good one to mention.
Ben: And Harper, specifically. What's the best way to explain his projected regression? There aren't a lot of successful 20-year-olds, he doesn't have a lot of comparables because he's such a rare talent?
Colin: His projected regression is, in fact, regression to the mean.
Colin: I mean, a .274 TAv is pretty good.
Ben: I know.
Colin: And a .291 TAv is really, really hard.
Ben: But people look at it and think: He was better than that last year, he's a year older now, and he's one of the best young talents in baseball, so why does PECOTA hate Bryce Harper?
Ben: His comps are Trout, Griffey, and Upton—those guys are pretty good.
Colin: We think he's pretty good!
Colin: We think he's an above-average major-league starter at age 20!
Colin: There aren't a WHOLE lot of those in MLB history.
Colin: Look at A-Rod's first full season.
Colin: Put up a .335 TAv.
Colin: At age 20.
Colin: At age 21?
Ben: I'll use that!
Colin: Those were both very good seasons for A-Rod, and we're expecting a very good season from Harper.
Colin: Very few hitters follow a smooth progression through their career—the platonic ideal aging curve doesn't exist in the wild.
Colin: Because there's a LOT else going on.
Colin: With Harper, put it another way.
Colin: His TAv last year is Adam Dunn's career TAv.
Ben: There's also the fact that he's playing left field.
Colin: The thing with guys like Harper.
Colin: It's like that bit from Moneyball.
Colin: "It's not that hard, Scott. Tell him, Wash." "It's incredibly hard."
Colin: Everything in Major League Baseball is incredibly hard.
Colin: Looking at position players who won Rookie of the Year, on average we see a 10-point drop in TAv the following season.
Colin: And pretty much EVERYONE who won the Rookie of the Year is young and at an age when you would expect improvement.
Colin: You've stopped caring about Harper by now, haven't you.
Ben: I think I probably have enough to say about Harper for several segments now.
Colin: You could probably turn that into an article.
Ben: Yes. [Ed. Note: So meta!]
Colin: The other thing you're probably going to be asked about is the AL East.
Ben: Yes. Red Sox and Blue Jays.
Colin: Pithy comments you can steal: The Blue Jays had a very impressive offseason, but they acquired most of their new talent from the Marlins, who won less than 70 games in an easier division and league. And the Orioles are a lot like the Diamondback team from a few years back whom everyone thought had figured out how to beat Pythag until they didn't.
Ben: The Jays seem to have a lousy projected defense.
Colin: I know you used to intern with the Yankees, but 30-year-old shortstops aren't out there for their fielding abilities.
Ben Lindbergh is an author of Baseball Prospectus. Follow @benlindbergh