The Socratic Approach to PECOTA: And Why We Don’t Hate Bryce Harper

February 20, 2013

When the PECOTA spreadsheet appears, one of the first things people do is pick out the players projected to make the greatest gains or suffer the largest declines. Then the questions start: Why does PECOTA like/dislike so-and-so so much? Is there a problem with the projections? Or is the system just picking up on something I’m not seeing?

Behind the scenes, the BP staff goes through the same thought process. Before we publish the projections, we approach PECOTA’s output with a skeptical eye, on the lookout for anything that could be a bug. But even after we’re satisfied with the spreadsheet and release it to our subscribers, PECOTA retains the capacity to surprise.

That PECOTA sometimes spits out surprising projections is a feature, not a bug. In that sense, it satisfies Bill James’ “80/20 Rule”, which says that a good statistic should conform to our expectations most of the time but still surprise often enough to be interesting. After all, if PECOTA simply confirmed what we already knew about player performance, it wouldn’t be worth the work. But James also (probably) said something like “No statistical finding is immune to the laws of common sense.” Granted, “common” sense might be more loophole than law, but the lesson seems sound: we shouldn’t blindly accept what every statistic says. A surprising stat might be right, but regardless, we’ll learn more by asking why it says what it says than we would by accepting it without question.

Earlier this week, I went on Clubhouse Confidential to talk about some of PECOTA’s more eye-popping projections. This is part of the first of two segments we did:

Your browser does not support iframes.

Before I went on, I wanted to be sure I could explain every stat in an easily digestible way, since there’s only so much time for talking on TV. As part of that process, I tried to anticipate and prepare for the questions and objections that might occur to someone seeing PECOTA projections for the first time.

They say that behind every great projection system is a great sabermetrician. (Note: no one actually says this.) PECOTA’s current keeper is Colin Wyers. As I crammed in the frantic few hours before I went on the air, I queried Colin (with words, not SQL code, though his brain can process both) on GChat. We thought BP readers might enjoy reading his responses, so we’ve republished the IM exchange below. (The transcript has been condensed and cleaned up a bit, so we sound like more eloquent instant messengers than we actually are.) —Ben Lindbergh

***

Ben: What do you think would be the best way to explain why we have Darvish as the second-most valuable pitcher?

Ben: I can talk about how good he was down the stretch…

Ben: I can also talk about how his comps are Pedro and Clemens.

Colin: Darvish had a good season last year, pitched very well in Japan, and his peripherals (i.e. FIP) exceeded his ERA by a significant amount.

Colin: In other words, what PECOTA is saying is that Darvish has some talent he showed in Japan that exceeds what he did in MLB last year, and we think he'll start bringing it this year.

Colin: And we think his high K rates are more important to his projection than some of his other stats.

Ben: Also—What do you think I can say about why our projection for the Nationals is more pessimistic than the mainstream perspective? That I will almost certainly be asked.

Ben: And 88 wins does sound a little lackluster.

Colin: 88 wins is a good record.

Ben: I know. But they won 98 last year, seemed to get better this winter, and we're projecting five other teams to win more than that.

Ben: So it's a little counterintuitive, at least.

Colin: The core of the Nationals is their starting pitching.

Colin: Gio, Strasburg, and Zimmermann all had the best seasons of their careers last year.

Colin: In terms of WARP—Stras and Zimmermann mostly improved in innings, Gio dropped innings slightly but improved in rates.

Colin: Expecting the three of them to repeat 2012 as a group is essentially fantasy. The odds that at least one of them gets hurt/misses significant time/is ineffective are pretty significant.

Ben: But we are projecting Strasburg to be worth more because of the innings, and the other guys not bad…

Ben: We're projecting them to have a roughly league-average offense, I guess is the other thing.

Colin: We're talking about a team that was sub-.500 two years ago.

Colin: And not as a blip, for every year since they moved to Washington.

Ben: All right. So it sort of falls under the "PECOTA has a long memory" category.

Colin: Yes.

Colin: As for the offense, LaRoche is a 33-year-old who is coming off an Indian Summer season—guys with their career-best year at 32 typically don't hold onto those gains.

Ben: Yes, he's a good one to mention.

Ben: And Harper, specifically. What's the best way to explain his projected regression? There aren't a lot of successful 20-year-olds, he doesn't have a lot of comparables because he's such a rare talent?

Colin: His projected regression is, in fact, regression to the mean.

Colin: I mean, a .274 TAv is pretty good.

Ben: I know.

Colin: And a .291 TAv is really, really hard.

Ben: But people look at it and think: He was better than that last year, he's a year older now, and he's one of the best young talents in baseball, so why does PECOTA hate Bryce Harper?

Ben: His comps are Trout, Griffey, and Upton—those guys are pretty good.

Colin: We think he's pretty good!

Colin: We think he's an above-average major-league starter at age 20!

Colin: There aren't a WHOLE lot of those in MLB history.

Colin: Look at A-Rod's first full season.

Colin: Put up a .335 TAv.

Colin: At age 20.

Colin: At age 21?

Colin: .284.

Ben: I'll use that!

Colin: Those were both very good seasons for A-Rod, and we're expecting a very good season from Harper.

Colin: Very few hitters follow a smooth progression through their career—the platonic ideal aging curve doesn't exist in the wild.

Colin: Because there's a LOT else going on.

Colin: With Harper, put it another way.

Colin: His TAv last year is Adam Dunn's career TAv.

Ben: There's also the fact that he's playing left field.

Colin: The thing with guys like Harper.

Colin: Sometimes you get a Buster Posey. And sometimes you get a Chris Coghlan. And sometimes you get a Geovany Soto.

Colin: It's like that bit from Moneyball.

Colin: "It's not that hard, Scott. Tell him, Wash." "It's incredibly hard."

Colin: Everything in Major League Baseball is incredibly hard.

Colin: Looking at position players who won Rookie of the Year, on average we see a 10-point drop in TAv the following season.

Colin: And pretty much EVERYONE who won the Rookie of the Year is young and at an age when you would expect improvement.

Colin: You've stopped caring about Harper by now, haven't you.

Ben: I think I probably have enough to say about Harper for several segments now.

Colin: Okay.

Colin: You could probably turn that into an article.

Ben: Yes. [Ed. Note: So meta!]

Colin: The other thing you're probably going to be asked about is the AL East.

Ben: Yes. Red Sox and Blue Jays.

Colin: Pithy comments you can steal: The Blue Jays had a very impressive offseason, but they acquired most of their new talent from the Marlins, who won less than 70 games in an easier division and league. And the Orioles are a lot like the Diamondback team from a few years back whom everyone thought had figured out how to beat Pythag until they didn't.

Ben: The Jays seem to have a lousy projected defense.

Colin: I know you used to intern with the Yankees, but 30-year-old shortstops aren't out there for their fielding abilities.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Ben Lindbergh

Colin Wyers

More about:

Latest Articles

You need to be logged in to comment. Login or Subscribe

Mooser

2/20

Thanks for the insight. Colin or Ben, I have to ask about the Mets / Braves runs against projection. Atlanta is projected to give up 30 more runs, yet is projected to save 40 more runs on defence than Mets. This would suggest that Atlanta pitchers are giving up 70 runs more than Mets in FIP categories or hard hit balls etc. However, WARP for both pithcing staffs is basically the same (Atlanta a little higher). What am I missing that Pecota sees in terms of run prevention.

Reply to Mooser

Mooser

2/20

Sorry disregad the 30 run difference, I was looking at run scored. The RA for both teams is about the same. However, it is suggesting that Mets pitchers are 30 runs better than Brave pitchers despite similar WARPs. Thanks

Reply to Mooser

chabels

2/20

Correction: Washington was not sub-.500 for every year since they moved to DC. They played .500 ball in their inaugural season.

Reply to chabels

davescottofakron

2/20

Beating Pythag would be a good title to a baseball novel.

Reply to davescottofakron

Oleoay

2/20

So... Trout and Griffey are Harper's best comparables. Trout had a great sophomore season and Griffey improved throughout his career, but Harper is going to regress to the mean... whatever the mean of a 20 year old is...

Reply to Oleoay

cwyers

2/20

Richard, that's not how the comparables work and that's never how the comparables have worked. We calculate the comparables based not on one season's performance but on what's called the "baseline forecast," that is to say the weighted average of the past several seasons, regressed to the mean. The comps are meant to help build aging curves, not to allow us to throw out a player's long-term performance history or the concepts behind regression to the mean.

Reply to cwyers

Oleoay

2/20

I understand there's a baseline forecast and that comparables help to guide it.

I guess the assumptions I am making in my mind are...

#1 Players who debut as a teeanger tend get better.

#2 Trout, Griffey and Upton are much different comparables than teenagers who washed out of MLB.

Or, if those assumptions are untrue, does PECOTA just project that all 20 year olds will get worse/have a sophomore slump?

I guess, as a corrollary, when was the last time PECOTA projected that a player who debuted as a teenager would improve at age 20?

Reply to Oleoay

tbwhite

2/20

I think if Harper had batted .230/.290/.360 last year PECOTA would project him to improve, since the MLB mean is better than that. In other words, if you suck, then regressing to the mean helps you(you probably weren't as bad as your stats say). Harper didn't suck, he was good, so regression works against him as PECOTA assumes he was the beneficiary of some good luck.

Reply to tbwhite

Oleoay

2/20

But, was he good because of luck or because of skill/talent compared to the "average" 20 year old player?

Reply to Oleoay

chabels

2/20

Both. But PECOTA regressed his performance to the mean, considered the "luck" factor in his 2012 season (BABIP, etc) and produced its forecast. Likely it forecast him to improve by X and to regress by Y. In his case, he was such an outlier that Y > X.

Reply to chabels

Oleoay

2/20

The outliers are always hard to play with. You're in a bit of a case of "damned if you do, damned if you don't", kind of like with the Wieters projections.

As for myself, whether Harper is projected to regress or not is a bit of a moot point to me. It's more a question of why. I understand the Y > X argument so I guess the real question is why the regression to the mean is more of an influence than Harper's unusual talent. We don't do the same kind of thing with minor league projections, right? A 18 year old who dominates AA and is supposed to repeat AA isn't projected to regress as a 19 year old, correct? Granted, MLB is different, but we don't chalk up all of the 18 year old's success at AA to luck, right?

I guess, if we were talking about someone like Wil Myers who has talent but isn't considered as unique as Harper, it probably would be a different discussion.

Reply to Oleoay

philosofool

2/21

That we expect young players to improve at the minor league level is a good point. What I would take away from Harper's projection is this:
The baseline is a combination of all the work he's done in professional ball, including the minor leagues, and maybe even JuCo. The non-MLB performance is dealt with by generating equivalencies, MLEs. Then an aging curve is generated by using comps.

The things to take away is probably something like this: Harper's baseline is actually pretty low, in part because his MiLB performance wasn't stellar above A ball, then you apply a 19-20 aging curve, which is actually quite aggressive, and you get a guy who's expected to be above average.

Reply to philosofool

Oleoay

2/22

-5? Did I step on someone's cat or are people disliking the discussion that much?

Reply to Oleoay

holgado

2/20

Ben, first off, really great job on MLB Network yesterday. I know we all look forward to seeing more appearances from you in the future. Would be especially fun to see you square off against the likeable, but sabermetrically-disinclined Mitch Williams in a segment or two.

I have to admit that I, too, did not find Colin's justifications for PECOTA's conservative Harper projection to be particularly convincing. I know that regression to the mean is an important component of PECOTA. But it's not supposed to be the lead dog -- we've got Marcel for that. The foundation of PECOTA's modeling technique really is the comparables/similarity scores. And Harper's is a classic case of a player who breaks the system -- and *should* break the system -- because he has such a small set of true comparables that they are almost rendered anecdotal. That is, pointing to A-Rod's or Mel Ott's or Ty Cobb's 19-and-20-year-old seasons doesn't do us much good, because PECOTA works best when it has 1,000 similar seasons to look to, rather than a handful. PECOTA shouldn't work well for a 45-year-old Jamie Moyer either.

It's similar to when Jeremy Hellickson and Ichiro Suzuki defy BABIP-based projections for ERA and Batting Average. The bad projections merely show how extraordinary those players are, and serve as exceptions that prove the general rule of the method. In other words (and I'm doing my best to channel Nate here), part of the utility of PECOTA is in knowing when it doesn't work particularly well, and why. I think we ought to make that point with reference to Harper's 2013 projection, rather than search for reasons to stand behind it.

Reply to holgado

thegeneral13

2/20

Re: your middle paragraph, this is exactly where PECOTA should excel vs. other systems. Not because it will produce a better point forecast, but because it will produce a better spectrum of forecasts. Because young players like Harper are so unique, there are fewer comparable seasons as you mention, which should drive higher forecast dispersion. This is the most fascinating feature of PECOTA to me but it doesn't get much attention (are the percentile forecasts even published anymore?), maybe because it is difficult to consume, or maybe because it doesn't work very well. Ideally, PECOTA would not only adjust the width of the tails, but also the skew (provided there was a statistical basis for that, obviously). In this way it would do better than other systems not by better forecasting the mode, but by better forecasting the mean. I'd love to hear if Colin thinks there is room for further development or focus on this aspect of PECOTA given how competitive the forecasting market has become.

Reply to thegeneral13

rawagman

2/21

Dave - you are undoubtedly on the right track about the outliers, but on TV, Ben needs to sell PECOTA, not tell the world about how it sucks.

Reply to rawagman

holgado

2/21

That's just it, these anomalies don't at all mean that PECOTA sucks. Sell it for what it actually is, which -- for 99.9% of players (for whom hundreds of comparable player-seasons are available) -- is something pretty great. If you embrace that it is a comparables-based model, it seems intuitive why it wouldn't work as well for freaks like Harper or Moyer -- or, for that matter, Trout -- who have few or no true comparables, either because of age or performance extremes (or in Trout's case, both).

As Nate himself said, "[w]hat does PECOTA do in situations like these? well, it does the best it can. ... Of course there are a few players -- like [Barry Bonds in 2004] -- for whom finding appropriate comparables is impossible. In those cases, PECOTA makes like a drunken frat boy and lowers its standards. In the case of Bonds, for example, it's willing to bed pretty much any 40-year-old with some power and some plate discipline. That isn't an ideal solution...."

http://www.baseballprospectus.com/article.php?articleid=3576

If Nate's willing to say that, it's not clear to me why those of us who didn't invent the thing wouldn't be. Of course Nate and other brilliant folks at BP have worked to improve PECOTA since he wrote the above article. But unless the model has shifted away from being primarily a comparables-based one, this particular anomaly doesn't seem like one that can be addressed. And again, that does not mean it sucks. It means wow, Bryce Harper is special... so special that he breaks PECOTA... and let's all appreciate that.

Reply to holgado

bornyank1

2/21

Yeah, while I was certainly interested in portraying PECOTA in a positive light, I wouldn't want to defend it blindly. When I was asked about Hanley, I mentioned that the projection might be overoptimistic because the system doesn't really know about his injury history and the possibility that he's not the same guy physically. I had prepared to say something similar about Lincecum's velocity loss, if asked about his projection. For Harper, I went with Colin's explanation.

Reply to bornyank1

Oleoay

2/21

Nate Silver also said that a decade ago. My understanding is that, back then, PECOTA was a massive spreadsheet of calculations. Colin's heavily revamped it since then (and using SQL?) so that statement might no longer apply...

Reply to Oleoay

cwyers

2/21

Except that's not what's happening here.

Again, we have the baseline forecast, then we have the forecast with the "career path adjustment" (essentially the aging curve) applied. In the case of Harper, the aging curve we've built for him (with the input of his comparables) shows him improving from his baseline projection. It's not like the comps are what's driving the idea that Harper will regress, we figure his comps from his baseline forecast after we've already accounted for regression to the mean and his minor league equivalencies.

It should be possible to hold these two thoughts at the same time:

* That a player's performance will decline from one season to the next,
* And at the same time a player's talent will improve over the same time period.

Because talent is not the only input to performance. It is not that PECOTA is writing off Harper as lucky -- he still has a very good forecast, particularly for a 20-year-old with one full season under his belt. But PECOTA suspects that Harper's true talent in 2012 was somewhat less than his performance indicated -- still an above-average MLB talent, to be sure.

PECOTA is skeptical by nature, and if you examine the historical record you'll see that it's right to be so. The sophomore slump is a cliche for a reason -- because it's surprisingly common for a player coming off a spectacular debut to not do as well next year.

I also don't really think that a multiyear track record of underprojecting Ichiro is comparable to a single year's projection for a season that hasn't even happened yet.

Reply to cwyers

Oleoay

2/21

So, in my other post, I say:

#1 Players who debut as a teeanger tend get better.

#2 Trout, Griffey and Upton are much different comparables than teenagers who washed out of MLB.

Or, if those assumptions are untrue, does PECOTA just project that all 20 year olds will get worse/have a sophomore slump?

So, what you're saying is my assumptions are untrue and that PECOTA, in general, has skepticism built into it for young players who overperform as teenagers. Thus Year 2 "looks" like a sophomore slump as the loss of performance due to good luck outweighs the Year 2 increase in talent. And, in general, all "young" year 2 players are likely to regress...

If that is correct, how does PECOTA operate differently for minor leaguers?

Reply to Oleoay

cwyers

2/21

Think of it as a Bayesian process. You have your 2012 performance for Harper. You also have your prior, in this case the population Harper comes from, along with Harper's minor league production. (We don't use "MLB players" as the population for everyone, we build our prior based on a player's peer group. This isn't comps based, it comes from looking at what levels a player has played at.) So you take Harper's 2012 performance and your prior and you get your estimate for Harper's "true talent" in 2012. From there, you find comps, estimate aging, adjust for league/park/etc. and you get your end result forecast.

We do this because changes in performance are more frequent and more dramatic than changes in talent. If you look at a player's career, it's very noisy at times, and to do a good projection we need to try and take as much noise out as possible. Sometimes babies end up being thrown out with the bathwater, but on the whole we throw out far, far more empty baths than we do baths with babies in them.

PECOTA is not adverse to projecting improvement for a young player. Off the top of my head, someone like Dom Brown who is coming off a disappointing season is expected to get better, because of where he's at on the aging curve and what his minor league numbers look like relative to his major league performance. Again, though, if you look at (say) Rookie of the Year winners, they tend to fall off a bit the next season. That's one way of illustrating what we call regression to the mean, which is another way of saying classical test theory. It's important to note that classical test theory describes populations, not individuals -- Harper could do any number of things. (This will be reflected in the percentiles, which will be going up soon.) But in terms of what is likely, expecting him to have a very good season but not quite as good as 2012 is eminently reasonable, given what we know of how ballplayers tend to perform historically.

Reply to cwyers

tbwhite

2/21

So, if Bryce Harper had an evil identical twin, who had posted identical stats as Bryce until they were both called up to the majors, at which time Bryce posted a .300 TAv while his evil twin posted a .250 TAv, assuming a league average TAv of say .275 PECOTA would roughly do the following. Since Bryce and his evil twin had identical priors, they would have identical 'true talent' entering 2012. The disparity in performance in 2012 would be attributed to some combination of luck and skill(according to the # of MLB PA's, fewer PA's more luck, more PA's less luck), and for 2013 Bryce would be projected to perform worse, and his evil twin would be projected to perform better. But, Bryce's projection would be better than his evil twin's, with the difference between the two corresponding to the # of MLB PA's. In other words, if they had 100 PA's the difference in projections would be smaller than if they had 500 PA's.

Reply to tbwhite

tbwhite

2/21

The paradox or irony is that the more an individual's performance exceeds the league average(and assuming a reasonable sample size) the more confident we are that his 'true talent' > league average talent AND that he was the beneficiary of some good luck. Those two explanations are not mutually exclusive, you can be lucky and good.

Reply to tbwhite

Oleoay

2/21

In addition, if Harper isn't appearing to perform as well as last year because he's good but the luck isn't breaking his way in 2013 like it did in 2012, he's more likely to get demoted or benched which has an additional chance of hurting his performance.

Reply to Oleoay

Oleoay

2/21

"We do this because changes in performance are more frequent and more dramatic than changes in talent."

I'm going to laminate that quote. And thanks for taking the time to explain all this.

Reply to Oleoay

holgado

2/21

Colin, I had meant to address this in my first post, but I also don't think that your Rookie of the Year based argument -- that there is, on average, a 10 point drop-off in EqA (OK, I'll humor you guys, let's call it TaV) for Rookies of the Year in the following year -- is a sound one. To me, it ignores the very point I'm making about how unusual Harper is, and why the "sophomore slump" cliche ought not reflexively be applied to him. Indeed, it is the same reason why expecting regression to the mean for a 19-year-old or 20-year-old who performed better than league average makes so much less sense than expecting such regression from an already-established player in his mid-20's.

Let's even put aside for a moment the wisdom of a casual comparison between Bryce Harper and players like Chris Coghlan and Geovany Soto (i.e., we all know that not all RoYs are built alike, and it has not required 20/20 hindsight for us to know this). Here are the ages of all of RoY hitters since 2000, and their TaVs listed for (i) their RoY season; (ii) the following season; and (iii) their career thus far (which is a bit of a tangent from what we're currently discussing, but added for everyone's edification).

2012: Bryce Harper (19) - .291
2012: Mike Trout (20) - .357
2010: Buster Posey (23) - .298 - .271 - .318
2009: Chris Coghlan (24) - .292 - .265 - .263
2008: Geovany Soto (25) - .286 - .233 - .265
2008: Evan Longoria (22) - .296 - .293 - .308
2007: Ryan Braun (23) - .324 - .291 - .316
2007: Dustin Pedroia (23) - .281 - .297 - .287
2006: Hanley Ramirez (22) - .278 - .311 - .297
2005: Ryan Howard (25) - .311 - .343 - .306
2004: Jason Bay (25) - .301 - .320 - .292
2004: Bobby Crosby (24) - .261 - .280 - .241
2003: Angel Berroa (25) - .267 - .246 - .233
2002: Eric Hinske (24) - .286 - .261 - .262
2001: Albert Pujols (21) - .329 - .323 - .340
2001: Ichiro Suzuki (27)* - .303 - .293 - .284
2000: Rafael Furcal (22) - .280 - .247 - .266
1999: Carlos Beltran (22) - .266 - .223 - .293
1998: Ben Grieve (22) - .289 - .276 - .275
1997: Nomar Garciaparra (23) - .290 - .309 - .296
1997: Scott Rolen (22) - .292 - .307 - .293
1996: Derek Jeter (22) - .282 - .279 - .290
1996: Todd Hollandsworth (23) - .278 - .232 - .258
1995: Mary Cordova (25) - .279 - .279 - .266
1994: Bob Hamelin (26) - .327 - .214 - .277
1994: Raul Mondesi (23) - .279 - .296 - .282
1993: Tim Salmon (24) - .323 - .314 - .306
1993: Mike Piazza (24) - .325 - .304 - .314

We don't yet know how Harper and Trout performed in the following year, of course. And there are also good reasons to exclude Furcal and Posey from the next-year part of the analysis, since both suffered (and played through) injuries in their sophomore seasons (one has to be curious as to whether the same was true for Beltran).

The TaV of the only 27-year-old Rookie of the Year, Ichiro, declined by 10 points the following season, and his career TaV is 19 points lower than his RoY one.

The TaV of the only 26-year-old RoY, Bob Hamelin, declined by 113 points the following season, and his career TaV was 50 points lower than his RoY one.

The TaVs of the five 25-year-old RoYs declined on average by approximately 5 points in the following season, and they had/have career TaVs which on average are 16-17 points lower than their RoY ones.

For the five 24-year-old RoYs, their next season TaVs declined by 12-13 points, and their career TaVs were about 20 points lower than their RoY ones.

For the six 23-year-old RoYs, their next-season TaVs (excluding Posey's) declined by approximately 3 points (thanks largely to marginal RoY winner Todd Hollandsworth), but their career TaVs (including Posey) have been 1 point higher than their RoY ones.

For the seven 22-year-old RoYs, their next season TaVs (excluding Furcal's) declined by only 2 points (and this is dragged down significantly by Beltran's flukishly bad and possibly injury-hampered sophomore year), and their career TaVs (including Furcal) have been 3-4 points higher than their RoY ones.

At age 21, there is only Albert Pujols. While his TaV declined 6 points the next season (to a still supernatural .323), he largely repeated his monster rookie year performance, and over his career he has been 11 points higher than that mark.

At age 20, there is only Mike Trout.

At age 19, there is only Bryce Harper.

That's 20 years of RoY hitters in both leagues, and the only guy even close to being as young as Trout and Harper is Pujols (who some have suggested might not even have been 21 at the time!). These guys break the model, and well they should. That's all I'm saying.

Further to that, however, I was also struck by your comment that "it's not like the comps are what's driving the idea that Harper will regress." This suggests that PECOTA is no longer primarily a comparables-based model (and in fairness, perhaps it never was, and I just never truly understood it). But if it were... *if it were*... I think we'd get a much different projection for these two wunderkinds. For instance â€“ and with the caveat that, as I said before, this may be too small a sample to be anything more than anecdotal â€“ if you look at Harper's top five comparable player-seasons by similarity score on Baseball Reference, every one of those five (four of whom are Hall-of-Famers) performed better in their age 20 seasons, some by large amounts. In order:

George Davis went from .264/.336/.375 in his 19-year-old season (over 583 PA) to .289/.354/.409 (over 627 PA) the next year.

Mel Ott went from a stellar .322/.397/.524 in his 19-year-old season (over 500 PA) to an even more stellar .328/.449/.635 (over 675 PA) the next year.

Al Kaline went from .276/.305/.347 as a 19-year-old (over 535 PA) to .340/.421/.546 (over 681 PA) in his MVP-runner-up 20-year-old season.

Ty Cobb (yes, Ty freaking Cobb!) went from .316/.355/.394 (over 394 PA) at age 19 to .350/.380/.468 (over 642 PA) at age 20.

Buddy Lewis (yes, Buddy freaking Lewis! okay, that didn't work as well) showed the most modest improvement, but still went from .291/.347/.399 (over 657 PA) at age 19 to .314/.367/.425 (over 733 PA) at age 20.

So it was more than a bit surprising to see PECOTA project Harper to decline by double digits in all three triple-slash stats this year, including a whopping 35 point decline in SLG. Even just looking at Harper's improvement over the course of last season, that just sounds wildly wrong.

Lastly, I want to underscore again that I'm not at all saying that PECOTA sucks. Much like democracy, it's the worst method of predicting future performance except for all others which have been tried. And more importantly, I appreciate that there is even a forum like BP in which somebody might actually read all of the above with interest. Even if you disagree with me, this is a hell of a lot of fun to discuss.

Reply to holgado

holgado

2/21

Also, it should be patently obvious by now that I own Harper in my fantasy league.

Reply to holgado

cwyers

2/21

Let's change it up a little bit, then. We'll use the same methodology, but confine ourselves to position players under the age of 21 who had a TAv of .290 or above. There were 58 of those, although if you put in a playing time cutoff of 100 PA, that drops to 19. We won't drop any players, though, we can use a weighted average (which is what I did for the example I gave Ben). For weight, we'll use the harmonic mean of PA in years one and two, or:

2/(1/PA1+1/PA2)

And I can report, those players did not lose ten points of TAv on average.

They lost 23 points instead.

Reply to cwyers

cwyers

2/21

By the way, the top five players (by the harmonic mean of PA):

Vada Pinson
Al Kaline
Frank Robinson
Alex Rodriguez
Ken Griffey

So it's not like we're looking at a bunch of scrubs there, that's a group that's at worst Hall of Very Good to Hall of Fame-worthy.

Reply to cwyers

holgado

2/22

So you're using Vada Pinson's 20 and 21-year-old seasons, rather than his 19 and 20-year-old ones, for what reason exactly? Granted, he only had 110 PA at 19. But Pinson's improvement from age 19 to age 20 was enormous.

Ken Griffey Jr. also improved dramatically from age 19 to age 20. You apparently ruled that out with your TAv cutoff, but he improved from a .281 TAv to .302. That said, from age 20 to 21 he (unlike the rest of the top 5 you listed) went way up again, to .331.

Alex Rodriguez, same deal. He went from a .230 TAv at age 19 (in 149 PA) to a .335 TAv at age 20 (in 677 PA). Your chosen cutoff ignores this one as well. And again, because you're choosing to look at age 20 to age 21 seasons (for a reason that still eludes me), you instead use A-Rod as a data point in favor of the argument that Harper will regress, since at age 21 A-Rod dipped to a .284 TAv.

Same deal with Kaline. I already talked about his massive improvement from age 19 to age 20. Your exercise chooses to ignore that because his 19-year-old season wasn't good enough, and instead looks to the difference between Kaline's age 20 and age 21 seasons.

Frank Robinson did not play in the majors at age 19. Again, this highlights my point. Maybe that one year doesn't seem like much, but I think even non-set head scouts will tell you that at such a young age, it makes a world of difference, including from a straight physical maturity standpoint (I say this as a guy who was 5'11' when I graduated high school, and 6'3" after my junior year of college).

Here's my point distilled to its essence. Comparing exceptional (>.290 TAv) 20-year-old seasons to Harper's 19-year-old one merely because we don't have enough (or indeed, any?) exceptional 19-year-old seasons to compare it to is tautological. Because the fact that we don't have enough (any?) such 19-year-old seasons to use is itself a very valuable piece information. The kind which suggests that we're dealing with a historically unprecedented player here.

Reply to holgado

holgado

2/22

"non-set head" = "non-stathead"

holgado

2/22

Also, isn't it analytically unsound to use the .290 TAv cutoff in the first place? Let's say there are 500 age-19 seasons out there of 100 PA or more, but 490 of them are ones where the player had <.290 TAv. Wouldn't it still be more meaningful to look at all of those seasons, to see whether there is a general trend (as I'm betting there is) of overwhelming improvement in players from their age 19 to age 20 seasons? The only basis for excluding them is the assumption that those lower than average TAv seasons would be expected to go up the next year simply due to regression to the mean. But this is tantamount to assuming your conclusion (that Harper's higher than average TAv will regress downward). In other words, rather than look at all available data to see whether there is something to the notion that improvement at the MLB level from age 19 to age 20 can be expected for reasons other than regression, your cutoff seems to disregard most of it.

holgado

2/22

Ooh, one more thing. Harmonic means are great and all, but I guess I'd be curious to see how many of those 39 seasons of <100 PA you mentioned involved September cups of coffee, when it's well known (or at least generally assumed) that the quality of competition decreases, both because these same young guys are given a chance to play, and because playoff teams often give their regulars extra rest. Also, players with that few PA in their first season necessarily will not have made it a "second time through the league," that inevitable adjustment period that will instead be saved for the following season (after which the player either successfully adjusts, or flames out because he fails to do so). Harper, on the other hand, has 597 PA under his belt now. And even just looking at his month to month stats, you can see that he clearly made it to that second time (corresponding with his summer swoon) and indeed, to that possibly very telling third time (corresponding with his monster fall).

holgado

2/25

I know you guys have lives to lead. But I really would like to know whether (and why) PECOTA is using age 20 seasons as comparisons to age 19 ones. And if it isn't doing that, then why would it predict regression for Harper when (by my best count) only ten 19-year-olds in MLB history have had 1.0 or more oWAR in a season (Harper had 3.4), and 8 of those 10 improved significantly the next year? (h/t to Joe Sheehan for leading me to that tidbit, as he covered Harper in a recent edition of his great newsletter.)

Did PECOTA really ding him for his minor league performance in 147 PA at AA at age 18, when playing at that level at that age was itself practically unprecedented? And even after such a precocious (nearly) full season of AB in the show in 2012 that was completely in line with the big numbers he had put up at high-A during the first half of 2011 (.318/.423/.554, good for a .324 TAv)?

In fact, Harper's 3.4 oWAR as a 19-year-old has only one precedent in major league history, with Mel Ott equaling that mark (and he then put up a 6.7 oWAR the following year). The only other 19-year-old seasons even in the ballpark of that oWAR belonged to Tony Conigliaro (2.8), Griffey Jr. (2.6), Cobb (2.3), and Cesar Cedeno (2.3). Griffey and Cobb took huge leaps the following year (to 4.9 and 6.2 oWAR, respectively), and Conigliaro improved his oWAR as well (to 3.4), though his TAv remained steady at .299. Cedeno is the exception, as he dipped to 1.8 oWAR the following year. As for the remaining five on that list of ten, the only other regression was by Edgar Renteria (from 2.1 to 1.1), with the other four all drastically improving their oWAR in the next season, led by Mickey Mantle (who went from 1.3 to 6.2), then Sherry Magee (1.5 to 4.1), George Davis (1.7 to 2.9), and Buddy Lewis (1.3 to 3.0).

But since oWAR is effectively a counting stat, and we were talking about TAv, here's that stat for the aforementioned player-seasons, or at least the five for whom BP has computed it (unfortunately this leaves out Cobb, Ott, Magee, Davis, and Lewis, all of whom improved):

Conigliaro - .299 to .299
Mantle - .297 to .347
Cedeno - .293 to .275
Griffey - .281 to .302
Renteria - .257 to .237

I don't know what the weighted average of this is. But it isn't a decline. (Assuming equal weight, it's an average of a 6 point increase.) And that's even without counting the five oWAR increasers for whom we don't have TAv's.

Again, I hesitate to rest my case on ten measly player-seasons, because it's arguably not enough data to give us much confidence in predicting anything about what Harper will do. At most perhaps the utter lack of truly comparable seasons -- really, Mel Ott and that's it -- should signal that we have something special here, something incapable of quantifying by resort to historical comps. Nonetheless, Sheehan is predicting Harper for MVP. And I think that outcome is less improbable than his TAv declining at all, let alone by 17 points.

Reply to holgado

HAL9100

3/13

Did you just actually posit that it is more likely that Bryce Harper wins the NL MVP than that he experiences any offensive decline at all?

Like, really? You think that?

holgado

4/03

Yes. Yes I do.

paulproia

2/21

Overall very interesting. About the NL East being an easier division and league, though:

1) Granted, the Marlins stunk because they put their worst defensive infielders (Ramirez and Reyes) in the infield and their hitting left a lot to be desired... Reyes should have played third, Bonifacio should have been at short, and Ramirez in center or left.

2) The division featured the Nationals and Braves, and a good (if underperforming) Phillies team. Even the Mets were okay for a little while. I'd take the Mets over the Orioles.

Last I checked, the NL won the World Series. Again. And Again. I'd argue that the AL east has gotten worse and will be even worse again in 2013.

Reply to paulproia

bornyank1

2/21

If you're trying to assess league strength, it makes more sense to look at interleague records than World Series results, no? And those still suggest that the AL is stronger.

Reply to bornyank1

tomterp

4/02

Statistical arguments aside (I know, non-stats based arguments seem ill-fitting here), is there a player in all of baseball more driven to excel than Harper? If effort, preparation, intensity, drive, etc. count for anything, he may well exceed even optimistic predictions for his 2013 season.

Reply to tomterp

The Socratic Approach to PECOTA: And Why We Don’t Hate Bryce Harper

Thank you for reading

Latest Articles

The Stash List ’25: Big Picture Numbers $

Lineup Lockdown: National League, September 2025 $

BSB: Luck, Speed, and Other Fortunes B

The Adaptable Swing: Why One Size Doesn’t Fit All $

The First Post-Sabermetric MVP Race? $

Ben Lindbergh

Colin Wyers

More about:

Latest Articles

The Stash List ’25: Big Picture Numbers $

Lineup Lockdown: National League, September 2025 $

BSB: Luck, Speed, and Other Fortunes B

Thank you for reading

Related Articles

Latest Articles

More about:

Latest Articles

Related Articles