keyboard_arrow_uptop

One of baseball’s enduring charms is its ability to defy prediction. Each time we think we’re absolutely sure of something-say, that the 2008 Tigers will score a bajillion runs, or Juan Pierre will be a disaster filling in for Manny Ramirez-our forecasts are confounded by baseball’s eternally fickle nature. Sophisticated projection tools, such as Nate Silver‘s PECOTA, are designed to help take some of the guesswork out of predicting how teams and players will perform during a given season, and often produce surprisingly accurate forecasts on the whole. But even PECOTA is prone to big misses, especially in individual player projections, which help to preserve the game’s air of mystery.

Projection models generally use each new season of data to better aim next year’s forecasts. A good model should improve its accuracy over time, and while a swing and a miss on a given player in a given season is inevitable, the addition of new data should make it less likely to be repeated. So how quick a study is PECOTA? After missing once, can it use this new information to get back in the box and make solid contact, or do certain players continue to perplex the system, year after year?

To find this out, I looked at PECOTA Equivalent Average (EqA) projections for all hitters with 300+ plate appearances during the 2006-2008 seasons, and compared them to the actual offensive production of those hitters. If the projection was at least 10 points lower than the actual EqA, I rated that projection as an “underestimation”; a projection that was at least 10 points higher is considered an “overestimation.” The results are shown in the following charts, starting with the players from whom PECOTA expected the worst:

```
Players with 300+ PA During Season: PECOTA Underestimations
Sample                   Sample   Proj. EqA
Year  Description                 Size    10 Pts. Low    %
2006  All Players                  260        124       48%
2007  Underestimated in 2006       108         22       20%
2008  Underestimated in 2006-07     17          7       41%
```

During the 2006 season, there were 260 players with at least 300 plate appearances. Of those, 124 players saw their actual EqA surpass their projection by at least 10 points-a surprisingly high 48 percent. Presumably, PECOTA should be able to absorb this new information and adjust their 2007 projections accordingly-and the numbers seem to bear this out. Of the 108 “overachieving” players from 2006 who again met the 300 PA threshold in 2007, only 22 of them (20 percent) again exceeded PECOTA‘s projection by 10 or more points. Of the 17 players which PECOTA had twice underestimated who met the 300 PA threshold in 2008, seven of them were underestimated yet again-so the percentage goes up to 41 percent, but with such a small sample that’s probably just noise. There were 154 players who met the PA threshold in all three seasons; of those players, only seven of them (4.5 percent) were underestimated by PECOTA in all three seasons.

Now let’s look at PECOTA‘s experience with irrational exuberance:

```
Players with 300+ PA During Season: PECOTA Overestimations
Sample                 Sample   Proj. EqA
Year  Description               Size    10 Pts. High    %
2006  All Players                260         65        25%
2007  Overestimated in 2006       39         20        51%
2008  Overestimated in 2006-07    12          4        33%
```

Here we see that PECOTA, as a stern evaluator, was about half as likely in 2006 to overestimate a player (25 percent) as underestimate a player (48 percent). Not surprisingly, those that PECOTA overestimated (and thus had a disappointing season) were less likely to meet the 300 PA threshold in the following season-so our sample shrinks at a faster rate. But interestingly, in 2007 PECOTA didn’t seem to learn as much about the underachievers as it did about the overachievers. While PECOTA had only a 1-in-5 chance of repeating its underestimation in 2007, more than half the players it overestimated in 2007 (who met the PA threshold) were again overestimated in 2008.

The list of players who were twice overestimated is peppered with names like Jim Edmonds, Richie Sexson, Trot Nixon, Craig Biggio, and the Giles brothers-players who had been highly productive but whose numbers suddenly cratered (often due to age or injury). For PECOTA, as with managers and fans, it took a while to see that these players truly had become shadows of their former selves. By the third season, most of these players were either no longer full-time major leaguers, or PECOTA finally stopped squinting and came up with a more realistic projection: only four players (2.6 percent of the 154 who met the PA threshold in all three seasons) were overestimated a third time.

Who were these masked men, the players who managed to turn PECOTA into Pollyanna, continually predicting performance far beyond that which they produced?

```
2006    2006  |  2007    2007  |  2008    2008
Actual  PECOTA | Actual  PECOTA | Actual  PECOTA
Player               EqA     EqA  |   EqA     EqA  |   EqA     EqA
Bobby Crosby        .231    .276  |  .225    .265  |  .234    .255
Juan Uribe          .234    .253  |  .231    .263  |  .236    .250
Austin Kearns       .282    .292  |  .271    .290  |  .223    .280
Jason Varitek       .248    .284  |  .272    .282  |  .237    .274
```

If there’s a pattern to discern here, it’s early promise followed very quickly by injury and/or disappointment, or what we might call the Ben Grieve career path. Crosby has long been either injured or lackluster, with his career shape looking ever more like a Pet Rock: instant, inexplicable, short-lived success that quickly becomes a metaphor for fleeting value. Kearns has never exactly been bad (until recently); neither has he become the consistent, multi-talented outfielder most thought he would grow into. PECOTA seems to have focused on what Ooh Ooh Uribe could do (hit 20-plus home runs in his mid-20s) while ignoring what he couldn’t do (get his OBP much above the mid-.200s). Only Varitek stands out in this crowd, and it looks as if PECOTA felt his leadership and moxie would exempt him from the standard catcher aging curve.

Can any member of this rogue’s gallery make PECOTA whiff yet again this year? Varitek’s bounce-back season (.278 EqA) and Uribe’s surprising competence (.263) in San Francisco has them both far exceeding PECOTA‘s sudden and deserved pessimism. On the other hand, Crosby (.235 actual/.243 projected) and Kearns (.237 actual/.275 projected) continue to be poster children for unrealized potential and may well achieve the four-peat.

The list of three-time overachievers is a little more complex:

```
2006    2006  |  2007    2007  |  2008    2008
Actual  PECOTA | Actual  PECOTA | Actual  PECOTA
Player               EqA     EqA  |   EqA     EqA  |   EqA     EqA
Chipper Jones       .331    .303  |  .339    .308  |  .360    .321
Hanley Ramirez      .288    .241  |  .318    .277  |  .320    .298
Matt Holliday       .304    .271  |  .317    .296  |  .316    .295
Dan Uggla           .278    .233  |  .275    .262  |  .296    .273
Ichiro Suzuki       .288    .266  |  .302    .277  |  .283    .271
Aaron Miles         .234    .222  |  .240    .206  |  .265    .221
Mark Grudzielanek   .257    .247  |  .268    .255  |  .265    .245
```

A PECOTA Similarity Score below 20 indicates a player who is particularly unique and difficult to compare to other players; Chipper Jones (Sim Score: 4) and Ichiro Suzuki (Sim Score: 17) fall into this category. Look at Chipper’s Equivalent Averages-it’s not like PECOTA expected the oft-injured star to become unproductive, it’s just that he’s been virtually superhuman (when healthy) during his late-career drive towards Cooperstown. Ichiro is a unique story, so it’s not surprising PECOTA has never been sure what to make of him. Uggla and Ramirez both achieved such immediate success that it’s taken time for PECOTA to believe what our eyes have already seen, especially for Hanley, whose minor league numbers were no match for his scouting reports. I’m not sure exactly what ancient grudge PECOTA has held against Matt Holliday, but it looks like 2009 might see them meeting halfway (.296 actual/.305 projected EqA). Miles has only managed to be not as awful as you might think, while Grudz has been a useful player far later into his 30s than most would have thought possible, and with his recent history of exceeding expectations, the Twins may have bought themselves a useful insurance policy.

For 2009, PECOTA has finally come around on this group; in fact, all but two players are currently well below their forecast. Hanley’s .324 projection is pretty much spot-on. The only player currently in great danger of yet again being underestimated by 10 or more points: the inscrutable Ichiro (.308 actual/.258 projected EqA), continually mistaken by PECOTA like some latter-day Rodney Dangerfield, someone who just can’t get any respect.

With the percentage chance of missing three times in a row comfortably in the low single digits, it looks as if PECOTA rarely goes into extended slumps when projecting any given player. There may be a few specific types of players (early career busts, players who maintain high productivity late into their 30s, holders of the single-season hit record) that tend to be venerated or demonized longer than they should. But overall, a given PECOTA projection is at least as accurate as your local weather forecast-good enough to know whether you’ll need a coat, but with enough short-term variation to occasionally leave you out in the cold.

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

### Latest Articles

3/30
0
3/30
0
• ##### Opening Day is Your Last Chance \$
3/30
0
You need to be logged in to comment. Login or Subscribe
sunpar
7/23
Don't worry PECOTA, most Braves had written off Chipper's career back in 2004/2005.
jwdinnin
7/23
I think the description of the type of player who has consistently underperformed his PECOTA expectations needs work. Your explanation ("early promise followed very quickly by injury and/or disappointment") is just a description of the data. PECOTA's projections are based on prior results. Thus, the only people who can make the list are people who show promise and then falter - but that doesn't define their traits as a player.
Put another way, what we should really be looking for in analyzing PECOTA is whether we can predict what type of player is likely to fall short of his expectations (due to problems with the model) and learn how to adjust the model from there. "Early promise followed very quickly by injury and/or disappointment" doesn't allow us to do that.
arcee555
7/23
I have never been able to buy into PECOTA as a meaningful predictor of performance. This is where the "scouts" POV has been that any somewhat baseball savvy person can guesstimate what a player will do in the upcoming season, and they miss no less than PECOTA does, without all the calculations.
briankopec
7/23
Someone posts a legitimate criticism of PECOTA and it gets dinged with a bunch of minuses?
7/23
Personally, I question your use of the word "legitimate".
kjgilber
7/23
Yeah, that's a little harsh. I don't agree with the point of view, but it seems silly to hide the post.
llewdor
7/23
I think he got voted down because he's wrong. Look at what VegasWatch has done comparing PECOTA's accuracy to that of other projection systems and individual baseball analysts - PECOTA routinely performs better than the individuals. There may no longer be a meaningful difference in accuracy between PECOTA and other statistical projection models, but the stats beat the guesstimates time and time again.
briankopec
7/24
Thank you for posting a rebuttal instead of merely dinging the original comment.

Those of you who dinged the comment and did not explain why, phooey on you.
BurrRutledge
7/24
People are going to get dinged for posting blatantly misguided and misinformed posts. Even if they're trolling us.
BeplerP
7/23
So, you want your money back?
Oltanya
7/23
The "spirited away" image leads to a 403 error.
mcconkey01
7/23
kenfunck
7/24
Chris B. (Crispy?) Young didn't get enough PAs in 2006 to make it into the sample. But he's definitely an Over:

Year Actual/PECOTA

2006 .236/.275
2007 .252/.291
2008 .254/.283
2009 .235/.277
Werty83
7/23
I was expecting to see a Javy Vazquez reference somewhere, unless there's going to be a seperate article for pitchers. I understand why he usually falls short of his projection (problems with runners on), but it would be interested to see if that trait can be worked into PECOTA to improve pitching projections.
kenfunck
7/23
I just ran numbers for hitters at first -- EQA is a very round number to compare -- but plan to look at pitchers as well soon.
JasonC23
7/23
I enjoyed seeing the list of PECOTA's black sheeps/unrequited crushes, but I think the real strength of this article, as indicated in the comments section here, is in the directions it can lead future research. So, interesting now, and potentially interesting later...nice job, Ken!
ItShouldBeEasy
7/23
One of the nice things about PECOTA is that it provides a distribution of possible outcomes for each player every year. Setting a strict limit of 10 points in EqA doesn't take into account that PECOTA sometimes predicts higher variance for some players versus others. I think a better way to do this would be to identify "swings and misses" as players who ended the season with an EqA below PECOTA's 10th percentile prediction or above PECOTA's 90th percentile prediction. Doing this would provide some context for the numbers in the first table: if 50% of players are finishing the season above the 90th percentile or below the 10th percentile of PECOTA's predictions, then (I would think) something is going awry in the predictions.

Also, I think it's important to throw out all players with a similarity score below a given threshold. You basically state that we shouldn't be surprised if PECOTA can't predict what Ichiro is going to do, so including these types of players in the analysis is only going to add noise to the analysis.
kenfunck
7/23
All very valid points, and good ideas for more in-depth analysis rather than the "quick and dirty" methodology used for the article. Although I don't think I'd necessarily want to toss out players with a low sim score -- in a deeper study I'd be interested in seeing whether it turns out to be true that high sim scores correlate strongly with more accurate projections. Many of the players listed in the article have average or higher sim scores, yet seem to have been tough for PECOTA to accurately assess over time.
beeker99
7/23
Ken, perhaps without knowing it, your opening line is a paraphrase of one of Yankee radio voice, and notorious-bad-nickname-master, John Sterling's go-to lines. "You just can't predict baseball." He drives many of us nuts with it.

But I like your paraphrase of it, and I really like this article. I can't wait to see the pitchers' article, because if I remember correctly, PECOTA usually does a very good job with hitters. Pitchers, on the other hand . . .
Scherer
7/24
I believe the concept was best stated by Joaquin Andujar: "Baseball can be summed up in one word: yaneverknow"
Wilson
7/23
Thinking about this year, a few names jump to mind: Chris B. Young, Lastings Milledge, Elijah Dukes, Kelly Johnson...I suspected going into the year that for these players the PECOTA forecast was wildly optimistic. I don't know what the common theme there is though, and I wish you would explore this topic more deeply.
anderson721
7/23
Common theme? I drafted them all...
russell
7/24
dbrown
7/23
I'm a little unsure of your methodology as this is written. When you say, "I looked at PECOTA Equivalent Average (EqA) projections for all hitters with 300+ plate appearances," do you mean that you looked at PECOTA projections for all hitters that PECOTA projected to achieve 300+ plate appearances? Or do you mean that you looked at PECOTA projections for all hitters that actually did achieve 300+ plate appearances?

If the latter, there's a sampling problem here, namely that your sample would be more likely to include players that PECOTA underestimated.
kenfunck
7/23
It's the latter -- the sample contains all hitters who managed to get 300+ actual plate appearances, and compares them to their PECOTA projections.

You make a good point -- if a hitter puts up truly awful numbers he'll have a harder time staying in the lineup long enough to get 300 PAs. Given this, it's reasonable to think that if I lowered the threshold to, say, 50 PAs the percentage of total projections in a given year that are "under" by 10 points might go down. But you'd also then be allowing in other SSS outliers which may skew things either up or down -- that's the noise I was trying to keep out of the sample. I'll try to find the time to re-run this with a lower threshold to see the effect, and post the results here.
kenfunck
7/24
Good catch -- you're exactly right. If I lower the threshold for the 2006 sample, the "underestimations" go down and the "overestimations" go up:

Min 100 PAs
Overs: 35%
Unders: 39%

Min 50 PAs:
Overs: 37%
Unders: 38%

So when the sampling bias is removed, the unders and overs become much more similar.

I hope to be able to dig into all this further in a later article -- the comments in this thread have some terrific suggestions.
bsolow
7/23
Another followup you might consider: Are there people who PECOTA just fails to identify properly, regardless of consistent over/underestimation? I'd be surprised if a projection system that continues to be updated keeps over/underestimating someone, since PECOTA uses just 3 years of data (weighting more recent data more heavily, if I'm correct?) and the original data that spurred the first overestimation will start to drop out. It's an interesting questin to me whether or not there are people that PECOTA just can't get right whether it's over or under predicting.
DrDave
7/23
I would love to see the scatterplot of PECOTA absolute error vs similarity score...
7/23
Me too.
7/23
Any chance we can see the names of those players that pecota underestimated and overestimated?
woodstein52
7/24
Something or someone cannot be "particularly unique". Uniqueness is a binary state -- something is either unique (without comparison) or it isn't. Once you qualify it, you negate it. What you mean is "particularly unusual", or something like that.

JosephC
7/24
Ditch that Strunk & White and get a Merriam-Webster Dictionary of English Usage instead!

"There is no denying that many good writers and editors strongly approve of /unique/ in its 'unusual' sense, even though it is indisputably well established in general prose. Perhaps you might try being one who knows enough about its bad reputation to avoid it but who also knows enough about its actual history not to sneer at those who use it."

Thanks for the article, Ken, by the way - may there be many more along this path, leading us to a better understanding of how to predict performance.
JosephC
7/24
Ugh, that's *dis*approve, of course. Sorry.
kenfunck
7/24
But it's definitely a construction I'll try to avoid in future.
DrDave
7/24
MWDoEU cannot even bring itself to agree that saying 'infer' when you mean 'imply' is something to be avoided. It may be a pretty good guide to how people actually speak (including persistent and widespread errors), but apparently not to how they should write.
scottlong
7/24
Perfect timing to do a whole piece on how PECOTA has almost always whiffed on Mark Buehrle. I have been pitching someone at BP to do an investigation on the reasons why PECOTA has been off on Buehrle more than any other pitcher during the past decade.
Kongos
7/24
As a Jays fan I'm mildly surprised that Wells and Rios weren't on your list of underachievers.
kenfunck
7/24
Unlike JP, PECOTA was pessimistic -- Wells and Rios each outperformed their projections in 2 of the 3 years.
sbnirish77
7/24
"Here we see that PECOTA, as a stern evaluator, was about half as likely in 2006 to overestimate a player (25 percent) as underestimate a player (48 percent)."

Quite an embarassing systematic error which requires some explanation.

Any fitted regression model would be expected to produce errors without any bias to the mean - that is - an equal number of players above and below their predicted performance.
sde1015
7/24
Scroll up a few comments and you'll see Ken explain that this results from his 300 PA cutoff.
KevinS
7/24

Dinging a post just because it is critical is shameful.

The only post I would consider dinging is one that attacks another personally or uses offensive language.

KevinS
7/24
Edit: Oops when I posted "this post", I meant to say the one posted above by sbnirish77.
ScottBehson
7/25
Hear hear!
sde1015
7/25
I don't think the BP community have agreed upon when and why to use a negative ranking and at this point the best we can say is that different posters have different standards. I do note, though, that there is both a link to report innapropriate posts and the +/- button, which suggests to me that they are meant to do different things. Personal attacks and offensive language, in my mind, would warrant the inappropriate link.

I negative-rated the post in question because, as I said in my comment, the question was answered three hours earlier. If you're going to say something is embarrassing, you should check the thread to make sure the issue hasn't been addressed. I'll admit that I may have cut another poster more slack on this, but sbnirish frequently posts snarky criticisms of BP writers that call into question their objectivity and abilities. Maybe it's unfair of me to take that it account, but I read his/her post with that history in mind and it affected how I interpreted it.
collins
7/29
Doubtless it was dinged because his criticism had already been addressed in an earlier comment. Had he bothered to read the comments he wouldn't (or shouldn't) have posted. On the other hand, I almost gave it a +1 since this is the first post I remember seeing by sbirnish in which he isn't bitching about the alleged bias of the BP writers.
collins
7/29
And I just committed the same error by posting before reading Elm's well-stated comment.
ccweinmann
7/24
This would be a lot more interesting if it answered the question, Why? Why is Ichiro unique? What is he doing that is so unusual? Same with Chipper...
jjaffe
7/24
PECOTA seems to take Ichiro's high BABIPs as a fluke, but year after year he's around .350 or even higher.
ccweinmann
7/24
And, as someone who has watched him a lot, I think this results from a couple things that it will be hard for PECOTA to ever pick up: (i) the fact that he is so fast out of the batters box and gets a ton of infields hits; and (ii) the fact he is able to literally "aim" his batted balls towards holes. Someday speed to first base will be a variable that PECOTA will know, but that's quite a way off, I'd guess.
jcuddy
7/24
Possibly word count restrictions? Its a topic for a follow-up article I would like to read though.
kenfunck
7/24
More like time restrictions than word count restrictions, actually. This article obviously just skims the surface. But the response here definitely makes me want to dig much deeper into this and come up with more detailed data and hopefully more nuanced answers. In addition to working on something that will come out each week, I now have the luxury to spend more time on a topic that might take weeks of work and know that there's a good chance the results will see the light of day -- during BP Idol, that wasn't necessarily the case.
blcartwright
7/24
Year Actual/PECOTA

2006 .236/.275
2007 .252/.291
2008 .254/.283
2009 .235/.277

Now this makes me wonder why, over four seasons, PECOTA never projected an EQA below .275 when Young never produced one above .254
jjaffe
7/24
I think Young, like Ichiro, is a guy whose BABIPs get too heavily regressed by the system. With his power-speed combo, his below-average line drive rates should be much higher, lifting his BABIP out of the .260-.270 range he's been in for three out of his four years.

I suspect the next big breakthrough in PECOTA, both on the offense and pitching sides, will come via the use of batted ball type information to get a better handle predicting the results on balls in play.
jcuddy
7/24
I recall PECOTA really loving guys that are young for the level of play as well. Not like getting there early isn't a huge accomplishment, but sometimes players are as good as they'll ever get.
jjaffe
7/25
Valuing guys who are succeeding when they're young for their level isn't specific to PECOTA - that's a general scouting principal, and something you'll see Kevin Goldstein or Keith Law, John Sickels and Baseball America comment upon when it's relevant.