keyboard_arrow_uptop

Earlier this year, we released PECOTA projections for every major-league baseball player, and then I asked you to beat those projections. The instructions were simple: Find players you thought PECOTA was too optimistic on, and bet the under; find players you thought PECOTA was too pessimistic on, and bet the over. We called it a game and I promised to learn something from it. Here we are nearing the end of the season, so I’ll fulfill my obligation presently.

First off, though: The Game. At the moment we’ve got a barn-burner. Somebody named BTRLA is winning, with 339.5 points; somebody named Thunder Gun Express has 336 points. (Third place has 234 points.) If just one player flips on Wednesday, the order will be switched and this paragraph will be a relic of a simpler, more-Wednesdayish era. This maybe isn’t that interesting to you, but there’s something you should know about BTRLA and Thunder Gun Express, and the other teams that are closest to them:

  • BTRLA: 118 total players picked
  • Thunder Gun Express: 94 players picked
  • PrEposterous COnservaTive Aggregates: 77 players picked
  • Sir nerdlington: 89 players picked
  • MUGATU: 141 players picked
  • Undefeated On Opening Day: 122 players picked

By count of players picked, those teams rank 8th, 10th, 16th, 11th, 4th and 7th out of 300 total entries. The team that picked the most players—Abe PECOTA’s Not Dead, with 404 picks—is in 13th place on the leaderboard. I designed the scoring system with the intent to discourage outright guessing by putting a larger penalty (-11.5 points) on a wrong pick than a gain for a correct pick (10 points). Yet picking a preposterous number of players turned out to be just the right strategy. Why? Now we get to what I learned:

1. Y’all really good at beating PECOTA.
I’m going to throw out all of those picks by players who admitted they hadn’t actually looked at PECOTA before making their selections. (Thanks for your honesty!) That gives us in the neighborhood of 6,400 individual picks, which we can rephrase as individual predictions. In a perfect world, if PECOTA is setting the over/under just right, predictions should be 50 percent right and 50 perfect wrong.

Of course, we know it’s not a perfect world. We’re asking you to look at an algorithm that is attempting to fit one worldview onto 2,000 players, 2,000 beautiful and unique snowflakes. No worldview could possibly apply to all of these snowflakes with exactly equal confidence, and the whole premise of this game was that a second worldview—yours!—might be able to identify those players where the system staggers a bit. To use an example, if PECOTA says that Juan Nicasio is going to have a DRA of 4.67, but you and you only know that Ray Searage is going to be his pitching coach, and Ray Searage still has two wishes left from that genie’s lamp he found, you might trust your worldview over PECOTA’s. (Twenty-four out of 25 Beat PECOTA contestants who selected Nicasio said he would be better than PECOTA said. Nicasio’s DRA this year is 3.18.)

But, now, this logic only works if you limit your guesses to those situations where you’re very confident you know something that PECOTA doesn’t. If you just go and pick every player, you’re doing something totally different; now you’re testing your worldview as a universal application over PECOTA’s worldview as a universal application. This is a fun test for you! But it’s a different thing than the synthesis-of-views approach that we might have hypothesized would work best.

Except but for this: Here we are, and BTRLA’s worldview is doing great! He’s got 65 of his picks right and only 30 wrong. (There are 23 that are currently ruled pushes, either because the performance is very close to the projection in either direction, or because the player hasn’t played our minimum number of games this year.) In fact, here are the nine Beat PECOTA contestants who chose at least 100 players, the ones we can conclude were testing themselves against PECOTA rather than looking for isolated instances of algorithmic blind spots:

  • Abe PECOTA’s Not Dead: 157-129
  • PECOTA: 118-96
  • Spencer’s Superteam: 72-46
  • MUGATU: 63-38
  • rok sox: 53-45
  • Sam Miller Riding A Tandem Bicycle: 51-47
  • Undefeated On Opening Day: 58-32
  • BTRLA: 65-30
  • Cleveland Rocks: 42-35

Note: All pushes ignored.

Everybody won!

There are, I think, two things going on here. One is that it’s a lot easier to bet over/under on another guy’s projection than to have to come up with your own from scratch, so the pickers have an advantage. For instance, if PECOTA projects a .900 TAv for Joey Votto and BTRLA would project .950, BTRLA gets credit for .905, .910, .920, .9249, etc., even though in those cases PECOTA was actually more right than BTRLA was. PECOTA is basically playing the contestants’ row game on Price Is Right and never gets to guess last. So that’s one thing. But the other thing is the big one:

2. Bet the over.
The reason all these Beat PECOTA teams did well is the same reason the entire pool of 6,400ish picks did so well: You guys were all over the Overs, and it turned out to be a very good strategy. Here, for instance, are our 6,400ish results:

  • Right: 2,890
  • Wrong: 1,831
  • Push: 1,655

Excluding pushes, that’s 61 percent correct picks. Holy friggin! But let’s break it down by over picks and under picks:

Overs

  • Right: 2099
  • Wrong: 996

Unders

  • Right: 791
  • Wrong: 835

There are a three things to note here: One is that people like picking overs. Two is that PECOTA, as far as rate stats go, was persistently low* on the average player. (This is sort of in the nature of the projections. If the aim is to project the aggregate expected value of a large population, then the epic collapses have to be spread across the entire population. There’s more room to underperform than to overperform, so (to simplify things) you might need to allow for three guys to outperform their projections by 10 percent to balance out the one inevitable epic collapse. Remember, PECOTA is not designed with my frivolous “Beat PECOTA” game in mind, but with “projecting baseball landscape accurately, generating useful playoff odds, and the like” in mind.) Those two points aren’t unrelated, but my guess it that point two doesn’t fully explain point one, especially because we’ve seen the crowd-sourced projections at FanGraphs be persistently optimistic, as well. Let’s just sum it up, ungenerously, as this: Our Lizard Brain is optimistic and defaults to Too High. The algorithm is pessimistic and defaults to Too Low. Put those together and you get a 61 percent overall success rate.

But the third thing to note is that when contestants break out of the Lizard Brain, think critically, and decide to bet the under, they do poorly! This is also not unrelated to point two—if the over/unders are persistently set too low, then betting the under is a lousy bet in aggregate—but it’s still interesting. If forces are pushing for over bets, then when somebody bets the under it should represent extra confidence. (If I order a hamburger 99 times out of 100 and the tuna melt just once, we could conclude that I was really craving a tuna melt that day, and should have been primed to love it.) And yet the excess of confidence leads them to wronger places.

Or, to put it another way, to Beat The Algorithm, the best thing to do is not necessarily to engage the human brain but to use another algorithm, one that could be expressed like this: (All The Stuff That PECOTA Does) * 1.10 Or So = Projected TAv.

Or, too put it one more way: We beat PECOTA, but we didn’t demonstrate we’re actually smarter. If we were actually smarter, we’d have picked the correct unders, too.

On the other hand:

3. Our collective confidence level did seem to correspond to correct picks.
In this case, we’re going to define confidence level as how many people picked a certain outcome. For instance, only three people bothered to pick Gregor Blanco; two said over, one said under. There is a very low collective level of confidence there, presumably because nobody thinks that much or thinks they know that much about Gregor Blanco. For another instance, 16 people picked Yoenis Cespedes, but eight said over and eight said under. There is a very low collective level of confidence there, as well, but in a different way.

So, we know that, overall, picking “over” was around 60 percent likely to be right. Let’s look a the players who we were most confident about betting the over on. (Anybody who is within seven points of TAv, or 30 points of DRA, of their projection is deemed by the rules of the game a push):

Player

Over/Total

Projected

Actual

Right?

Wade Davis

77/81

3.57

3.72

Push

Bryce Harper

66/66

0.312

0.306

Push

Manny Machado

45/46

0.277

0.299

Yes

Mike Moustakas

43/43

0.254

0.281

Yes

J.D. Martinez

39/41

0.276

0.313

Yes

Mike Trout

38/41

0.331

0.354

Yes

Lorenzo Cain

36/38

0.261

0.258

Push

Carlos Correa

35/37

0.282

0.301

Yes

Jake Arrieta

30/37

3.56

3.94

No

Sonny Gray

36/36

4.02

3.93

Push

Zack Greinke

31/36

3.87

3.44

Yes

Addison Russell

34/35

0.253

0.283

Yes

Clayton Kershaw

27/34

2.99

1.96

Yes

Craig Kimbrel

32/34

3.52

2.7

Yes

Chris Archer

29/33

3.69

2.97

Yes

Paul Goldschmidt

31/33

0.316

0.309

No

Brandon Crawford

30/30

0.249

0.272

Yes

Nolan Arenado

31/32

0.271

0.302

Yes

Xander Bogaerts

29/32

0.265

0.268

Push

Justin Verlander

27/30

4.12

3.09

Yes

Excluding pushes, our gang went 13 for 15, with an extremely wide swath of player/career types. If you include pushes, so that being even one point right (or wrong) counts, the gang still went 15 for 20. (This is all through Monday, by the way. Paul Goldschmidt homered twice Wednesday, after I wrote this, and has probably unbolded himself.)

Within this generally strong showing was, naturally, Wade Davis, the player who I identified in April as the most popular Over pick. Here’s what I wrote:

This is totally unsurprising. I love PECOTA, more than you do probably, but even I consider the Wade Davis projection to be its Nikki & Paulo episode. Not that I can blame it. This is Wade Davis:

• He’s a 31-year-old pitcher with a 3.11 ERA over the past three years, in a pretty good (275) block of innings.

• He’s also a 31-year-old pitcher with a 0.97 ERA over the past two years, in a relatively slim (139) block of innings.

Which, to PECOTA, must be like saying he's simultaneously a hammer and a nail.

You and I get to believe whatever we want to believe, but PECOTA’s got to follow a set of rules that will apply to all baseball players, and there’s not really a set of rules that applies to all baseball players and whatever Wade Davis appears to be. He’s a historical outlier.

Oh, and here’s what else I wrote:

And yet: Maybe we’ll still (almost) all be wrong! Greg Holland was nearly as good as Davis before 2015, and would have been worse than PECOTA’s Wade Davis projection in 2015.

We were wrong! The most confident we could possibly be about a pitcher (Davis) and a hitter (Harper) and we were still wrong, albeit by narrow margins. So one conclusion that you might make from this whole exercise is “if the whole world thinks an algorithmically derived projection is wrong, pay some attention to the whole world.” But another is that “if the whole world thinks an algorithmically derived projection is wrong, pay some attention to the algorithm.”

4. So, my own personal, not-all-that-rigorous conclusions.
I think it’s clear that the crowd does have some wisdom on some players that can be applied to your personal PECOTA usage. This makes sense. PECOTA and the crowd are using two different data sets; PECOTA uses statistical records and large-n tendencies, while the crowd uses all the sorts of things that leak out in beat reporters’ notes pieces: Who isn’t throwing as hard as they did last year, who the scouts are talking up, who was injured last year but finally had surgery, who hit puberty and can grow a mustache now. We as the crowd can add timely reassessments of a player, while PECOTA can provide a foundation for considering a player based on cold, hard reason.

It’s also clear that what the crowd has to add is more on the level of “finger on the scale,” not “omniscient God view.” The crowd mostly took advantage of PECOTA’s conservatism, and the crowd did that mostly because the crowd was itself probably too excitable about players. As crowds are. If the crowd is so smart, let’s see it start predicting player collapses a lot better.

But if you have 20 friends who all think you’re wrong about something, you’re probably wrong about it. This seems to be true for algorithms, too.

When it comes to performance, PECOTA really does have a tendency to be conservative—and maybe unreasonably so. It’ll be worth reading the PECOTA release article in January or February to see whether this is something our esteemed stats team thought was worth addressing, and how.

*You might be yelling that if there’s a minimum playing time threshold to qualify, which there is, then PECOTA is set up to be persistently low. As I put it in April, “The minimum number of plate appearances required for a pick to “count” poses a little bit of a loophole. If a player projects to be so bad that if he gets a little worse than his projection he won’t clear the minimum, then it’s almost free money to take the over. If he does hit the over, you win. If he hits his projection, it’s a push. If he hits the under, then he won’t reach the minimum number of at-bats, and that’s a push, too. We aimed to avoid this loophole by setting the minimum projected plate appearances (251) high enough to only allow betting on established players, and by setting the minimum actual plate appearances required to count low enough that even a backup catcher would clear it with ease.” Because of our efforts to close this loophole, I don’t believe it’s a major factor in PECOTA’s persistently low projections. However, there are a few individual cases (Hector Olivera, Chris Colabello) where it seems to have been in play, so it’s also not totally irrelevant.

You need to be logged in to comment. Login or Subscribe
whitakk
9/22
How did the people who didn't look at PECOTA do in aggregate?
lyricalkiller
9/22
Great question! They got 56 percent of their picks right, mainly because they also picked more overs than unders. They did do worse than the informed pickers when you break it down-- -67 percent right on overs (compared to 68 percent) -44 percent right on unders (compared to 49 percent) but came close, which is confirmation of some things I wrote and makes me think more deeply about others. Thanks for asking this.
russell
9/23
I'm wondering if this has to do with a tails issue, where the mean performance estimate for the typical player is is lower than the median performance estimate. Any chance you looked into this?
russell
9/23
If PECOTA tends to have higher median performance then mean performance, I can make money by always betting on the over.
drpjlang
9/25
A thought experiment: do a 'search and replace' on all articles about PECOTA, and substitute the word 'conservative' with the word 'wrong', and see what how they read. Manager, PrEposterous COnservaTive Aggregates
mdickson
9/27
BTRLA (Baton Rouge, LA) right here!!
mdickson
9/27
I don't mean to be gauche (h/t KG), but was a prize ever decided for the winner?
mdlehrman
10/01
Jonathan Herrera's Curtain Call here (currently #28). I just want to publicly apologize for picking the under on Rich Hill. I should have picked the over on him, and I should have picked way more overs, in general. I don't even know why I did that for Hill. I love Rich Hill. It must have been a mistake. Fun contest!