September 22, 2016
The Great Big 'Beat PECOTA' Wrap
Earlier this year, we released PECOTA projections for every major-league baseball player, and then I asked you to beat those projections. The instructions were simple: Find players you thought PECOTA was too optimistic on, and bet the under; find players you thought PECOTA was too pessimistic on, and bet the over. We called it a game and I promised to learn something from it. Here we are nearing the end of the season, so I’ll fulfill my obligation presently.
First off, though: The Game. At the moment we’ve got a barn-burner. Somebody named BTRLA is winning, with 339.5 points; somebody named Thunder Gun Express has 336 points. (Third place has 234 points.) If just one player flips on Wednesday, the order will be switched and this paragraph will be a relic of a simpler, more-Wednesdayish era. This maybe isn’t that interesting to you, but there’s something you should know about BTRLA and Thunder Gun Express, and the other teams that are closest to them:
By count of players picked, those teams rank 8th, 10th, 16th, 11th, 4th and 7th out of 300 total entries. The team that picked the most players—Abe PECOTA’s Not Dead, with 404 picks—is in 13th place on the leaderboard. I designed the scoring system with the intent to discourage outright guessing by putting a larger penalty (-11.5 points) on a wrong pick than a gain for a correct pick (10 points). Yet picking a preposterous number of players turned out to be just the right strategy. Why? Now we get to what I learned:
1. Y’all really good at beating PECOTA.
Of course, we know it’s not a perfect world. We’re asking you to look at an algorithm that is attempting to fit one worldview onto 2,000 players, 2,000 beautiful and unique snowflakes. No worldview could possibly apply to all of these snowflakes with exactly equal confidence, and the whole premise of this game was that a second worldview—yours!—might be able to identify those players where the system staggers a bit. To use an example, if PECOTA says that Juan Nicasio is going to have a DRA of 4.67, but you and you only know that Ray Searage is going to be his pitching coach, and Ray Searage still has two wishes left from that genie’s lamp he found, you might trust your worldview over PECOTA’s. (Twenty-four out of 25 Beat PECOTA contestants who selected Nicasio said he would be better than PECOTA said. Nicasio’s DRA this year is 3.18.)
But, now, this logic only works if you limit your guesses to those situations where you’re very confident you know something that PECOTA doesn’t. If you just go and pick every player, you’re doing something totally different; now you’re testing your worldview as a universal application over PECOTA’s worldview as a universal application. This is a fun test for you! But it’s a different thing than the synthesis-of-views approach that we might have hypothesized would work best.
Except but for this: Here we are, and BTRLA’s worldview is doing great! He’s got 65 of his picks right and only 30 wrong. (There are 23 that are currently ruled pushes, either because the performance is very close to the projection in either direction, or because the player hasn’t played our minimum number of games this year.) In fact, here are the nine Beat PECOTA contestants who chose at least 100 players, the ones we can conclude were testing themselves against PECOTA rather than looking for isolated instances of algorithmic blind spots:
Note: All pushes ignored.
There are, I think, two things going on here. One is that it’s a lot easier to bet over/under on another guy’s projection than to have to come up with your own from scratch, so the pickers have an advantage. For instance, if PECOTA projects a .900 TAv for Joey Votto and BTRLA would project .950, BTRLA gets credit for .905, .910, .920, .9249, etc., even though in those cases PECOTA was actually more right than BTRLA was. PECOTA is basically playing the contestants’ row game on Price Is Right and never gets to guess last. So that’s one thing. But the other thing is the big one:
2. Bet the over.
Excluding pushes, that’s 61 percent correct picks. Holy friggin! But let’s break it down by over picks and under picks:
There are a three things to note here: One is that people like picking overs. Two is that PECOTA, as far as rate stats go, was persistently low* on the average player. (This is sort of in the nature of the projections. If the aim is to project the aggregate expected value of a large population, then the epic collapses have to be spread across the entire population. There’s more room to underperform than to overperform, so (to simplify things) you might need to allow for three guys to outperform their projections by 10 percent to balance out the one inevitable epic collapse. Remember, PECOTA is not designed with my frivolous “Beat PECOTA” game in mind, but with “projecting baseball landscape accurately, generating useful playoff odds, and the like” in mind.) Those two points aren’t unrelated, but my guess it that point two doesn’t fully explain point one, especially because we’ve seen the crowd-sourced projections at FanGraphs be persistently optimistic, as well. Let’s just sum it up, ungenerously, as this: Our Lizard Brain is optimistic and defaults to Too High. The algorithm is pessimistic and defaults to Too Low. Put those together and you get a 61 percent overall success rate.
But the third thing to note is that when contestants break out of the Lizard Brain, think critically, and decide to bet the under, they do poorly! This is also not unrelated to point two—if the over/unders are persistently set too low, then betting the under is a lousy bet in aggregate—but it’s still interesting. If forces are pushing for over bets, then when somebody bets the under it should represent extra confidence. (If I order a hamburger 99 times out of 100 and the tuna melt just once, we could conclude that I was really craving a tuna melt that day, and should have been primed to love it.) And yet the excess of confidence leads them to wronger places.
Or, to put it another way, to Beat The Algorithm, the best thing to do is not necessarily to engage the human brain but to use another algorithm, one that could be expressed like this: (All The Stuff That PECOTA Does) * 1.10 Or So = Projected TAv.
On the other hand:
3. Our collective confidence level did seem to correspond to correct picks.
So, we know that, overall, picking “over” was around 60 percent likely to be right. Let’s look a the players who we were most confident about betting the over on. (Anybody who is within seven points of TAv, or 30 points of DRA, of their projection is deemed by the rules of the game a push):
Excluding pushes, our gang went 13 for 15, with an extremely wide swath of player/career types. If you include pushes, so that being even one point right (or wrong) counts, the gang still went 15 for 20. (This is all through Monday, by the way. Paul Goldschmidt homered twice Wednesday, after I wrote this, and has probably unbolded himself.)
Within this generally strong showing was, naturally, Wade Davis, the player who I identified in April as the most popular Over pick. Here’s what I wrote:
This is totally unsurprising. I love PECOTA, more than you do probably, but even I consider the Wade Davis projection to be its Nikki & Paulo episode. Not that I can blame it. This is Wade Davis:
Oh, and here’s what else I wrote:
And yet: Maybe we’ll still (almost) all be wrong! Greg Holland was nearly as good as Davis before 2015, and would have been worse than PECOTA’s Wade Davis projection in 2015.
We were wrong! The most confident we could possibly be about a pitcher (Davis) and a hitter (Harper) and we were still wrong, albeit by narrow margins. So one conclusion that you might make from this whole exercise is “if the whole world thinks an algorithmically derived projection is wrong, pay some attention to the whole world.” But another is that “if the whole world thinks an algorithmically derived projection is wrong, pay some attention to the algorithm.”
4. So, my own personal, not-all-that-rigorous conclusions.
It’s also clear that what the crowd has to add is more on the level of “finger on the scale,” not “omniscient God view.” The crowd mostly took advantage of PECOTA’s conservatism, and the crowd did that mostly because the crowd was itself probably too excitable about players. As crowds are. If the crowd is so smart, let’s see it start predicting player collapses a lot better.
But if you have 20 friends who all think you’re wrong about something, you’re probably wrong about it. This seems to be true for algorithms, too.
When it comes to performance, PECOTA really does have a tendency to be conservative—and maybe unreasonably so. It’ll be worth reading the PECOTA release article in January or February to see whether this is something our esteemed stats team thought was worth addressing, and how.
*You might be yelling that if there’s a minimum playing time threshold to qualify, which there is, then PECOTA is set up to be persistently low. As I put it in April, “The minimum number of plate appearances required for a pick to “count” poses a little bit of a loophole. If a player projects to be so bad that if he gets a little worse than his projection he won’t clear the minimum, then it’s almost free money to take the over. If he does hit the over, you win. If he hits his projection, it’s a push. If he hits the under, then he won’t reach the minimum number of at-bats, and that’s a push, too. We aimed to avoid this loophole by setting the minimum projected plate appearances (251) high enough to only allow betting on established players, and by setting the minimum actual plate appearances required to count low enough that even a backup catcher would clear it with ease.” Because of our efforts to close this loophole, I don’t believe it’s a major factor in PECOTA’s persistently low projections. However, there are a few individual cases (Hector Olivera, Chris Colabello) where it seems to have been in play, so it’s also not totally irrelevant.