Oh boy, it’s playoff time! Time for all of the baseball prognosticators out there to find that perfect little factoid that no one else has noticed about each series. It needs to be slightly surprising and counter-intuitive so that the reader is entertained by your erudite knowledge of the game, not to mention your use of the word “erudite.” You also need to be able to make a case, probably though some questionable logic, that this factoid will, over the next five games between these two teams, not only make a difference in the outcome of the series, but will be the difference between the teams. You get bonus points if you refer to someone really obscure as an X-factor.
But if you can’t think of an X-factor that’s hipster enough, you can always fall back on the old standbys. Team A will win because they clinched early and are well-rested. Team B will win because they had to fight all the way through September to the last day and are already in playoff mode. What’s fun is that you’ll hear both explanations cited. The Braves built a 15-game lead over the Nationals in August, and it was a foregone conclusion that they would win the NL East. The AL Wild Card race, on the other hand, ended up being oodles of fun (and it’s not done yet!). We’ll find out later this week who gets to be the fourth entrant into the ALDS round. I guarantee that you’ll hear both the Braves and the Indians/Rangers/Rays picked to win (and lose) their respective series for these completely opposite reasons, the same way that the older, more experienced team and the younger, hungrier team will win.
It’s an argument that comes up a lot. Some teams haven’t had to break a sweat yet, while others have fought tooth and nail to get to this moment. Those who argue that teams who have had to fight have an advantage are implicitly arguing for something called stress inoculation theory. The playoffs are a big stage, and you can’t blame players for being a little nervous when they step onto the field. We already know that players do show some different results than we might expect in the postseason. There’s only one October!, after all. In the same way that things that once made you nervous get less nerve-wracking after you’ve done them a few times, playing big games in September makes it easier to deal with the butterflies in October.
You might call it momentum, because the teams that just squeaked into the playoffs generally played in a bunch of critical games, but also won them. That’s how they got in. But really we’re arguing that this toughened the up in some way. They have already faced down the monster and won. Why not do it again? On the other side, we might cite the Beane Excremental Effectiveness Theorem. We know that anything can happen in a short series. So there’s probably already a really good chance that our momentum-having team could win the series by chance. Maybe all this talk of momentum is a bunch of excrement.
Shall we find out together?
Warning! Gory Mathematical Details Ahead!
Let’s talk about meaningful games in September. I considered a game to be meaningful for a team if it met a few criteria
- The team entered that day’s game(s) having not clinched a playoff spot, either division or wild card. If the team is leading their division and has clinched “the worst that can happen is a wild card” then they are considered to have clinched.
- The team was within three games, in either direction, of a playoff spot. So, if a team was more than three games back, both in the division race and the wild card race, the game that day isn’t “meaningful.” Same goes for having a lead of more than three games. However, that can change from day to day.
- There’s an available playoff spot that the team could still obtain (i.e., there hasn’t been a clinch on their division and the available wild cards). It’s useless to be three games back with two to play.
I ran my analyses a couple of different ways. I counted how many of each team’s last 15 games were “meaningful” and counted how many games in September (or any regular season games in early October) were also “meaningful.” The answers didn’t change. Here, I’m reporting the results for the “last 15 game” variation.
To test whether playing a lot of meaningful games in September made much of a difference, I looked at the play-by-play for all playoff series from 2003-2012. Using regular season stats, I used the log-odds ratio method to control for the quality of players on each team. This is necessary because a good way to not play a lot of meaningful September games is to win 100 games, and to do that you need some really good players. I looked at encounters in which a pitcher who faced 250 batters during the regular season was facing off against a batter with 250 plate appearances in the regular season, and calculated the expected outcome of a matchup between those two ending in several different outcomes (strikeout, walk, HBP, single, double/triple, HR, or out in play.) I then added in the number of “meaningful” stretch games played by the batting team and the number played by the pitching team and loaded it into a few logistic regressions.
The answer was that there was some evidence to suggest that for the pitching team (but not the batting team), having survived a lot of should-really-win games over the last few weeks of the season predicted better results. I’m being a little bit liberal with my p-values here in some cases, but strikeouts (p = .197), walks (p = .018), HBP (p = .090), extra-base hits (p = .096), and home runs (p = .252), all registered as at least worth mentioning, and all but HBP went in the direction favoring the pitching team. P-values for crunch-time games for the batting team consistently hovered in the .80 range.
If you compare a team that had no drama coming down the stretch (zero meaningful games in its last 15) to a team that was playing for its life every night (15 such games), the effect sizes are actually on the order of a percentage point for walks and strikeouts. We would expect that the pitchers who played on the team that’s “already in playoff mode” would actually get about a percentage point more on these outcomes than we might expect based on their regular season stats and those of the batters they face. That’s a lot, something on the order of .20 or .25 in runs per nine innings based on walks and strikeouts alone. This is a fairly substantial effect.
I will happily admit that some of those p-values are concerning. For the initiated, like a lot of the “psychological” factors that I study, they generally fell into the category of non-significant because of a high standard error, rather than low beta values. This is common when you have an effect that will influence people in a broad range of ways, but the fact that there’s enough signal poking its head out makes me think that there might be something to this.
To rule out a couple of competing hypotheses, I also looked at how many of these “meaningful” games each team had won over its last 15 games. The results were largely the same, mostly because winning a lot of meaningful games is more likely to happen when you play a lot of them. I also looked to see whether this was a matter of one team simply having a comparative advantage over the other in stress inoculation. I coded whether the batting team or the pitching team had more meaningful games down the stretch (or whether they were tied). In this case, maybe it’s the same effect if you’ve played two meaningful games to your opponent’s none vs. playing 15 against none. This didn’t come up significant for anything. It’s not whether you had more drama down the stretch than your opponent. It appears that the raw number of meaningful games (the amount of practice you already have in “playoff mode”?) drives the findings.
Finally, I limited the games under consideration to just Division Series games. This has the unfortunate side effect of throwing away about 70 percent of the sample of plate appearances, but then again, by the time you get to the LCS, everyone’s played some meaningful games. Maybe that’s muddying up the sample. The effects for “meaningful” games on the pitching team’s side for strikeouts (p = .271) and walks (p = .193) are still kind of worth mentioning. It’s hard to tell whether they went up based on a loss of statistical power or because the original findings were just a fluke.
So, the Braves and Dodgers are doomed, right?
No. Although since they play each other, I feel confident in predicting that one of them will lose. (See, the theory works!)
Everything in this study should be interpreted alongside the phrase “relative to expectations.” It looks like fighting for a playoff spot gives a pitching staff a little extra boost over what we might otherwise expect from them. It does not, however, guarantee a win in the series for that team. The team that had no drama down the stretch may have an amazing group of players, and those players are still amazing. Talent is still the main driver of results. And for what it’s worth, it should be said again that these findings aren’t as clean cut as I’d like them to be. They should also have the qualifier “I think there’s something here” tacked on. But I don’t want to completely shoo these findings away, either.
These data provide support (even if lukewarm support) for the idea that facing adversity down the stretch does make a team better in a measurable way. The data suggest that the effects can be large, but are probably highly variable across different players. That makes sense because… well, people are different. I’m at a loss to explain why only the pitchers on the team seem to reap the benefits, but that’s what the numbers tell us.
There have been a number of commentators who have noted that since the introduction of the Wild Card in 1995, the Wild Card team seems to have won an inordinate number of playoff series, especially for a team that was (by definition) not good enough to win its division and often did have to sneak into the playoffs in September. That can be explained by invoking the twin arguments of “small sample size” and “Well, but they were good enough to win 90 games during the regular season.” But maybe there’s something else going on. In the sabermetric community, we’re often too quick to dismiss that the human element can have an effect. Maybe stress inoculation really does work. If we’re going to call ourselves proper scientists, we need to be open to that possibility.
Recently, former pitcher and current broadcaster, (fantastic) author, and blogger Dirk Hayhurst wrote a piece in which he argued for the existence of momentum, based on his experiences as a player. Mr, Hayhurst has consistently been very open-minded when discussing advanced baseball analytics and is no old-school ideologue. Near the end of this piece, he wrote this line in which he nailed the problem exactly:
In an increasingly numbers-driven baseball environment, managing the human aspect of the game is often sold short. It doesn’t help that this human aspect typically gets wrapped in tired clichés by lumpen broadcasters.
The problem with a lot of the trite clichés invoking the human element isn’t that they are completely false. (Some of them are.) The problem is that they take something very complex and idiosyncratic like the human element within a multi-layered situation like a baseball game and turn it into a bumper sticker sized aphorism, and when you just test the aphorism unto itself, of course it turns out to be false. When that broadcaster trots out a cliché in the next few weeks, maybe his sin isn’t that he’s peddling nonsense. Maybe he’s over-simplifying something that’s real—just more complex. I’d argue that’s what we have here.
Does momentum matter? Yeah, I think it does. At least that’s what the numbers tell me. The effect still has a lot of noise in it, and it is not magic pixie dust that turns quad-A players into Hall of Famers. If we had more data, maybe we could pull apart who might benefit from it and who might not. But if you peer deep into the numbers, it looks like there really is something there.
Thank you for reading
This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.Subscribe now
It also doesn't help that about half the time a broadcaster talks about momentum, it's about the momentum shifting. Just like most of the time a player gets a weak hit that is supposed to be the one to end a slump, it isn't. My conclusion is they're not proposing any theory, simple or complex. They're just repeating things with zero regard for accuracy.
One, you do so many studies that some of them will produce false results even with significant or close to significant P values? Don't you have to look at each study in the whole body of similar work?
Two, surely there is publication bias on top of that (you do a study, find nothing, and are hesitant to write an article about it, for good reason), although the first one is enough to produce lots of false positives.
To those of you who don't know what the first one is, consider this:
Of all the studies on sample data where we are relying statistical tests to determine whether an effect really exists or the measurement is just a random artifact (in this case estimated by his P values in the regressions), of which there are tens if not hundreds of thousands, many of them get "positives" by chance alone. That is by definition.
If I conduct or may people conduct a total of 100 studies and publish all the results AND unbeknownst to us, the effects that we are looking for do not exist, what is going to happen? Well, in 4 of those studies we will likely conclude that an effect exists (reject the null hypothesis) at 2 sigma with a 2-tailed test, and even at 2.5 sigma, we are left with 2 positives.
And that is not even considering the second thing, publication bias, which is that maybe 1000 tests were actually conducted and many of the negative ones never got reported and all of the positive ones did. And there are going to be a lot of false positives in those 1000 experiments. So of my 100 published tests, I might have maybe 20 or 30 false positives!
One antidote to this, which I also must ask Russell if he uses, is Bayes. Do you have any a priori going into these studies, even if they are rough estimates? I suggest that you should. What are they? Well, for example, if the body on research in the past suggests that psychological things have very little effect on the outcome of a baseball game, and I think that is a fair statement, then we should very much go into a study like this with a priori that suggests that our hypothesis is likely not true. If we do that, even in a conservative fashion, I think you will find that your P values will not hold up.
What say you? You are the expert at these things, not I.