Overthinking It: Spoiling the Bunch

May 3, 2012

If you’ve paid any attention to the 2012 season, you know that Albert Pujols has yet to hit a home run. The three-time MVP, fresh off the first homerless month of his career, is hitting just .208/.252/.287 with career-worst walk and strikeout rates. Jered Weaver’s no-hitter last night temporarily deflected some attention away from Albert’s struggles. But while Weaver mowed down Minnesota, Pujols’ homerless streak was extended to 107 plate appearances, ensuring that scrutiny of his every swing will only intensify once the no-hitter hubbub dies down.

Pujols averaged 39 home runs for the Cardinals over the past five seasons. After factoring in some age-related decline and the difficulty of hitting home runs from the right side in Angel Stadium, PECOTA projected him to hit 33 in 2012. The probability that a 33-home-run hitter would go homerless over 107 plate appearances by chance alone is just .3 percent. Either Pujols has been extremely unlucky, he’s declined more quickly than PECOTA expected, or he’s pressing at the plate.

Privately, Pujols is probably feeling some pressure. Publicly, though, he claims to be unconcerned. “I don’t think about that, man. It could be tomorrow, maybe the next day, a month from now, I don’t know. My job is to get myself ready to play and take my swing. Home runs, when they come, they come in bunches.”

At this point, Pujols would probably settle for hitting homers in dribs and drabs, let alone bunches. According to his comments, though, when he does start hitting homers, they’ll add up in a hurry. But can Albert be believed?

The belief that home runs are hit in bunches—in other words, that they’re hit in flurries followed by droughts, rather than at regular intervals—isn’t unique to the struggling Angels star. When Bryan LaHair went homerless this spring, Cubs manager Dave Sveum said, “People forget that home runs come in bunches.” Since then, they have for LaHair, who has hit six in the regular season, if not for the Cubs, who have collectively hit fewer home runs than Matt Kemp. But the history of “home runs in bunches” goes back well beyond Bryan LaHair. Writers and players alike have been referring to the idea at least since the middle of last century: in 1958, Willie Mays said, “When I hit home runs I get them in bunches and then no more for a time.”

Is there anything to this, or is “home runs are hit in bunches” another baseball myth that deserves to be busted? Google “clutch hitting,” “pitching to the score,” or a host of other time-honored baseball beliefs, and you’ll find countless studies that have tried and failed to find any statistical evidence supporting them. The contention that homers are hit in bunches seems to have escaped investigation so far, but it’s just as easy to check.

Using a statistical concept called binomial distribution, we determined the theoretical rates of zero-, one-, two-, three-, and four-homer games for the average major-league batter. By comparing those predicted rates to how often those games actually occurred, we could see whether there was anything to the idea that home runs are hit in bunches. If players actually alternate between home-run hot streaks and dry spells, their long balls would be bunched together, and we would see higher rates of two- and zero-homer games and lower rates of one-homer games than predicted.

Over small samples, of course, some players do have more two-homer games than predicted. In 78 games last season, Mike Cameron hit nine home runs, six of which came in three two-homer bunches. Those three two-homer games were about 2.6 more than the model would have predicted. Cameron was one of five players to have at least two more two-homer games than he “should” have in 2011:

Name	PA	G	Pred. 2-HR G	Actual 2-HR G	Difference
Russell Martin	476	122	0.96	4	3.04
David DeJesus	506	127	0.31	3	2.69
Ian Kinsler	723	155	2.33	5	2.67
Mike Cameron	269	78	0.42	3	2.58
Mark Reynolds	620	155	3.00	5	2.00

Over larger samples, though, we don’t see correspondingly large differences. Of the 211 players with at least 3000 plate appearances from 2002-2011, only nine had at least five more two-homer games than expected:

NAME	PA	G	Difference
Vladimir Guerrero	6015	1414	8.68
Carlos Beltran	5851	1344	7.22
Chase Utley	4778	1109	6.66
David Ortiz	6039	1411	6.26
Adam LaRoche	4022	1021	5.89
David Wright	4783	1106	5.54
Ty Wigginton	4526	1176	5.38
Vernon Wells	6295	1455	5.36
Andruw Jones	5083	1250	5.11

It’s possible that Vlad’s homers had some slight tendency to be “bunched,” but even in his case, it’s likely that the difference was due to chance.

So what’s the verdict when we look at home-run distributions for all players? The following table shows the predicted and observed percentages of games in which an average major-league batter hit each number of home runs from 1994-2011. The model predicted that the average player would go homerless in 89.29 percent of his games, hit one homer in 9.99 percent of his games, and hit two homers in 0.68 percent of his games. The predicted and observed results are almost identical, and the slight differences aren’t significant.

	0 HR	1 HR	2 HR	3 HR	4 HR
Predicted	89.29	9.99	0.68	0.03	0.00
Observed	89.09	10.17	0.71	0.03	0.00

Here are what those percentages look like for Pujols’s career. As one would expect, both the theoretical model and the in-game results show that he’s been much more likely to go deep than the typical player, but he hasn’t had more multi-homer games than expected.

	0 HR	1 HR	2 HR	3 HR	4 HR
Predicted	75.75	21.57	2.52	0.15	0.01
Observed	75.57	21.77	2.41	0.26	0.00

So why do Pujols and so many other players mistakenly believe that they’re hitting home runs in bunches? A cognitive bias called the "availability heuristic" might be to blame. According to Amos Tversky and Daniel Kahneman, the psychologists who coined the term, the availability heuristic is our “tendency to make a judgement about the frequency of an event based on how easy it is to recall similar instances.” The easier it is to summon instances of an event to our minds, the more often we believe that event to occur. For hitters, few events are more memorable than a multi-homer game or a long stretch without hitting a homer, so it’s not surprising that those events seem to them to happen more often than they do.

Home runs aren’t really hit in bunches, but it’s probably in the Angels’ best interests not to burst Albert’s bubble. There could be some psychological benefit to believing in bunches. In the midst of a home-run barrage last May, Mark Teixeira explained his success by saying, “Home runs come in bunches, and right now I’m just in one of those streaks where I’m hitting them out of the park a lot.” After ending the longest homerless stretch of his career in July of 2009, Teixeira used the same reasoning to explain his struggles: “I’m a streaky home run hitter. They come in bunches, and after hitting a bunch in a row, it took a while to get another one.”

Teixeira’s all-purpose explanation suggests that while hitting homers in bunches isn’t fact, it is a useful fiction. One of the most important qualities for a hitter to have is confidence, and the “bunches” belief provides a confidence boost for any occasion. A player who has homered recently can go to the plate believing he’s mid-bunch and about to hit another. A player who hasn’t homered in ages can console himself with the thought that a bunch of long balls could be a game away. What Albert Pujols could really use right now is a homer. But some confidence can’t hurt.

Colin Wyers provided research assistance for this article.

A version of this story originally appeared on ESPN Insider .

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Ben Lindbergh

More about:

Latest Articles

You need to be logged in to comment. Login or Subscribe

geer08

5/03

Do you think Pujols (and other 'bunches' proponents) meant per game, or in any given week or month? Kemp just hit 12 in April, but with only one 2-homer game.

As with all April stats and streaks, would this be anywhere near as big a story if a) Albert were still a Cardinal or b) this happened in August? Or May, even?

Reply to geer08

bornyank1

5/03

John, I don't think Pujols and others are referring to multi-homer games specifically. However, the way Colin Wyers put this to me when I asked him about it is, "A game is just a really short streak." In other words, if hitters went through periods where they were "bunching" homers, we'd see that reflected in the rate of multi-homer games as well as over longer stretches. Looking at it this way was less computationally intensive.

This wouldn't be as big a story if Pujols hadn't just changed teams and signed a giant contract, or if it had happened later in the season. But it's definitely reached the point at which it's a valid cause for concern.

Reply to bornyank1

geer08

5/03

Thanks Ben. Any data on NL hitters moving to the AL? Also, how has Angel Stadium played in April? Seems the announcers referenced a damp, thick air at night in April (heard this during the Orioles series there from Palmer or Thorne)...anything to it?

Reply to geer08

sensij

5/03

A game is not just a short streak, it is a really, really short streak. Perhaps a way to do it would be to look at a histogram of AB's between homers. Conceptually, hitting them in streaks would show up as bi-modal... a cluster around the streak, and a cluster around the typical non-streak. You could normalize each player by his average AB's between homers, and test for a trend among all players.

Reply to sensij

flyingpickle

5/03

I get that theory. I just don't think it's useful. If a player picks up a homer in his 3rd PA of a game, he's probably only got one more chance to pick up a 2nd HR in that game. So the window of opportunity to see a "bunching" trend is pretty limited.

I understand why it's "less computationally intensive". I just want somebody to do that heavy lifting! ;-)

Good article. Thanks Ben!

Reply to flyingpickle

rmnd3b

5/03

Nice work. One question though-could the "bunches" idea be applied to multiple home runs over a span of two or three games rather than just looking at single multi-HR games? That was the first thing that popped into my mind when I read the first paragraph.

Reply to rmnd3b

bornyank1

5/03

Thanks. See my comment above.

Reply to bornyank1

rmnd3b

5/03

Yeah-I think we posted the questions at the exact same time :P

Reply to rmnd3b

zasxcdfv

5/03

Great article - absolutely worth my 11 cents today!

Reply to zasxcdfv

aaronbailey52

5/03

Pujols 2011: hr/ games
3/31-4/13: 1/12 drought
4/14-4/23: 6/9 bunch
4/24-5/29: 1/33 drought
5/30-6/7: 6/8 bunch
6/8-8/21: 17/51 steady
8/22-8/30: 0/7 drought
8/31-9/1: 3/2 bunch
9/2-9/28: 3/25 drought

2010: hr/games (6 multihomer games)
4/5-4/12: 5/7 bunch
4/14-5/26: 3/39 drought
5/27-6/6: 6/10 bunch
6/7-6/26: 1/17 drought
6/27-7/3: 5/7 bunch
7/4-7/30: 3/22 drought
7/31-8/27: 12/23 bunch
8/28-9/7: 0/10 drought
9/8-9/12: 4/5 bunch
9/13-9/22: 0/9 drought
9/23-9/26: 3/4 bunch
9/27-10/3: 0/6 drought

bunch: 60 homers in 75 games (130 hrs per 162 games = God himself)
drought: 12 homers in 174 games (11 hrs per 162 games = Casey Kotchman)
steady: 17 homers in 51 games (54 hrs per 162 games = prime Pujols)

Reply to aaronbailey52

aaronbailey52

5/03

I carried an extra 10 somehow. His bunch pace is actually 50/75 games, or only 108 hrs per year. Doesn't change the Pecota comp, though

Reply to aaronbailey52

flyingpickle

5/03

As others have posted, I'd be curious in seeing something like the following...

Following a game in which a player homered, what is the predicted chance he will home in the following game and what is the actual chance he homered in the following game?

Repeat for the chances (predicted and actual) that he would/did homer within either of the next TWO games.

And, maybe for good measure, extend that out to the next 3 or 4 games.

That's what I'd be more curious to see rather than chance of them having multi-homer games.

Reply to flyingpickle

piraino

5/03

+ 1

Reply to piraino

craig643

5/03

I think you have to adjust for park effects, no? My guess is any additional "bunching" in observed vs. straight binomial would be, at least in part, due to park effects (both within a game and within a multigame series).

Reply to craig643

BurrRutledge

5/03

Ben, I think we can all agree that home runs are not distributed evenly, yes? Pujols may hit 39HR/162 games, but that doesn't mean he hits 1HR Every 4 games or so. It's not clockwork. It's not even Old' Faithful.

Therefore, if it's not distributed exactly evenly, then they must be unevenly distributed... in bunches. QED.

Reply to BurrRutledge

dodgerken222

5/04

Problem isn't that Dee Gordon has more HRs than Albert...problem is that Dee has more RBIs too.

Reply to dodgerken222

piraino

5/04

The really smart question here would be whether home runs are serially random, or are they bunchier (positive serial correlation) or less bunchy (negative serial correlation) than they would be if they were serially random. Whip out your ARIMA modeling tool kit. Nonetheless I think this article does a good job showing a more intuitive analysis that is related to the "smart" analysis. Given the findings of this article, I would be surprised if it is possible to reject a null hypothesis of no serial correlation out to a lag of, say, 50 plate appearances. Surprised, but not shocked. It's still worth doing.

Reply to piraino

Dodger300

5/04

Others have touched on this, but in a million years it would have never have occurred to me that when a player says he hits home runs in bunches, it meant in the SAME GAME.

The same home stand, the same week, the same month, okay. But not the same game. Talk about a small sample size.

I'm not the least bit surprised that the analysis discovered basically nothing. A lot of effort por nada.

However, coming up with a column named "Overthinking It" has now been revealed to have been a perfect exercise in thought! :)

Reply to Dodger300

bornyank1

5/04

Believe me, I get it. I'm aware that Pujols wasn't referring to multi-homer games specifically. That was my first reaction upon seeing the data, too. Most people would take "bunches" to mean many homers over a short period of time. But if you're going to get that kind of distribution, you're also going to get a greater-than-expected number of multi-HR games. It's extremely unlikely that you'd have one without the other.

Reply to bornyank1

Dodger300

5/05

Through April 28, in his forst 20 games Giancarlo Stanton had zero home runs and was sporting a pathetic OPS of .598.

Six games later Stanton has four home runs and has raised his OPS to .804.

That is hitting home runs in bunches. But there was not even a single multiple home run game in the "bunch."

Reply to Dodger300

bornyank1

5/05

You're right. But you're also talking about one example of a bunch, for one particular player, whereas I was talking about every player over a sample of 18 seasons. I don't deny that home runs are occasionally hit in bunches, just as hits are sometimes clutch and pitchers sometimes pitch just well enough to win. What we have to know, in order to support or refute a blanket statement like "home runs are hit in bunches," is whether those events are consistent and repeatable. if Stanton and other players always hit home runs like that--in bunches, followed by long dry spells--it's very unlikely that those bunches would not result in more multi-homer games than predicted.

Reply to bornyank1

BurrRutledge

5/05

What if we were to look at this from the other side? How would a researcher set out to prove that home runs are evenly distributed?

One way would be to investigate the likelihood of multiple-HR games vs. the observed data. But I think we're in agreement that multi-homer games will not going capture the essence of what we all think of as "in bunches."

So, what other ways might one go about it?

Reply to BurrRutledge

Dodger300

5/08

I doubt very much that home runs hit in bunches will prove to be as speculative as clutch hitting or pitching to the score is. It would be extremely premature to make such an assumption at this point.

However, until someone gives it some serious study, I guess that none of us know.

I think that study will definitely need to deal with longer periods of time, probably up to one month. One game just seems too limited. It is barely more revealing than trying to determine the answer by looking at how often home runs in consecutive at bats.

BTW, sure, it is anecdotal, but Giancarlo Stanton now has raised his OPS from .598 to .840 at this moment, by hitting six home runs in his last nine games. That's a bunch.

And Matt Kemp is now oh-for-May.

Reply to Dodger300

Overthinking It: Spoiling the Bunch

Thank you for reading

Latest Articles

Fantasy Starting Pitching Planner ’24: Week Four $

Next Man Up: Week Four $

Something’s Off $

MLU: ‘Tugboat’ Wilkinson is Cruising $

TA94: April $

Ben Lindbergh

More about:

Latest Articles

Fantasy Starting Pitching Planner ’24: Week Four $

Next Man Up: Week Four $

Something’s Off $

Thank you for reading

Related Articles

Latest Articles

More about:

Latest Articles

Related Articles