keyboard_arrow_uptop
Image credit: USA Today Sports

Note: This is based on a presentation at the SABR Analytics Conference on Saturday, March 10. Audio of the presentation is here and presentation slides are here.

Part 1 of this article described the history of attempts to identify clutch hitting, dating back over 40 years. While the general sabermetric consensus is that clutch hitting does not exist as an identifiable and replicable skill, the topic remains controversial.

Using play-by-play data, we identified the difference between situation-dependent (i.e., weighted by win probability) and non-situation-dependent (neutral) run creation for 8,963 batters with 500 or more plate appearances since World War II. We calculated a z-score for each player to identify whether his performance was clutch (positive z-score) or unclutch (negative).

To consider the existence (or not) of clutch hitting as a skill, we looked at the distribution of z-scores for individual hitters. We first identified the players who were, by our methodology, most frequently among the best clutch hitters. Bill James once noted that a good statistic is one that gives us new insight while confirming much of what we already knew. His first mass market Baseball Abstract, published after the strike-shortened 1981 season, introduced his Runs Created formula to a wide audience. Among the top hitters in baseball, per Runs Created, were familiar names such as Mike Schmidt, Andre Dawson, George Foster, Keith Hernandez, and Eddie Murray. Some of the leaders, such as Dwight Evans, Rickey Henderson, and Bobby Grich, were surprises to many. We learned that players who draw walks and hit doubles can be as valuable as those who hit homers and get a lot of RBIs. Runs Created provided new information while confirming much of the then-extant previous knowledge.

The list of the best hitters by our clutch metric is, well, curious. Here are the players who finished in the theoretical top five percent (i.e., z-score 1.645 or greater) for clutch hitting most frequently:

Taking nothing away from Bert Campaneris, he seems an odd choice for a consistent clutch leader. The list of players to finish in the top five percent four times includes some expected results along with several head-scratchers.

The list of players most frequently in the bottom five percent is similarly inconsistent:

Several of those 28 players were stars. Six are in the Hall of Fame, seven are on a Hall of Fame trajectory, and four more would likely be in were it not for PEDs. Are they all unclutch?

These lists would seem to violate James’ observation that statistics should confirm much of what we already know. (Or, alternatively, an example of Twyman’s Law: If a statistic looks interesting or unusual it is probably wrong.) Do Bert Campaneris and Manny Ramirez represent the opposite zeniths of clutch hitting skill? Or is clutch hitting random, varying widely from year to year, more indicative of chance than a skill?

The second feature we considered is consistency. A valid skill should be statistically replicable, occurring with some regularity. The five percent threshold above is high, but it is not unusual for the top performers in the game. Rod Carew finished in the top five percent of batting average 11 times. Ted Williams was in the top five percent of walk rate nine times. Harmon Killebrew was in the top five percent of home run rate eight times. Randy Johnson was in the top five percent of strikeout rate 12 times. Excellence is rare, but it does exist, and if clutch hitting is a skill, we should not be surprised to see batters consistently appear at the extremes of the distribution.

What we should be surprised to see, though, is batters who appear at both extremes. We don’t expect Rod Carew to bat .225, or Ted Williams to walk in four percent of plate appearances, or Harmon Killebrew to hit one homer every 70 at-bats, or Randy Johnson to strike out four batters per nine innings. Even for one season.

However, we found it not uncommon for batters to hit well below average in clutch situations in some seasons and well above average in those situations in other seasons. From 1946 to 2017, there were 120 players who appeared in the top five percent and the bottom five percent at least once in their career. And a dozen were in both the top five percent and the bottom five percent two times or more:

To provide an example of what a top five percent and bottom five percent year looks like from a clutch perspective, we’ll examine the five seasons noted above for David Ortiz, using his plate appearances with a leverage index above 1.5. (For a frame of reference, these high-leverage plate appearances accounted for 18 percent of plate appearances in 2017 and, as noted earlier, batters performed almost identically in high-leverage and all other plate appearances.)

We’ve also included FanGraphs’ “clutch” metric, which, similarly to our z-scores, compares a player’s Win Probability Added (WPA) in high-leverage situations to his overall performance. “Clutch” values above 1.0 are considered clutch, and below -1.0 considered unclutch.

  • 2005: z-score 2.66: 3.31 Clutch, 1.312 OPS in high-leverage situations, .928 in other plate appearances
  • 2006 z-score 1.81: 1.48 Clutch, 1.100 OPS in high-leverage situations, 1.037 in other plate appearances
  • 2007 z-score -2.25: -1.68 Clutch,1.019 OPS in high-leverage situations, 1.079 in other plate appearances
  • 2011 z-score -2.70: -1.61 Clutch, .846 OPS in high-leverage situations, .981 in other plate appearances
  • 2013 z-score -2.07: -1.08 Clutch, .867 OPS in high-leverage situations, .983 in other plate appearances

It is tempting to think that clutch hitting is a skill that players acquire with age, but every one of the above batters’ good and bad years (other than Ortiz and Rose) were interspersed throughout their careers. It is incongruous to suggest that a skill that certain players possess would vary so wildly over a player’s career. Often, players acquire power and plate discipline while losing speed and fielding ability as they age. But players do not swing from one extreme to another over the course of their career for these replicable skills. Gary Sheffield was one of the best clutch hitters in baseball when he was 21 and again when he was 33. He was one of the worst when he was 27 and 31. That’s just not how replicable baseball skills work.

Conclusions

This analysis, which builds on Pete Palmer’s and Dick Cramer’s research from the 1970s with a much more robust data set, illustrates continued difficulty in identifying clutch hitting as a replicable skill.

Again, this is not to say the clutch hitting does not exist. Over the course of our research, we noted some remarkable clutch hitting seasons. Pete Reiser in 1946 batted .338/.421/.541 in high-leverage plate appearances and .248/.333/.383 in his other plate appearances. Charlie Maxwell in 1960 hit .281/.348/.587 in high leverage and .222/.315/.391 otherwise. Both seasons equated to a z-score over 4.0, a 0.003 percent probability.

Three players had z-scores on the other side of the spectrum as well: Bill Mueller in 2003 (-3.03 Clutch, .223/.297/.330 in high leverage, .349/.421/.586 otherwise), Alex Rodriguez in 2008 (-3.13 Clutch, .264/.372/.434 in high leverage, .312/.398/.609 otherwise), and Aaron Judge in 2017, whom FanGraphs’ Travis Sawchik proclaimed “the least clutch player on record” (-3.64 Clutch, .219/.361/.500 in high leverage, .298/.435/.655 otherwise). Every year, batters have notable success and failure in clutch situations.

However we found no evidence that clutch hitting is a replicable skill. For that to be the case, we would see players repeat at the top (or bottom) of the charts, year after year. These are the year-by-year z-scores of the aforementioned Ortiz, heralded as the greatest clutch hitter of this generation, possibly all time:

Season Z Score
2003 +0.04
2004 -1.34
2005 +2.66
2006 +1.81
2007 -2.25
2009 +0.14
2010 -0.67
2011 -2.70
2013 -2.07
2014 +0.94
2015 -1.38
2016 -1.40

There is no consistent pattern of success (nor failure).

As we have illustrated, clutch performance for players often swings wildly from one season to the next. This, we feel, is a greater indictment of the concept of clutch hitting than is the apparent clutch-ness of Bert Campaneris, Vince Coleman, and Tony Womack, and the apparent unclutch-ness of Manny Ramirez, Frank Thomas, and Barry Bonds. A viable statistical metric must be replicable, with results generally consistent over time. Our measure of clutch hitting—the excess performance of a hitter in high win-expectancy plate appearances compared to others—fails to meet this test. We therefore echo Cramer’s conclusion from 41 years ago that while clutch hitting may exist as a feature, it does not exist as a repeatable skill.


Pete Palmer is the co-author with John Thorn of the Hidden Game of Baseball and co-editor with Gary Gillette of the Barnes and Noble ESPN Baseball Encyclopedia (five editions). Pete worked as a consultant to Sports Information Center, the official statisticians for the American League from 1976 to 1987. Pete introduced on-base average as an official statistic for the American League in 1979 and invented on-base plus slugging (OPS). He won the SABR Bob Davids award in 1989, was selected by the SABR in 2010 as a charter member of the Henry Chadwick Award, and is the 2018 recipient of the SABR Analytics Conference Lifetime Achievement Award.

You need to be logged in to comment. Login or Subscribe
Leonard V.
3/15
I think the weight of evidence seems to be against clutch hitting as a skill, but this analysis and many of the others I've seen use very odd metrics to find it. Or, perhaps I'll say that the metrics they use sacrifice alignment with the intuitive concept of clutch hitting for ease of calculation. Win Probability is a great statistic that captures a lot of the idea of what it would mean to add value to a team. Runs Created and its derivatives are great statistics that capture a lot of replicable skill of hitting. Weighting Runs Created by Win Probability is a confusing mess, IMO. It's not clear whether the article means that you take the RC value of a plate appearance and multiply it by the WPA of that appearance, then sum up for all PAs, divide by total WPA, then subtract total RC, or if you somehow increase the value of each PA based on how close to 50% Win Probability the game state was, or if the calculation is even broader like translating WPA into a number of runs then subtracting RC. In any of these cases, though, I don't really understand what's being measured. And purely subtracting two measures that are not at the same scale for hitters of different caliber ensures that only the best hitters can score the best. If you want to measure whether a hitter performs better when the pressure is on (a good definition of "clutch" hitting), you have to filter the statistics against a reasonable definition of clutch hitting _from the hitter's point of view_. 2nd & 3rd, 2 outs in the bottom of the 1st of a 0-0 game may actually have a reasonably high potential for WPA, but while the hitter might feel some pressure to deliver in that spot, the "clutch" factor isn't as strong. Likewise, simply batting at all in a close game in the late innings has a higher potential for WPA than some RBI situations earlier in the game, but most people don't include a tally of fly outs made with one out in the 9th while the team leads by 1 as negative points against someone's clutch ability. Most of the time, when the team is already leading, those at-bats are not really considered key ones. nice to get some insurance, but that's not where people look for clutch hitting. Adding these extra values will tend to wash out any effects of hitting the real clutch situations, making the results far more random. The team's situation within the season also matters - later in the season, closer to but not assured of contention, playing a rival, etc. None of this is to say that a far more detailed analysis would turn anything real up - obviously many efforts have been tried with stuff like late & close RBIs or detailed analyses of Ortiz's career, and human brains love to find patterns in noise. The ratio of pure statistical Runs Created and WPA seems like an interesting measure, not of clutch hitting as it is usually perceived, but of situational hitting - doing the right thing to get your team runs. It doesn't capture that entirely, though - it gets at stuff like swinging for the fences when a homer would do it or shortening up and just trying to get on base when you aren't enough to tie the game alone, but misses other potentially important factors like expanding the zone in a key RBI spot when the hitter behind is you significantly worse, stretching or stealing your singles into doubles in front of low-ISO hitters, but not taking the bat out of the hands of Mike Trout. All the stuff that relies on awareness of not just the game state but the upcoming hitters.
Rob Mains
3/15
Leonard, we broke down the methodology in detail in Part One. Short version: Pete looked at the win expectancy change of each plate appearance and converted the total to runs, so it's apples-to-apples compared to the linear weight runs of the player;s total statistics. And while it's true that players and fans may not perceive the clutch-ness of PAs the same way the numbers do, win expectancy tables have the advantage of being empirically related to wins.
RedsManRick
3/15
Great stuff. Fair to add though that even a .003 probability season is 1 in 333. Even limited to 500 PA seasons, there are roughly 150 such hitter seasons every year, meaning we should expect such an extreme season once every 2 or 3 years. Even those extreme seasons simply aren't that rare given the sample sizes we observe. However, your observation in the final paragraph got me thinking: maybe we're looking at the less interesting question. I'm sure people think they have a working definition of clutch, but I don't know of any person who made a determination of clutch on the basis of rational consideration/analysis. So we do analyses like this based on a definition of clutch that is reasonable enough (and which clutch believers may even sign on to), but which may not actually map to what clutch means in practice. Accordingly, we can establish that this definition of clutch-as-a-skill likely does not exist without doing much to change the conversation. It would be interesting to start with a list of subjectively determined "clutch" players and then try to reverse engingeer how people make the clutch determination in practice. Perhaps it's mostly a function of batting average w/ RISP (or, more likely, H/PA) or RBI production instead more robust measures of production. Perhaps the clutch label is determined almost entirely in the first few years of a players' early career. Perhaps it's based on a few high profile events, or established by 1 great clutch season. Likely all of these play in to it. Broader point, I suspect there's some element of missing the point when we choose to analyze an intuitive judgment like "that player is clutch" based on formalized holistic performance measures like OPS or wOBA and "high leverage" situations. Perhaps the question "What do people mean when they call a player clutch" would be more insightful than proving and re-proving that a formalized conceptualization of clutch is not a skill that players might possess.
Rob Mains
3/15
Redsmanrick, it's a .003 *percent* probability season. The odds of that are one in 33,333. And there have been only about 9,000 500-PA seasons since WWII, those are extreme outliers, Re the definition of clutch, that's why I didn't close the door on BIll James's observations that maybe we don't have the tools yet--the definitions we have may not be the right ones.
Michael Cappucci
3/15
Thank you for this contribution to the clutch hitting literature. Two things strike me about the Quixotic search for clutch hitting in baseball. 1) Implicit in most people's definitions of "clutch" hitting is an added requirement that the player perform better in certain higher-leverage situations ~when I'm watching.~ It's hard to imagine any statistical evidence rebutting Yankees and Red Sox fans' sure knowledge of Derek Jeter's or David Ortiz's clutchness in light of their salient October moments, despite the fact that both experienced epic bouts of post-season ineptitude during their long careers. I think there is a subjective (to the viewer) element inherent in most people's conceptions of clutch hitting that does not lend itself to proof or disproof. 2) I don't think clutch is a thing. It's an adjective we apply to particular outcomes in particular situations that does not exist in the brains (or hearts) of flesh and blood humans. If God (or an evil demon) were to peer into the brains (or hearts) of both clutch and un-clutch baseball players at the moment of truth, I don't think they would look any different, because there is no a priori fact about someone's clutchness that exists before the outcome of the at-bat. This is consistent with the view that clutch = skill + luck.
Rob Mains
3/15
Michael, I at the conference, some sharp Red Sox fans said that after Ortiz's 2003 postseason and the followup seasons, displayed here, in 2004 and 2005, his reputation was set in stone in fans' minds, amplifying your points.
newsense
3/15
Seeing Manny Ramirez, ARod, Frank Thomas, and Barry Bonds at the bottom of the clutch list made me wonder if intentional walks may be skewing some of the results.
Rob Mains
3/16
They could in the sense that they're robbing those players (esp Bonds) of potential increases in win expectancy, but IBBs increase WE as well.
gwatson
3/15
Very interesting. Thanks for doing this work and sharing...Question: You imply age (or experience) as a potential mediator of clutchiness (and note variability across a career as evidence to reject the premise). What other, if any, mediating variables would you like to test, assuming you had the data to do it (player, manager, clubhouse, franchise, league level factors)? Are there any theories you would want to test if you had your dream dataset?
Rob Mains
3/16
I got some great suggestions at the conference and elsewhere. The most actionable have to do with opportunity. I'm interested whether some players have fewer high-win-expectancy-swing plate appearances than others. There are probably some analyses that could be done based on teammates and managers but I have a hard time conceptualizing them. Clutch pitching is an interesting concept that would apply primarily to relievers but I'd be really surprised if there were any significant difference between relievers' overall quality than their quality in leveraged outings.
JohnnyB
3/15
I grew up watching frank Thomas and just by the eye test, I thought he was a terrible clutch hitter. I always thought his stats were very empty. So, if nothing else, this confirms to me that you are on to something.
Rob Mains
3/16
I would hope, Johnnyb, that the takeaway here is that clutch hitting isn't a distinct and replicable skill, not that Frank Thomas was un-clutch and Bert Campaneris was clutch. Thomas, as the link shows, was a GREAT hitter in high-leverage situations. Just a little worse than he was in other situations https://www.baseball-reference.com/players/split.fcgi?id=thomafr04&year=Career&t=b#all_lever
Christopher Robertson
3/16
Just a thought - could you be seeing high contact hitters and/or very speedy players picking up clutch hits because they are putting balls in play against situational infield alignments?
Rob Mains
3/16
Fair point, Christopher, given the like of Campaneris and Womack on the list. However, on Twitter, MGL suggested--I'll work on this and publish results--that hitters like those two have fewer clutch opportunities, so that may mitigate any positive impact.
jnossal
3/16
Time to give this up. Clutch hitting doesn't exist. We've known that for many, many years. Probably even before the statistical revolution there were savvy observers who realized that "money" players were just good overall hitters. Trying to find some sliver of the MLB population that are 2% better in arbitrarily defined situation is a waste of effort.
Rob Mains
3/19
I'm going to have to disagree with you, jnaossal. I am completely on board that given what we know know, clutch hitting doesn't exist. That's been demonstrated for going on a half century. BUT...given the pervasive belief among many that it does exist--the impetus for the project here was support for the idea of clutch hitting by three outstanding broadcasters--and the emergence of new tools, like those that Pete used in our project. I think there is value in further substantiating existing empirical evidence.