Believe it or not, most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.
Scorecasting: The Hidden Influences Behind How Sports Are Played and Games Are Won is the Freakonomics of sports. It's a mix of journalism and research in which the authors use famous real-life sports moments to illustrate cases of how academic research has shown conventional sports wisdom to be wrong. Like its spiritual predecessor, Scorecasting is a collaboration of an economist, Tobias J. Moskowitz, and a journalist, Sports Illustrated's L. Jon Wertheim.
In seventeen chapters, each on a topic unrelated to the others, the authors take a bit of sports lore and use a research finding to illuminate the “hidden influences” underpinning it.
If you're a frequenter of sports and economics blogs, many of the findings in the book will be familiar to you. The authors cover the Romer study, which showed that risk-averse NFL coaches punt too often on fourth down. There's the Massey-Thaler paper, which shows that high NFL draft picks aren't as valuable as lower picks, because—even though the higher-ranked players are better—it costs too much to sign them. There's the Pope-Schweitzer golf study, which found that golfers were more likely to sink a putt for par than they were the same putt for birdie, because of “loss aversion”—the disincentive of bogeying and losing a stroke was stronger than the incentive of making birdie and gaining a stroke, so the golfers putted more aggressively and successfully.
These findings are interesting and important, and it’s surprising that they’ve received relatively little attention. This book corrects that oversight, and it does it well. The explanations are clear, and the numbers are understandable. Often, in books of this sort, you have to go back and reread a couple of paragraphs to really figure out what the author is talking about. That doesn’t happen here. It’s mostly the result of exceptionally clear writing, but it’s also because the authors take care to stay away from too much in the way of technical explanation.
In addition, there are enough real-life examples to keep it all from getting too abstract. Take, for instance, the chapter on “whistle swallowing.” There, the authors argue that officials are subject to “omission bias,” which means that referees and umpires are reluctant for their actions to have too much influence on the game. That is, it’s the players who should decide the outcome, not the officials. “When the game steps up, the referees step down.”
To illustrate, the authors review the 2008 Super Bowl play in which David Tyree made his famous “helmet catch.” That play should, perhaps, have been called dead before the throw; in replays, quarterback Eli Manning appeared to be firmly in the grasp of several defenders. But referee Mike Carey “swallowed his whistle,” allowing Manning to make a miraculous escape. For that, Carey was widely praised. In fact, the NFL was so pleased with the non-call that they subsequently allowed Carey to be interviewed about the play (including by the authors of this book).
What happens when the referee doesn’t “step down?” In a clutch situation in the 2009 US Open, lineswoman Shino Tsurubuchi called a rare foot fault on Serena Williams. The foot fault cost Williams one point; her tirade against Tsurubuchi cost her a second point, and the match. Although replays showed she was right, Tsurubuchi was so severely criticized for making the call—the correct call—that tennis officials initially refused even to confirm her identity.
After the anecdotes, the authors turn to the numbers. What is the actual evidence that referees try to minimize their influence on the game outcome, even at the expense of getting the call right?
The authors looked at ball-strike calls in baseball. Since every pitch in MLB is tracked by the PITCHf/x system, every call by the plate umpire can be examined for accuracy. It turns out that on 0-2 counts, the umpires err on the side of the batter. The strike zone shrinks substantially as the umpire “wants” to call a ball—over 40 percent of true strikes are called balls instead. On 3-0 counts, the same thing happens, but in reverse—in those cases, the strike zone gets bigger, and 20 percent of “true” balls are actually called strikes. (The authors report an overall error rate of 14.4 percent.)
So, indeed, it looks like umpires are trying to avoid being the ones “creating” a strikeout or walk. They want the players to be the ones who determine the outcome, either by swinging the bat or throwing a pitch that’s a very obvious ball or strike.
That was Chapter One, and I found it fairly convincing. Some of the authors' other findings, though, seem considerably less certain—especially their biggest topic, home field advantage (HFA), which is the only topic that covers two chapters.
What causes HFA? In my view, it’s one of the biggest unsolved problems in sports analysis. Researchers have looked at many of the obvious suspects—travel, crowd support, park familiarity, and so on—with varying degrees of success and uncertainty. A couple of weeks ago, I would have said that we really have no idea where home field advantage comes from. But in the book, Moskowitz and Wertheim say they’ve figured it out.
Their answer: referee bias. Home field advantage, they say, is almost entirely the result of umpires and refs unconsciously favoring the home team. Why do they think that? They present a mass of little findings that seem to add up, across multiple sports. For instance:
In soccer, the referee has discretion to decide how much extra time to add to the end of the game to compensate for unusual stoppages in play. The authors looked at 750 Spanish soccer matches. They found that when the home team was behind by a goal and would benefit from more extra time to tie the game, the referee gave them four minutes. But when the visiting team was behind by a goal, it received only two minutes.
In baseball, they found that the more crucial the situation to the outcome of the game, the more umpires favored the home team in how they called pitches. In high-leverage situations, the home team’s pitchers received many more called strikes than the visiting team’s pitchers. But in low-leverage situations, the pitch calls actually favored the visiting team. It’s as if the umpires “wanted” to keep the numbers even, but favored the home team by letting the calls go their way when it counted the most.
In the NFL, when visiting team coaches issue challenges, they’re upheld more often than home team coaches’ challenges. This suggests that referees must initially be making more wrong calls favoring the home team.
In hockey, referees call more penalties on visiting teams than home teams. If you figure out the number of extra power play goals that should theoretically result, it almost completely accounts for overall goal differential between home and visitors.
In NBA basketball, calls not subject to the referees’ discretion (like shot-clock violations) are about equal for home and road teams. But subjective fouls are called more against visiting teams. Traveling, for instance, is called 15 percent more often against the visiting team than against the home side.
At this point, I decided to check a couple of things out.
First, the hockey result. I checked team home/road stats at the NHL website, and what I found was different than what the authors suggest. It turns out that home teams outscore the visitors in even-strength situations almost as much as on power plays.
For 2009-10, the average home team outscored the visiting team by 14 percent at even strength, 121 goals to 106. With the power play, the home team outscored the visitors 30-25, or by 20 percent. In terms of raw goals, 75 percent of HFA (15 goals out of 20) comes in even-strength situations. That empirically contradicts the book’s assertion that it’s all penalty calls.
What's more, if you add one power-play goal to the visiting team total, bringing the tally to 30-26, the power-play home advantage drops to 15 percent, virtually identical to the even-strength figure. An oversimplified guess, then, might be that biased refereeing hurts the visiting team by about one goal per team-season. That’s not very much.
Second, I tried to verify the baseball claim that the home-field advantage is actually backwards in low-leverage situations. To check, I looked at all MLB games from 1954 to 2007 (using Retrosheet game logs) and figured out the home/road run differential in each inning. If the Scorecasting argument is correct, you’d expect more HFA in the late innings (where clutch situations tend to cluster) than in early innings. But the actual figures show the opposite trend. The biggest HFA came in the first inning, when the home team outscored the visitors by 18 percent, 61,872 to 52,071. Here’s the full chart. (I didn’t include the ninth inning, since it isn’t always completed.)
Trying to better isolate low-leverage situations, I ran the same test for innings which began with one team holding a four-run lead. Those should all be heavily weighted to low-leverage situations and favor the visiting team. But that turns out not to be the case:
Finally, I looked at innings where the visiting team was ahead by four runs or more. Those situations should show a huge run advantage for the visitors, right? First, because with a four run lead, they’re probably the better team, and, second, because they’re low-leverage situations. Here, we do find a case or two where the visitors win the inning:
But in five out of the seven cases, as well as overall, the advantage still favors the home team—even though the home teams are likely significantly weaker overall, having fallen four runs behind in their home park.
Moreover, there’s a plausible explanation for at least one of the two exceptions, the second inning. Having scored at least four runs in the first inning, the visiting team is probably starting the next inning at or near the top of the batting order. The home team, on the other hand, is in the middle or bottom of the order. You can see why the visitors would score more runs when that happens. So, if we found a HFA in almost every situation, how was it that the authors found so many more called strikes for visiting pitchers in low-leverage situations? I don’t know, but there are some logical possibilities. Here’s one. Perhaps teams behave differently in clutch situations that occur in the bottom of the ninth. Suppose the home team needs only one run to win. With two outs and a runner on third, the visiting team would be much more willing to walk the batter. That means the pitcher would be more likely to nibble at the corners, which means that more balls would be called.
Is that really happening? I don’t know, but it’s possible.
Are there similar explanations for some of the book’s other HFA findings? Probably. If teams simply play worse when they’re on the road, isn’t it possible that visiting teams actually do commit more traveling offenses than home teams in basketball? If they wind up getting beaten to the puck more in hockey, wouldn’t it follow they’d have to take more penalties to avoid giving the home side more scoring chances? And, in football, couldn’t it just be that visiting coaches are more likely to issue challenges because they’re behind more often? I would think that you’d need to check all of those things before drawing any conclusions about how much of HFA really stems from refereeing.
What the authors have done, I think, is to look at only one side of the argument, throwing every plausible or suggestive piece of evidence on the pile. It’s an impressive array, but you have to look a little more closely and consider whether there are other explanations, or whether other findings are consistent. That’s something the authors don’t do often enough.
There are more than a few other cases where I think the authors got it wrong. Sometimes it’s because the study they’re quoting is flawed. For instance, they’ve got a chapter on a study that says that batters hitting below .300 late in the season wind up hitting over .400 in their last at-bat because they’re motivated by their personal goal. It turns out that that isn’t true: after you adjust for selective sampling (players who reach .300 tend to sit out thereafter, which means the causation goes the other way: a hit causes the at-bat to be the last one, rather than the other way around).
At other times, the authors' arguments aren’t well thought out. At one point, they comment that part of the reason baseball teams tend to repeat as champions is because of the structure of the playoffs: “Teams play a best-of-five game series followed by a best-of-seven League Championship Series followed by a best-of-seven World Series…the sample size is large enough that the best team ought to win the series…”
That’s not true at all. In fact, it’s almost a truism that it’s false, hence Billy Beane's oft-repeated lament, “My shit doesn’t work in the playoffs.” Various sabermetricians and mathematicians have worked out the odds of the best team winning the World Series, and they’re surprisingly small.
One last example: in the chapter on competitive balance, the authors note that Major League Baseball has many more repeat playoff teams than the NFL does. That, they say, is because baseball has a 162-game season, while football's schedule contains only 16 games. That’s part of the reason, but it’s heavily mitigated by a factor that goes the other way— the structure of the games themselves. In baseball, the weaker team always has a good, fighting chance to beat the stronger team—as opposed to the NFL, where good teams crush bad teams, and single-game odds of 8:1 and longer are not uncommon. That means that in baseball, 16 games doesn’t even come close to determining who the best teams are. For instance, if the 2009 baseball season had ended on April 22, when teams had played between 15 and 18 games, the Royals and Marlins would have won their respective divisions.
The theory of how to draw proper comparisons between disparate sports is one that’s been around for a while, although you have to know where to find it (I’d recommend Tom Tango’s analysis). Part of the problem, I think, is that the authors restricted themselves to the scholarly literature. Their extensive bibliography, which takes up nine pages, includes absolutely no mainstream sabermetric studies whatsoever. Tom Tango appears once in the book (as the creator of “leverage”), Brian Burke appears once (as the compiler of the raw NFL overtime data the authors used), and Bill James appears not at all.
I’m of two minds about the book. On the one hand, the field of sabermetrics really needs a book like this, one that explains how sports researchers think and how they know what they know. On the other hand, I don’t like how the authors treat some of their results as conclusive, when they’re really works in progress—especially their home field advantage hypothesis, where the data are suggestive at best and contradictory at worst.
When Moneyball came out, it didn’t take long for the importance of on-base percentage to become part of mainstream conventional wisdom. It would be great if some of the findings in the book did the same—the debunking of the “hot hand,” for instance, or “icing the kicker.” However, I’d hate for “home field advantage is caused by biased referees” to do the same—because that’s a huge claim, and I don’t think it’s true. Ideally, the authors would have consulted some of the practicing sabermetricians in the various sports—the Prospectus writers, Tom Tango, Brian Burke, Gabriel Desjardins, and so forth—who would undoubtedly have pointed out some issues and advised the authors to temper some of their conclusions.
It's possible that having to qualify some of the results would make for a less popular book. In any case, Moskowitz and Wertheim are outstanding at getting their ideas across effortlessly. With a little more collaboration from others who study this stuff, this could have easily been the best popular sabermetrics book since Bill James. As it stands, it’s still recommended reading, but I wish it came with a warning to take some of its conclusions with a grain of salt.