Baseball ProGUESTus: Scorecasting Review

February 21, 2011

Believe it or not, most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.

Phil Birnbaum is the editor of “By the Numbers,” the SABR Statistical Analysis publication. He blogs at sabermetricresearch.blogspot.com, where he has commented on Scorecasting in more detail.

Scorecasting: The Hidden Influences Behind How Sports Are Played and Games Are Won is the Freakonomics of sports. It's a mix of journalism and research in which the authors use famous real-life sports moments to illustrate cases of how academic research has shown conventional sports wisdom to be wrong. Like its spiritual predecessor, Scorecasting is a collaboration of an economist, Tobias J. Moskowitz, and a journalist, Sports Illustrated's L. Jon Wertheim.

In seventeen chapters, each on a topic unrelated to the others, the authors take a bit of sports lore and use a research finding to illuminate the “hidden influences” underpinning it.

If you're a frequenter of sports and economics blogs, many of the findings in the book will be familiar to you. The authors cover the Romer study, which showed that risk-averse NFL coaches punt too often on fourth down. There's the Massey-Thaler paper, which shows that high NFL draft picks aren't as valuable as lower picks, because—even though the higher-ranked players are better—it costs too much to sign them. There's the Pope-Schweitzer golf study, which found that golfers were more likely to sink a putt for par than they were the same putt for birdie, because of “loss aversion”—the disincentive of bogeying and losing a stroke was stronger than the incentive of making birdie and gaining a stroke, so the golfers putted more aggressively and successfully.

These findings are interesting and important, and it’s surprising that they’ve received relatively little attention. This book corrects that oversight, and it does it well. The explanations are clear, and the numbers are understandable. Often, in books of this sort, you have to go back and reread a couple of paragraphs to really figure out what the author is talking about. That doesn’t happen here. It’s mostly the result of exceptionally clear writing, but it’s also because the authors take care to stay away from too much in the way of technical explanation.

In addition, there are enough real-life examples to keep it all from getting too abstract. Take, for instance, the chapter on “whistle swallowing.” There, the authors argue that officials are subject to “omission bias,” which means that referees and umpires are reluctant for their actions to have too much influence on the game. That is, it’s the players who should decide the outcome, not the officials. “When the game steps up, the referees step down.”

To illustrate, the authors review the 2008 Super Bowl play in which David Tyree made his famous “helmet catch.” That play should, perhaps, have been called dead before the throw; in replays, quarterback Eli Manning appeared to be firmly in the grasp of several defenders. But referee Mike Carey “swallowed his whistle,” allowing Manning to make a miraculous escape. For that, Carey was widely praised. In fact, the NFL was so pleased with the non-call that they subsequently allowed Carey to be interviewed about the play (including by the authors of this book).

What happens when the referee doesn’t “step down?” In a clutch situation in the 2009 US Open, lineswoman Shino Tsurubuchi called a rare foot fault on Serena Williams. The foot fault cost Williams one point; her tirade against Tsurubuchi cost her a second point, and the match. Although replays showed she was right, Tsurubuchi was so severely criticized for making the call—the correct call—that tennis officials initially refused even to confirm her identity.

After the anecdotes, the authors turn to the numbers. What is the actual evidence that referees try to minimize their influence on the game outcome, even at the expense of getting the call right?

The authors looked at ball-strike calls in baseball. Since every pitch in MLB is tracked by the PITCHf/x system, every call by the plate umpire can be examined for accuracy. It turns out that on 0-2 counts, the umpires err on the side of the batter. The strike zone shrinks substantially as the umpire “wants” to call a ball—over 40 percent of true strikes are called balls instead. On 3-0 counts, the same thing happens, but in reverse—in those cases, the strike zone gets bigger, and 20 percent of “true” balls are actually called strikes. (The authors report an overall error rate of 14.4 percent.)

So, indeed, it looks like umpires are trying to avoid being the ones “creating” a strikeout or walk. They want the players to be the ones who determine the outcome, either by swinging the bat or throwing a pitch that’s a very obvious ball or strike.

——

That was Chapter One, and I found it fairly convincing. Some of the authors' other findings, though, seem considerably less certain—especially their biggest topic, home field advantage (HFA), which is the only topic that covers two chapters.

What causes HFA? In my view, it’s one of the biggest unsolved problems in sports analysis. Researchers have looked at many of the obvious suspects—travel, crowd support, park familiarity, and so on—with varying degrees of success and uncertainty. A couple of weeks ago, I would have said that we really have no idea where home field advantage comes from. But in the book, Moskowitz and Wertheim say they’ve figured it out.

Their answer: referee bias. Home field advantage, they say, is almost entirely the result of umpires and refs unconsciously favoring the home team. Why do they think that? They present a mass of little findings that seem to add up, across multiple sports. For instance:

In soccer, the referee has discretion to decide how much extra time to add to the end of the game to compensate for unusual stoppages in play. The authors looked at 750 Spanish soccer matches. They found that when the home team was behind by a goal and would benefit from more extra time to tie the game, the referee gave them four minutes. But when the visiting team was behind by a goal, it received only two minutes.

In baseball, they found that the more crucial the situation to the outcome of the game, the more umpires favored the home team in how they called pitches. In high-leverage situations, the home team’s pitchers received many more called strikes than the visiting team’s pitchers. But in low-leverage situations, the pitch calls actually favored the visiting team. It’s as if the umpires “wanted” to keep the numbers even, but favored the home team by letting the calls go their way when it counted the most.

In the NFL, when visiting team coaches issue challenges, they’re upheld more often than home team coaches’ challenges. This suggests that referees must initially be making more wrong calls favoring the home team.

In hockey, referees call more penalties on visiting teams than home teams. If you figure out the number of extra power play goals that should theoretically result, it almost completely accounts for overall goal differential between home and visitors.

In NBA basketball, calls not subject to the referees’ discretion (like shot-clock violations) are about equal for home and road teams. But subjective fouls are called more against visiting teams. Traveling, for instance, is called 15 percent more often against the visiting team than against the home side.

At this point, I decided to check a couple of things out.

First, the hockey result. I checked team home/road stats at the NHL website, and what I found was different than what the authors suggest. It turns out that home teams outscore the visitors in even-strength situations almost as much as on power plays.

For 2009-10, the average home team outscored the visiting team by 14 percent at even strength, 121 goals to 106. With the power play, the home team outscored the visitors 30-25, or by 20 percent. In terms of raw goals, 75 percent of HFA (15 goals out of 20) comes in even-strength situations. That empirically contradicts the book’s assertion that it’s all penalty calls.

What's more, if you add one power-play goal to the visiting team total, bringing the tally to 30-26, the power-play home advantage drops to 15 percent, virtually identical to the even-strength figure. An oversimplified guess, then, might be that biased refereeing hurts the visiting team by about one goal per team-season. That’s not very much.

Second, I tried to verify the baseball claim that the home-field advantage is actually backwards in low-leverage situations. To check, I looked at all MLB games from 1954 to 2007 (using Retrosheet game logs) and figured out the home/road run differential in each inning. If the Scorecasting argument is correct, you’d expect more HFA in the late innings (where clutch situations tend to cluster) than in early innings. But the actual figures show the opposite trend. The biggest HFA came in the first inning, when the home team outscored the visitors by 18 percent, 61,872 to 52,071. Here’s the full chart. (I didn’t include the ninth inning, since it isn’t always completed.)

Inning	Runs	Percent
1	61872-52071	+18
2	46823-42539	+10
3	53590-48188	+11
4	53357-49593	+8
5	53203-48448	+10
6	54401-50603	+8
7	52231-48641	+7
8	50451-47781	+6

Trying to better isolate low-leverage situations, I ran the same test for innings which began with one team holding a four-run lead. Those should all be heavily weighted to low-leverage situations and favor the visiting team. But that turns out not to be the case:

Inning	Runs	Percent
2	2543-2139	+19
3	4583-4176	+10
4	8817-7801	+13
5	10940-10057	+9
6	14371-13279	+8
7	15698-14583	+8
8	16935-16180	+5

Finally, I looked at innings where the visiting team was ahead by four runs or more. Those situations should show a huge run advantage for the visitors, right? First, because with a four run lead, they’re probably the better team, and, second, because they’re low-leverage situations. Here, we do find a case or two where the visitors win the inning:

Inning	Runs	Percent
2	957-1022	-6
3	1974-1799	+10
4	3609-3355	+8
5	4435-4645	-5
6	6269-5705	+10
7	6627-6562	+1
8	7309-7179	+2

But in five out of the seven cases, as well as overall, the advantage still favors the home team—even though the home teams are likely significantly weaker overall, having fallen four runs behind in their home park.

Moreover, there’s a plausible explanation for at least one of the two exceptions, the second inning. Having scored at least four runs in the first inning, the visiting team is probably starting the next inning at or near the top of the batting order. The home team, on the other hand, is in the middle or bottom of the order. You can see why the visitors would score more runs when that happens. So, if we found a HFA in almost every situation, how was it that the authors found so many more called strikes for visiting pitchers in low-leverage situations? I don’t know, but there are some logical possibilities. Here’s one. Perhaps teams behave differently in clutch situations that occur in the bottom of the ninth. Suppose the home team needs only one run to win. With two outs and a runner on third, the visiting team would be much more willing to walk the batter. That means the pitcher would be more likely to nibble at the corners, which means that more balls would be called.

Is that really happening? I don’t know, but it’s possible.

——

Are there similar explanations for some of the book’s other HFA findings? Probably. If teams simply play worse when they’re on the road, isn’t it possible that visiting teams actually do commit more traveling offenses than home teams in basketball? If they wind up getting beaten to the puck more in hockey, wouldn’t it follow they’d have to take more penalties to avoid giving the home side more scoring chances? And, in football, couldn’t it just be that visiting coaches are more likely to issue challenges because they’re behind more often? I would think that you’d need to check all of those things before drawing any conclusions about how much of HFA really stems from refereeing.

What the authors have done, I think, is to look at only one side of the argument, throwing every plausible or suggestive piece of evidence on the pile. It’s an impressive array, but you have to look a little more closely and consider whether there are other explanations, or whether other findings are consistent. That’s something the authors don’t do often enough.

——

There are more than a few other cases where I think the authors got it wrong. Sometimes it’s because the study they’re quoting is flawed. For instance, they’ve got a chapter on a study that says that batters hitting below .300 late in the season wind up hitting over .400 in their last at-bat because they’re motivated by their personal goal. It turns out that that isn’t true: after you adjust for selective sampling (players who reach .300 tend to sit out thereafter, which means the causation goes the other way: a hit causes the at-bat to be the last one, rather than the other way around).

At other times, the authors' arguments aren’t well thought out. At one point, they comment that part of the reason baseball teams tend to repeat as champions is because of the structure of the playoffs: “Teams play a best-of-five game series followed by a best-of-seven League Championship Series followed by a best-of-seven World Series…the sample size is large enough that the best team ought to win the series…”

That’s not true at all. In fact, it’s almost a truism that it’s false, hence Billy Beane's oft-repeated lament, “My shit doesn’t work in the playoffs.” Various sabermetricians and mathematicians have worked out the odds of the best team winning the World Series, and they’re surprisingly small.

One last example: in the chapter on competitive balance, the authors note that Major League Baseball has many more repeat playoff teams than the NFL does. That, they say, is because baseball has a 162-game season, while football's schedule contains only 16 games. That’s part of the reason, but it’s heavily mitigated by a factor that goes the other way— the structure of the games themselves. In baseball, the weaker team always has a good, fighting chance to beat the stronger team—as opposed to the NFL, where good teams crush bad teams, and single-game odds of 8:1 and longer are not uncommon. That means that in baseball, 16 games doesn’t even come close to determining who the best teams are. For instance, if the 2009 baseball season had ended on April 22, when teams had played between 15 and 18 games, the Royals and Marlins would have won their respective divisions.

The theory of how to draw proper comparisons between disparate sports is one that’s been around for a while, although you have to know where to find it (I’d recommend Tom Tango’s analysis). Part of the problem, I think, is that the authors restricted themselves to the scholarly literature. Their extensive bibliography, which takes up nine pages, includes absolutely no mainstream sabermetric studies whatsoever. Tom Tango appears once in the book (as the creator of “leverage”), Brian Burke appears once (as the compiler of the raw NFL overtime data the authors used), and Bill James appears not at all.

——

I’m of two minds about the book. On the one hand, the field of sabermetrics really needs a book like this, one that explains how sports researchers think and how they know what they know. On the other hand, I don’t like how the authors treat some of their results as conclusive, when they’re really works in progress—especially their home field advantage hypothesis, where the data are suggestive at best and contradictory at worst.

When Moneyball came out, it didn’t take long for the importance of on-base percentage to become part of mainstream conventional wisdom. It would be great if some of the findings in the book did the same—the debunking of the “hot hand,” for instance, or “icing the kicker.” However, I’d hate for “home field advantage is caused by biased referees” to do the same—because that’s a huge claim, and I don’t think it’s true. Ideally, the authors would have consulted some of the practicing sabermetricians in the various sports—the Prospectus writers, Tom Tango, Brian Burke, Gabriel Desjardins, and so forth—who would undoubtedly have pointed out some issues and advised the authors to temper some of their conclusions.

It's possible that having to qualify some of the results would make for a less popular book. In any case, Moskowitz and Wertheim are outstanding at getting their ideas across effortlessly. With a little more collaboration from others who study this stuff, this could have easily been the best popular sabermetrics book since Bill James. As it stands, it’s still recommended reading, but I wish it came with a warning to take some of its conclusions with a grain of salt.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Phil Birnbaum

Latest Articles

You need to be logged in to comment. Login or Subscribe

eliyahu

2/21

I believe Matt Swartz looked at HFA a while back on this site and had some interesting findings. In any event, I still find it odd that baseball is the only sport where there is a structural advantage to being home -- getting to bat in the bottom half of the inning -- yet the HFA seems to be least pronounced.

Moreover, from watching NBA games, it would be hard to argue that the referees don't favor certain teams when at home.

Reply to eliyahu

NYYanks826

2/21

In the NHL, after a stoppage in play, the home team gets the last line change, meaning they get to set the matchup to their liking. That may not be as high-profile as batting last in an inning, but it's still technically a structural advantage.

Reply to NYYanks826

Mountainhawk

2/22

They also get to put their stick in last for a faceoff, which is also a small advantage.

Reply to Mountainhawk

nwamser

3/04

Both great points

Reply to nwamser

holgado

2/21

Thank you for this! I've heard and seen a lot of glowing reviews of this book, and bought it on that basis, but my reaction was the same as yours. In fact, I had to put it down after reading the chapter on repeat champions, as it was so poorly reasoned and yet so authoritatively worded. Bad combo. I found some of the chapters regarding non-baseball findings more plausible (esp. the "never punt" one). But then again, I don't know any other sport as well as I know baseball, so now I'm doubting those conclusions, too. I think I'm going to let the dust settle before I finish this thing.

Reply to holgado

ScottBehson

2/21

The freakonomics authors and Malcolm Gladwell are frequently wrong, as well. This book seems to be in the same vein.

Reply to ScottBehson

edwilli

2/21

The New York Times published a study about a month ago that concluded that icing the kicker works very well

Reply to edwilli

mattymatty2000

2/22

Link please?

Reply to mattymatty2000

BrewersTT

2/21

Thanks for the review. As an aside to Phil's discussion, the idea that referees should swallow their whistles and "let the players decide it" late in the game infuriates me. Clearly, this amounts to letting some of the players play without constraints of fairness, and forcing other players to suddenly lose that protection. If there's no such thing as traveling all of a sudden, then the defenders are not being allowed to "decide" it. Oh how this makes me fume.

Reply to BrewersTT

markpadden

2/21

Your single-season sample of NHL goal scoring doesn't "empirically contradict" anything... I can't believe I just read your conclusion that "In terms of raw goals, 75 percent of HFA (15 goals out of 20) comes in even-strength situations. That empirically contradicts the bookâ€™s assertion that itâ€™s all penalty calls."

So a sample of 20 goals in a single season is conclusive? It shows nothing.

Reply to markpadden

TangoTiger1

2/22

A "sample of 20 goals"? He didn't say that. He looked at ALL the goals for an entire season across the league, meaning he looked at over 6000 goals. The "15 out of 20" is the average per team.

Reply to TangoTiger1

markpadden

7/12

"With the power play, the home team outscored the visitors 30-25, or by 20 percent"

Some something happened during one season 30 times instead of 27.5 times, and that's conclusive?

If you genuinely don't understand why this is a negligible sample, I can't really help you.

Reply to markpadden

fgreenagel2

2/22

Nice review. Thanks.

Reply to fgreenagel2

ScottBehson

2/22

As an academic researcher myself (in a completely unrelated field), we are, in most cases correctly, biased towards research that is:
blind-peer reviewed,
published for no monetary compensation, and
given wide free access

and against research that is:
only self-or editorially reviewed
when both the author and reviewer know who each other are
authors are compensated, and
readers pay for content

In baseball analysis/statistics, however, the field is truly defined by excellent work in the latter category and lesser research in the prior one. However, I'm not suprised the authors of this book chose to only include research that fit the first description.

Reply to ScottBehson

philb

2/23

Mitchel Lichtman analyzes another of the authors' claims here:

http://www.insidethebook.com/ee/index.php/site/article/are_the_data_and_results_reported_in_scorecasting_accurate/

Reply to philb

nwamser

3/04

Thanks. Because of this review, I have moved this book from high priority on my amazon wish list to low.

Reply to nwamser

Baseball ProGUESTus: Scorecasting Review

Thank you for reading

Latest Articles

Picking Guys Out of a Lineup 2024 $

Box Score Banter: Dealin’ Dylan Does the Deed B

To Swing and Miss Less is Tough Business $

Do Sophomores Still Slump? $

The Heat Check: Loperfido Looms, Collier Crushing $

Phil Birnbaum

Latest Articles

Picking Guys Out of a Lineup 2024 $

Box Score Banter: Dealin’ Dylan Does the Deed B

To Swing and Miss Less is Tough Business $