keyboard_arrow_uptop

In any one baseball game, there are 50 players who are eligible to play. Which of them is the most important? On any single play, there can be up to 13 players who can directly impact the outcome (the nine fielders, the batter, and potentially, three runners). Which one of them will have the biggest effect on what happens? Even if we zoom in on the batter and pitcher (because the answer is probably going to be one of them), should we worry more about what the batter brings to the at bat or the pitcher?

About five years ago (and two kids ago), I started a project to try to revamp how we give out credit for things in baseball. We have plenty of research on a lot of things in the game, but when it comes to stats like win probability added or even some linear weights based systems, we just sort of assume that just about everything that happens in a baseball game is the fault (or credit) of the pitcher and batter. A pitcher might induce a double play grounder directly to the shortstop, but then watch in horror as he realizes that his WPA for the play will suffer because the shortstop decided to try to turn a 6-9-4-3 double play. Although that leaping catch that the second baseman made to snare that line drive that was headed toward the gap. Totally the pitcher. Totally.

The problem is that I really haven’t done anything with it in… five years. So let’s pick that up again and take a look at what can happen when you look a little deeper into who is responsible for what in a baseball game.

Warning! Gory Mathematical Details Ahead!
Originally, I wanted to do this project entirely with Retrosheet data, which has the benefit of being freely available and is complete or nearly complete for seasons back to World War II. What I’m about to present will eventually be superseded by Statcast data once a big enough data set is amassed, and it could probably be done better with some of the advanced fielding metric datasets out there if they were public. Some of the stuff around walks and strikeouts can be done with PITCHf/x data, but for now, we’ll stick with Retrosheet.

I used a methodology similar to the one from five years ago. For 2010-2014, I coded all terminal batting events (they ended the plate appearance) as either being a strikeout or not a strikeout. I then took the seasonal strikeout percentage (K / PA) for the batter in each plate appearance, the pitcher, the league in general, and also the catcher. (As a check, I ran a version of this with the right fielder, who we can be pretty sure has no significant effect on the strike zone. Sure enough, he didn’t.) I converted all of these percentages into logged odds ratios, one for each actor (pitcher, hitter, catcher, league).

I created a binary logistic regression in which all four of these terms tried to predict whether or not a plate appearance would end up in a strikeout. It may seem a little strange, but here’s the logic behind it. Suppose that batters had absolutely no control whatsoever over whether a plate appearance ended in a strikeout. It was all the pitcher. Then, we would see that hitters would tend to strike out more when they faced pitchers who were good at striking hitters out. Any differences between hitters in their K rates would be a function of the pitchers they faced and random variation. If hitters were completely in control, they would strike out at a similar rate no matter who was on the mound. If neither had anything to do with it, we would find that the league rate would be the best predictor of whether an individual plate appearance ended with the letter K.

Of course, the answer is somewhere in the middle and it’s tempting to say “just call it 50/50,” but we can do better than that. Instead, we can statistically look at the amount of variance that each predictor picks up. For the initiated, I did this by looking at the -2 log likelihood statistic. I only looked at the variance contributed by the actor factors (batter, pitcher, catcher, league) to get an idea of how much they pick up variance relative to each other, rather than the actual overall variance which includes a good chunk of just randomness in general.

As you might imagine, I did the same for walks, HBP, grounders, line drive, and fly balls, again, using 2010-2014 Retrosheet data.

Based on that data set, we can portion out credit or blame for each event at these rates.

Event

Batter

Pitcher

Catcher

League

Strikeout

63.5%

35.3%

0.1%

1.1%

Walk

64.2%

33.9%

1.0%

0.8%

HBP

65.7%

32.3%

1.7%

0.3%

Grounder

60.6%

39.2%

0.1%

0.04%

Line Drive

54.5%

32.2%

1.4%

11.8%

Fly Ball

42.6%

51.2%

6.2%

We notice a pretty consistent trend here. The outcome of one specific plate appearance is much more in the control (nearly a 2-to-1 ratio) of the batter, with the exception of fly balls. Pitchers actually have more control over whether a ball will end up high in the air. We see that catchers account for a small part of the variance, although still significant.

But let’s look at what this means, again looking at strikeouts. We know that while a pitcher is only responsible for 35.3 percent of an individual strikeout, if he’s the starter, he will pitch to 25-30 batters in a game, while a hitter will only take four or five plate appearances. Then again, the starter only goes once every five days while the hitter might play all give of those days. So, who actually ends up being responsible for more strikeouts over the course of a year? What follows is just last year’s strikeout leaderboard, weighted for the amount that each actor is responsible in the equation. For example, David Price led the majors with 271 strikeouts. His 95.7 value below is simply 271 multiplied by 35.3 percent.

Pitchers

Batters

Catchers

David Price – 95.7

Ryan Howard – 120.6

Yan Gomes – 1.07

Corey Kluber – 95.0

Marlon Byrd – 117.5

Salvador Perez – 1.04

Max Scherzer – 89.0

Mike Trout – 116.8

Miguel Montero – 1.03

Felix Hernandez – 87.5

Ian Desmond – 116.1

Mike Zunino – 1.03

Johnny Cueto – 85.4

Chris Carter – 115.6

Jonathan Lucroy – 1.00

This is what order of magnitude we might expect from the top of the chart. Those catcher numbers might look low given what we’ve learned about catcher framing (although the names look right!), and maybe the regression is under-selling the catchers a bit. Catchers get about a 1 percent credit on walks, rather than a .1 percent credit on strikeouts, so maybe the catchers should have 10 strikeout credits to their names, but it’s certainly well below what pitchers hitters add to the equation. We have to remember that while catchers do vary in their abilities to frame pitches, a strikeout requires that they have a pitcher who can throw the ball near the strike zone to begin with. If a pitcher is throwing two feet outside, even Brad Ausmus himself can’t help.

Framing answers a somewhat different question than we’re asking here. In the anatomy of a strikeout, it’s more important to have a guy on the mound who can fill up the strike zone than a guy who can steal a few strikes around the edges. Here we see that the ability to get it near the strike zone to begin with is much more important. Consider that the difference in extra strikes per game generated by a fantastic framer like Hank Conger and an awful one like Tom Telis was about 7 strikes, using 2014 numbers. In 2014, the difference between a guy who fills up the zone (Phil Hughes, 56.4 percent of his pitches are in the zone) and a guy who seems to avoid the zone (Francisco Liriano, 35.0 percent in the zone) would be 21 potential called strikes over the course of a standard 100 pitch outing. Even accounting for the fact that some of those were swung at, you also have to account for the fact that the difference between a strong swinging strike artist (Kershaw at 14.1 percent of his pitches) and a weak one (David Phelps at 5.4 percent) represents another source of talent spread that will certainly lead to variations in K rate.

But what we see is that even weighted for the number of times that they come to the plate, hitters are still somewhat more important to a team over the course of a season. There’s a strange duality though that baseball has implicitly long-recognized. Within a single game, the starting pitcher is going to be the most important member of his team. It’s why individual games are often sub-titled by their starting pitching matchup. The only time they really bother to publish the lineup in advance is the All-Star Game, mostly because it’s the All-Star Game.

Important to Whom?
There’s long been a debate about the split between batter and pitcher in terms of their relative importance to a baseball team. Should a team invest heavily in pitching? (Pitching wins championships!) These findings shed some light on that topic. People watch baseball one game at a time—the casual fans only tune in for the really important games—and at the game level, the day’s starting pitcher is the most important player on his team. The problem is that you can’t generalize that statement to a broader context, and I think that mistake in logic is what begets the meme of pitching winning championships. Over time, because a starter only goes once every five days, his contributions are less than those of a good hitter over the course of a season, and it’s a team’s performance over the course of a season that qualifies them for the playoffs, where championships are won. It’s not that pitchers aren’t important, it’s just that batters are somewhat more so.

But wait a minute, in the playoffs themselves, where one game means a lot more and occasionally, it means everything, then suddenly, the pitcher is much more important again. Over the long haul, which only the baseball junkies pay attention to, you’re better off betting on a team with a great offense. When everyone’s paying attention in Game 7, you’re better off looking at that day’s starting pitcher. Pitching wins games. Offense wins seasons. Offense gets you to the playoff series. An amazing starter wins you Game 7. That’s an over-simplification since we’re talking about a variance composition that’s not terribly out of balance, but is still somewhat more tilted toward the hitters. But our more complex mantra is instructive and it’s more accurate than repeating “Pitching wins championships” which basically starts from the mistaken belief that Game 7 is all that matters. Sure, if you get there, it is all that matters, but you gotta get there first. And the dirty little secret is that to get there, it’s actually your offense that is more likely to carry you there.

So, who’s the most important player on a team? I guess it depends on what level of zoom you have the microscope set to.