keyboard_arrow_uptop

There is no such thing as the “times through the order” penalty.

Warning! Gory Mathematical Details Ahead!

Okay, fine. Let’s start by acknowledging that starting pitchers, in general, perform more poorly than we would otherwise expect on their third trip through the lineup than they do in their first or second trips through the opposing nine. So in that sense, yes there is a “times through the order penalty.” (TTOP from here on out.) The problem is that the way TTOP is commonly understood and what seems to actually be causing this effect are very different and the current understanding actually leads to some counterproductive recommendations.

TTOP is most commonly conceptualized as an exposure or practice effect. The kernel of truth that we are supposedly exposing is that in my first trip to the plate against Smith, I learn a thing or two about his pitches or his patterns and I take that knowledge, whether consciously or unconsciously up to bat with me the second time I see him. In that second at-bat, I gain even more knowledge, and in the third at-bat, I’m even more prepared. On top of that, the starter can only pull that “throw him one inside to back him off, paint the outside corner and see if he’ll slap one off the end of his bat” thing once before I catch on. Does he have more tricks up his sleeve?

Let’s talk about the other thing that the starter has done since the last time I saw him. He’s thrown 30 or 35 more pitches.

About a year ago, I took a look at this very issue and found evidence that pitchers did perform worse as the game wore on, but that the explanation for why was rather muddled. For some outcomes, pitch count seemed to drive the finding. For some, the TTOP seemed to reign supreme. It ended up a mess.

This time, I tried something a little different. If we’re going to see a TTOP that is drastic, the place to look for it is as the lineup turns over. I isolated all cases in which a pitcher was facing the ninth batter in the lineup for the second time and then the first batter in the lineup for the third time. To make things fair, neither hitter was allowed to be the pitcher (this essentially limited the sample to games in AL parks), and the hitters needed to be faced in the same inning. Now, because the leadoff hitter is usually a better hitter, we need to control for that. I created a control variable for all outcomes using the log odds ratio method, which controls for the skills of the batter, as well as that of the pitcher. I also controlled for whether or not the pitcher had the platoon advantage in either case.

I entered both the pitcher’s pitch count at the beginning of the at-bat and a dummy code of whether this was the ninth batter being seen for his second check-up or the leadoff guy being seen for his third. If the simple fact of it being the third time around is what really drives this finding, then we should see a big jump in positive outcomes for the leadoff hitters, again, relative to our expectations, which we have controlled for, and it will show up in this TTOP variable.

The results? TTOP really was the stronger predictor, compared to pitch count… of the pitcher having an easier time. I looked at strikeouts, walks, singles, home runs, outs in play, OBP in general, and a few others. In just about every case, the TTOP variable came up in the direction of the pitcher getting a times through the order bonus. Pitch count generally came out non-significant in these analyses.

I replicated the same analyses looking at what happens when the pitcher faces the ninth hitter the first time, followed by the leadoff hitter the second time, and found the same basic pattern. Now that could just be some weird quirk of how pitchers approach the bottom vs. the top of the lineup. Maybe the pitcher was easing off a bit because it was the ninth hitter and saving all of his good stuff for the leadoff hitter.

So, I expanded things a little bit. This time, I took a data set of plate appearances 10 (the second time that the leadoff hitter came to bat) through 27 (the third time that the ninth hitter came up). Again, we control for hitter and pitcher matchup, handedness advantage, and look at both pitch raw pitch count at the beginning of the at-bat, and whether the at-bat belonged to the second-time-through-the-order group or the third-time-through-the-order group. (Needless to say, I relaxed the requirement that all of the at-bats take place in the same inning.) To ensure I wasn’t overly biasing my sample, I made sure that the pitcher faced at least 27 hitters during his start.

The results here were a little more interesting. In my previous work, I found that as the game wore on, a tired pitcher was less likely to strike a batter out, but also less likely to issue a walk, leading to more contact events, although the pattern of which variable (pitch count or times through the order) was the better predictor was harder to suss out. Here, we find a little bit more clarity. Again, we see the same basic pattern. Strikeouts and walks went down as the game wore on, and there were more contact events, although they tended to be more hits than outs on balls in play.

On some important variables (strikeouts, walks, singles, groundballs, and flyballs), pitch count was the stronger predictor and was significant, often with the second vs. third time through the order variable reduced to non-significance (and when it was significant, it was pointing toward a pitcher times through the order “bonus”). Neither was significant in predicting extra-base hits or home runs, relative to expectations.

When I relaxed the requirement that the starter had faced 27 batters (thus completing a full three trips around the batting order, but that restriction considers only cases in which the pitcher was doing well enough to make it all the way through), I got similar results, although slightly more muddled (the pitch count variable still tended to be the stronger predictor, though sometimes it dropped below significance).

Here is where we introduce a statistical concept that I don’t believe has made its BP debut: mediator analyses.

The idea is simple. A mediator variable explains what’s going on in the relationship between two variables. Back when I was a grad student in psychology, this was a common type of analysis. Back in those days, I was studying the relationship between stressful events and psychological problems in adolescents. We know that adolescents (and people in general) who are under a lot of stress have more psychological problems, but what explains why that’s happening? What’s in the middle there?

One common theory (that has plenty of evidence to back it up) is the hopelessness theory of depression. The theory states that when someone experiences a stressful events (or a series of stressful events), they begin to feel that there is no hope for things to get better, and that leads to symptoms of depression. It’s useful information to know as a clinician, I may not be able to stop the person from experiencing stress, but I can work with them on not jumping to “all hope is lost.” If I know the chain of how things happen, I can intervene at some point on the chain. It isn’t that hopelessness is the only path between stress and depression, but it’s good to know that it is one of the paths.

But more generally, it allows us to take a relationship between two variables and understand a little deeper what’s going on. In this case, I think that pitch counts are the mediating variable between times through the order and bad outcomes (from the pitcher’s perspective) at the plate.

Mathematically, to prove a mediator is present, first you have to establish a relationship between the two variables that you are trying to mediate. In the case of stress and depression, it’s easy to run a regression and find that high levels of stress predict high levels of depression. Step 2 is to establish a relationship between the initial variable (stress in this case) and the proposed mediator (hopelessness). This usually takes the form of another regression. The final step is that you have to show that when you put the mediator into a regression with the initial variable, the mediator should be significant, and should either make the original variable no longer significant or at least make it less significant (there’s a post hoc test for that). If you can establish those conditions, you either have a full mediator (the original variable is reduced to non-significance) or you have a partial mediator (the original variable is still significant, though the effect is less strong).

Here, we have an initial relationship (TTOP leads to more negative outcomes). This has been established when you just look at the raw numbers. The fact that going through the order for the third time is going to increase pitch counts is just a simple matter of truth. And we see that when both TTOP and pitch count are put into the same regression predicting outcomes of the at-bat, pitch count tends to be a significant predictor and knock TTOP out of being significant. That’s a textbook mediator right there. It’s not a perfect mediator, mind you. But it suggests that the times through the order penalty is a simple pitch count effect, once you control for batter quality and handedness.

I don’t think that’s the entire story of why the raw numbers suggest a TTOP. For example, these findings would suggest that managers pay attention not to how many times the batter’s name has been called by the public address announcer, but to how many times his pitcher has thrown the ball. If he’s having a low pitch count day, that whole TTOP thing might not be so much to worry about. But what’s behind that effect? Obviously, fatigue is one piece, but I have a slightly different hypothesis.

We can – without stretching the imagination too far – assume that pitchers vary in the amount of stamina that they have, whether in the ability to physically sustain the effort of throwing 100 pitches or in the number of tricks that they know to find different ways to fool hitters or in their ability to get through 27 hitters and not have it take 150 pitches. What if the problem is that there aren’t a lot of pitchers out there who are actually good for 100 pitches? By the time that pitch counter clicks to 80, you’ve got guys on the mound who probably have probably lost something. Maybe it would be better if they were taken out, but the modern roster has been designed with the assumption that he needs to be able to throw 100 pitches, so he gets left out there. And his awfulness messes up the numbers for everyone else.

When to Take That Long, Slow Walk to the Mound
The times through the order penalty is actually a pitch count penalty. The standard line has been that teams should begin feeling guilty about leaving the starter in once the leadoff hitter comes up the third time. And if you assume that the TTOP penalty is universal and kicks in at that very moment, it makes sense as a strategy. Some have gone as far as suggesting that rosters should be built around TTOP.

But the reality is that it probably has more to do with pitch count, likely intermixed with a much greater variation in how much individual starters can actually give to the cause. There probably are guys who are only good for about 60 pitches and 18 batters. The problem I think is that there are only two roles in the modern pitching staff. There are the guys who throw 100 pitches and guys who throw 20, and there seems to be no allowance for the guy who lands somewhere in the middle. Maybe TTOP is actually a pathology of overly rigid roster construction.

Instead, I’d suggest that managers pay attention to things that they already tend to pay attention to. We have good evidence that pitch count is the real culprit in stealing pitchers’ mojo. It’s entirely possible that some are more susceptible to this theft than others, though all seem to suffer. Instead of relying on the TTOP framework, maybe it’s better to dig deeper into understanding each pitcher and how his body handles fatigue, combined with how much work he’s had to do. This is a decision that has to be made on a case-to-case basis.

Or maybe that misses the point. Maybe defaulting to TTOP is the right decision for the wrong reason. If we leave the decision of taking the pitcher out to the manager and touchy-feely stuff like figuring out of the pitcher is tired, then does he let things like the fact that his pitcher swears up and down that he’s okay for the ninth inning to sway him? Perhaps using the third time through the order as a marker of when to take the pitcher out isn’t the optimal strategy, but is it a better strategy than what’s currently being used? Taking the timing of the decision away from the manager might mean pulling a pitcher too early, but perhaps too early is better than too late, especially if the team can structure a bullpen that picks up the slack. Managers are human and they can fall prey to the optimism bias that says “Oh, he’ll be good for another couple of batters…” and in that time, give up the game.

The reality though is that a more nuanced understanding could produce even better results. Whether we have the technology to be rational about what that more nuanced approach looks like is a different story. So, in a weird way, TTOP might be the best strategy, though not for the reasons that we believe it is.