Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52

There is no such thing as the “times through the order” penalty.

Warning! Gory Mathematical Details Ahead!

Okay, fine. Let’s start by acknowledging that starting pitchers, in general, perform more poorly than we would otherwise expect on their third trip through the lineup than they do in their first or second trips through the opposing nine. So in that sense, yes there is a “times through the order penalty.” (TTOP from here on out.) The problem is that the way TTOP is commonly understood and what seems to actually be causing this effect are very different and the current understanding actually leads to some counterproductive recommendations.

TTOP is most commonly conceptualized as an exposure or practice effect. The kernel of truth that we are supposedly exposing is that in my first trip to the plate against Smith, I learn a thing or two about his pitches or his patterns and I take that knowledge, whether consciously or unconsciously up to bat with me the second time I see him. In that second at-bat, I gain even more knowledge, and in the third at-bat, I’m even more prepared. On top of that, the starter can only pull that “throw him one inside to back him off, paint the outside corner and see if he’ll slap one off the end of his bat” thing once before I catch on. Does he have more tricks up his sleeve?

Let’s talk about the other thing that the starter has done since the last time I saw him. He’s thrown 30 or 35 more pitches.

About a year ago, I took a look at this very issue and found evidence that pitchers did perform worse as the game wore on, but that the explanation for why was rather muddled. For some outcomes, pitch count seemed to drive the finding. For some, the TTOP seemed to reign supreme. It ended up a mess.

This time, I tried something a little different. If we’re going to see a TTOP that is drastic, the place to look for it is as the lineup turns over. I isolated all cases in which a pitcher was facing the ninth batter in the lineup for the second time and then the first batter in the lineup for the third time. To make things fair, neither hitter was allowed to be the pitcher (this essentially limited the sample to games in AL parks), and the hitters needed to be faced in the same inning. Now, because the leadoff hitter is usually a better hitter, we need to control for that. I created a control variable for all outcomes using the log odds ratio method, which controls for the skills of the batter, as well as that of the pitcher. I also controlled for whether or not the pitcher had the platoon advantage in either case.

I entered both the pitcher’s pitch count at the beginning of the at-bat and a dummy code of whether this was the ninth batter being seen for his second check-up or the leadoff guy being seen for his third. If the simple fact of it being the third time around is what really drives this finding, then we should see a big jump in positive outcomes for the leadoff hitters, again, relative to our expectations, which we have controlled for, and it will show up in this TTOP variable.

The results? TTOP really was the stronger predictor, compared to pitch count… of the pitcher having an easier time. I looked at strikeouts, walks, singles, home runs, outs in play, OBP in general, and a few others. In just about every case, the TTOP variable came up in the direction of the pitcher getting a times through the order bonus. Pitch count generally came out non-significant in these analyses.

I replicated the same analyses looking at what happens when the pitcher faces the ninth hitter the first time, followed by the leadoff hitter the second time, and found the same basic pattern. Now that could just be some weird quirk of how pitchers approach the bottom vs. the top of the lineup. Maybe the pitcher was easing off a bit because it was the ninth hitter and saving all of his good stuff for the leadoff hitter.

So, I expanded things a little bit. This time, I took a data set of plate appearances 10 (the second time that the leadoff hitter came to bat) through 27 (the third time that the ninth hitter came up). Again, we control for hitter and pitcher matchup, handedness advantage, and look at both pitch raw pitch count at the beginning of the at-bat, and whether the at-bat belonged to the second-time-through-the-order group or the third-time-through-the-order group. (Needless to say, I relaxed the requirement that all of the at-bats take place in the same inning.) To ensure I wasn’t overly biasing my sample, I made sure that the pitcher faced at least 27 hitters during his start.

The results here were a little more interesting. In my previous work, I found that as the game wore on, a tired pitcher was less likely to strike a batter out, but also less likely to issue a walk, leading to more contact events, although the pattern of which variable (pitch count or times through the order) was the better predictor was harder to suss out. Here, we find a little bit more clarity. Again, we see the same basic pattern. Strikeouts and walks went down as the game wore on, and there were more contact events, although they tended to be more hits than outs on balls in play.

On some important variables (strikeouts, walks, singles, groundballs, and flyballs), pitch count was the stronger predictor and was significant, often with the second vs. third time through the order variable reduced to non-significance (and when it was significant, it was pointing toward a pitcher times through the order “bonus”). Neither was significant in predicting extra-base hits or home runs, relative to expectations.

When I relaxed the requirement that the starter had faced 27 batters (thus completing a full three trips around the batting order, but that restriction considers only cases in which the pitcher was doing well enough to make it all the way through), I got similar results, although slightly more muddled (the pitch count variable still tended to be the stronger predictor, though sometimes it dropped below significance).

Here is where we introduce a statistical concept that I don’t believe has made its BP debut: mediator analyses.

The idea is simple. A mediator variable explains what’s going on in the relationship between two variables. Back when I was a grad student in psychology, this was a common type of analysis. Back in those days, I was studying the relationship between stressful events and psychological problems in adolescents. We know that adolescents (and people in general) who are under a lot of stress have more psychological problems, but what explains why that’s happening? What’s in the middle there?

One common theory (that has plenty of evidence to back it up) is the hopelessness theory of depression. The theory states that when someone experiences a stressful events (or a series of stressful events), they begin to feel that there is no hope for things to get better, and that leads to symptoms of depression. It’s useful information to know as a clinician, I may not be able to stop the person from experiencing stress, but I can work with them on not jumping to “all hope is lost.” If I know the chain of how things happen, I can intervene at some point on the chain. It isn’t that hopelessness is the only path between stress and depression, but it’s good to know that it is one of the paths.

But more generally, it allows us to take a relationship between two variables and understand a little deeper what’s going on. In this case, I think that pitch counts are the mediating variable between times through the order and bad outcomes (from the pitcher’s perspective) at the plate.

Mathematically, to prove a mediator is present, first you have to establish a relationship between the two variables that you are trying to mediate. In the case of stress and depression, it’s easy to run a regression and find that high levels of stress predict high levels of depression. Step 2 is to establish a relationship between the initial variable (stress in this case) and the proposed mediator (hopelessness). This usually takes the form of another regression. The final step is that you have to show that when you put the mediator into a regression with the initial variable, the mediator should be significant, and should either make the original variable no longer significant or at least make it less significant (there’s a post hoc test for that). If you can establish those conditions, you either have a full mediator (the original variable is reduced to non-significance) or you have a partial mediator (the original variable is still significant, though the effect is less strong).

Here, we have an initial relationship (TTOP leads to more negative outcomes). This has been established when you just look at the raw numbers. The fact that going through the order for the third time is going to increase pitch counts is just a simple matter of truth. And we see that when both TTOP and pitch count are put into the same regression predicting outcomes of the at-bat, pitch count tends to be a significant predictor and knock TTOP out of being significant. That’s a textbook mediator right there. It’s not a perfect mediator, mind you. But it suggests that the times through the order penalty is a simple pitch count effect, once you control for batter quality and handedness.

I don’t think that’s the entire story of why the raw numbers suggest a TTOP. For example, these findings would suggest that managers pay attention not to how many times the batter’s name has been called by the public address announcer, but to how many times his pitcher has thrown the ball. If he’s having a low pitch count day, that whole TTOP thing might not be so much to worry about. But what’s behind that effect? Obviously, fatigue is one piece, but I have a slightly different hypothesis.

We can – without stretching the imagination too far – assume that pitchers vary in the amount of stamina that they have, whether in the ability to physically sustain the effort of throwing 100 pitches or in the number of tricks that they know to find different ways to fool hitters or in their ability to get through 27 hitters and not have it take 150 pitches. What if the problem is that there aren’t a lot of pitchers out there who are actually good for 100 pitches? By the time that pitch counter clicks to 80, you’ve got guys on the mound who probably have probably lost something. Maybe it would be better if they were taken out, but the modern roster has been designed with the assumption that he needs to be able to throw 100 pitches, so he gets left out there. And his awfulness messes up the numbers for everyone else.

When to Take That Long, Slow Walk to the Mound
The times through the order penalty is actually a pitch count penalty. The standard line has been that teams should begin feeling guilty about leaving the starter in once the leadoff hitter comes up the third time. And if you assume that the TTOP penalty is universal and kicks in at that very moment, it makes sense as a strategy. Some have gone as far as suggesting that rosters should be built around TTOP.

But the reality is that it probably has more to do with pitch count, likely intermixed with a much greater variation in how much individual starters can actually give to the cause. There probably are guys who are only good for about 60 pitches and 18 batters. The problem I think is that there are only two roles in the modern pitching staff. There are the guys who throw 100 pitches and guys who throw 20, and there seems to be no allowance for the guy who lands somewhere in the middle. Maybe TTOP is actually a pathology of overly rigid roster construction.

Instead, I’d suggest that managers pay attention to things that they already tend to pay attention to. We have good evidence that pitch count is the real culprit in stealing pitchers’ mojo. It’s entirely possible that some are more susceptible to this theft than others, though all seem to suffer. Instead of relying on the TTOP framework, maybe it’s better to dig deeper into understanding each pitcher and how his body handles fatigue, combined with how much work he’s had to do. This is a decision that has to be made on a case-to-case basis.

Or maybe that misses the point. Maybe defaulting to TTOP is the right decision for the wrong reason. If we leave the decision of taking the pitcher out to the manager and touchy-feely stuff like figuring out of the pitcher is tired, then does he let things like the fact that his pitcher swears up and down that he’s okay for the ninth inning to sway him? Perhaps using the third time through the order as a marker of when to take the pitcher out isn’t the optimal strategy, but is it a better strategy than what’s currently being used? Taking the timing of the decision away from the manager might mean pulling a pitcher too early, but perhaps too early is better than too late, especially if the team can structure a bullpen that picks up the slack. Managers are human and they can fall prey to the optimism bias that says “Oh, he’ll be good for another couple of batters…” and in that time, give up the game.

The reality though is that a more nuanced understanding could produce even better results. Whether we have the technology to be rational about what that more nuanced approach looks like is a different story. So, in a weird way, TTOP might be the best strategy, though not for the reasons that we believe it is.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
“Oh, he’ll be good for another couple of batters…”

I'm disappointed this didn't link to a gif of Grady Little.

Brilliant analysis and a great read. I'm not sure what you mean by the "technology to be rational" though. I'd think a huge problem with implementing such a strategy would be the perfectly rational greed and self-interest of the players involved. The pitchers who make the big bucks are the starting pitchers, and a big part of why they get 10 figures is their win totals. A pitcher who is only good for 80-ish pitchers will probably not get out of the 4th inning almost as often as not. If he's not getting through then he's not even qualifying for the win, assuming the team is even in a position to win.

I remember the Red Sox going with a closer-by-committee and that was a dozen years ago. I think teams are more willing to jettison an under-performing closer or give a new player a chance based on performance, but I don't think the role of Capital C Closer has gone away. I'd expect the resistance of starting pitchers to be a couple orders of magnitude greater to a change like this than relievers.

But I'm probably wrong.
When I say "the technology", I mean that we may or may not know enough to be able to pinpoint when a specific pitcher is "too tired" in some mathematically precise way, at least in-game. I think there's something to be said for a risk management strategy though that says that we know enough to know when that point is still in the future, and if we understand that getting to that point can be disastrous, it's possible to design the team's roster (boost the bullpen) to avoid it happening.

Okay, makes sense now Mr. Carleton; thanks. I'd think some sort of wearable technology would be invaluable in this.
This is a great topic, thanks for continuing to talk about it. I also agree that regardless of the actual reason, the "marker" (good word) of times through the order is easier to sell than the nebulous concept of pitches.

I'd also like to point out the research presented by MGL here:

The idea of a Third-Time *Bonus* going up against number-of-pitches fatigue is important, and highlights that while the hitter maybe learning some things about the pitcher, the pitcher and catcher may be learning about the hitter as well.

I've always hated the absolutism of the Church of TTOP, that a reliever is *always* better than the starter after a certain given point. The math that's been done is great in showing aggregate trends that need to be considered, but they don't necessarily hold up in every game.

Sometimes the starter really is that good (think no-hitters and close calls).

Sometimes the bullpen really is that bad.

Sometimes the bullpen has lately been ridden hard and put away wet, and the best reliever available is, himself, at reduced capacity. Or is a poor match for the opponent. (I've seen starters get knocked around, and replaced by a reliever of a very similar style. It doesn't end well.)

One of the things managers get paid for is hopefully to be able to tell the difference between the aggregate numbers and individual performances, if not on a regular basis, at least in key spots.

Otherwise, we might as well just install a Computer-Aided-Skipper-EfficiencY unit in every dugout.
No one ever says "always". That's a strawman.

The point is that unless you can somehow know, then you have to rely on the averages. If you go with the averages blindly, you'll be right 60% of the time.

If you go with the averages AND your gut, you'll be right 50-65% of the time. The odds are stacked against you.

Completely bogus reasoning, because what actually happens in MLB is that managers make their decisions based on "the averages AND your gut." We are completely lacking in reliable information on what would happen if the gut is taken out of the decision making process, because it NEVER is. Russell is accurate in identifying places where that probably leads to non-optimum decision making. But it's only "probably." We don't know.

BTW, I have seen plenty of people say "always." They generally don't know what they're talking about.
Very interesting and enjoyable article. One thing occurred to me with the TTOP, pitch count, and fatigue issues. I started thinking about the "second wind" concept of "settling into a groove" (give me a minute and I will think of more cliches). I wonder how pitcher performance changes with a big pitch count inning. How does a pitcher that threw 15, 15, 15, and 15 pitches in the first four innings (60 total pitches) perform in the next ten pitches as opposed to when he threw 15, 10, 25, 10 pitches (60 total pitches)? Did he experience more fatigue because of the 25 pitch inning? Does that impact his performance going forward? With the randomness of BABIP for pitchers the high pitch count inning may have just been bad luck as opposed to bad pitching. I wonder if a high standard deviation in pitch count per inning (for completed innings) is a measure of pitcher performance going forward or if just a numeric maximum (yank a pitcher after a 25 pitch inning no matter what rule) is a better indicator?
"There is no TTOP - it's all about the pitch count?" I am extremely skeptical, especially since I found evidence of the complete opposite (by controlling for pitch count) but I'll keep an open mind until I can do some more research. Shouldn't be too hard to find evidence for or against.

I am not crazy about the #9/#1 hitter thing for reasons that Russell mentioned (there could be other things going on, like selective sampling, for example pitchers allowed to face the top of the order tend to be better pitchers).
In fairness, this is a bit of a selective sampling nightmare. When I insisted that the model only look at guys who made it through at least 27 batters, well... that's a sample of guys who were having a good enough day to be worth leaving out there for hitter #27. But take off that limit and you get guys who are pulled after hitter #24 specifically because they just gave up 3 straight hits... perhaps they would have been OK facing #25-27, but we'll never know that...
Russell, what years did your study cover?
Russell wrote: "I isolated all cases in which a pitcher was facing the ninth batter in the lineup for the second time and then the first batter in the lineup for the third time. To make things fair, neither hitter was allowed to be the pitcher (this essentially limited the sample to games in AL parks), and the hitters needed to be faced in the same inning."

I believe you biased the sample quite a bit. Suppose the ninth place hitter makes an out. If it was the third out of the inning, it will be excluded from the sample. Suppose the ninth place hitter reaches base. It will always be included in the sample.

Of course, it's possible that I misunderstood the study because I read the article too quickly.