February 24, 2005
More on the Lineup
Last week's column about lineup order optimization generated a greater response than I anticipated, especially for one with such loose conclusions, so I'm going to dig a little deeper into the topic. So far, the only application of the lineup program has been checking various basic ideas--like sorting by descending or ascending AVG, OBP, and SLG and bunching of the better hitters--but there's a bit more that can be done before adding some more enhancements to the program to see if we can attempt to adjust for baserunning, steals, and platoons.
One of the more interesting questions left unanswered last week was just how important sorting by OBP or SLG is. By using two lineups for each metric--one in ascending order and one in descending order--it was clear that players with higher OBP and SLG should be near the top of the order. Sorting by absolutely the wrong way only changed the lineup output by 26 runs at the OBP mean and 13 at the SLG mean. Considering the sample size and the standard deviations, the results were close to statistically significant, but the confidence was not high. Thus, we could only loosely conclude that OBP is more important than SLG when determining a lineup order when all other factors are equal.
What was not addressed was the fact that teams often have to make the choice between the two. It's easy to choose to bat a player with a .260/.330/.500 line earlier than a player with a .260/.330/.400 line, but things become a little muddled with comparing something like .260/.310/.500 to .260/.360/.380.
To begin to take a look at that question, I put together a new team, but to keep things simple, this team only has three players. First up is Wily Mo Pena, the resident high-SLG, low-OBP sample point with a 2004 line of .259/.316/.527. Pena is the only player last year who slugged at least .500 with an OBP of lower than .320 in at least 300 PAs. Congratulations, Wily. Next is Luis Castillo, selected for his .291/.373/.348 performance last year. Castillo's OBP outpaced his SLG by one of the largest differences in the league, thus making him the perfect candidate for the high-OBP, low-SLG slot. Finally, we'll plug the last hole with Morgan Ensberg who comes in at an impressively league average .275/.330/.411. Though he is a little shy in the power department, Ensberg makes a nice "this porridge is just right" player between Pena and Castillo.
Each of these players was given three spots in the lineup and then all possible lineup combinations of these three players were run through the program (which runs each lineup through 1,000 seasons), giving us a sample size of well over a million seasons by the time things are all finished. The program outputs a minimum, mean, and maximum for each lineup. I also outputted the full results for the first 50 lineups to check standard deviations, all of which were between 39 and 41 runs. Of all the lineups, the highest mean runs scored was 834; the lowest mean was 816. Despite testing every possible combination with these three players, the range of means over the entire sample was 18 runs. There's just not that much difference.
Still, 18 extra runs can be hard to come by when shopping for players, so it's still worth looking into a little more deeply. For each player, I've averaged how many runs the team scored when they were in a given lineup spot. Here's what we've got:
While the range above is very small, the sample size of data is large enough to draw a few conclusions from the data. First, notice how Pena and Castillo are extremely divergent in the #1, #3, and #4 spots in the order, but are almost equal in the #2 and #5 spots. Having a high-OBP player in the top spot maximizes run scoring, but the advantage of OBP is quickly lost to SLG, perhaps as early as the second spot in the lineup. On-base percentage comes back with a vengeance in the bottom four spots. Ensberg--the average player data point--appears to outpace both high-OBP and high-SLG towards the bottom of the lineup, but I wonder how much of that is simply the fact that he's not as good of a hitter as the other two; the apparent run scoring when he's at the bottom of the order may simply be a result of Pena and Castillo getting more plate appearances when he's at the top of the order.
Looking at the best and worst performing lineups confirms a little of this. Here are the three lineups that mustered the maximum 834 run mean:
Pos Lineup 1 Lineup 2 Lineup 3 ------------------------------------ #1: Castillo Castillo Castillo #2: Castillo Castillo Castillo #3: Pena Pena Pena #4: Pena Pena Pena #5: Pena Pena Pena #6: Castillo Ensberg Ensberg #7: Ensberg Castillo Ensberg #8: Ensberg Ensberg Ensberg #9: Ensberg Ensberg CastilloAnd the two that notched the minimum 816:
Pos Lineup 4 Lineup 5 --- -------- -------- #1: Pena Ensberg #2: Castillo Pena #3: Castillo Castillo #4: Castillo Castillo #5: Ensberg Ensberg #6: Ensberg Ensberg #7: Ensberg Pena #8: Pena Pena #9: Pena CastilloFrom this small sample, Pena's power in the fifth spot looks to slightly outweigh his value in the second spot. Castillo still finds his way towards the top of the lineup in the first of the worst lineups, but the biggest difference between the worst and best lineups is the presence of a couple Wily Mo's at the bottom of the order. One other interesting point to note is that Lineup 3 and Lineup 4 both have the same bunching, they just happen to start at different parts. In this example, bunching of high-SLG or high-OBP hitters does not appear to have a significant effect on run scoring.
It's rare for a team to have three Penas or Castillos, so in another effort to see where their particular talents are best suited, I ran through nine lineups: eight average players and Pena or Castillo batting in all nine positions in the lineup. Here's how they shook out:
Pena shows a great deal more range in his results than Castillo, peaking out in the three and four spots, as expected from the previous results. Interestingly, this result appears even without the typical poor hitters at the bottom of a lineup. Most of the criticism of putting a slugger towards the top of the lineup centers around the reduced number of baserunners on base in front of a slugger, but the results here seem to indicate that the advantage is something else, perhaps the right combination of leading off the first inning with a better OBP, but still getting the slugger the maximum number of plate appearances. While Castillo's top production is in the first spot, he shows far less change as he moves down the lineup.
So where does this leave us? Remember that we're dealing with a very small range of possible outcomes, meaning that much of the data being drawn from these results cannot be considered statistically significant. That said, when teams have a choice between a high-SLG, low-OBP player like Pena and a high-OBP, low-SLG player like Castillo, the traditional lineup structure with Castillo towards the top and Pena in the 3-5 spots yields near maximum run scoring. Though it may be ideal to bat baseball's best hitters--those who are among the league leaders in both OBP and SLG--towards the top of the lineup, teams that are forced to choose between high OBP and SLG appear to already be following a near-optimal model for maximizing run scoring.