Last week, we talked about the effects of sleep (or lack thereof) on a player’s performance, and it was all nice and theoretical, and at the end, I mumbled something about how a brilliant researcher might, in the future, be able to come up with some way to quantify these sorts of things. Welcome to the future. (See what I did there?)

It’s been said that the mark of a good man is that he’s the same man on Saturday night that he is on Sunday morning. Can the same thing be said of a baseball player? Is he the same man on Saturday night during the 7:05 game that he is on Sunday for the 1:05 game? If he has poor sleep habits (or was just out late the night before, it’s going to be harder for him to pull it together for a day game than a night game.)

Day/night splits are nothing new in baseball, but they run into the same problem that all sorts of splits do. Most folks want to look at the triple-slash stats (AVG/OBP/SLG) and draw relevant conclusions from them (This guy is great during day games!). The problem is that the triple-slash stats have all sorts of problems that end with the words "small sample size." If a hitter goes 4-for-5 in a game, we instinctively know that he is not really an .800 hitter. Five plate appearances isn’t a big enough sample size to tell you much of anything about a hitter.

When do sample sizes become big enough that they are "good enough?" A few years ago, a brilliant researcher who oddly named himself after a kitchen utensil wrote an article on this very subject. The idea is fairly simple. If a stat actually tells us something about a player, then it should be fairly reliable. Say that I gave a batter 100 PA and during that time he hit three HR. If the statistic is reliable, in his next 100 PA, I would expect him to hit around five again. It might be two or four, but it won’t be 15. I did this over a bunch of hitters to see how closely one sample of 100 (or 200 or 300 or 500) PA correlated with a second, equal sample using a method known as split-half reliability. What I found was that the triple-slash stats that everyone loves so much are actually not very reliable. At 650 PA, a full season for a regular starter, batting average had a split-half reliability of .586, OBP checked in at .779, and SLG at .762. The industry standard (at least in my industry) in scale metrics for a "good enough" split-half correlation is .70 or above. What it means is that, given a sample of 650 PA for a batter, we have a pretty good idea what his OBP and SLG would be given another sample of 650 PA under similar circumstances.

The practice of using split statistics (home-road, day-night, left-right, April-May) is taking a stat that is already a little unreliable and cutting its underlying sample size in half (or more!)… which makes it an even more unreliable measurement. Cut the sample size to 300 PA, and AVG has a split half of .328, OBP is at .596, and SLG is at .634. Cutting the sample size in half takes something that was already teetering on the edge of respectability and makes it a worse measurement. This leads to people saying silly things like swearing up and down that there’s something about playing in the sunlight that makes Chase Headley hit like Chase Utley. So, given that I just spent a few paragraphs trashing the use of splits, why am I about to use them? Because, unlike some other Robin Hoods, I can speak with an English accent.

It occurred to me that to really investigate whether this day vs. night issue was real, I’d need some stats that could withstand the fact that I was going to do horrible things to their sample sizes, something that is stable enough even at such small sample sizes as 50 PA. Thankfully, a few of them exist… and fortunately for me, they just happened to be the ones that I needed. There are two major stats that I’ll be working with here.

  • Swing percentage – swings / pitches faced
  • Contact percentage – (balls in play + foul balls) / swings

These measures have a few major characteristics that make them perfect for an analysis of the effects of sleep. They show a split-half reliability correlation of .70 or better at around 50 PA. They also describe the batter’s behavior, rather than the result of an interaction between batter, pitcher, the fielders, the wind, the field of play, and a few good or bad hops. If we’re going to see the effects of sleep deprivation, it’s best to get as close to direct measures of player behavior as possible. For example, a lack of sleep impairs the ability to react quickly (and make contact with a pitch). It may also make some folks more jumpy and others more sluggish. So, we have good reliable measurements of things that would be most likely to be affected. Happy, happy! Joy, joy!

I looked at 2009 to find out which players had the biggest splits between day and night on the three stats described above, on the assumption that they had at least 50 PA both during day games and night games over the course of the season). There’s no particular reason that a player would change his approach during the day for any strategic reason related to the time of day. He may want to swing more or less against the particular pitcher on the mound, but he’s just as likely to face that pitcher in a day game as a night game.

As an example, let’s take a look at which players had the greatest change in how often they made contact from day to night.

Better Contact at Night Change Better Contact During Day Change
Darnell McDonald 18% Edgar Gonzalez 13%
Taylor Teagarden 13% Alcides Escobar 11%
Mike Rivera 11% John Baker 10%
Mat Gamel 11% Carlos Gonzalez 10%
Robinson Diaz 10% Travis Snider 9%

In general, there was a pretty even split overall between players who made better contact at night (200) and who made better contact during the day (195). The average player had a change of just about zero (0.08%), and the average magnitude of change was three percentage points. Swing percentage showed similar results.Correlations between day-time contact rate and night-time contact rate across the whole 2009 sample was .81, and for swing rate, it was .86.

Looking at the list, no obvious patterns jump out concerning those players. But we can assume that Darnell McDonald and Taylor Teagarden are nocturnal creatures, while Edgar Gonzalez and Alcides Escobar prefer to fight when they see the sunlight.

The next question is whether these differences are reliable from year to year. That is, if Johnny Damon swung three percent more during the day than at night in 2009, then would we find (roughly) the same split going back a few years? To answer this question, I found the splits for all players who met the 50 PA both at night and during the day in a season from 2006-2009. I found the intra-class correlation (AR(1) covariance matrix, for the initiated) for the change rates for each of these measures over the four years. For those who aren’t familiar, intra-class correlation is kinda like a year-to-year correlation (and you can read it the same way)—it’s just that it incorporates more than two data points.

And here’s where things fall apart. The splits on both measures fell below an ICC of .05. To put that in some perspective, BABIP for pitchers has a better ICC. So, if a player makes better contact during the day than at night in 2007, that’s interesting as a historical truth, but it has almost no predictive power as to what’s going to happen in 2008. Taylor Teagarden and Darnell McDonald are not really vampires. They just happened to have a season in 2009 where they had some weird splits. It was chance.

These findings call into question the use of platoon splits for anything other than as fodder for factoids. If, given stats that are super-reliable (as swing and contact are) that can be linked directly to the variable under study (day vs. night) by logic, there’s no reliably replicable split difference, then what hope is there for much less reliable measures which involve a great deal of luck?

 This isn’t to say that there are no effects of sleep deprivation, or that there aren’t things that directly influence performance during the day vs. performance at night. It’s just that looking at performance splits isn’t the way to do it. Pulling the effects of sleep out, assuming that they are there (and I do believe that they are), is going to need a whole new framework for looking at the issue. Sleep deprivation is not as simple as "sleep bad one night, do bad the next day." Plenty of people have pulled all-nighters before a test and done OK on it, and in general, people can usually fight off one bad night of sleep. But two nights in a row is different. Plus, the effects are cumulative, so the effects might become more pronounced as the season wears on. Most sabermetric research treats plate appearances in September the same as a plate appearance in May. An enterprising researcher might wish to move toward research and statistical tools that can account for this time differential, because… well, that’s how the human body actually works. What I hope is apparent this time is that there’s a good deal of work to be done in understanding the effects of context and situation on a player, but that it will take some new thinking to even begin to think about how to frame the problem, much less solve it. Earlier in this article, I said that things like day/night splits were nothing new in baseball. Maybe that’s the problem.  

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
Great article Russell. I found it informative, but also really enjoy your writing style. Thanks.
The amount of day games the Cubs play has been one theory of why they haven't won any World Series titles recently. Is there a way to get any sample size in which this hypothesis could be tested? Since individuals aren't consistent year to year are teams?
The Cubs haven't won a World Series because... they're the Cubs. Actually, I think that the day games would work in the favor of the Cubs. They can get used to playing day ball. The other teams have to adjust. I don't think that we would have the sample size necessary to look into this hypothesis, but it's an interesting question, no doubt.
It seems to me that the most important takeaway is that swing% and contact% stabilize so quickly. Do you have any plans to use these tools to analyze other areas that suffer from sample-size issues, like platoon splits or 1st half/2nd half performance?