The start of a new year is unlike the rest of the year. People call their parents more frequently. The gym gets more crowded. Sales of alcohol, tobacco, and, I’d guess, unhealthy foods go down. Maybe sales of fresh fruit and vegetables go up. These trends are the result of New Year’s resolutions. But they don’t last. After a few days or weeks, people start lighting up, hitting the bottle, eating fast food, not calling mom, and sleeping in rather than going to the gym, just like they always did. New Year’s resolutions fail, because change is hard. Patterns and habits are hard to break.

Sometimes, though, against the odds resolutions work out. We really do drop weight, work harder, get along better. But that’s the exception, not the rule.

There’s an analogy in baseball, I suppose. Players sometimes try to change. Maybe not as much as fans wish that they’d change. After all, Miguel Sano batted .398 when he didn’t strike out last year. If he could just cut his strikeouts down by 100 and hit .200 in those at-bats instead, he’d have hit .281 instead of .236! How hard can that be?

Like going to a 6:00 am spin class or laying off ice cream, those changes are easier said than done. Patterns and habits are as hard to break for baseball players as the rest of us.

One pattern you hear about a lot: Pulling the ball. Per data collected by Baseball Info Solutions, Chris Davis stepped to the plate with shifted infielders 301 times last year. Only nine players (David Ortiz 403, Anthony Rizzo 375, Curtis Granderson 346, Kyle Seager 345, Jay Bruce 328, Freddie Freeman 328, Adrian Gonzalez 325, Brandon Belt 313, and Kole Calhoun 303) were shifted more frequently. And only nine (Granderson 24, Kendrys Morales 21, Ortiz 17, Ryan Howard 16, Victor Martinez, Logan Morrison, and Albert Pujols 15, Brian McCann 13) lost more net base hits to the shift than Davis’ 12.

It’s easy to see why Davis was shifted so frequently and why it was so effective. Here’s his spray chart from 2016. The green dots are his ground balls:

So why, Orioles fans ask, doesn’t Davis do something about it? Why doesn’t he occasionally bunt for a hit or go the other way once in a while?

The problem with bunting for a hit is that if the pitcher serves up a meatball, you’d rather see it land 15 rows up in the outfield stands than roll toward the spot where the third baseman would normally be playing. And, like getting more involved with the PTA or giving up chocolate, it’s not easy to change a swing.

Our friends at FanGraphs classify each batted ball as pulled, hit to the opposite field, or hit to the center of the field. Using those figures, I calculated a “pull index” of pull percentage minus opposite field percentage. The leader among players with at least 350 plate appearances was Brian Dozier. He pulled 56.4 percent of his batted balls and went the other way on only 15.3 percent for a pull index of 41.1 percent. DJ LeMahieu represented the other extreme, pulling 21.8 percent and hitting 37.9 percent to the opposite field for a pull index of -16.1 percent.

Using pull index as a proxy for tendency to hit to one side of the field or the other, I calculated the change from 2015 to 2016. I considered only players with at least 350 plate appearances in each season. That gave me 185 batters, a reasonably good sample.

Who changed the most? Who woke up on New Year’s Day a year ago and said to himself, “This is the year I’m going to stop being so pull-happy?”

Orioles fans: He listened! No player changed anywhere near as much as Chris Davis last year in terms of going the other way more frequently. He pulled a lot less frequently and hit to the opposite field more often. He stuck to his resolution!

How about players who pulled more?

In terms of absolute difference, these players changed where they hit balls more than the players on the first list. It’s also worth noting, I suppose, that six of the 10 players who increased their pull index the most—Xander Bogaerts, Danny Espinosa, Brad Miller, Alcides Escobar, Rajai Davis, and Evan Gattis—set career-highs for home runs last season. The only players on the opposite field table to notch career-highs were Freddie Freeman and Odubel Herrera.

That leads to another question: How much of a difference did the change in approach make? I used True Average (TAv), our all-inclusive measure of performance at the plate, scaled to park and league, with a league average of .260:

Well now. That’s not quite what you were expecting, was it? Players who went the opposite way more did slightly worse in 2016 than in 2015. Now, granted, six of the 10 improved, and the overall average swings to positive .003 if you exclude Bryce Harper’s fall from greatness to really goodness. But overall, there really wasn’t much to show for the effort these players presumably made.

Maybe pulling more was a good strategy?

Nope, exactly as with the last table: The 10 players who had the greatest increase in pulled batted balls in 2016, on average, had a TAv five points worse than in 2015.

Maybe I’m (inadvertently, I promise) cherry-picking here. As I mentioned, there were 185 players who batted at least 350 times in both 2015 and 2016. Maybe the 10 who comprise these two lists are an underrepresentation. Perhaps players who go the opposite way more, in general, do better, and it’s just that the 10 at the top didn’t happen to.

To check that, I divided all 185 players into exactly equally-sized deciles based on their 2016 plate appearances. Here is the weighted average change in pull index and the average change in TAv, per decile. The first decile is the 10 percent of players who went the opposite way the most compared to 2015, and the 10th decile is the 10 percent of players who pulled the most.

There’s something unexpected going on there, and you can see it better in a graph:

See that? The effect isn’t all that strong, but the more often batters went the other way the worse they performed offensively. The results are clustered around no change in the pull index—change is hard!—but there are a couple that stand out. Players in the second decile reduced the pull index by 6.0 percent and suffered a nine-point drop in TAv, on average. That decline goes away if you ignore Mark Teixeira (TAv declined from .313 to .231), Alex Gordon (.299 to .240), Randall Grichuk (.316 to .275), and Matt Duffy (.283 to .252), but of course you can’t really do that, since they comprise about 18 percent of the decile’s plate appearances.

Similarly, there was a big (10 points) TAv increase in the ninth decile, in which players net pulled 8.7 percent more frequently, with six players increasing their TAv by 30 or more points (Ichiro Suzuki 48, Angel Pagan 38, Marcell Ozuna 37, Kris Bryant and Chase Utley 33, Yasmany Tomas 30).

Overall, the relationship here—players who pulled more in 2016 than in 2015 did better than players who went the opposite way more—isn’t strong. And a chunk of it can probably be explained by age and infirmity, as older/hurt guys can’t catch up to pitches as well, resulting in decreases in both pull index and TAv. (Teixeira and Harper come to mind as victims of age and infirmity, respectively.)

But I think we can say that a New Year’s resolution to stop pulling everything and going the other way more—even if a player sticks to it—may not yield the desired results. Players like Chris Davis are like the person who dutifully lays off desserts and is on the treadmill five days a week but doesn’t lose any weight.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
When you look at Chris Davis' spray chart, it's probable that the increase in his opposite field hitting was in the fly ball category only, as the vast majority of his grounders were pulled. I wonder, therefore, if his TAv decline was due to his lesser tendency, purposeful or not, to pull balls a long way in the air. It might be interesting to see how positive changes in pulled fly balls (and, in the same vein, opposite field grounders) from year-to-year correlate with TAv success - I think the relationship might be stronger in a positive direction. Now whether a player can consciously effect change in these parameters by altering his approach is another question...
Using the FanGraphs splits tool, this is the breakdown of percentage of batted balls pulled, hit up the middle, and hit to the opposite field, for Davis (totals don't always add to 100 due to rounding):

2014 38-31-32
2015 46-32-22
2016 32-38-30

2014 77-13-9
2015 79-15-6
2016 60-32-8

So he cut back on air balls to right but cut way, WAY back on grounders to right. That's the lowest percentage of grounders he's pulled in his career.

This Hardball Times article explains why so many grounders are pulled while fly balls often go to the opposite field. The conclusion supports your thesis that a positive change in pulled fly balls is a good thing:

Thanks Rob - I liked your article and the THT reference was great!
I like this study a lot Rob. One thing you have to be careful about is this. Comparing changes from one year to the next without taking into consideration regression toward the mean. This will especially wreak havoc when one or more of your groups happen to be well-above or below average players and you don't have any reason to expect that bias to continue. That may or may not be the case with your study.

Best ways around that are to do one of two things. One use a control group of around the same performance level. Two, use projections as your baseline. The latter is often best.

For example in your first chart you get a decline in Tav of 5 points. But is that really a decline? You have to compare it to a projection for that group of players. So a group of players who hit .292 for one season would actually expect to hit maybe .280 the next season with aging and if the mean of their population were .260 (I don't know if it is or isn't - it's probably higher for several reasons). So hitting .287 the next season is actually BETTER than expected.

The second group probably hit around what was expected the next year so there really wasn't a "decline" there either. Again I don't know what their projection would be so take those statements as merely illustrative of my overall point.

Anyway nice work!
Thanks for the insight. I'm going to be doing some more work along these lines, and I'll keep his in mind, particularly in cases where the group in question (at the extremes of the distribution) are markedly over- or under-performers. You're right, the first group did better than their projections, although that is almost entirely due to massive outperformance by Herrera (who I assume broke most projection systems last year) and Freeman. The second group also did slightly better than projections, though if you take away Carpenter and Espinosa, they underperformed. Thanks for bringing up a great point.

Re TAv, it's scaled around a league average of .260, so you're right, the players under consideration here--those good enough to get 350 PAs in consecutive seasons--will be above that average.
sounds good! Looking forward to seeing more work in this area. Yes, the means for full-time and part-time players will be different however one has to be careful about putting players in one group or another and also to establishing the means of each group. To some extent players get into the full-time group because they have over-performed (relative to their talent). Always remember that when establishing means for purposes of regressing the means MUST be the actual true talent of that group. There are many ways to assign means to groups that are NOT representative of true talent because the groups themselves were determines by random over- or under-performing.

One more thing. Try not to ever get caught up in "Well, if I eliminated this or that person from the sample." It is rarely productive to even think that way, let alone express it in a research piece, and it can lead to bad research. I highly recommend never going down that path. Consciously or not, it can and will lead to skewing results to lead to a desired end.

I would also recommend not to go down this path either: "6 out of 10 of my sub-samples did this even though..." Unless there is a compelling methodological reason to treat differently or as separate, meaningful entities, or even mention separate samples within a group, there is no statistical justification for doing so. It is actually a red flag for amateur or bad research. In most cases if a sample of players (or any entity) points in one direction it makes absolutely no difference if 3 out of 10 or 7 our of 10 of the individual players points in one direction or the other and pointing that out can only serve to mislead or influence the reader to go in a direction that he shouldn't be going. Again, unless there is a methodological or mathematical reason to do so - in which case you must articulate what that is.

Of my last 2 points, you wrote:

"Now, granted, six of the 10 improved, and the overall average swings to positive .003 if you exclude Bryce Harper’s fall from greatness to really goodness."

You should not be writing that, again, without articulating a reason (which there probably isn't) why it might make a difference or be germane to the analysis.
I appreciate this.

Two questions. First, in the example of the players who pulled less in 2016, would it be appropriate to note that the results vs. expectations are skewed by Odubel Herrera, whom the projection systems pretty much completely missed?

Second, regarding the sentence you pointed out, if I have a sample where eight players did 5% better than expected and two did 25% worse, would you view the preponderance of modest improvement or the overall average of modest decline as the more germane conclusion?

Again, thanks for your insights here.
First question, no I don't think so. How good or bad these players performed compared to their projections has nothing to do with the issue you are addressing. Again, you can look at individual players to decide if there's a compelling reason not to include them in your overall sample, but I strongly advise doing that.

Second question, the overall average is the only thing that matters in most cases. The only reason you are using multiple players in most of these analyses is to increase sample size. For all practical purposes they might as well be one player. You are inlcuding these players because you have determined that they're part of the population you are studying. Period. They are now simply N1, N2, N3, etc.

Making any inferences based on individual player results is almost (not quite, but close) like doing a study where you have a homogeneous sample size of N and then you arbitrarily and randomly break it up into multiple small samples and then make inferences about one or more of those small sub-samples. You can't do that.

You're investigating a certain effect within a population of players, period. The results of the individual players within that population mean nothing. You have to get out of the habit of thinking that the individual results mean anything at all or even reporting them. It's nice and it lends a "familiar air" to your research but it's not necessary. I almost never report on the individual players in a study or even look at them myself and I have done hundreds of these.

I mean if you look at individual players and you determine there is something about one or more such that they shouldn't be included in your sample then remove them, although that's a dangerous path to do down as I indicated in my last comment.

Basically think of your players as just random sub-samples of your overall sample. The problem with inferring anything from individual results, besides the fact that it's just not mathematically or methodologically correct to do so, is that your'e simply going to get all kinds of random anomalies among those sub-groups. That is exactly what you are trying to avoid by including as many players as you can that "fit the bill." Your entire point is to have all those random fluctuations "even out" by combining them. Bringing individual results up is exactly what you don't want to do.

In most cases, as in your second example, the distribution of individual results will have nothing to do with the effect you are trying to analyze. It will everything to do with the number of individuals in your sample and each of their underlying sample sizes (like PA in the season in question). In your example the only reason you have 8 players at +5% and 2 at -25% is because you only have 10 players and you expect lots of random fluctuation within 10 players. Has you had 1000 players it would be pretty likely that half would be on one side of your mean and half on the other side. Why would you want to report some random fluctuation within your overall sample? If I was testing a coin and I had 9 people helping me and each flipped the coin 100 times, would it help my inquiry if I reported, "8 of those people favored heads and 2 favored tails?" Of course not. You would simply report the results of 1000 flips. You realize that nothing changes in your analysis how you split up the 1000 flips. It's presumably the same the multiple players in most research like this. Each player is merely 10 flips of a coin. There is no difference among the players.You might as well split your overall sample up by age or months of the year or first half/second half. Presumably every player in your sample belongs to the same population you are studying AND is equally likely to be effected by whatever it is you are looking for. If not, then the methodology of your study is poor to being with and you shouldn't be combining these players in the first place!

Anyway I probably beat a dead horse!
No, not a dead horse. All helpful, thanks.
Tango had an interesting thread on his blog regarding your approach and the problems associated with selecting sampling. Did you read it?

He's right in that, to sum up his argument is a few words, a group of players who changed their approach (has to be a deliberate change not the result of something like an injury or age) from one season to the next (you have identified this change) will tend to have gotten lucky whether or not the change had any positive or negative effect on his true talent.

He explains the reason. After, say, a month, if the results are bad (which will always be a combination of luck and skill, but mostly luck in only a month), players in general will have a tendency to switch back to the old approach.

The ones who continue will have had a tendency to have gotten lucky in the first month.

This pattern will continue throughout the season such that players will "drop out of the new approach" all along the way if their prior performance was bad (mostly unlucky).

The net result is that of all players who continued with this new approach for most or all of the season will have had a somewhat lucky season by definition. This is unavoidable and incontrovertible.

It is exactly the same dynamic as looking at player with the most playing time. The more the playing time (mostly for non-established players - established ones are not as sensitive to the whims and decisions of management in terms of playing time) the luckier the season (and the greater the talent of course).

Any time someone makes a decision about what to do based on X (playing time, change of approach, eating chicken, taking steroids, etc.) if you find out that they did or took X for most or all of a season, they will have had a lucky season and will regress in the next season whether they continue to do or take X or not.

So you samples players for large changes. Presumably players with large changes kept up those changes for most or all of the season. So you have selectively samples these players such that their season 2 would be lucky seasons. So you should see an increase in TAv in the second season and not a decrease which you found (although we don't know if it's a true "decrease" or not, which is why really need projections or a control group).

Not only that but there is another factor at work in your selective sampling. (I don't mean to imply that you made a mistake in sampling these players - you didn't. You just need to recognize it and adjust for it if you can.)

Players who deliberately change their approach from one year to the next will tend to have had a bad first year! Otherwise they would be less likely to want to change their approach. So both year 1 and year 2 are biased in terms of results. Year 1 will be unlucky and year 2, lucky. More reason for there to be an even larger increase from year 1 to year 2.

I suspect that the reason you found a decrease was a combination of aging (there will always be a decrease from any year 1 to year 2 unless your sample is very young players - that's especially true when you include survivorship bias) and injury.

However, to see to what extent selective sampling is biasing the results, in this case TAv in years 1 and 2, try this:

Look at same players (make sure you always weight by each player's PA) in year 0 (2014 in this case), and year 3 (you can't do that here, but if you switch your study to changes from 2014 to 2015, you can then look at 2016).

Year 0 will be an unbiased estimate of these players' talent before the change and 2017 (or 2016 if you study 2014 to 2015 changes) will be an unbiased estimate of their talent after the change (with the assumption that the change was permanent to some degree at least). You just have to make sure that you include aging effects OR that you compare to a control group (such that the control group will have the same aging effects). For example, ALL players unless you select or happen to select very young ones, will have better stats in year 0 than in year 1 and worse stats in year 2 than year 1 and year 3 than in year 2.

I just want to reiterate that my comments and suggestions not withstanding, I really like this study and your basic approach.
Thanks for bringing Tango's piece to my attention; I hadn't seen it. If you don't mind, could hit the "Contact Author" button at the top of the screen? I have a couple questions/comments. Thanks!
By and large these are great points, but I don't quite buy the analogy between players and assistants flipping coins. The assistants will not be prone to their own patterns or distributions, and among players those could be germane to the robustness of the findings.

In a way, these references to sub-samples are a peek at high-influence points or at least skewness. I would agree that they are not rigorous statements about either, but I'm not quite convinced that they are nothing but distractions.