Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

Over the past couple of weeks, I’ve introduced the idea of estimating a player’s talent level at specific points in time, at least retrospectively, by using a moving average. The idea is that instead of simply assuming that at every plate appearance, a batter could be described by his seasonal average, we might allow that his true talent level varied over the course of a season.

Hitters do grow and develop within a single season, because that’s what humans do. Sometimes it’s obvious that a hitter has figured something out mid-season and that he’s much better at swinging the stick in September than he was in April. Fans and front office types alike love it when that happens, and even if a team doesn’t end up making it to the playoffs, that late-season surge provides hope for the next season—hope that is currently fueling the fantasies of every fan in baseball. Ike Davis had a .326 OBP last year, but in his last 100 plate appearances, he posted a .460 OBP. Jason Castro had a perfectly glorious .350 OBP, but in his last 100 PA, he put up a nifty .450. Ben Revere’s overall OBP was .338, but .410 in his last century of times to the plate. It’s tempting to think that these players—these young players—are finally ready to step up from good, solid players to elite status. After all, the trend line is pointing up, right?

It’s so tempting to want to believe, but like everything else, we need to examine whether it’s a belief based on solid evidence. You want to believe that because he finished strong, he’ll start strong and he’ll stay strong, right? (And in a fantastic feat of cognitive dissonance, you will also assume that the guy who tailed off at the end of the year was just a small sample size fluke that doesn’t mean anything.)

Warning! Gory Mathematical Details Ahead!
Recently, I laid out the method for how we can retrospectively estimate a player’s true talent at a given point in a season. Basically, I calculated a series of moving averages (how the player has done in his last 50 chances, last 100 chances, last 150 chances, and so on) and also the player’s seasonal average. I allowed the computer to pick which of those best described the player during the course of a season, based on a logistic regression. Some hitters show evidence of this sort of variation in their talent level. For some, the seasonal average is all you need.

This time, I’m looking at OBP. For the years 2009-2013, I coded for each player (min 250 PA) whether a plate appearance ended in an on-base event. I then took moving averages for the last 50, 60, 70, etc. plate appearance up to 200 and determined what the best descriptor for his chances of an on-base event was. For example, if the regression said that he was best described by his previous 130 PA, I assigned his true talent level to be his OBP in his last 130 PA. For players who were best described by their seasonal average, I simply input his seasonal average.

I calculated several OBP estimates:

  • Overall seasonal OBP
  • Seasonal OBP regressed to the league mean (using KR-21 coefficients, described here)
  • What his true talent estimate was at his last PA of the season
  • His highest estimated true talent point during the season
  • His lowest estimated true talent point during the season
  • His OBP over the last 100 PA of the season, whether that was his best descriptor or not

For players who had a minimum of 250 PA in the next season, I calculated his OBP for that next season. Which of the above was the best predictor? Was it the overall seasonal OBP or did that trend line win out? Should we look at the whole body of work or the most recent piece?

Correlation scoreboard (all correlations with year+1 OBP):

Overall Seasonal OBP

.474

Regressed Seasonal OBP

.480

End of season talent estimate

.380

Highest estimated OBP

.302

Lowest estimated OBP

.243

OBP in last 100 PA of season

.297

I limited the sample to only those who had shown evidence that they were best described by some sort of moving average.

Overall Seasonal OBP

.412

Regressed Seasonal OBP

.416

End of season talent estimate

.307

Highest estimated OBP

.286

Lowest estimated OBP

.244

OBP in last 100 PA of season

.297

I tried splitting the file a few other ways (only players under 27, players who showed evidence that things were either trending upward or downward, upping the inclusion criteria to 400 PA), but the results were always a variation on this theme. The overall OBP always prevailed. The end-of-season talent estimate did a decent job, but it lost. I re-ran the analyses with strikeout rate as the outcome. The year-to-year correlations were higher (K rate is much more stable than OBP), but the same relative patterns held.

To do a quick check, I also looked at the differences between the year +1 OBP and the current year OBP, and then the differences between the year +1 OBP and the end-of-year point estimate and asked which came closer to predicting the real OBP. The seasonal OBP won about 65 percent of the time.

How Changes Actually Happen
So the idea of using trend lines as predictors is dead? Not necessarily. Like a lot of things in baseball (and life) it’s just not that simple. To interpret the findings above in their most literal sense, there’s just no escaping who you are as a player. If seasonal OBP is all there is (with a healthy dose of random variation and luck—and maybe some aging effects), then there’s no room for breakouts in baseball.

Of course, breakouts do happen, so we need a little more nuance. Something must drive breakouts. It seems that if something is going to indicate that one is about to happen, it would be guys who mathematically show a pre-disposition to be described best by a smaller, more recent sample, and who had a surge at the end of the year. Why does seasonal OBP still win, even when we seem to be stacking the deck to favor the streak?

Because that’s not how change works.

Change and development in humans is not a neat, linear process. Consider any bad habit that you have ever tried to change. You recognized that it was a bad habit. You made up your mind that you were going to change it and slowly you worked at it and got better at controlling it until…well, you probably fell back into the habit. Over time, you might eventually conquer it and make a change (hopefully for the better), but relapse is part of change. It’s not that you can’t try again (if it’s a bad habit, you should!), it’s just that sometimes you’ll fall back. Real change usually comes in fits and false starts. Our minds (especially when looking at players whom we want to succeed) want to see a nice straight line, but take it from a former therapist. Life doesn’t work like that.

We essentially have two measures for our player; let’s look at what they really describe. We have a small sample size (his last 150 PA of last season), but we have evidence that this is a decent descriptor of his talent at that point, and we know that it described how he was doing the last time he was on a competitive baseball diamond. Why would he not carry it over? Because it describes the phase where he was working on making himself a better hitter. Then he gets a few months off. He probably does off-season work and hopefully tries to maintain whatever he’s done. But the reality is that he is likely to have a period where he falls back into a bad habit. That’s the real strength of the seasonal average. It is composed of outcomes that represent the player both before he had improved himself and after. And it’s likely that next year’s performance will include a bit of both of those too.

Let’s look a little more closely at the numbers. Note that the end-of-season talent estimate, based on a small sample size, checks in with a correlation of .307 with year + 1 OBP (among those who had a tendency toward changing OBPs through the year). That’s not bad, especially given that we’re talking about a sampling frame of, at most, 200 PA. Seasonal OBP does much better as a correlate but has a bigger sample size base to work from. The fact that the 200 PA that we’re selecting are the ones most likely to foretell a breakout seems to have something going for it.

I’d argue that we need something else in here. I’d argue that we can at least get a decent idea of who is doing something different and whether that actually had an effect on performance. It’s entirely possible that a batter might make a change for the better and stick with it. Once we identify who has made a change, how can we better figure out who will keep it up? There’s the challenge.

In the shorter term, we return to Ike Davis and Jason Castro and Ben Revere. Should fans of the Mets and Astros and Phillies get all hot and bothered about the major breakthrough that they are about to see in 2014, based on a hot finish in 2013? It’s entirely possible that they might, but the numbers say that the much less enticing seasonal number from last year is the best guide.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
randolph3030
2/24
Dr. Carlton: Randolph3030, I'm afraid you'll have to undergo a coronary bypass operation.
Randolph3030: Say it in English, Doc.
Dr. Carlton: You're going to need open heart surgery.
Randolph3030: Spare me your medical mumbo jumbo.
Dr. Carlton: We're going to cut you open, and tinker with your ticker.
Randolph3030: Could you dumb it down a shade?
pizzacutter
2/24
Heart go boom. That bad.
TangoTiger1
2/24
Can you tell me the average number of PA for each of these entries?

Overall Seasonal OBP
.474

Regressed Seasonal OBP
.480

End of season talent estimate
.380

Highest estimated OBP
.302

Lowest estimated OBP
.243

OBP in last 100 PA of season
.297

TangoTiger1
2/24
Well, the last one is a given. The others, please, thanks.
therealn0d
2/24
You want to know if it equals or exceeds 350? That's reasonable.
pizzacutter
2/24
For the seasonal numbers, on average you're talking about 480 PA. For anything involving a trend line (#3, #4, #5 on the list), it represents an average of 132 PA powering each number (some more, some less, obviously). For #6, as you point out, it's 100.
delatopia
2/24
There's one second-half uptick that may be worth a flyer because of a documented change: Chris Iannetta got contacts and went from OPS of .688 through July to posting an .819 in the final two months.
pizzacutter
2/24
And that's the sort of analysis that we're gonna need. Assuming that Ianetta's hot streak was due to getting contacts, the only thing he'll need to remember is to put his contacts in every morning. Of course, that quickly accelerates beyond our data.
Johnston
2/24
Life imitating art ("Major League").
gpurcell
2/24
Yeah, this is really just a variant of needing a theory-based reason for introducing a nominal variable into a time-series regression that handles pre- and post-intervention effects.
slackfarmer
2/24
Did you try calculating a year+1 OBP estimate based on overall seasonal OBP, but overweighting the PA of the end of season talent estimate?
pizzacutter
2/24
Sorta. I ran (and didn't report) a step-wise regression throwing both of those into the pot. There was a small effect for the end of season estimate, but it was very small R-squared wise.
sfrischbp
2/24
I would think the same reasoning applies to guys that get worn out over a season (due to age or just how they happen to deal with the grind). When they start the next season, they've had a chance to rest mentally or physically, so you wouldn't expect the end of year drop to predict performance in the first month. For an analysis that picks continuous windows, the one that goes all the way back to the beginning of last year is going to be the best.
TangoTiger1
2/24
Russell: something looks odd to me.

For the seasonal correlations, those look like r-squared results.

But for the short-season correlations (those based on 130 or 100 PA), those look like r results.

Can you confirm that (a) all the results in the table are of the same form, and (b) whether it is r or r-squared?
pizzacutter
2/24
Confirmed that they are all good ole fashioned Pearson correlations.
TangoTiger1
2/24
In that case, can you look at past years correlation of OBP year T to year T+1? Because an r of close to .50 with 480 PA is an extremely low number. It should be closer to .70.

LynchMob
2/24
Padres' writer thinks Seth Smith late season improvement might be due "touch-up Lasik surgery on his left eye" ...

http://mlb.mlb.com/news/article/sd/padres-left-handed-batter-seth-smith-benefits-from-eye-surgery?ymd=20140223&content_id=68230614&vkey=news_sd

"After hitting .241/.315/.367 through Aug. 17 last season, Smith returned six days later with better vision and better success at the plate. Smith hit .341/.431/.568 over his final 51 plate appearances of the regular season before hitting .313 in four playoff games."
bornyank1
2/24
When the Padres traded for Smith, I wrote:

"He might miss Melvin’s efforts to limit his exposure to lefties, but at least we know he can see—after undergoing LASIK surgery following a summer slump, he hit .341/.431/.568 in a small sample the rest of the way. If you’re a Padres fan searching for upside, you can cling to the hope that Smith that won’t make any more outs now that he’s no longer astigmatic."

Fingers crossed!
Otisbird
2/24
I always assumed that an uptick in performance late in the year was due to the expanded rosters and call ups.
Repperson29
2/25
I believe that was found to be not accurate...
lyuchi
2/25
Some analysis incorporating changes like Iannetta's seems like something that can be studied by dividing change types into a few big buckets. So things like getting eyewear would go into a different bucket than the more qualitative stuff i.e. Player X changed his approach at the plate by getting more aggressive/patient. Could also maybe use a few proxies (pitches taken) to see if the stated changes in approach resulted in anything observable.

One other thought: could be interesting to observe the the season+1 OBP correlations after omitting players who enjoyed unusually high BABIPs in the last 100PA and/or the shorter MAs.
dethwurm
2/25
Agreed, I don't think it would be terribly hard (time-consuming, but not difficult) to find and bin all the, say, eyewear/eye surgery cases over the last couple of years and see if something like that really makes a notable difference.