keyboard_arrow_uptop
Baseball Prospectus is looking for a Public Data Services Director. Read the description here.

I want to get into a topic that might be the question in baseball. The one that people really want to know. Who is going to break out next year?

I want to approach the question from a slightly different way than it is usually done. Normally, people who look for breakouts are looking to identify a magic list. These five guys are the ones to watch out for. Yeah, eventually, there need to be names attached (because otherwise, what’s the point?) but I’m a little wary of magic lists. The problem with “must watch” lists is that someone will often pick some factor which s/he believes will lead to change (the good ones will actually have some reasonable data to try to justify the factor) and then list players who fit that mold. This is where you get the 26+3 (under age 26, more than 3 years in the majors) lists. It’s not a bad heuristic, but it’s also not very informative to the bigger question. What promotes growth and development in players?

Magic lists have the unfortunate habit of invariably swinging and missing on a few guys (remember Scott Sizemore, Rookie of the Year candidate?) and completely missing out on the guy who actually appears from nowhere (Josh Donaldson). So, I’m going to start with a disclaimer. The lists are just to illustrate the point.

Growth and development are hard to predict for a simple reason. There are several ways to become a better player. Sometimes it’s as simple as making a single decision that snaps everything into place. Sometimes, it’s a matter of the growth of a couple of skills at the same time. I think that in the rush to make the list, people have forgotten to really study what’s going on at a deeper, more molecular level. So, that’s what I want to do. What are the ways—plural—in which we can model (numerically) growth and development in baseball players?

Warning! Gory Mathematical Details Ahead!

We’re going to look at breakouts from a very specific place. Baseball stats are most often quoted in their full season form (e.g., “He hit .280 last year.”) because the season is the most important unit of measurement in the game for two reasons. One is that they hand out only one “World” Series Championship each year (yeah, I know, Toronto), and the other is that player contracts pretty much exclusively run through the end of a season. General managers (whether of the fantasy or real variety) are generally looking back at last year’s stats, trying to divine who will shine this year.

I’m going to define a breakout as a change in some statistic of note from one year to the next. (For example, in a moment, I’m going to use a reduction in strikeout rate.) That can take two forms. One is a raw change (He struck out in 18.0 percent of his plate appearances last year and 16.2 percent this year, a difference of 1.8 percent) or a percent change over baseline (his strikeout rate fell by 10 percent over last year).There’s a third method, which I have discussed before, called the reliable change index that adjusts for some of the inherent unreliability in baseball stats, especially at smaller sample sizes. For right now, I’m going to use the first two.

To start looking for evidence that a player might change his stripes in the year to come, let’s start in an obvious place and see whether our player showed any indications that he was changing last year. I have previously suggested a method for looking at talent level variations within a season for an individual player. The basic idea is that we might use a moving average approach toward modeling a player’s performance.

For example, what is the better predictor of what Smith will do in a particular plate appearance? Is his overall seasonal average the best predictor, or perhaps his last 100 PA? If the answer is the last 100 PA, then we can safely say that there were probably some peaks and valleys in Smith’s true talent level over the course of the year. If his full season rate is the best predictor of his plate appearances, then we can assume that his talent level was fairly consistent over the course of the year. We run the same analyses for all of Jones’s plate appearances in a given year as well. In this way, we get a read on Smith and Jones as individuals. For these analyses, I looked to see whether a player’s full season average was the best predictor or whether a moving average of his last 50 PA, 60 PA, 70 PA, etc. up to 200 was the best predictor. (For the initiated, I used a stepwise binary logistic model and picked out the first variable to enter the equation.) I looked at all player-seasons from 2009-2013, minimum of 250 PA in a season, and determined the best predictor for each one.

We’re going to call the group that shows peaks and valleys over the course of the year “changelings” and those who stay the same “solids.” For the changelings, once we know what the best “width” for a tracking average is for him, we can map out his projected true talent levels over the course of a season. We can also take those points and shoot a regression line through them to see whether, overall, the trend line is pointing upward or downward (for the initiated, the regression coefficient is positive or negative). That leaves us with three groups: changelings who are trending upward, changelings who are trending downward, and solids. Now we know whether Smith’s strikeout rate was moving up, down, or staying level over the course of the previous year.

Now, what happens in the next year to each of those groups? Again, I’m defining a breakout based on full-season stats, because those are the ones that are easiest to look at. So, I looked to see how many in each group had a year over year increase of at least 1 percentage point in their strikeout rate. Then, 2 percentage points, then 3. I also looked to see how many showed an increase of more than 10 percent of their previous rate.

The results:

Trending Upward

Solids

Trending Downward

Increased Strikeouts by 1% or more

45.3%

40.5%

40.0%

Increased Strikeouts by 2% or more

31.8%

27.8%

29.8%

Increased Strikeouts by 3% or more

19.3%

17.0%

19.5%

Increased Strikeouts by 10% or more over baseline

36.5%

31.4%

32.7%

Decreased Strikeouts by 1% or more

33.2%

31.8%

39.5%

Decreased Strikeouts by 2% or more

20.2%

20.9%

26.8%

Decreased Strikeouts by 3% or more

11.7%

12.9%

18.5%

Decreased Strikeouts by 10% or more over baseline

20.6%

24.6%

31.2%

One thing that we clearly see is that there are a lot of players who randomly move around in their strikeout rate. Even a big move like 3 percent up happened for about 20 percent of the sample that had previously been trending downward. But we do see a pattern that we might expect. Changelings who were trending upward last year were somewhat more likely (for the initiated, chi-squares were not significant) to show an increase in their strikeout rates. There’s only a couple of points of separation there. Some of that is probably due to the fact that if a player starts to strike out a lot more than he had been, he’s likely to get demoted or released for such an offense, and not reach the 250 PA inclusion limit.

On the decrease side, the effect is a little more pronounced (and statistically significant). Hitters who are showing signs of a decreasing strikeout rate the year before tend to have seasons the next year where they show a decrease. There’s consistently about 6-7 percentage points worth of separation between the groups. Not bad.

We can sharpen our focus a little bit though. I mentioned that I based my determination of whether the trend was up or down, based on the regression coefficient of the line that passed through their moving average graph For those who were changelings who showed a downward trend, I ran a logistic regression using a binary outcome of whether, in the next year, they showed a 10 percent decrease over baseline. I used the slope of the initial tracking average line as a predictor. If in 2012, a hitter was tilting seriously downward, he’s probably a better bet to show a big drop in 2013. And that’s what I found. Players who had steeper lines were more likely to experience a big drop in their strikeout rate in the following year.

But not all is rosy. Now, using 2013 stats, here are the players whom the model predicted would be in line for a big drop in their strikeout rates in 2014. (And remember, there are no magic lists…)

Player

Model Prediction of Likelihood of Drop in K Rate

2013 Strikeout Rate

2014 Strikeout Rate

Absolute change

Craig Gentry

52.6%

16.0%

17.1%

+1.1%

Marcel Ozuna

49.1%

19.6%

26.8%

+7.6%

Yonder Alonzo

48.8%

12.5%

12.5%

0.0%

Daniel Nava

48.1%

17.4%

19.9%

+2.5%

A.J. Ellis

47.7%

17.4%

16.4%

-1.0%

If a team had looked at this list and this list only, they would have seen one guy who didn’t move his strikeout rate at all, two guys who moved a point, and then Nava and Ozuna who struck out more. Score one for the aggregate, and none for the specific!

The Problem with Lists

There’s a reasonable rebuttal to these analyses. I’ve shown a method that can separate out a group with a 31 percent chance of breaking out (on strikeouts, anyway) vs. a 24 percent chance. Even when we take the top five candidates, we’re talking about a list of five guys who aren’t even 50 percent chances to break out, and the names the model would have told us to focus on ended up being duds. This is hardly the stuff of certainty, but there are no certainties in baseball. Here’s where I think we see the problem with lists. I think that people expect a perfectly discriminant function. This mythical equation will separate the sheep from the goats, the Sharks from the Jets, and the people who like Cincinnati chili from the normal people. This method isn’t it.

We’ve identified one factor that could indicate that someone is ready for some growth. If you make the article about the list, you will be disappointed. If you recognize that we’ve identified one factor (likely of many) that can predict growth, and take this as the starting point for finding others, then you’ll be happy. If you can identify a few more of these, then you can start to build a growth profile of a player. If he has three of these growth factors on his profile, maybe he’s worth taking a risk on. Note the word “risk.” You really should have a preponderance of evidence before taking such a risk, and that sort of understanding requires a deep study of the subject. And maybe (gasp!) talking to the scouts.

Think of this from the point of view of a team. Unlike a fantasy draft, real teams do not have the ability to simply target any player they like. If a team wants to go take a chance on a player, he either has to conveniently be a free agent or be pry-able in a trade. And it’s not like teams can trade for 20 players and hope that a few work out like people who are making lists on the internet can. You have to pick a guy or two based on who’s available and hope you get lucky. A breakout is a low frequency event. There’s a certain amount of drawing the magic lottery ticket that goes with it. The best that you might hope for is to have a slightly better and more-informed list of guys to think about. But I don’t think that we need to throw up our hands and say it can’t be done. It’s just going to take some work.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
brentdaily
10/29
No need to put world (as in series) in quotes. The origin comes from the original sponsor, the New York World newspaper, not the idea of crowning a "world" champion.
Muboshgu
10/29
The "New York World" story is an urban legend.

http://www.snopes.com/business/names/worldseries.asp
bmmcmahon
10/29
You still don't need to put quotes around "World" simply on the basis that there is no serious competitor to the champion of the American major leagues for the title of best baseball team in the world. When the best baseball players around the world start flocking to Japan to play ball, then you can question the legitimacy of the title.
brentdaily
10/30
Well played, I stand corrected. Thanks.
TeamPineTar
10/29
Good work as usual, Russell. You've made many good points. One of those seems almost trite, that "today is the first day of the rest of our lives" kind of thing. However, unless we assume that there are still unconquered territories throughout the metric continent, it would be a fool's errand for any of us to be here. Looking back at 2014, I sure wish there had been some sort of indication in one of the many preseason or early season sources that I devour that JD Martinez would break out. In my primary (dynasty) league, Martinez was a serendipitous pick-up in JUNE (!) by a team that had one glaring OF weakness and had churned 11 or 12 guys through the position before then. Only later, many writers on BP and other sites talked about how Martinez had changed his entire approach at the plate. He probably falls almost exclusively into the realm that the only indicator of a breakout would have been talking to scouts.

From a subscriber standpoint, one hardly knows how much work goes into the various lists. I recently went back to check several BP writers' 2014 preseason rankings of a few positions. They generally left a lot to be desired, to be courteous. Saul may just be embarking on the road to Damascus, having no idea what light may descend ahead.
pizzacutter
10/29
Saul was blinded on the Road to Damascus... I hope I have better luck.
jessemumm
10/30
500 PA moving average, trend line, @ 100 PA intervals. pick 5-10 (or more!) key metrics, overlay them, normalize.

#zerohedge