Notice: Trying to get property 'display_name' of non-object in /var/www/html/wp-content/plugins/wordpress-seo/src/generators/schema/article.php on line 52
keyboard_arrow_uptop

Were Ichiro Suzuki represented by Scott Boras, the super-agent might be able to make a more convincing case than usual for his client’s singular, once-in-a-generation talent. Actually, Ichiro-types don’t come along even as often as that, especially in PECOTA’s post-World War II player comparison pool; the baseball gods appear to have both made and broken the mold especially for him.

The Mariners’ NPB import is an outlier in more ways than one, which makes him both a fan favorite and a likely future Hall-of-Famer. Of course, every player could accurately be described as unique, whether because of some aspect of his play on the field, his background, or his choice of breakfast cereal. But Ichiro’s uniqueness is impossible to ignore.

As it happens, some of the very qualities that endear Ichiro to baseball fans render him a persona non grata with the developers of forecasting systems, at least in their professional capacities. In addition to being a great quote, Suzuki has famously managed to collect at least 200 hits for 10 consecutive seasons, a feat that distinguishes him from every other player in history. The traits that have enabled him to amass those remarkable hit totals also mark him as the rarest of roses in and of themselves.

By virtue of his speed, tendency to hit the ball on the ground, and, perhaps, some innate ability to hit ’em where they ain’t, Suzuki has managed to sustain a .357 BABIP over more than 7,000 plate appearances in environments where the “average hitter” musters only a near-.300 figure. In not-unrelated news, Suzuki has led the American League in infield hit percentage for five straight seasons.

The statistical quirks that have made Ichiro such good news for Seattle have doubled as bad news for the accuracy of PECOTA’s projections. Since automated projection algorithms aren’t tailored to individuals, players to whom the normal rules don’t apply (or apply only loosely) present a challenge. Let’s take a look at how PECOTA’s past and present projections for the speedy right fielder stack up to reality. The following table displays Suzuki’s actual stats since suiting up on this side of the Pacific (omitting his rookie season):

Year

PA

AVG

OBP

SLG

2002

728

.321

.388

.425

2003

725

.312

.352

.436

2004

762

.372

.414

.455

2005

739

.303

.350

.436

2006

752

.322

.370

.416

2007

736

.351

.396

.431

2008

749

.310

.361

.386

2009

678

.352

.386

.465

2010

704

.314

.359

.395

AVG

730

.329

.375

.427

For comparative purposes, here’s how PECOTA projected Ichiro in each of our annual publications since the system hit the scene. PECOTA was little more than an apple in Nate Silver’s eye in 2002, so we’ll look at 2003 on:

Year

AVG

OBP

SLG

2003

.306

.368

.419

2004

.309

.351

.423

2005

.311

.355

.415

2006

.308

.343

.406

2007

.310

.354

.398

2008

.304

.346

.384

2009

.292

.338

.359

2010

.322

.375

.426

 

Withholding comment until we’ve presented all the data, let’s take a look at the retroactive forecasts (sans aging adjustments, which we’ll cover later this week) for the same seasons, generated by the latest PECOTA methodology:

Year

AVG

OBP

SLG

2002

.315

.345

.424

2003

.315

.356

.420

2004

.312

.352

.421

2005

.327

.368

.428

2006

.320

.358

.423

2007

.319

.362

.422

2008

.325

.366

.422

2009

.321

.363

.407

2010

.319

.359

.412

AVG

.319

.359

.420

 
In order to put the different implementations of PECOTA on an even playing field, we can compare Ichiro’s MLB stats to those dueling forecasts from 2003-2010, using Ichiro’s actual PA totals for PECOTA-weighting purposes:
 

Results

AVG

OBP

SLG

Actual

.330

.374

.428

New PECOTA

.320

.360

.420

Old PECOTA

.308

.354

.404

 

Actual Ichiro outperforms even the new-and-improved projected Ichiro, but not by much: only 10 points of batting average separate the two. A projection system can’t predict luck, and since some random fluctuation is inevitable, no method can pinpoint batting average infallibly. Ichiro’s true batting-average ability may have remained more or less stable even as his results jumped from as low as .303 to as high as .372, but “new” PECOTA wisely split the difference, never calling for a figure lower than .312 or higher than .327.

So what’s responsible for the improvements in Ichiro’s forecast? As Nate Silver acknowledged several years ago, PECOTA wasn’t doing a great job of grasping the legitimacy of Ichiro’s high batting averages. All high batting averages aren’t created equal, but as Nate lamented about the system’s former failings, “PECOTA thinks that Ichiro is due for a major correction because it thinks he’s like Luis Polonia, and when a hero like Luis Polonia hits .330 or something, it is almost certainly a fluke, a lucky year by a banjo hitter.”

Nate dubbed Ichiro “unique,” but he’s not the only batter whose high BABIPs manage to confound PECOTA on a regular basis. Matt Swartz has written about these “BABIP Superstars” on multiple occasions. Along with Ichiro, the group he identified includes luminaries like Derek Jeter and Joe Mauer, which hasn’t helped to obscure PECOTA’s deficiencies in the BABIP department.

The problem is in projecting batting average in the first place. There’s any number of component skills that contribute to a player’s ability to hit for average – his ability to hit home runs, his ability to make a lot of contact, his ability to leg out a few additional singles. But PECOTA was lumping all of those skills into one catch-all metric, one that is typically subject to a high amount of noise.

So we’ve broken hitting down into a much larger set of component skills than PECOTA has in the past – utilizing play-by-play data from Retrosheet, we can break out things like infield singles and reaching on errors. We can then break this more detailed batting line down into an even more detailed set of components, and project them all independently before combining them into an overall batting line.

This lets us do a better job of projecting players with unique skill sets – by taking a closer look at the variety of skills that make up their batting line, we can do a much better job of identifying the underlying skill and not regressing it away as “luck.”

This also has implications for pitchers – we’re not stuck using official pitching stats to project pitchers anymore. We can get an exact count of (for instance) doubles and triples allowed, and to the extent that pitchers have a persistent skill in allowing extra base hits on balls in play, we can use that information to project their runs allowed.

(This also reduces the amount of code needed to run PECOTA, because we can share more code between the hitter and pitcher forecasts. That means less possibility for bugs and more shared improvements between the two sets of projections.)

But what about players whose skills aren’t unique, but their situations are? Tomorrow, we look at how we’re making PECOTA smarter about injuries.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
kantsipr
9/29
I'm not sure whether the data would support this, but it would be interesting to look at the mean square angular distance between batted balls and the nearest fielder on a play-by-play basis. That would give you some measure of a player's ability to "hit the ball where they ain't."

Has anyone done something like that? Is there data to allow it?
mikefast
9/29
The HITf/x data that was made available for April 2009 implied to me that Ichiro did have some ability to hit 'em where they ain't. Other analysts were skeptical that that was anything more than a fluke.

I haven't seen anything in the currently available batted ball location data sets that suggests we have enough precision and accuracy in the data to tell one way or the other.
ObviouslyRob
9/29
I would think that for that kind of ability anything less than a few years' worth of data would be inconclusive.
mikefast
9/29
It shouldn't take a few years worth. When we're relying on cameras rather than subjective observers to tell us where the ball went, we can be a lot more confident in the data. But I agree that I'd like to have a sample bigger than 40 batted balls before I can make a statement with confidence about Ichiro's placehitting skills.
kantsipr
9/30
Cool. It would pretty much have to be done year-to-year to determine whether it is an actual ability rather than a fluke.
mikefast
9/30
Here's a graph showing the HITf/x data for Ichiro that I was talking about.

http://fastballs.files.wordpress.com/2010/09/suzukic01_hitfx_spray_angle.png

There are two lines on the graph. The blue one is the horizontal angle at which Suzuki hit his batted balls in April 2009, grouped in bins five degrees wide. The red one is the BABIP for left-handed batters on batted balls with a vertical launch angle of less than 8 degrees (basically ground balls and borderline line drives). The idea is to find the positions of the infielders. Where the BABIP is lowest, that's the likely position of the infielder, and where the BABIP is the highest, that's the gap between the fielders.

You can see that Ichiro did a pretty impressive job of hitting the gaps. I included all of Suzuki's 42 batted balls, but the pattern doesn't change much if his 10 air balls are excluded.
mikefast
9/30
If I divide the 90 degrees between the foul lines into 45 degrees nearest the infielders and 45 degrees "in the gaps", Ichiro hit 28 balls in the gaps and 14 balls near the infielders.

Whether that is a repeatable skill or not, and if it is, to what extent, I don't know. However it is over three standard deviations from the mean in the binomial distribution, under the assumption that the spray angle is simply random.
mikefast
9/30
Actually, I should probably exclude Ichiro's air balls from those numbers, in which case he hit 23 balls in the gaps and 9 balls near the infielders. That comes in at about 2.5 standard deviations from the mean. (The previous one, with air balls included, was actually closer to 2 std dev, I reported that wrong in the comment above.)

I looked at Ryan Howard, too, and excluding his air balls, he hit 13 balls in the gaps and 11 balls near the infielders, not accounting for any abnormal shifting by the infielders beyond what they normally do for LHB.
brownsugar
9/29
Thanks for the explanation, the updates make perfect sense. Is play-by-play data for the minor leagues an issue? I'm wondering if projecting batting average for minor league players (and rookies) will essentially be done as before because PBP data isn't available.
cwyers
9/29
We have play-by-play data for the minor leagues from '05 through to now, so that's not an issue. Foreign leagues are a larger concern, but we can be clever with some of what we do and approximate the process using only official stats.
georgeforeman03
9/29
Is it some sort of clever data imputation? Essentially creating the play-by-play data based by comparing the official stats to the official stats of players you do have play-by-play data for? If not, that might be one way to do it. :-)
flyingdutchman
9/29
By the way, where is today's PECOTA-adjusted Playoff Odds Report?
BananaHammock
10/01
Cool article, but fewer tables that had more columns would have greatly helped readability. Is readability a word? Percent differences would have been helpful too.