CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here for forgotten password Click here to subscribe

<< Previous Article
Premium Article Under The Knife: Fade ... (09/29)
<< Previous Column
Reintroducing PECOTA: ... (09/28)
Next Column >>
Reintroducing PECOTA: ... (09/30)
Next Article >>
Premium Article Seidnotes: A Triple Sh... (09/30)

September 29, 2010

Reintroducing PECOTA

The Hits Just Keep On Coming

by Ben Lindbergh and Colin Wyers

Were Ichiro Suzuki represented by Scott Boras, the super-agent might be able to make a more convincing case than usual for his client’s singular, once-in-a-generation talent. Actually, Ichiro-types don’t come along even as often as that, especially in PECOTA’s post-World War II player comparison pool; the baseball gods appear to have both made and broken the mold especially for him.

The Mariners’ NPB import is an outlier in more ways than one, which makes him both a fan favorite and a likely future Hall-of-Famer. Of course, every player could accurately be described as unique, whether because of some aspect of his play on the field, his background, or his choice of breakfast cereal. But Ichiro’s uniqueness is impossible to ignore.

As it happens, some of the very qualities that endear Ichiro to baseball fans render him a persona non grata with the developers of forecasting systems, at least in their professional capacities. In addition to being a great quote, Suzuki has famously managed to collect at least 200 hits for 10 consecutive seasons, a feat that distinguishes him from every other player in history. The traits that have enabled him to amass those remarkable hit totals also mark him as the rarest of roses in and of themselves.

By virtue of his speed, tendency to hit the ball on the ground, and, perhaps, some innate ability to hit ’em where they ain’t, Suzuki has managed to sustain a .357 BABIP over more than 7,000 plate appearances in environments where the “average hitter” musters only a near-.300 figure. In not-unrelated news, Suzuki has led the American League in infield hit percentage for five straight seasons.

The statistical quirks that have made Ichiro such good news for Seattle have doubled as bad news for the accuracy of PECOTA’s projections. Since automated projection algorithms aren’t tailored to individuals, players to whom the normal rules don’t apply (or apply only loosely) present a challenge. Let’s take a look at how PECOTA’s past and present projections for the speedy right fielder stack up to reality. The following table displays Suzuki’s actual stats since suiting up on this side of the Pacific (omitting his rookie season):

Year

PA

AVG

OBP

SLG

2002

728

.321

.388

.425

2003

725

.312

.352

.436

2004

762

.372

.414

.455

2005

739

.303

.350

.436

2006

752

.322

.370

.416

2007

736

.351

.396

.431

2008

749

.310

.361

.386

2009

678

.352

.386

.465

2010

704

.314

.359

.395

AVG

730

.329

.375

.427

For comparative purposes, here’s how PECOTA projected Ichiro in each of our annual publications since the system hit the scene. PECOTA was little more than an apple in Nate Silver’s eye in 2002, so we’ll look at 2003 on:

Year

AVG

OBP

SLG

2003

.306

.368

.419

2004

.309

.351

.423

2005

.311

.355

.415

2006

.308

.343

.406

2007

.310

.354

.398

2008

.304

.346

.384

2009

.292

.338

.359

2010

.322

.375

.426

 

Withholding comment until we’ve presented all the data, let’s take a look at the retroactive forecasts (sans aging adjustments, which we’ll cover later this week) for the same seasons, generated by the latest PECOTA methodology:

Year

AVG

OBP

SLG

2002

.315

.345

.424

2003

.315

.356

.420

2004

.312

.352

.421

2005

.327

.368

.428

2006

.320

.358

.423

2007

.319

.362

.422

2008

.325

.366

.422

2009

.321

.363

.407

2010

.319

.359

.412

AVG

.319

.359

.420

 
In order to put the different implementations of PECOTA on an even playing field, we can compare Ichiro’s MLB stats to those dueling forecasts from 2003-2010, using Ichiro’s actual PA totals for PECOTA-weighting purposes:
 

Results

AVG

OBP

SLG

Actual

.330

.374

.428

New PECOTA

.320

.360

.420

Old PECOTA

.308

.354

.404

 

Actual Ichiro outperforms even the new-and-improved projected Ichiro, but not by much: only 10 points of batting average separate the two. A projection system can’t predict luck, and since some random fluctuation is inevitable, no method can pinpoint batting average infallibly. Ichiro’s true batting-average ability may have remained more or less stable even as his results jumped from as low as .303 to as high as .372, but “new” PECOTA wisely split the difference, never calling for a figure lower than .312 or higher than .327.

So what’s responsible for the improvements in Ichiro’s forecast? As Nate Silver acknowledged several years ago, PECOTA wasn’t doing a great job of grasping the legitimacy of Ichiro’s high batting averages. All high batting averages aren’t created equal, but as Nate lamented about the system’s former failings, “PECOTA thinks that Ichiro is due for a major correction because it thinks he’s like Luis Polonia, and when a hero like Luis Polonia hits .330 or something, it is almost certainly a fluke, a lucky year by a banjo hitter.”

Nate dubbed Ichiro “unique,” but he’s not the only batter whose high BABIPs manage to confound PECOTA on a regular basis. Matt Swartz has written about these “BABIP Superstars” on multiple occasions. Along with Ichiro, the group he identified includes luminaries like Derek Jeter and Joe Mauer, which hasn’t helped to obscure PECOTA’s deficiencies in the BABIP department.

The problem is in projecting batting average in the first place. There’s any number of component skills that contribute to a player’s ability to hit for average – his ability to hit home runs, his ability to make a lot of contact, his ability to leg out a few additional singles. But PECOTA was lumping all of those skills into one catch-all metric, one that is typically subject to a high amount of noise.

So we’ve broken hitting down into a much larger set of component skills than PECOTA has in the past – utilizing play-by-play data from Retrosheet, we can break out things like infield singles and reaching on errors. We can then break this more detailed batting line down into an even more detailed set of components, and project them all independently before combining them into an overall batting line.

This lets us do a better job of projecting players with unique skill sets – by taking a closer look at the variety of skills that make up their batting line, we can do a much better job of identifying the underlying skill and not regressing it away as “luck.”

This also has implications for pitchers – we’re not stuck using official pitching stats to project pitchers anymore. We can get an exact count of (for instance) doubles and triples allowed, and to the extent that pitchers have a persistent skill in allowing extra base hits on balls in play, we can use that information to project their runs allowed.

(This also reduces the amount of code needed to run PECOTA, because we can share more code between the hitter and pitcher forecasts. That means less possibility for bugs and more shared improvements between the two sets of projections.)

But what about players whose skills aren’t unique, but their situations are? Tomorrow, we look at how we’re making PECOTA smarter about injuries.

Ben Lindbergh is an author of Baseball Prospectus. 
Click here to see Ben's other articles. You can contact Ben by clicking here
Colin Wyers is an author of Baseball Prospectus. 
Click here to see Colin's other articles. You can contact Colin by clicking here

Related Content:  Ichiro Suzuki,  PECOTA,  Ichiro

13 comments have been left for this article. (Click to hide comments)

BP Comment Quick Links

kantsipr

I'm not sure whether the data would support this, but it would be interesting to look at the mean square angular distance between batted balls and the nearest fielder on a play-by-play basis. That would give you some measure of a player's ability to "hit the ball where they ain't."

Has anyone done something like that? Is there data to allow it?

Sep 29, 2010 06:15 AM
rating: 0
 
Mike Fast

The HITf/x data that was made available for April 2009 implied to me that Ichiro did have some ability to hit 'em where they ain't. Other analysts were skeptical that that was anything more than a fluke.

I haven't seen anything in the currently available batted ball location data sets that suggests we have enough precision and accuracy in the data to tell one way or the other.

Sep 29, 2010 07:11 AM
rating: 0
 
ObviouslyRob

I would think that for that kind of ability anything less than a few years' worth of data would be inconclusive.

Sep 29, 2010 14:05 PM
rating: 0
 
Mike Fast

It shouldn't take a few years worth. When we're relying on cameras rather than subjective observers to tell us where the ball went, we can be a lot more confident in the data. But I agree that I'd like to have a sample bigger than 40 batted balls before I can make a statement with confidence about Ichiro's placehitting skills.

Sep 29, 2010 14:38 PM
rating: 0
 
kantsipr

Cool. It would pretty much have to be done year-to-year to determine whether it is an actual ability rather than a fluke.

Sep 30, 2010 07:14 AM
rating: 0
 
Mike Fast

Here's a graph showing the HITf/x data for Ichiro that I was talking about.

http://fastballs.files.wordpress.com/2010/09/suzukic01_hitfx_spray_angle.png

There are two lines on the graph. The blue one is the horizontal angle at which Suzuki hit his batted balls in April 2009, grouped in bins five degrees wide. The red one is the BABIP for left-handed batters on batted balls with a vertical launch angle of less than 8 degrees (basically ground balls and borderline line drives). The idea is to find the positions of the infielders. Where the BABIP is lowest, that's the likely position of the infielder, and where the BABIP is the highest, that's the gap between the fielders.

You can see that Ichiro did a pretty impressive job of hitting the gaps. I included all of Suzuki's 42 batted balls, but the pattern doesn't change much if his 10 air balls are excluded.

Sep 30, 2010 11:39 AM
rating: 0
 
Mike Fast

If I divide the 90 degrees between the foul lines into 45 degrees nearest the infielders and 45 degrees "in the gaps", Ichiro hit 28 balls in the gaps and 14 balls near the infielders.

Whether that is a repeatable skill or not, and if it is, to what extent, I don't know. However it is over three standard deviations from the mean in the binomial distribution, under the assumption that the spray angle is simply random.

Sep 30, 2010 11:53 AM
rating: 0
 
Mike Fast

Actually, I should probably exclude Ichiro's air balls from those numbers, in which case he hit 23 balls in the gaps and 9 balls near the infielders. That comes in at about 2.5 standard deviations from the mean. (The previous one, with air balls included, was actually closer to 2 std dev, I reported that wrong in the comment above.)

I looked at Ryan Howard, too, and excluding his air balls, he hit 13 balls in the gaps and 11 balls near the infielders, not accounting for any abnormal shifting by the infielders beyond what they normally do for LHB.

Sep 30, 2010 12:55 PM
rating: 0
 
Randy Brown
(189)

Thanks for the explanation, the updates make perfect sense. Is play-by-play data for the minor leagues an issue? I'm wondering if projecting batting average for minor league players (and rookies) will essentially be done as before because PBP data isn't available.

Sep 29, 2010 07:11 AM
rating: 0
 
BP staff member Colin Wyers
BP staff

We have play-by-play data for the minor leagues from '05 through to now, so that's not an issue. Foreign leagues are a larger concern, but we can be clever with some of what we do and approximate the process using only official stats.

Sep 29, 2010 07:23 AM
 
Matthew Avery

Is it some sort of clever data imputation? Essentially creating the play-by-play data based by comparing the official stats to the official stats of players you do have play-by-play data for? If not, that might be one way to do it. :-)

Sep 29, 2010 10:06 AM
rating: 1
 
flyingdutchman

By the way, where is today's PECOTA-adjusted Playoff Odds Report?

Sep 29, 2010 09:48 AM
rating: -1
 
npullano

Cool article, but fewer tables that had more columns would have greatly helped readability. Is readability a word? Percent differences would have been helpful too.

Sep 30, 2010 18:57 PM
rating: 0
 
You must be a Premium subscriber to post a comment.
Not a subscriber? Sign up today!
<< Previous Article
Premium Article Under The Knife: Fade ... (09/29)
<< Previous Column
Reintroducing PECOTA: ... (09/28)
Next Column >>
Reintroducing PECOTA: ... (09/30)
Next Article >>
Premium Article Seidnotes: A Triple Sh... (09/30)

RECENTLY AT BASEBALL PROSPECTUS
Fantasy Article The Quinton: Draft Setting and the Wisdom of...
Every Team's Moneyball: Baltimore Orioles: U...
Every Team's Moneyball: Miami Marlins: Haste
Fantasy Article Expert League Auction Recap: Tout Wars: Gene...
Premium Article Daisy Cutter: Pedroia Got His Thumb Back, Bu...
Spring Training Notebook: Grapefruit League
Premium Article Some Projection Left: Top 50 Draft Prospects...

MORE FROM SEPTEMBER 29, 2010
Premium Article Under The Knife: Fade To Black Album
Premium Article Kiss'Em Goodbye: Los Angeles Dodgers
Premium Article Prospectus Perspective: Front Fours
Premium Article Prospectus Hit and Run: Disasterpiece Theate...
Premium Article Transaction Analysis: White Sox, Indians
Premium Article On the Beat: Eyeing The Glass Slipper

MORE BY BEN LINDBERGH
2010-10-08 - One-Hoppers: Andy Pettitte's Game Two Starts...
2010-10-01 - Reintroducing PECOTA: The Seven Percent Solu...
2010-09-30 - Reintroducing PECOTA: Aches and Pains
2010-09-29 - Reintroducing PECOTA: The Hits Just Keep On ...
2010-09-28 - Reintroducing PECOTA: Whatever Happened to t...
2010-09-24 - Overthinking It: The Half Giants
2010-09-20 - Premium Article Prospectus Perspective: How Important is the...
More...

MORE REINTRODUCING PECOTA
2011-02-07 - Reintroducing PECOTA: They're Here!
2010-10-01 - Reintroducing PECOTA: The Seven Percent Solu...
2010-09-30 - Reintroducing PECOTA: Aches and Pains
2010-09-29 - Reintroducing PECOTA: The Hits Just Keep On ...
2010-09-28 - Reintroducing PECOTA: Whatever Happened to t...
2010-09-27 - Reintroducing PECOTA: What A Long, Strange T...
More...

INCOMING ARTICLE LINKS
2013-04-08 - Premium Article The Asian Equation: What's at Stake?
2011-05-11 - Premium Article The Asian Equation
2010-10-01 - Reintroducing PECOTA: The Seven Percent Solu...
2010-10-01 - Premium Article Prospectus Perspective: Achieving WARP Speed
2010-10-01 - One-Hoppers: Gomes and the Game of Inches
2010-09-30 - Reintroducing PECOTA: Aches and Pains