May 7, 2003
Lies, Damned Lies
Binomial Distribution (or What the Heck is Up with Miguel Tejada and Alex Gonzalez?)
The baseball season has reached its adolescence. Oh sure, there are the still the occasional temper tantrums, the delusions of grandeur, the fashion faux pas. But the season has been around for long enough that we can't totally dismiss it, even when it mouths off without reason or, convinced of its own invincibility, it pushes its limits a bit too far.
The PECOTA system wasn't originally designed to update its forecasts in real time, but through some creative mathematics we can adapt it to that purpose. In particular, we can evaluate its projections by means of a something called a binomial distribution (geek alert: if you're uninterested in the math here, the proper sequence of keystrokes is Alt+E+F+"Blalock"). The binomial distribution is a way to test the probability that a particular outcome will result in a particular number of trials when we know the underlying probability of an event. For example, the probability of a "true" .300 hitter getting six or more hits in a sequence of 15 at bats is around 27.8 percent. (The binomial distribution's cousin, the Poisson distribution, has a cooler name but is less mathematically robust).
A couple of important objections are going to be raised here. First, the binomial distribution is designed to test outcomes in cases in which there are mutually exclusive definitions of success and failure--for example, "hit" and "out," or "Emmy Nomination" and "WB Network." The measures of offensive performance that we tend to favor don't readily meet that criterion. Second, the binomial distribution assumes that we know the intrinsic probability of an event occurring, as we would with a dice roll or coin flip. But we never really know what a baseball player's underlying ability is--we're left to make a best guess based on his results, presumably coming closer to the mark as the sample size increases.
The first problem has an intriguing, if mathematically sketchy solution in the form of Equivalent Average, which is scaled to take on roughly the same distribution as batting average, even though it accounts for all major components of offensive performance. So, we could test the probability of a "true" .300 EqA hitter putting up an EqA of .400 in 15 plate appearances by assuming that this is equivalent to six successes (40%) in 15 trials. Since I haven't heard any objections, let's roll with it.
Mr. Silver, I have a Mr. Davenport on Line 1.
Tell him I'm out of the office. Big snowstorm here in Chicago.
Nice try, sir.
The second problem--not knowing a hitter's true ability--has a potential solution in the structure of the PECOTA projections themselves. Take a look at this sequence of results:
Even if we can't know a hitter's "true" ability, we can make some pretty good inferences about it based on the outcomes that we have been able to observe. In the example above, it's more than seven times more likely that a .350 hitter will hit .400 or better in a stretch of 15 at bats than will a .200 hitter.
Since PECOTA estimates the probability that a hitter's true ability is at a given level of performance, we are able to adapt it to the task. For example, here's a modified version of Alfonso Soriano's EqA chart. The blue line is the probability distribution on EqA that PECOTA generated for Soriano prior to the start of the season. The red line, derived from the binomial distribution, is the relative probability that Soriano would be able to perform at the level that he has thus far this season (a .351 EqA in 156 plate appearances) if his 'true' EqA were accurately represented by each point along the blue line.
OK, so that explanation might be a little confusing, but the graph makes my point clearly enough: Even though it's early in the season, what we've seen so far tells us that it's dramatically more likely that Soriano's true level of ability lies somewhere toward the more favorable outcomes along the left side of the graph. Odds are that he's a better hitter than we thought he was.
By evaluating the graph above--essentially, 'multiplying' the blue line by the red line--we're able to come up with a revised estimate of a player's true EqA. In the remainder of this article, I'll apply this technique to 12 players--six overachievers and six disappointments--offering pithy explanations for their results or lack thereof along the way. (Those of you who are pining for a review of The Talented Mr. Baldelli will need to wait a couple of days: His performance has been so unusual that we're going to devote an entire column to him).
Just a couple of quick notes on terminology--for each player, I've listed something called a "true" EqA, which is our modified estimate of his underlying level of ability based on the Soriano technique described above. I've also listed a number called "Revised 2003 Projection," which predicts his end-of-year numbers assuming that he'll hit at his "true" EqA for the balance of the season after accounting for his performance to date.
Current EqA: .327 (106 PA) Preseason Projection: .266 Breakout Rate: 22.1% "True" EqA: .279 (+13) Revised 2003 Projection: .288
Patience is a virtue, and for once, I'm not taking about plate discipline. The skills that Blalock displayed at Charlotte and Tulsa in 2001, which merited his placement atop our prospect list prior to the start of last season, are manifesting themselves in terms a great line drive stroke, a high batting average, and developing power at the major league level. Blalock's two best PECOTA comparables are Eric Chavez and Keith Hernandez, both of whom exhibited significant development at age 22. While some slippage is likely--Blalock's .326 EqA is well ahead of even his 2001 level--there is little doubt that he'll be a fine hitter for a long time. If there's a concern, it's that Blalock seems to be taking a cue from Chavez and hitting more like Matlock against lefties. It's worth it for the Rangers to take their lumps and make sure he gets plenty of experience against southpaws, because this is too good a player to have resigned to a platoon arrangement so early.
Current EqA: .351 (156 PA) Preseason Projection: .281 Breakout Rate: 6.4% "True" EqA: .294 (+13) Revised 2003 Projection: .305
Unlike Rocco Baldelli's, Soriano's fine numbers have been accompanied by an improvement in his plate discipline--his strikeouts are down 25% from last year, while his walk rate has nearly tripled. PECOTA still hedges his bets on him a little bit, projecting him to finish with a .305 EqA that would almost exactly duplicate his 2002 numbers, but at the very least, Soriano's continued development this year suggests that the Juan Samuel scenario can credibly be thrown out the window. To the chagrin of Yankee haters everywhere, Soriano is likely to be a very good hitter for a very long time.
As an aside, Soriano has a good chance to become the first player ever to accumulate 800 plate appearances in a season, or at the very least, to best Lenny Dykstra's record of 773 (he's currently on pace for 815 PAs). The combination of Soriano's durability, his place at the top of the order, and--most importantly--an eight-man on base machine hitting behind him, might well be enough to push him over the top.
Current EqA: .322 (131 PA) Preseason Projection: .299 Breakout Rate: 27.6% "True" EqA: .308 (+9) Revised 2003 Projection: .310
Before the start of the season, PECOTA thought that Kearnsie had a one-in-four chance to put up an EqA of .320 or better, so his performance thus far isn't a total surprise. Kearns is a fun player to watch play, and has a fine, well-rounded skill set that perhaps most resembles Dwight Evans, trading a little defense for extra power. His strikeout rate has climbed a bit this year, something he's going to need to correct if he wants to maintain a batting average above .300, but that's picking nits.
Current EqA: .355 (124 PA) Preseason Projection: .275 Breakout Rate: 24.5% "True" EqA: .295 (+20) Revised 2003 Projection: .307
One more young guy. Nick Johnson's comparables list was pretty much split down the middle before the start of the season, including outstanding hitters like Carlos Delgado and Darrell Evans, as well as notorious flameouts like Travis Lee and Pete LaCock. Twenty-four is one of the last ages at which a hitter is likely to show dramatic signs of improvement. Plus, there's the wrist injury to worry about, and the pressure of playing every day in New York. Suffice it to say, this is a big year for Nick the Stick.
His results speak for themselves. We like to say that a good batting eye can lead to better production in other areas, but it can work the other way too--Johnson has increased his walk rate over his prior already good level in part because pitchers now have legitimate reason to fear him. He's clicking on all cylinders, and considering that it's years before he'll be eligible for free agency, he's already one of the most valuable players in the league.
Current EqA: .348 (135 PA) Preseason Projection: .280 Breakout Rate: 17.3% "True" EqA: .291 (+11) Revised 2003 Projection: .301
In an ESPN piece this March, I described Cruz as a classic 'hump' player: He has, at times, flashed excellent power, a strong batting eye, and even dangerous basestealing ability, but he's never put it all together in the same season. At least not until this year.
I think that Cruz's resurgence--or maybe that should be his surgence?--is for real, and so does PECOTA; it gave him a 17% chance of a breakout season this year, a very high mark for an established 29-year-old regular. Think Carl Everett without the mental problems--um, never mind. In any event, Cruz's batting average, walk rate, and power have all improved, and he's become quite a valuable offensive player. He might end up with more RBI than Bonds or something, in which case he'll get Skip Bayless' MVP vote.
Alex Gonzalez (Florida)
Current EqA: .326 (118 PA) Preseason Projection: .212 Breakout Rate: 6.7% "True" EqA: .242 (+30) Revised 2003 Projection: .258
Flounder, flounder in the sea! Come, I pray thee, here to me! For my owner, Jeff Loria Wills he had a Renteria!
Every year, there are a few guys who simply defy rational expectation. PECOTA didn't give Gonzalez much chance of a breakout season; then again, I don't think that anyone did. It's unlikely that Gonzalez is going to maintain anything resembling his current pace--his plate discipline hasn't improved very much--but it's also safe to say that he's made at least some fundamental improvement.
Of course, it's highly unlikely that a "true" .226 EqA hitter--that's where Gonzalez was at for his career entering the season--could sustain a performance anything like his current pace in the first place, even over just five weeks.
There are a couple of competing explanations out there to account for Gonzalez's improvement; Rotowire suggests that he gained upper body strength in rehabbing his shoulder injury last year, while Peter Gammons credits Marlins infield instructor Perry Hill. Go, Sea Bass, Go!
Now for the not-so-goodies:
Current EqA: .225 (76 PA) Preseason Projection: .275 Collapse Rate: 19.1% "True" EqA: .263 (-12) Revised 2003 Projection: .256
Don't panic. His poor numbers have come in a relatively small number of plate appearances, and are mostly driven by a low batting average--put him at .250 or so, keeping in mind that batting average is the most fluky part of offensive performance, and the season totals begin to look a lot more respectable. He may have been playing a little bit over his head last year--even so, he's still a great prospect. That said, I'm inclined to agree that it'd be in the mutual best interest of Teixeira and the Rangers to have him in Triple-A right now.
Current EqA: .191 (93 PA) Preseason Projection: .251 Collapse Rate: 13.2% "True" EqA: .233 (-18) Revised 2003 Projection: .225
Phillips' slump has been a little bit deeper and a little bit longer than Tex's, and PECOTA takes a pretty big chunk off his EqA projection. Having a good balance of skills, as Phillips does, is usually an asset; the problem is that I think that Phillips has oscillated between trying to be a power hitter and more of a line drive guy, and has wound up doing neither particularly well. Keep in mind that he's learning a new position, too. I think he'll probably be just fine once he relaxes, but his stardom is not a sure thing.
Current EqA: .200 (139 PA) Preseason Projection: .285 Collapse Rate: 14.3% "True" EqA: .265 (-20) Revised 2003 Projection: .253
Pretty much the reverse case of Sea Bass. It's safe to say that some of Tejada's failures can be chalked up to bad luck, since he's still hitting for some power; it's also dramatically more likely that, say, a .270 EqA hitter would put up a run like this one than would a .300 EqA hitter:
I canvassed the BP group for a more exciting answer, but the most plausible explanation for Miggy's struggles is also the simplest: Last season was a career year.
Current EqA: .226 (121 PA) Preseason Projection: .293 Collapse Rate: 9.3% "True" EqA: .276 (-17) Revised 2003 Projection: .267
Alfonzo has recovered a little bit recently, but he isn't making anyone forget David Bell, let alone Jeff Kent. Edgardo's approach has changed at the plate this year; he's seeing only 3.5 pitches per plate appearance, as opposed to a career number of 3.9, and has hit the ball on the ground more than half the time, which he hasn't done since 1997 (as I found a couple of weeks ago, flyballs and power numbers are usually generated deeper in the count). It could be that Fonzie is simply pressing, or he could be covering for a lack of bat speed brought on by his back problems. I hate to cite this sort of thing as evidence, especially since the Giants have one of the best medical staffs in the business, but maybe the slow market for Alfonzo this winter meant something after all.
Current EqA: .259 (135 PA) Preseason Projection: .293 Collapse Rate: 19.4% "True" EqA: .285 (-8) Revised 2003 Projection: .280
At what point is it fair to accuse Ichiro of being a little bit stubborn? This is a guy who once hit 25 HR in just 130 games in Japan; despite his small stature, his quick bat enables him to hit the ball a long way when he wants to. And, for that matter, what about using that great plate coverage to take a few more pitches? Ichiro has less reason to fear being behind in the count than almost any other hitter.
It's something of a paradox that speed players age better even though speed is normally the first skill to go. The reason is that fast players are normally good athletes, and are able to compensate for slowing legs with improvement in other areas of their game. Sammy Sosa and Barry Bonds, who use(d) speed as a complementary skill, have aged well; the Roger Cedenos and Vince Colemans of the world didn't. Ichiro has the capacity to change his approach, but seems uninterested in doing so; as he continues to age, his slap-hitting ways will seem ever more gradually out of place, until one day he's Larry Eustachy at a sorority party.
Current EqA: .218 (69 PA) Preseason Projection: .295 Collapse Rate: 29.0% "True" EqA: .268 (-27) Revised 2003 Projection: .259
As I remarked to Will Carroll the other day, Jeremy Giambi is sort of like my Phil Nevin: It's not that I wish him any harm, but since my system was pretty convinced that he was a collapse candidate, I've watched his progress with a piqued interest. I'm more convinced than ever that his approach at the plate is not fundamentally sound: I think that his walks come not entirely out of a proactive strategy to reach base, so much as a side effect of working deep into counts because he doesn't make contact very often. Giambi is a quirky player, and had a large range on his forecast this year, so PECOTA really punishes him for his poor start. He isn't this bad, but a .268 EqA DH who is a major liability on the basepaths is waiver wire material.
(Acknowledgement - though I've been somewhat obsessed with using binomial distributions like this for a few years now, thanks to reader Tim Sackton for putting the idea in my head again).