BP Comment Quick Links
| Home | Unfiltered | Articles | Newsletter | Statistics | Fantasy | Events | Radio | Glossary | Search |
![]() |
|
|
|
April 21, 2005 Crooked NumbersApril FoolsApril stats are meaningless. OK, that’s not entirely fair. March stats are meaningless, April stats are just misleading. As Joe Sheehan pointed out yesterday, most everyone knows this and understands it, but when you love talking about baseball, no one wants to say “let’s wait until July.” Instead, we qualify all our statements before launching into discussions of Brian Roberts’ home run chase, Tim Hudson’s hard luck, and Edgardo Alfonzo chasing .400. As an exercise in restraint, here are the Best and Worst hitters on April 30, 2004 as ranked by MLVr (min 50 PAs in April and 300 on the year): Batter Year AVG OBP SLG MLVR Barry Bonds 2004 .472 .696 1.132 1.481 Charles Johnson 2004 .333 .458 .875 .848 Lew Ford 2004 .419 .471 .710 .784 Adam Dunn 2004 .328 .538 .750 .767 Sean Casey 2004 .414 .458 .667 .698 Jim Thome 2004 .364 .456 .714 .682 Moises Alou 2004 .361 .400 .735 .645 Manny Ramirez 2004 .388 .448 .647 .617 Laynce Nix 2004 .365 .397 .714 .617 Ron Belliard 2004 .417 .500 .548 .582 ------------ Neifi Perez 2004 .220 .260 .275 -.371 Gabe Kapler 2004 .233 .270 .250 -.380 A.J. Pierzynski 2004 .236 .267 .250 -.385 Luis Rivas 2004 .190 .227 .317 -.391 Tike Redman 2004 .226 .229 .301 -.391 Jimmy Rollins 2004 .183 .263 .268 -.392 Ty Wigginton 2004 .188 .216 .333 -.394 Alex Gonzalez 2004 .182 .222 .312 -.413 Jason Phillips 2004 .162 .275 .221 -.435 Derek Jeter 2004 .168 .255 .232 -.460 While Barry Bonds had already established his dominance, there are quite a few names (Charles Johnson, Laynce Nix, Derek Jeter, Jimmy Rollins) who did not finish the year anywhere near where they began. Similarly, on the morning of May 1 last year, the Red Sox were 15-6, the Orioles 12-9, and the Yanks 12-11. Texas was leading the AL West and the Cardinals were 12-11, a game and a half behind the Astros and Cubs, tied for the division lead at 13-9. Though there are always a few outliers every April, simply dismissing the first month of the season is obviously not the way to go. Games in April count as much as games in September, it’s just that the ones in September have greater implications because the likelihood of various outcomes is vastly different. Much like leverage as it pertains to relievers, games later in the season have an apparently larger bearing on the standings. But a slow April, much like a starter who gets shelled in the early innings, can make those late games meaningless. Similarly with individual player statistics, we can estimate just how meaningful that first month is. There are a couple different ways to do this. The first is to use something called confidence intervals for population proportions (referred to as "p-hat" because the symbol is a "p" with a"^" over it). P-hat allows us to determine how accurate our data is with varying degrees of confidence and ranges. Essentially, based on the sample size, the normal distribution curve, and the value in question, p-hat provides a quick formula to provide a range under which the "true" value lies. The best way to think of p-hat is like a coin. We "know" the coin will land on heads 50% of the time if we flipped it forever, but if we only flip it five times, obviously it’s not going to come up at 50%. As the number of flips increase, the more information we have about the coin and the closer the total proportion of heads flips will be to 50%. There’s a normal curve of outcome distributions with 50% being the most likely (in the middle of the curve) and higher and lower proportions of heads less likely (the tails). Selecting a certain percentage of the area under the curve gives us that much confidence that the "true" likelihood of a heads flip will come up. Using p-hat, we can estimate the minimum and maximum values we need in order to cover the area of the true likelihood. The more times we flip the coin, the tighter the curve gets, and thus the closer the minimum and maximum values get to the mean for a particular confidence level.
|