Happy Labor Day! Regularly Scheduled Articles Will Resume on Tuesday, September 2.
June 10, 2009
Checking the Numbers
Despite playing alongside Barry Bonds for several seasons, Pedro Feliz never learned to discipline his bat, entering this season with an abysmal .292 career on-base percentage. Given his antipathy toward taking free passes, it stands to reason that what transpired in a May 12 matchup between the Phillies and Dodgers could induce double-takes from even the most seasoned baseball people. In the bottom of the third, Clayton Kershaw issued a four-pitch walk to Feliz to begin the frame. The very next inning, with nobody out and a runner on second, Pedro held up on a 3-2 offering and earned his second straight base on balls. If Feliz had stopped here, his two-walk performance would still have been a relatively monumental feat for him, as he had batted in 983 games from 2001-08, and walked at least twice in a game on just 14 different occasions. Then, in the bottom of the sixth, with a runner on first, nobody out, and James McDonald in from the bullpen, Feliz took four pitches out of the zone, and trotted down to first for a third consecutive time. An inning later, Feliz stepped up with the bases loaded, and after Jayson Werth opened up a spot on the basepaths with a steal of home plate, Ronald Belisario threw two more balls, giving Pedro his fourth walk of the game.
Now, it's perhaps needless to note that Feliz has never before walked four times in one game, let alone in consecutive plate appearances. In fact, Pedro's entire season to date has been an outlier, as his OBP stood at .357 through 52 games at the time that I began researching this piece. I was curious about just how unlikely these achievements were, so I decided to calculate the probabilities that a player as stubbornly resistant when it comes to taking free passes as Feliz has been would walk four times in as many trips to the plate in a single game, and that a hitter with a career .292 OBP would produce a .357 rate through roughly one-third of the season.
To determine the probability of his four-walk game, Feliz's walk rate must first be isolated from the overall on-base percentage. From 2004-08, when Feliz played in a full-time capacity, he walked just 5.5 percent of the time, among the lowest marks for any regular player during that span. Walking is a yes-or-no proposition, in that a player either walks or he doesn't in a given plate appearance, which lends itself perfectly to a Bernoulli Trial. The Bernoulli determines the probability of an independent event occurring within a given parameter of opportunities, based on prior knowledge of the subject's rate of success in the area being tested. For Feliz, we would be attempting to calculate the likelihood of exactly four walks in four plate appearances given a 5.5 percent success frequency.
In Excel, the formula would be BINOMDIST(4,4,.055,FALSE), where the FALSE signifies usage of the probability mass function as opposed to the cumulative distribution function; the former refers to exactly four walks, while the latter lends itself to four walks at most. One stroke of the "enter" key later, and the answer of 0.0000092 indicates that Feliz has a 0.00092 percent shot of accomplishing this feat in a specific game. According to the Birthday Paradox-the idea that, as a group of people increases in size, the likelihood that any two share a birthday grows stronger-it is much more likely that Feliz would perform his four-walk extravaganza at some point over the course of a season as opposed to within any one specific game. Now, one issue emerges with this type of test in that the probability assumes that Feliz has the same chance of walking in each plate appearance, or that each walk outcome is independent of the others, much like a fair coin always has a 50/50 shot of landing on its tail. For all we know, Feliz may have been on a walking spree leading up to this game, or he could have made adjustments in facing Kershaw that second time through the order. Both of these events could reasonably suggest that his probability of walking grew with each plate appearance. Still, the Bernoulli Trial does a solid job of accomplishing our goal with this study, and for all we know, those four walks could have been independent of one another.
To turn the 0.00092 percent probability in a specific game into the likelihood of its occurrence over a 150-game span, the probability of the event not occurring must first be calculated. 1-0.0000092 = 0.9999908. Raise the new number to the power of 150 to arrive at 0.9986, and subtract from 1.0 to end up at 0.0014, or 0.14 percent. Essentially, based on our knowledge of Feliz's success rate when it comes to bases on balls, he had a minuscule 0.14 percent shot of walking exactly four times in four straight plate appearances in a game at some point during a 150-game campaign. If the same steps are taken but the formula is modified to calculate the probability of exactly two successful events out of four opportunities in any of 150 games over the course of a season, in an area with a success rate of 0.0025, we arrive at a probability of 0.56 percent, exactly four times as likely as Feliz's feat. This modification isn't arbitrary either, as I chose it for comparative purposes to illustrate just how unlikely it was for Feliz to accomplish what he did earlier this season.
The 0.0025 rate of success refers to the quotient derived from dividing Juan Pierre's 13 career home runs by his 5,153 career at-bats entering this season. Juan Pierre is four times more likely to hit precisely two home runs in the four at-bats per game that he has averaged than Feliz is to walk four times in as many plate appearances. Who'd have thought that?
The next step-involving the probability of an observed .357 OBP through 52 games for a player with a career .292 mark-is determinable in a couple of different ways. If a normal distribution is assumed, 68 percent of the data within a set will fall within one standard deviation (SD) of the mean; 95 percent of the data will fall within two SDs of the mean; and 99.7 percent of the data will fall within three SDs of the mean. Baseball data cannot always be categorized as normally distributed, so this method has the potential to produce inaccurate results, but because we'll go over the other method afterwards, comparing the results can prove interesting. To find the aforementioned probability operating under the normal distribution assumption, Feliz's average and SD must be known. The average has already been established at .292, and the SD can be found through the formula SQRT(P × Q/N), where P is the percentage of success, Q is 1-P (the percentage of failure), and N, in this case, stands for the number of plate appearances.
Since the goal here involves the probability of such a high on-base percentage over a 52-game stretch, I first tallied the statistics for Feliz in every career stretch of such length. This method has been discussed in this space before, and is equivalent to utilizing the game-logs summing tool at Baseball Reference. The 52-game stretches did not extend into subsequent seasons, however, and were comprised of around 205 plate appearances on average since Feliz became an everyday player. Plugging the numbers in, P=0.292, Q=0.708, and N=205, and the expected standard deviation turns out to be 32 points. Again, assuming a normal distribution, we could then expect that a little over two-thirds of the 490 different 52-game stretches for Feliz from 2001-08 would feature OBPs ranging from .260 to .324. About 95 percent of these stretches would consist of OBPs ranging from .228 to .356. With five percent of the dataset out of the two-SD range, roughly 2.5 percent would fall below .228, and 2.5 percent would soar past .356. Under a normal distribution and given his history, Feliz would have a 2.5 percent probability of exceeding a .356 OBP in a 52-game stretch as he has done this season.
The data may not necessarily be normally distributed, however, making the binomial distribution the more accurate measure. Through 205 PAs, a .357 OBP could be produced with 73 on-base events. The probability of observing at least 73 successes in 205 opportunities, given a .292 success rate, is equal to one-the probability of at most 72 successes. In Excel, this would be: 1-BINOMDIST(72,205,0.292,TRUE). The probability checks out at 2.4 percent, basically reporting that Feliz should observe a .357 OBP or higher over 52 games about once out of every 40 such stretches. Interestingly enough, over his eight-year career, despite the expected SD of 32 points, Pedro has been consistently poor in reaching base, actually producing a 22-point deviation. Out of the 490 different stretches, Feliz has ranged from .234 to .345, never falling below 1.87 SDs, or above 1.66 SDs.
Next week, I plan on delving deeper into historical OBP spikes akin to what Feliz has achieved so far this season. The method is very similar to our earlier look at Cliff Lee and his sharp uptick in ground balls last season. Can hitters sustain their shiny new OBP rates? Or are vast increases in reaching base even rarer than unexpected growths in ground-ball frequency? For now, the initial goal dealt more with just investigating the rather extreme unlikelihood that someone with Feliz's skill set could defy the odds by this magnitude, which serves as a solid preface to our eventual inquiry into whether or not these outlying on-base percentages are flukes, or a sign of improved future production lurking around the corner.
Special Thanks to Tom Tango, Ben Baumer, and Heiko Todt for keeping me sane throughout my research.