BP Comment Quick Links


June 10, 2009 Checking the NumbersBinomial Feliz
Despite playing alongside Barry Bonds for several seasons, Pedro Feliz never learned to discipline his bat, entering this season with an abysmal .292 career onbase percentage. Given his antipathy toward taking free passes, it stands to reason that what transpired in a May 12 matchup between the Phillies and Dodgers could induce doubletakes from even the most seasoned baseball people. In the bottom of the third, Clayton Kershaw issued a fourpitch walk to Feliz to begin the frame. The very next inning, with nobody out and a runner on second, Pedro held up on a 32 offering and earned his second straight base on balls. If Feliz had stopped here, his twowalk performance would still have been a relatively monumental feat for him, as he had batted in 983 games from 200108, and walked at least twice in a game on just 14 different occasions. Then, in the bottom of the sixth, with a runner on first, nobody out, and James McDonald in from the bullpen, Feliz took four pitches out of the zone, and trotted down to first for a third consecutive time. An inning later, Feliz stepped up with the bases loaded, and after Jayson Werth opened up a spot on the basepaths with a steal of home plate, Ronald Belisario threw two more balls, giving Pedro his fourth walk of the game. Now, it's perhaps needless to note that Feliz has never before walked four times in one game, let alone in consecutive plate appearances. In fact, Pedro's entire season to date has been an outlier, as his OBP stood at .357 through 52 games at the time that I began researching this piece. I was curious about just how unlikely these achievements were, so I decided to calculate the probabilities that a player as stubbornly resistant when it comes to taking free passes as Feliz has been would walk four times in as many trips to the plate in a single game, and that a hitter with a career .292 OBP would produce a .357 rate through roughly onethird of the season. To determine the probability of his fourwalk game, Feliz's walk rate must first be isolated from the overall onbase percentage. From 200408, when Feliz played in a fulltime capacity, he walked just 5.5 percent of the time, among the lowest marks for any regular player during that span. Walking is a yesorno proposition, in that a player either walks or he doesn't in a given plate appearance, which lends itself perfectly to a Bernoulli Trial. The Bernoulli determines the probability of an independent event occurring within a given parameter of opportunities, based on prior knowledge of the subject's rate of success in the area being tested. For Feliz, we would be attempting to calculate the likelihood of exactly four walks in four plate appearances given a 5.5 percent success frequency. In Excel, the formula would be BINOMDIST(4,4,.055,FALSE), where the FALSE signifies usage of the probability mass function as opposed to the cumulative distribution function; the former refers to exactly four walks, while the latter lends itself to four walks at most. One stroke of the "enter" key later, and the answer of 0.0000092 indicates that Feliz has a 0.00092 percent shot of accomplishing this feat in a specific game. According to the Birthday Paradoxthe idea that, as a group of people increases in size, the likelihood that any two share a birthday grows strongerit is much more likely that Feliz would perform his fourwalk extravaganza at some point over the course of a season as opposed to within any one specific game. Now, one issue emerges with this type of test in that the probability assumes that Feliz has the same chance of walking in each plate appearance, or that each walk outcome is independent of the others, much like a fair coin always has a 50/50 shot of landing on its tail. For all we know, Feliz may have been on a walking spree leading up to this game, or he could have made adjustments in facing Kershaw that second time through the order. Both of these events could reasonably suggest that his probability of walking grew with each plate appearance. Still, the Bernoulli Trial does a solid job of accomplishing our goal with this study, and for all we know, those four walks could have been independent of one another. To turn the 0.00092 percent probability in a specific game into the likelihood of its occurrence over a 150game span, the probability of the event not occurring must first be calculated. 10.0000092 = 0.9999908. Raise the new number to the power of 150 to arrive at 0.9986, and subtract from 1.0 to end up at 0.0014, or 0.14 percent. Essentially, based on our knowledge of Feliz's success rate when it comes to bases on balls, he had a minuscule 0.14 percent shot of walking exactly four times in four straight plate appearances in a game at some point during a 150game campaign. If the same steps are taken but the formula is modified to calculate the probability of exactly two successful events out of four opportunities in any of 150 games over the course of a season, in an area with a success rate of 0.0025, we arrive at a probability of 0.56 percent, exactly four times as likely as Feliz's feat. This modification isn't arbitrary either, as I chose it for comparative purposes to illustrate just how unlikely it was for Feliz to accomplish what he did earlier this season. The 0.0025 rate of success refers to the quotient derived from dividing Juan Pierre's 13 career home runs by his 5,153 career atbats entering this season. Juan Pierre is four times more likely to hit precisely two home runs in the four atbats per game that he has averaged than Feliz is to walk four times in as many plate appearances. Who'd have thought that? The next stepinvolving the probability of an observed .357 OBP through 52 games for a player with a career .292 markis determinable in a couple of different ways. If a normal distribution is assumed, 68 percent of the data within a set will fall within one standard deviation (SD) of the mean; 95 percent of the data will fall within two SDs of the mean; and 99.7 percent of the data will fall within three SDs of the mean. Baseball data cannot always be categorized as normally distributed, so this method has the potential to produce inaccurate results, but because we'll go over the other method afterwards, comparing the results can prove interesting. To find the aforementioned probability operating under the normal distribution assumption, Feliz's average and SD must be known. The average has already been established at .292, and the SD can be found through the formula SQRT(P × Q/N), where P is the percentage of success, Q is 1P (the percentage of failure), and N, in this case, stands for the number of plate appearances. Since the goal here involves the probability of such a high onbase percentage over a 52game stretch, I first tallied the statistics for Feliz in every career stretch of such length. This method has been discussed in this space before, and is equivalent to utilizing the gamelogs summing tool at Baseball Reference. The 52game stretches did not extend into subsequent seasons, however, and were comprised of around 205 plate appearances on average since Feliz became an everyday player. Plugging the numbers in, P=0.292, Q=0.708, and N=205, and the expected standard deviation turns out to be 32 points. Again, assuming a normal distribution, we could then expect that a little over twothirds of the 490 different 52game stretches for Feliz from 200108 would feature OBPs ranging from .260 to .324. About 95 percent of these stretches would consist of OBPs ranging from .228 to .356. With five percent of the dataset out of the twoSD range, roughly 2.5 percent would fall below .228, and 2.5 percent would soar past .356. Under a normal distribution and given his history, Feliz would have a 2.5 percent probability of exceeding a .356 OBP in a 52game stretch as he has done this season. The data may not necessarily be normally distributed, however, making the binomial distribution the more accurate measure. Through 205 PAs, a .357 OBP could be produced with 73 onbase events. The probability of observing at least 73 successes in 205 opportunities, given a .292 success rate, is equal to onethe probability of at most 72 successes. In Excel, this would be: 1BINOMDIST(72,205,0.292,TRUE). The probability checks out at 2.4 percent, basically reporting that Feliz should observe a .357 OBP or higher over 52 games about once out of every 40 such stretches. Interestingly enough, over his eightyear career, despite the expected SD of 32 points, Pedro has been consistently poor in reaching base, actually producing a 22point deviation. Out of the 490 different stretches, Feliz has ranged from .234 to .345, never falling below 1.87 SDs, or above 1.66 SDs. Next week, I plan on delving deeper into historical OBP spikes akin to what Feliz has achieved so far this season. The method is very similar to our earlier look at Cliff Lee and his sharp uptick in ground balls last season. Can hitters sustain their shiny new OBP rates? Or are vast increases in reaching base even rarer than unexpected growths in groundball frequency? For now, the initial goal dealt more with just investigating the rather extreme unlikelihood that someone with Feliz's skill set could defy the odds by this magnitude, which serves as a solid preface to our eventual inquiry into whether or not these outlying onbase percentages are flukes, or a sign of improved future production lurking around the corner. Special Thanks to Tom Tango, Ben Baumer, and Heiko Todt for keeping me sane throughout my research.
Eric Seidman is an author of Baseball Prospectus. 25 comments have been left for this article. (Click to hide comments) BP Comment Quick Links JayhawkBill (17771) "The data may not necessarily be normally distributed, however, making the binomial distribution the more accurate measure." Jun 10, 2009 11:08 AM joeboxr36 (37507) Eric, I'm a Giants' fan and have watched every game for years, so I'm very familiar with Feliz. He is the real life Pedro Cerano from the movie Major League, and cannot hold up on any breaking pitch down and away. So when I saw your article I had to delve deeperI have very passively kept track of him since he left San Francisco. I loved your article and the deep thinking it had. With that said... Jun 10, 2009 13:19 PM Joe, last part first, yes, 0.55^4 gives us the same answer as BINOMDIST(4,4,.055,FALSE), but the answer it gives us, 0.0000092 (I rounded up) is the probability in a SPECIFIC game, not any game. Over a 150game season it is more likely to occur at least once than it is likely to occur in a specific game. 10.0000092 = 0.9999908. 0.9999908^150=0.9986. 10.9986=0.0014, or 0.14% chance that in some game over 150, Feliz would walk 4 times in 4 PA. Jun 10, 2009 13:44 PM joeboxr36 (37507) He was a pretty stubborn player in SF, and I doubt he is too. But, since clearly something is new, I would check where in the field his hits are going (I don't know which site shows you this). Is the increase in batting average due to an increase of hits to RF? Jun 10, 2009 14:37 PM bebo23 (46827) Interesting article from a stats application viewpoint, which is appreciated, but not particularly meaningful from a baseball standpoint for many of the reasons joeboxr cited. Feliz has more or less steadily improved his walk rate on 3ball counts over the past several years, but the main feature on May 12 was his seeing strikes on just 4 of 20 pitches, with at least 7 of those not near the zone. Moreover, 3 of the 4 strikes were on 3ball counts. The analysis presumes Feliz was facing a typical ballstrike mix and does not consider the particulars of the situation, a general weakness of all statistical analyses that ignore the interaction between pitcher and batter to assume the opponent is some theoretical guy named League Average. Jun 10, 2009 15:49 PM Yes, over the course of a season, hitters face varying types of accuracy and strategy amongst pitchers. Overall, these varying types tend to even out, but on a pergame basis, can change the probability of anything. Jun 11, 2009 05:37 AM matthewshea (10330) Eric, Jun 10, 2009 15:31 PM Matthew, Jun 11, 2009 03:52 AM breed13 (17737) Eric, Jun 10, 2009 17:11 PM breed, Jun 11, 2009 05:32 AM Scott D. Simon (1384) Eric, thanks for including a wee bit of math instruction for the less mathematically inclined among us. I hope at some point you can take the time to write a series (or book!) on "Mathematics/Excel for Aspiring Baseball Analysts." Jun 10, 2009 19:12 PM Scott D. Simon (1384) One more thing  I would love to learn how to create/manipulate a retrosheet database :) Jun 10, 2009 19:13 PM Something I might be doing in the next few months is going over how to do just this. There is already a really great tutorial out there on how to CREATE a Retro DB, from Colin Wyers, so what I might do is link to that and then, maybe once a month, go over some MySQL jargon to teach how to run some queries and get some data. Jun 11, 2009 03:51 AM molnar (170) There was an article in the New York Times... http://www.nytimes.com/2008/03/30/opinion/30strogatz.html ... that reported on a simulation of the entire history of baseball, looking at the longest hit streak recorded. The conclusion was "streaks of 56 games or longer are not at all an unusual occurrence." Which merely invalidates the simulation; if 56game streaks are not unusual, then why is the secondlongest streak in history only 44? Jun 11, 2009 13:03 PM BrewersTT (1952) It's hard to be sure of the details of the NY Times simulation, but it sounds as though it makes the assumption that every atbat is an independent draw from the same distribution, which is not the case. (This affects this Pedro Feliz study a bit too, but not enough to alter the point that he has a vanishingly small chance of walking four times  if it's really twice your estimate of 1/7 of a percent per season, so what.) The article states that Dimaggio had an 81% of at least one hit in the average game. But this is not accurate for a game against the league leader in fewest hits allowed, or most walks allowed. A handful of lower percentages make the odds of a 56game streak drop quite a bit. Running 10,000 iterations of the model does not introduce into the results any variation that is not represented in the model to start with. Jun 11, 2009 15:09 PM JayhawkBill (17771) It's almost more interesting reading the comments than reading the (excellent) article. Jun 12, 2009 10:43 AM BrewersTT (1952) Eric: Yes, you did discuss the independence issue. What I meant to say earlier was that I don't believe it even matters in your study because the finding is so extreme. Your point about Feliz would hold even if your probability estimate happened to be off by several multiples. Jun 18, 2009 12:23 PM Not a subscriber? Sign up today!

Feliz's four walk game was amazing. The Phillies, due to Manuel and Milt Thompson, have been pretty good regarding getting guys to reach career highs in OBP  sometimes by a lot. Rod Barajas and Aaron Rowand are two great examples but there have been others. Feliz has fit in to that scheme as he drew a career high walk rate last season and an OBP better than .300 for only the second time in his career.
However, his increase in OBP this season is due to him hitting above .300 more than for any ohter reason. He is a career .255 hitter with a .294 OBP. Now he is hitting .305, which accounts for nearly all of his +60 increase in OBP. When his average drops to the .250ish range, his OBP will be in the very low .300s. A slight improvement to his career numbers but not by all that much.
Feliz really has trouble with pitch recognition. He's trying to do better with the Phils it seems, as the Phillies are really an OBP type team. But he really doesn't look like he sees pitches all that well. He does hit them pretty well when he does get one in his zone though.
Another thing that's interesting about Feliz is he has some pretty nice clutch numbers throughout his career. I wonder if a talented hitter who has trouble with the pitch recognition thing (if that's possible  but it seems to be the case here) has some kind of advantage in the clutch.
His overall walk rate is still higher in 2009. Using the latest stats from the DT page we see: 18 BB in 212 PA (in 2009) versus 182 BB in 3490 PA (before 2009)  that is, 0.085 BB/PA in 2009 versus 0.052 BB/PA before. Since we're really interested in the spike in his walk rate (and not his total OBP), the probability to see >=18 BB in 212 PA assuming a binomial distribution with p=0.052 is 3.0%. Accounting for the fact that there is also a statistical uncertainty on the 182 BB in 3490 PA, we find that the probability is 3.4%. I'd still call that interesting.