Checking the Numbers: Binomial Feliz

June 10, 2009

Despite playing alongside Barry Bonds for several seasons, Pedro Feliz never learned to discipline his bat, entering this season with an abysmal .292 career on-base percentage. Given his antipathy toward taking free passes, it stands to reason that what transpired in a May 12 matchup between the Phillies and Dodgers could induce double-takes from even the most seasoned baseball people. In the bottom of the third, Clayton Kershaw issued a four-pitch walk to Feliz to begin the frame. The very next inning, with nobody out and a runner on second, Pedro held up on a 3-2 offering and earned his second straight base on balls. If Feliz had stopped here, his two-walk performance would still have been a relatively monumental feat for him, as he had batted in 983 games from 2001-08, and walked at least twice in a game on just 14 different occasions. Then, in the bottom of the sixth, with a runner on first, nobody out, and James McDonald in from the bullpen, Feliz took four pitches out of the zone, and trotted down to first for a third consecutive time. An inning later, Feliz stepped up with the bases loaded, and after Jayson Werth opened up a spot on the basepaths with a steal of home plate, Ronald Belisario threw two more balls, giving Pedro his fourth walk of the game.

Now, it’s perhaps needless to note that Feliz has never before walked four times in one game, let alone in consecutive plate appearances. In fact, Pedro’s entire season to date has been an outlier, as his OBP stood at .357 through 52 games at the time that I began researching this piece. I was curious about just how unlikely these achievements were, so I decided to calculate the probabilities that a player as stubbornly resistant when it comes to taking free passes as Feliz has been would walk four times in as many trips to the plate in a single game, and that a hitter with a career .292 OBP would produce a .357 rate through roughly one-third of the season.

To determine the probability of his four-walk game, Feliz’s walk rate must first be isolated from the overall on-base percentage. From 2004-08, when Feliz played in a full-time capacity, he walked just 5.5 percent of the time, among the lowest marks for any regular player during that span. Walking is a yes-or-no proposition, in that a player either walks or he doesn’t in a given plate appearance, which lends itself perfectly to a Bernoulli Trial. The Bernoulli determines the probability of an independent event occurring within a given parameter of opportunities, based on prior knowledge of the subject’s rate of success in the area being tested. For Feliz, we would be attempting to calculate the likelihood of exactly four walks in four plate appearances given a 5.5 percent success frequency.

In Excel, the formula would be BINOMDIST(4,4,.055,FALSE), where the FALSE signifies usage of the probability mass function as opposed to the cumulative distribution function; the former refers to exactly four walks, while the latter lends itself to four walks at most. One stroke of the “enter” key later, and the answer of 0.0000092 indicates that Feliz has a 0.00092 percent shot of accomplishing this feat in a specific game. According to the Birthday Paradox-the idea that, as a group of people increases in size, the likelihood that any two share a birthday grows stronger-it is much more likely that Feliz would perform his four-walk extravaganza at some point over the course of a season as opposed to within any one specific game. Now, one issue emerges with this type of test in that the probability assumes that Feliz has the same chance of walking in each plate appearance, or that each walk outcome is independent of the others, much like a fair coin always has a 50/50 shot of landing on its tail. For all we know, Feliz may have been on a walking spree leading up to this game, or he could have made adjustments in facing Kershaw that second time through the order. Both of these events could reasonably suggest that his probability of walking grew with each plate appearance. Still, the Bernoulli Trial does a solid job of accomplishing our goal with this study, and for all we know, those four walks could have been independent of one another.

To turn the 0.00092 percent probability in a specific game into the likelihood of its occurrence over a 150-game span, the probability of the event not occurring must first be calculated. 1-0.0000092 = 0.9999908. Raise the new number to the power of 150 to arrive at 0.9986, and subtract from 1.0 to end up at 0.0014, or 0.14 percent. Essentially, based on our knowledge of Feliz’s success rate when it comes to bases on balls, he had a minuscule 0.14 percent shot of walking exactly four times in four straight plate appearances in a game at some point during a 150-game campaign. If the same steps are taken but the formula is modified to calculate the probability of exactly two successful events out of four opportunities in any of 150 games over the course of a season, in an area with a success rate of 0.0025, we arrive at a probability of 0.56 percent, exactly four times as likely as Feliz’s feat. This modification isn’t arbitrary either, as I chose it for comparative purposes to illustrate just how unlikely it was for Feliz to accomplish what he did earlier this season.

The 0.0025 rate of success refers to the quotient derived from dividing Juan Pierre‘s 13 career home runs by his 5,153 career at-bats entering this season. Juan Pierre is four times more likely to hit precisely two home runs in the four at-bats per game that he has averaged than Feliz is to walk four times in as many plate appearances. Who’d have thought that?

The next step-involving the probability of an observed .357 OBP through 52 games for a player with a career .292 mark-is determinable in a couple of different ways. If a normal distribution is assumed, 68 percent of the data within a set will fall within one standard deviation (SD) of the mean; 95 percent of the data will fall within two SDs of the mean; and 99.7 percent of the data will fall within three SDs of the mean. Baseball data cannot always be categorized as normally distributed, so this method has the potential to produce inaccurate results, but because we’ll go over the other method afterwards, comparing the results can prove interesting. To find the aforementioned probability operating under the normal distribution assumption, Feliz’s average and SD must be known. The average has already been established at .292, and the SD can be found through the formula SQRT(P × Q/N), where P is the percentage of success, Q is 1-P (the percentage of failure), and N, in this case, stands for the number of plate appearances.

Since the goal here involves the probability of such a high on-base percentage over a 52-game stretch, I first tallied the statistics for Feliz in every career stretch of such length. This method has been discussed in this space before, and is equivalent to utilizing the game-logs summing tool at Baseball Reference. The 52-game stretches did not extend into subsequent seasons, however, and were comprised of around 205 plate appearances on average since Feliz became an everyday player. Plugging the numbers in, P=0.292, Q=0.708, and N=205, and the expected standard deviation turns out to be 32 points. Again, assuming a normal distribution, we could then expect that a little over two-thirds of the 490 different 52-game stretches for Feliz from 2001-08 would feature OBPs ranging from .260 to .324. About 95 percent of these stretches would consist of OBPs ranging from .228 to .356. With five percent of the dataset out of the two-SD range, roughly 2.5 percent would fall below .228, and 2.5 percent would soar past .356. Under a normal distribution and given his history, Feliz would have a 2.5 percent probability of exceeding a .356 OBP in a 52-game stretch as he has done this season.

The data may not necessarily be normally distributed, however, making the binomial distribution the more accurate measure. Through 205 PAs, a .357 OBP could be produced with 73 on-base events. The probability of observing at least 73 successes in 205 opportunities, given a .292 success rate, is equal to one-the probability of at most 72 successes. In Excel, this would be:
1-BINOMDIST(72,205,0.292,TRUE). The probability checks out at 2.4 percent, basically reporting that Feliz should observe a .357 OBP or higher over 52 games about once out of every 40 such stretches. Interestingly enough, over his eight-year career, despite the expected SD of 32 points, Pedro has been consistently poor in reaching base, actually producing a 22-point deviation. Out of the 490 different stretches, Feliz has ranged from .234 to .345, never falling below 1.87 SDs, or above 1.66 SDs.

Next week, I plan on delving deeper into historical OBP spikes akin to what Feliz has achieved so far this season. The method is very similar to our earlier look at Cliff Lee and his sharp uptick in ground balls last season. Can hitters sustain their shiny new OBP rates? Or are vast increases in reaching base even rarer than unexpected growths in ground-ball frequency? For now, the initial goal dealt more with just investigating the rather extreme unlikelihood that someone with Feliz’s skill set could defy the odds by this magnitude, which serves as a solid preface to our eventual inquiry into whether or not these outlying on-base percentages are flukes, or a sign of improved future production lurking around the corner.

Special Thanks to Tom Tango, Ben Baumer, and Heiko Todt for keeping me sane throughout my research.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Eric Seidman

Latest Articles

You need to be logged in to comment. Login or Subscribe

smitty99

6/10

Feliz's four walk game was amazing. The Phillies, due to Manuel and Milt Thompson, have been pretty good regarding getting guys to reach career highs in OBP -- sometimes by a lot. Rod Barajas and Aaron Rowand are two great examples but there have been others. Feliz has fit in to that scheme as he drew a career high walk rate last season and an OBP better than .300 for only the second time in his career.

However, his increase in OBP this season is due to him hitting above .300 more than for any ohter reason. He is a career .255 hitter with a .294 OBP. Now he is hitting .305, which accounts for nearly all of his +60 increase in OBP. When his average drops to the .250ish range, his OBP will be in the very low .300s. A slight improvement to his career numbers but not by all that much.

Feliz really has trouble with pitch recognition. He's trying to do better with the Phils it seems, as the Phillies are really an OBP type team. But he really doesn't look like he sees pitches all that well. He does hit them pretty well when he does get one in his zone though.

Another thing that's interesting about Feliz is he has some pretty nice clutch numbers throughout his career. I wonder if a talented hitter who has trouble with the pitch recognition thing (if that's possible -- but it seems to be the case here) has some kind of advantage in the clutch.

Reply to smitty99

astrophel

6/10

His overall walk rate is still higher in 2009. Using the latest stats from the DT page we see: 18 BB in 212 PA (in 2009) versus 182 BB in 3490 PA (before 2009) - that is, 0.085 BB/PA in 2009 versus 0.052 BB/PA before. Since we're really interested in the spike in his walk rate (and not his total OBP), the probability to see >=18 BB in 212 PA assuming a binomial distribution with p=0.052 is 3.0%. Accounting for the fact that there is also a statistical uncertainty on the 182 BB in 3490 PA, we find that the probability is 3.4%. I'd still call that interesting.

Reply to astrophel

JayhawkBill

6/10

"The data may not necessarily be normally distributed, however, making the binomial distribution the more accurate measure."

Exactly.

I find myself wondering if this article was inspired by the discussion regarding one of this week's BP Idol submissions. In any case, great article. Thanks!

Reply to JayhawkBill

EJSeidman

6/10

Nope! I don't even know which discussion you're referring to, either, but I might go check that out. This was inspired from actually watching Feliz, with my own eyes, walk four times. My jaw dropped.

Reply to EJSeidman

dpowell

6/11

But notice that it didn't make a difference. Just like in the Idol piece you're referring to, Law of Large Numbers takes care of this.

Reply to dpowell

joeboxr36

6/10

Eric, I'm a Giants' fan and have watched every game for years, so I'm very familiar with Feliz. He is the real life Pedro Cerano from the movie Major League, and cannot hold up on any breaking pitch down and away. So when I saw your article I had to delve deeper--I have very passively kept track of him since he left San Francisco. I loved your article and the deep thinking it had. With that said...

I went to mlb.tv and watched all of these at bats for myself.

At bats by pitch sequence, pitches separated by semicolon:

1st AB: easy take; decent take on high FB; same pitch as before; same pitch as before
Analysis: Either Kershaw was struggling with command, or he thinks throwing three straight FB's in the same location will cause Feliz to swing. He's not changing the sight line, easy takes, easy walk.

2nd AB: easy take; take knee high FB K; same pitch as before, just miss for a ball; high FB; knee high change swinging K; FB for ball 4 that could have been called a K or a BB
Analysis: Where's the breaking pitch? Feliz has been around for years, how could Kershaw not throw one? Good at bat, but not challenging for Feliz

3rd AB: knee high FB down and away; same pitch; easy take FB in dirt; same pitch in dirt
Analysis: Anyone could have taken this BB. Again, Feliz has never been a proven bad judge of FB's.

4ab: take K; FB up and away; same as before but worse; take same K as first followed by werth stealing home; FB up and away; FB in dirt
Analysis: Again, EASY takes, and all FB's no less!

Okay, that's a bit of overkill, but I had to see that for myself. Four walks are indeed impressive, especially for a HORRIBLE judge of breaking pitches (Feliz), but most of these were EASY takes: fast balls way of the zone.

So the next question is: what were the odds of Feliz seeing this type of pitching attack before since he's a 6 year veteran? Well, the first thing I'll tell you is I am SHOCKED feliz had 4 ab's without one decent breaking ball. So, can you really compare Feliz's career history batting 7th when Bonds was in the lineup? Was this just a fluke where no Dodger pitchers went to the scouting report (he didn't face anyone with >2 years service time)? Is this an issue of seeing more FB's in general because he's in a historically excellent lineup?

And regarding the issue of having a career high OBP. If Feliz has 550 AB's this year, his BB/AB rate will put him around 50, which would be only 10 BB's higher than his previous career high. That is an improvement, but I don't think it's a Cliff Lee type improvement.

As I wrote, I haven't watched Feliz closely this or last year. However, checking out these four May 12th AB's, the biggest thing i noticed right out of the gates is the lack of breaking pitches. Was that just for the one night, or is this a trend, indicative oh his awesome lineup--one not even remotely comparible to Feliz's time in SF.

Pedro Feliz has changed surroundings. In the end, who knows, maybe he changed his approach--I doubt it. For years the breaking pitch AND the ability to drive the outside pitch to the opposite field was the bane of his baseball existence. I would also research where his hits are going. If there is a spike in opposite field hits, I think you have your answer: he gained the ability to drive the outside pitch whereas before he either fouled it off or made an out with it.

In the end, my difference is not in your statistical calculation, but in your usage of the statistics. First, he's in two very different environments. Also, if Pedro Feliz is the same player he has always been, your analysis is more relevant. If Feliz now has, say, a different hitting approach; his statistical past is not as valid in predicting his future. Raul Ibanez I think will testify to these things.

As a statistical aside, the Bernoulli Trial is kind of overkill. If you have the probability of an independent event, and you want to know the probability of that independent event happening consecutively, you multiply the probabilities. In this instance, it's 0.055*0.055*0.055*0.055 or 0.055^4 = 9.15 x 10^-6 (which is the same answer you gave).

Reply to joeboxr36

EJSeidman

6/10

Joe, last part first, yes, 0.55^4 gives us the same answer as BINOMDIST(4,4,.055,FALSE), but the answer it gives us, 0.0000092 (I rounded up) is the probability in a SPECIFIC game, not any game. Over a 150-game season it is more likely to occur at least once than it is likely to occur in a specific game. 1-0.0000092 = 0.9999908. 0.9999908^150=0.9986. 1-0.9986=0.0014, or 0.14% chance that in some game over 150, Feliz would walk 4 times in 4 PA.

As far as the other points, yes, this is all dependent on the idea that he is the same player. This may or may not be true. If his true walk rate is closer to the 7.2% last year than the 5.2% the years before, everything changes, but I'm not convinced he is that much different.

Reply to EJSeidman

joeboxr36

6/10

He was a pretty stubborn player in SF, and I doubt he is too. But, since clearly something is new, I would check where in the field his hits are going (I don't know which site shows you this). Is the increase in batting average due to an increase of hits to RF?

Reply to joeboxr36

isleykeith

6/10

Interesting article from a stats application viewpoint, which is appreciated, but not particularly meaningful from a baseball standpoint for many of the reasons joeboxr cited. Feliz has more or less steadily improved his walk rate on 3-ball counts over the past several years, but the main feature on May 12 was his seeing strikes on just 4 of 20 pitches, with at least 7 of those not near the zone. Moreover, 3 of the 4 strikes were on 3-ball counts. The analysis presumes Feliz was facing a typical ball-strike mix and does not consider the particulars of the situation, a general weakness of all statistical analyses that ignore the interaction between pitcher and batter to assume the opponent is some theoretical guy named League Average.

Reply to isleykeith

EJSeidman

6/11

Yes, over the course of a season, hitters face varying types of accuracy and strategy amongst pitchers. Overall, these varying types tend to even out, but on a per-game basis, can change the probability of anything.

In fact, I mentioned this in the article, in that determining the probabilities here isn't going to be 100% accurate given that they are derived on the principle that each event is independent of others. If Feliz is facing Greg Maddux, he is going to be MUCH less likely to walk than if facing Oliver Perez. If he faces Kershaw the first time and learns his stuff, that second time through the lineup, Feliz may be more likely to walk based on his prior experience. Nobody is debating that, but this isn't necessarily an analysis of how or why Feliz has done what he has done, just an experiment to see the likelihood of him producing these types of numbers, given the last 8 years of data we have for him, constituting a pretty consistent level of production based on his low SD.

Reply to EJSeidman

matthewshea

6/10

Eric,

This article is a great one when I introduce these subjects next year in my AP Stats class. Some nice, basic introductions, then the math, then a decent explanation of what it all means.

Thanks!

Reply to matthewshea

EJSeidman

6/11

Matthew,

Thank you! When I was in school, I found that relating everything to baseball helped me really relate to the material, and I have no doubt the same would apply to some of your students. I'll search for the one I wrote on Z-Scores, T-Tests, SDs, etc and see if I can link it to you as that might be a good one too.

Reply to EJSeidman

matthewshea

6/11

That'd be great! Thanks!

Reply to matthewshea

breed13

6/11

Eric,

As mentioned above, another great piece... I'm no statistical expert, but I'm not sure that your use of the binomial distribution in the last part does what you think (confirm the calculation that assumes a normal distribution)... My recollection is that for large-ish sample sizes the binomial approximates the normal... So aren't you implicitly assuming a normal distribution in both cases?

Again, great article... thanks.

Reply to breed13

EJSeidman

6/11

breed,

In a large enough sample, the normal distribution is a very solid approximate for the binomial distribution. To determine if the sample is large enough, I use the idea that all data within a set falls within 3 SD. For Feliz, everything is within 2 SD, which is a bit nutty, so the point here was to compare the two. The binomial distribution is more accurate, at 2.4%, but since the sample is large enough and everything falls within 2 SDs, we could have stopped with the normal distribution of 2.5%. If this was just a short-term explanation or determination of the probability, we could have stuck to the normal distribution, but I got into explanation mode and wanted to show both.

Reply to EJSeidman

ssimon

6/11

Eric, thanks for including a wee bit of math instruction for the less mathematically inclined among us. I hope at some point you can take the time to write a series (or book!) on "Mathematics/Excel for Aspiring Baseball Analysts."

Keep up the good work.

Reply to ssimon

ssimon

6/11

One more thing - I would love to learn how to create/manipulate a retrosheet database :)

Reply to ssimon

EJSeidman

6/11

Something I might be doing in the next few months is going over how to do just this. There is already a really great tutorial out there on how to CREATE a Retro DB, from Colin Wyers, so what I might do is link to that and then, maybe once a month, go over some MySQL jargon to teach how to run some queries and get some data.

Reply to EJSeidman

eighteen

6/11

Awesome idea. Thanks, Eric.

Reply to eighteen

molnar

6/11

There was an article in the New York Times... http://www.nytimes.com/2008/03/30/opinion/30strogatz.html ... that reported on a simulation of the entire history of baseball, looking at the longest hit streak recorded. The conclusion was "streaks of 56 games or longer are not at all an unusual occurrence." Which merely invalidates the simulation; if 56-game streaks are not unusual, then why is the second-longest streak in history only 44?

So, while your essay does make a point that needs to be made - namely, that if you have a lot of trials you are bound to see something unusual once in a while - I would suggest that there is some peril in the use of the binomial distribution. Presumably, that's amplified in a simulation of over 100 years, but I think that when you say for example your model gives Feliz a 2.5% chance of having a .356+ OBP over a 52-game span, we have to recognize the limitations of the model; that could easily be more like 5 or 6%. And Feliz is just one player, and OBP is just one variable.

Reply to molnar

BrewersTT

6/11

It's hard to be sure of the details of the NY Times simulation, but it sounds as though it makes the assumption that every at-bat is an independent draw from the same distribution, which is not the case. (This affects this Pedro Feliz study a bit too, but not enough to alter the point that he has a vanishingly small chance of walking four times - if it's really twice your estimate of 1/7 of a percent per season, so what.) The article states that Dimaggio had an 81% of at least one hit in the average game. But this is not accurate for a game against the league leader in fewest hits allowed, or most walks allowed. A handful of lower percentages make the odds of a 56-game streak drop quite a bit. Running 10,000 iterations of the model does not introduce into the results any variation that is not represented in the model to start with.

Factors that the NYT simulation probably cannot address, and that a simple binomial approach also assumes away, but that probably affect a hitting streak significantly, include facing a mix of pitchers (including aces), pitcher adjustments as the streak marches on (including simply not offering anything to hit), occasional excellent opponent defenses, fatigue, weather, mild injury, mental pressure on the hitter as the streak continues, umpires with varying strike zones...anything that makes even a few games harder to get a hit in makes a long streak much less likely.

Reply to BrewersTT

molnar

6/11

yup.

Reply to molnar

EJSeidman

6/12

I literally mentioned in the article that the non-independentness could skew the results.

Reply to EJSeidman

JayhawkBill

6/12

It's almost more interesting reading the comments than reading the (excellent) article.

The Law of Large Numbers suggests when sample size is big enough that a normal distribution will be "close enough" to binomial distribution, but just because np >= 10 it doesn't mean that the distributions will be identical. Eric makes that point in the comments. Furthermore, the discussion of Feliz where the number of trials was just four demanded use of binomial theorem instead of normal approximation.

While the point that plate appearances are not truly independent trials has merit and needs to be (and was) stated, almost every baseball statistic has, at its root, an assumption of independent trials, and it's an assumption that we usually accept. We might question "Wins" and "W-L Pct" as a fair stat for pitchers, but few of us dig into the available tables regarding strength of pitchers faced by batters when evaluating hitters by their OBP. Judging the likelihood of four consecutive walks isn't so great a stretch from one's acceptance of OBP as a statistic.

Once the decision to model plate appearances as independent trials is accepted, using binomial theorem is almost always a better modeling system than normal approximation for exactly the reasons Eric stated in the article. It may be very little better with large enough samples, but it's still better. With smaller samples, it's important to use binomial theorem if it applies.

Reply to JayhawkBill

BrewersTT

6/18

Eric: Yes, you did discuss the independence issue. What I meant to say earlier was that I don't believe it even matters in your study because the finding is so extreme. Your point about Feliz would hold even if your probability estimate happened to be off by several multiples.

JayhawkBill: You're right, and I suppose I may have been pedantic to get into the independence issue. But I don't generally think about or use W/L records or OBP in a probabilistic sense. On the other hand, the NYT study was explicitly trying to prove a radical point probabilistically, and therefore I think that examining its flawed assumptions in this regard is worthwhile, lest we all go around thinking that the study means something.

Reply to BrewersTT

Checking the Numbers: Binomial Feliz

Thank you for reading

Latest Articles

Picking Guys Out of a Lineup 2024 $

Box Score Banter: Dealin’ Dylan Does the Deed B

To Swing and Miss Less is Tough Business $

Do Sophomores Still Slump? $

The Heat Check: Loperfido Looms, Collier Crushing $

Eric Seidman

Latest Articles

Picking Guys Out of a Lineup 2024 $

Box Score Banter: Dealin’ Dylan Does the Deed B

To Swing and Miss Less is Tough Business $