BP Comment Quick Links

May 17, 2009 Prospectus Idol EntryMatthew Knight's Initial EntryBio: Ten years ago I wasn't sure what I wanted to do with my life, but when asked about my dream job I would probably have said "general manager of the Oakland A's". This was a more realistic assessment of my talents than the response for the first 18 years of my life, "centerfield for the Oakland A's." Today, I am even more realistic about my abilities, and a bit less certain about my dream job. I still dream of being involved in baseball, either working in a front office or writing about it. But at 30, I have a Ph.D. in astronomy and a promising research career ahead of me. I have accomplished enough scientifically to think I have a reasonable chance of becoming a tenured professor, and I'm not sure I'd be willing to forsake that, even if Billy Beane came calling. So how to reconcile my two passions? I have been brainstorming ideas for a seminar intended to introduce university students to the basics of sabermetrics. I envision this class using "back of the envelope" calculations that rely on simple algebra to delve into more advanced statistical concepts. Thus, the column I hope to write for Baseball Prospectus would be the framework for the course I hope to teach one day, and perhaps the first draft of a textbook on introductory sabermetrics. Also, one extra bit of impetus: my wife will let me get the MLB Extra Innings package if I work for BP! Entry: Back of the Envelope: Why don't high strikeout hitters ever lead the league in batting average? In 1949 the study of comets was an astronomical backwater. Hardly any new insight into their composition, origin, or evolution had been made in decades. Every once in a while a bright comet captured the public's attention, but comets were not considered interesting science by most astronomers. Then in 1950, a paper by Dutch astronomer Jan Oort revolutionized the field. Using data on only 19 wellobserved comets, Oort made what was essentially a "back of the envelope" calculation (scientific slang for a quick estimate) that showed that there must be an enormous cloud of unseen comets orbiting the Sun at great distances. He argued that these comets had formed near Jupiter and were scattered out to distances of 50,000150,000 astronomical units (one astronomical unit is the distance from the Earth to the Sun). Today, Oort's idea has been repeatedly validated by ever more complex computer simulations. Despite astronomer's inability to observe objects as small as comets at such large distances, the "Oort Cloud" is now fundamental to the understanding of the solar system. In what I hope will become a regular feature on Baseball Prospectus, Back of the Envelope will use simple calculations to illustrate baseball statistics and concepts. This column is intended to ease baseball fans new to statistical analysis into the wonderful world of sabermetrics. I hope that regular readers of BP will find some food for thought in these approaches as well. Like Oort, my aim is to show readers that there is a world of knowledge hidden just out of sight that can be illuminated by a few "back of the envelope" calculations. Finally, I hope to spice up your weekly baseball reading with the occasional bit of scientific history, something that has been sorely lacking since Dan Fox departed BP for the Pirates' front office. This article will investigate why "Three True Outcomes" (TTO) hitters seem to always have low batting averages. A popular topic of discussion on BP, TTO hitters are players with massive strikeout, walk, and home run totals. Why are strikeouts, walks, and home runs dubbed the three true outcomes? Because they are the only occurrences on the baseball field over which the defense has no control. TTO hitters tend to be sluggers who work pitchers deep into counts. Rob Deer is the TTO archetype. Despite high home run and walk totals, more talented hitters (think Albert Pujols) are not considered TTO material because they do not strike out enough. A common bond among TTO luminaries like Deer, Dave Kingman, and Adam Dunn is a low batting average. But why must these sluggers have low batting averages? Shouldn't we expect the occasional league leading 0.350 batting average season from one of them? Let's investigate. For ease of calculation, consider a hypothetical TTO hitter who gets 700 plate appearances and amasses 100 walks and 200 strikeouts (the home runs are irrelevant at this point). 700 plate appearances minus 100 walks means he had 600 at bats. Of these 600 at bats, 200 ended in strikeouts, leaving 400 at bats in which the ball was put in play (for simplicity we'll ignore things like sacrifices, hit by pitches, and reaching on a dropped third strike). Only these at bats can result in hits, so we need to know what the player's batting average on balls in play (BABIP) is to calculate how many hits he got. The percentage of time that a batted ball to end a plate appearance becomes a base hit, BABIP is a statistic that comes up frequently on BP. A surprising finding by Voros McCracken in 1999 was that a pitcher has very little influence on BABIP. Regardless of whether he is Johan Santana or postOakland Barry Zito, once a batter makes contact, most pitchers give up hits about 30% of the time. Thus, a pitcher can expect to allow a BABIP near 0.300, with small fluctuations around that due mostly to luck. Hitters, on the other hand, tend to have some effect on their own BABIP. This is somewhat intuitive since when Manny Ramirez makes contact, he is very likely to hit the ball harder and farther than David Eckstein. Furthermore, players with a lot of home runs will have a higher BABIP since the defense has no chance at catching these balls. As a result, hitters' BABIP have a wider range than pitchers, but rarely exceed 0.350. When they do, analysts tend to regard these as fluky and expect the player's BABIP to regress to the mean over time. Returning to our hypothetical TTO hitter, let's assume he is a particularly good hitter when he makes contact, resulting in a 0.350 BABIP. Multiplying the 0.350 BABIP by the 400 at bats that did not result in a walk or a strikeout yields 140 hits. 140 hits in 600 at bats is a paltry 0.233 batting average. What would it take for our hypothetical TTO hitter to bat a league leading 0.350? He would need 210 hits in those 600 at bats (210/600 = 0.350). Since 200 of those at bats end in strikeouts, he would need an astronomical 0.525 BABIP (210/400). Thus, it is virtually impossible for a hitter who strikes out 200 times to have a league leading batting average. Since this type of player tends to be a weak fielder (illustrated by Dunn's troubles in finding a lucrative contract this off season), why do teams employ them? As mentioned above, TTO hitters tend to hit the ball hard and far when they make contact. Let's round out our hypothetical TTO hitter's stat line with 40 home runs and 25 doubles. Taking these 65 extra base hits out of his 140 hits yields 75 singles. Thus, he had 285 total bases (75 singles + 2*25 doubles + 4*40 home runs) for a slugging percentage (total bases divided by at bats) of 285/600 = 0.475. His on base percentage (hits plus walks divided by at bats plus walks) was (140+100)/(600+100) = 0.343. Our hypothetical TTO hitter's overall batting line (commonly referred to as the "triple slash categories" of AVG/OBP/SLG) is a 0.233 batting average, a 0.343 on base percentage, and a 0.475 slugging percentage. His OPS (On base percentage Plus Slugging percentage) is 0.818, or slightly better than the 0.800 OPS that is generally considered decent. Despite the low batting average, our hypothetical TTO hitter manages to be a slightly above average hitter. Compare this batting line to Russell Branyan's career stats (0.230/0.328/0.484 for an OPS of 0.812) and you can see why Branyan's nickname is "3TO". As is evidenced by Branyan's career (eight different major league teams since debuting with Cleveland in 1998), teams are willing to take a chance on this type of hitter if he can hit for enough power when he makes contact. The thought of taming his strikeouts while retaining the high BABIP is tantalizing. If our hypothetical hitter struck out only 100 times and continued to have a 0.350 BABIP, his batting average would be 0.292. If all the extra base hits were singles, he would hit 0.292/0.393/0.533 and would be an all star caliber hitter (better still if some of those singles become doubles, triples, or home runs). But remaining a productive major league hitter while striking out 200 times a season is difficult. If the player's power slips much below our hypothetical levels, he quickly becomes a sub0.800 OPS hitter with limited defensive ability, the kind of player who doesn't last long on a major league roster. In future installments of Back of the Envelope, I will analyze the benefits of taking a lot of pitches as a hitter, demonstrate why pitchers with high strikeout rates are generally more successful, explain why mediocre players rarely last for 15 seasons in the major leagues, and take a look at why a fielder with more errors might be better than one with fewer errors. 30 comments have been left for this article.
 
I don't believe HR are considered 'in play' for BABIP purposes; I think it's 'on contact' numbers that include homers. (The mouseover even confirms this, though it's talking about pitchers' BABIP, not hitters.) Over the fence is not 'in play.' So actually, big HR hitters don't have inflated BABIP by virtue of being big HR hitters.
I agree. The math and use of stats are a bit off, but I still like the article and think it holds.
Let's take 650 PA, 110 BB, 160 K, and 40 HR (roughly Adam Dunn's 2005).
There are 540 AB, of which 160 are "TTO" outs and 40 are "TTO" hits, giving a "TTO" batting average of 0.200 (40/(160+40)) comprising about 37% of ABs (200/540). Thus, of the remaining 340 ABs, to reach a benchmark 0.300, 37% * 0.200 + 63% * BABIP = 0.300. A backoftheenvelope calculation requires this player's BABIP to be a pretty high 0.359. To reach his high benchmark of 0.350 the BABIP would need to be an absurd 0.438.
Let's a pitching "average" 0.300 BABIP, which would give us an aggregate average of 0.263 ... not necessarily low, but not high by any means (Dunn batted 0.247 in 2005).
I think the key to the TTO players having a low average is that TTO players use a lot of their official ABs on the TTO results and Ks are going to vastly outnumber HRs, which means of the BIP the TTO player needs a much higher BABIP to overcome the deficit caused by Ks.
For the record, the example above with the 0.300 BABIP would give a player with 0.263 / 0.398 / 0.548 (I ignore things like sacrifices, etc. also and I assume that roughly 25% of BIP hits are doubles, again cribbing Dunn). At 0.946 OPS that's a pretty good player.
That being said, I think it's a great idea to use simple algebra, maybe with a broader comp pool than I used, and some ingenuity and ink to illustrate some of the concepts of baseball we may think "obvious" (or maybe counter intuition!).
I'm going to reply my own post again, but just to provide a the other side. Say you have the same 650 PA and 110 BB, but with a more modest 70 K and 35 HR (BB, not figuring into average, really drop out of this) ... I'm cribbing the K and HR from Pujols here.
Here the TTO average is 0.333 in 19.4% of AB, so the equation is 19.4% * 0.333 + 80.6% * BABIP = 0.300, and now the required BABIP is 0.292 to reach 0.300. To reach 0.350 you'd only need 0.354 BABIP (high, yes, but not the absurd 0.438). If this players bats the "average" 0.300 BABIP his aggregate average works out to 0.306, much higher than our TTO's 0.263. As we look at players further and further from TTO players (Ichiro, I guess?) their average *should* tend towards the relatively high 0.300 average BABIP (the extreme being 0 Ks and 0 HRs, an "average" 0.300 BABIP hitter would bat exactly 0.300). Again, I'm sort of waving away any influence players have on BABIP, but the math remains pretty solid here to me.