BP Comment Quick Links


May 17, 2009 Prospectus Idol EntryMatthew Knight's Initial EntryBio: Ten years ago I wasn't sure what I wanted to do with my life, but when asked about my dream job I would probably have said "general manager of the Oakland A's". This was a more realistic assessment of my talents than the response for the first 18 years of my life, "centerfield for the Oakland A's." Today, I am even more realistic about my abilities, and a bit less certain about my dream job. I still dream of being involved in baseball, either working in a front office or writing about it. But at 30, I have a Ph.D. in astronomy and a promising research career ahead of me. I have accomplished enough scientifically to think I have a reasonable chance of becoming a tenured professor, and I'm not sure I'd be willing to forsake that, even if Billy Beane came calling. So how to reconcile my two passions? I have been brainstorming ideas for a seminar intended to introduce university students to the basics of sabermetrics. I envision this class using "back of the envelope" calculations that rely on simple algebra to delve into more advanced statistical concepts. Thus, the column I hope to write for Baseball Prospectus would be the framework for the course I hope to teach one day, and perhaps the first draft of a textbook on introductory sabermetrics. Also, one extra bit of impetus: my wife will let me get the MLB Extra Innings package if I work for BP! Entry: Back of the Envelope: Why don't high strikeout hitters ever lead the league in batting average? In 1949 the study of comets was an astronomical backwater. Hardly any new insight into their composition, origin, or evolution had been made in decades. Every once in a while a bright comet captured the public's attention, but comets were not considered interesting science by most astronomers. Then in 1950, a paper by Dutch astronomer Jan Oort revolutionized the field. Using data on only 19 wellobserved comets, Oort made what was essentially a "back of the envelope" calculation (scientific slang for a quick estimate) that showed that there must be an enormous cloud of unseen comets orbiting the Sun at great distances. He argued that these comets had formed near Jupiter and were scattered out to distances of 50,000150,000 astronomical units (one astronomical unit is the distance from the Earth to the Sun). Today, Oort's idea has been repeatedly validated by ever more complex computer simulations. Despite astronomer's inability to observe objects as small as comets at such large distances, the "Oort Cloud" is now fundamental to the understanding of the solar system. In what I hope will become a regular feature on Baseball Prospectus, Back of the Envelope will use simple calculations to illustrate baseball statistics and concepts. This column is intended to ease baseball fans new to statistical analysis into the wonderful world of sabermetrics. I hope that regular readers of BP will find some food for thought in these approaches as well. Like Oort, my aim is to show readers that there is a world of knowledge hidden just out of sight that can be illuminated by a few "back of the envelope" calculations. Finally, I hope to spice up your weekly baseball reading with the occasional bit of scientific history, something that has been sorely lacking since Dan Fox departed BP for the Pirates' front office. This article will investigate why "Three True Outcomes" (TTO) hitters seem to always have low batting averages. A popular topic of discussion on BP, TTO hitters are players with massive strikeout, walk, and home run totals. Why are strikeouts, walks, and home runs dubbed the three true outcomes? Because they are the only occurrences on the baseball field over which the defense has no control. TTO hitters tend to be sluggers who work pitchers deep into counts. Rob Deer is the TTO archetype. Despite high home run and walk totals, more talented hitters (think Albert Pujols) are not considered TTO material because they do not strike out enough. A common bond among TTO luminaries like Deer, Dave Kingman, and Adam Dunn is a low batting average. But why must these sluggers have low batting averages? Shouldn't we expect the occasional league leading 0.350 batting average season from one of them? Let's investigate. For ease of calculation, consider a hypothetical TTO hitter who gets 700 plate appearances and amasses 100 walks and 200 strikeouts (the home runs are irrelevant at this point). 700 plate appearances minus 100 walks means he had 600 at bats. Of these 600 at bats, 200 ended in strikeouts, leaving 400 at bats in which the ball was put in play (for simplicity we'll ignore things like sacrifices, hit by pitches, and reaching on a dropped third strike). Only these at bats can result in hits, so we need to know what the player's batting average on balls in play (BABIP) is to calculate how many hits he got. The percentage of time that a batted ball to end a plate appearance becomes a base hit, BABIP is a statistic that comes up frequently on BP. A surprising finding by Voros McCracken in 1999 was that a pitcher has very little influence on BABIP. Regardless of whether he is Johan Santana or postOakland Barry Zito, once a batter makes contact, most pitchers give up hits about 30% of the time. Thus, a pitcher can expect to allow a BABIP near 0.300, with small fluctuations around that due mostly to luck. Hitters, on the other hand, tend to have some effect on their own BABIP. This is somewhat intuitive since when Manny Ramirez makes contact, he is very likely to hit the ball harder and farther than David Eckstein. Furthermore, players with a lot of home runs will have a higher BABIP since the defense has no chance at catching these balls. As a result, hitters' BABIP have a wider range than pitchers, but rarely exceed 0.350. When they do, analysts tend to regard these as fluky and expect the player's BABIP to regress to the mean over time. Returning to our hypothetical TTO hitter, let's assume he is a particularly good hitter when he makes contact, resulting in a 0.350 BABIP. Multiplying the 0.350 BABIP by the 400 at bats that did not result in a walk or a strikeout yields 140 hits. 140 hits in 600 at bats is a paltry 0.233 batting average. What would it take for our hypothetical TTO hitter to bat a league leading 0.350? He would need 210 hits in those 600 at bats (210/600 = 0.350). Since 200 of those at bats end in strikeouts, he would need an astronomical 0.525 BABIP (210/400). Thus, it is virtually impossible for a hitter who strikes out 200 times to have a league leading batting average. Since this type of player tends to be a weak fielder (illustrated by Dunn's troubles in finding a lucrative contract this off season), why do teams employ them? As mentioned above, TTO hitters tend to hit the ball hard and far when they make contact. Let's round out our hypothetical TTO hitter's stat line with 40 home runs and 25 doubles. Taking these 65 extra base hits out of his 140 hits yields 75 singles. Thus, he had 285 total bases (75 singles + 2*25 doubles + 4*40 home runs) for a slugging percentage (total bases divided by at bats) of 285/600 = 0.475. His on base percentage (hits plus walks divided by at bats plus walks) was (140+100)/(600+100) = 0.343. Our hypothetical TTO hitter's overall batting line (commonly referred to as the "triple slash categories" of AVG/OBP/SLG) is a 0.233 batting average, a 0.343 on base percentage, and a 0.475 slugging percentage. His OPS (On base percentage Plus Slugging percentage) is 0.818, or slightly better than the 0.800 OPS that is generally considered decent. Despite the low batting average, our hypothetical TTO hitter manages to be a slightly above average hitter. Compare this batting line to Russell Branyan's career stats (0.230/0.328/0.484 for an OPS of 0.812) and you can see why Branyan's nickname is "3TO". As is evidenced by Branyan's career (eight different major league teams since debuting with Cleveland in 1998), teams are willing to take a chance on this type of hitter if he can hit for enough power when he makes contact. The thought of taming his strikeouts while retaining the high BABIP is tantalizing. If our hypothetical hitter struck out only 100 times and continued to have a 0.350 BABIP, his batting average would be 0.292. If all the extra base hits were singles, he would hit 0.292/0.393/0.533 and would be an all star caliber hitter (better still if some of those singles become doubles, triples, or home runs). But remaining a productive major league hitter while striking out 200 times a season is difficult. If the player's power slips much below our hypothetical levels, he quickly becomes a sub0.800 OPS hitter with limited defensive ability, the kind of player who doesn't last long on a major league roster. In future installments of Back of the Envelope, I will analyze the benefits of taking a lot of pitches as a hitter, demonstrate why pitchers with high strikeout rates are generally more successful, explain why mediocre players rarely last for 15 seasons in the major leagues, and take a look at why a fielder with more errors might be better than one with fewer errors. 30 comments have been left for this article. (Click to hide comments) BP Comment Quick Links leez34 (40214) Fantastic. I need Back of the Envelope in my life, and it sounds like the type of thing I need to show to my friends and family. It also sounds like next week's theme is right in your wheelhouse! May 17, 2009 18:35 PM mafrth77 (29310) Good article, but I think that Branyan's lack of steady employment is due to his inability to generate enough walks to be an asset offensively. He is more of a two true outcome guy. May 17, 2009 19:23 PM djackson (30868) Between 1995 and 2000, Mark McGwire hit .404 on contact (736 hits, 2546 AB, 722 SO), so it is certainly possible for a big power hitter to have a true talent on contact MUCH higher than .350. May 17, 2009 21:49 PM Brian Cartwright (4519) Any evidence that TTO hitters tend to be poor fielders? The current crop, yes, but I'm not so sure it holds true over time. May 18, 2009 01:10 AM smallflowers (38782) Yup, Rob Deer was a fine fielder. And C.Granderson, who is a borderline case IMO, is an amazing one. May 18, 2009 07:40 AM eamuscatuli (26043) I enjoyed the article in general, but are Invictus and I the only ones disappointed by the fact that the author clearly misunderstands BABIP? I am not trying to pick a fight, but how was this a finalist? May 18, 2009 08:34 AM Richard Bergstrom (36532) I don't know if anyone truly understands BABIP, but it is definitely possible to err more on the side of caution. May 18, 2009 09:18 AM eamuscatuli (26043) Perhaps that was poor phrasing on my part, as I agree that BABIP and DIPS is a difficult thing to understand theoretically. It is another thing entirely to miss a critical component of the formula that derives BABIP home runs. It is akin to calculating batting average on plate appearances instead of at bats. May 18, 2009 09:33 AM eamuscatuli (26043) To illustrate my point, a true .350 BABIP for the hypothetical player stated in the article would result in 166 Hits in 600 At Bats (166 Hits minus 40 Homeruns divided by 600 At Bats minus 40 Homeruns minus 200 Strikeouts). The resulting "triple slash" batting line would be .277/.380/.518, a far cry from the initial .233/.343/.475. Assuming a .350 BABIP, I agree with the author's premise that it would be nearly impossible for a hitter to bat .350 overall with 100 walks and 200 strikeouts in 700 plate appearances; such a player would need to hit 108 homeruns. May 18, 2009 11:09 AM Sam Rothstein (25767) The main question posed by the article (i.e. why do TTO hitters always have a low B.A), has a pretty obvious answer for any regular BP reader. It may be good for new readers as an intro, but usually I expect a little more from BP authors than rehashing basic concepts without a new slant. JMO May 18, 2009 08:48 AM Richard Bergstrom (36532) As an introduction to TTO concepts, I thought the article was wellwritten and easy to read, especially for those unfamiliar with those concepts. I can see the author drew on his experience explaining concepts in astronomy to laymen (or freshmen) and he pulls that off quite well. May 18, 2009 09:18 AM I'm adding my judging comment to each article: May 18, 2009 10:35 AM John Carter (22689) Matthew's concept for a series of these type of articles is promising, but the first delivery was a disappointment. I agree with the comments above that he doesn't quite have a handle on the level of knowledge most BP reader's have. May 18, 2009 11:37 AM Richard Bergstrom (36532) Kinda like calling the sun simply a big ball of fire... May 18, 2009 17:20 PM jacobo2u (29639) At first I couldn't tell if you were in agreement with joe or not. Perhaps that analogy is not apt. May 22, 2009 15:05 PM ephinz (35890) My dreams of a Ryan Howard triple crown seem even more remote now. Back of the Envelope is an excellent concept, and I hope Matt gets a chance to be a regular regardless of the outcome. The Chris Daughtry of BP Idol. May 18, 2009 13:25 PM cttigers (7072) i thought this was a fantastic article, clearly the most analytical and thought provoking of the bunch. I can forgive the BABIP thing one time. He just needs to tighten it up from here. Look forward to reading him in the coming weeks May 18, 2009 13:27 PM jpkand (47821) Can't take much of the article seriously after this line: "Furthermore, players with a lot of home runs will have a higher BABIP since the defense has no chance at catching these balls" May 18, 2009 15:39 PM Evan (47) I can't help but agree. He explicitly ignores HR, but then assumes there aren't any in his BABIP calculations (or he's mistaking BABIP for BA on contact, which is a different thing entirely). May 19, 2009 10:45 AM Richard Bergstrom (36532) It appears the judging was also done on potential. May 19, 2009 11:51 AM mafrth77 (29310) One more issue I don't think that teams are "willing to take a chance" that a player reduces his strikeout rate when they sign a player like Dunn, a TTO guy who is truly limited defensively. I simply think that he was valued according to his ability. Guys like Mike Cameron are valuable assets be cause of their D. Also one a player establishes who he his, his BB rate, K rate and HR rate are among the most predictable things in all professional sports. May 19, 2009 16:33 PM Not a subscriber? Sign up today!

I don't believe HR are considered 'in play' for BABIP purposes; I think it's 'on contact' numbers that include homers. (The mouseover even confirms this, though it's talking about pitchers' BABIP, not hitters.) Over the fence is not 'in play.' So actually, big HR hitters don't have inflated BABIP by virtue of being big HR hitters.
I agree. The math and use of stats are a bit off, but I still like the article and think it holds.
Let's take 650 PA, 110 BB, 160 K, and 40 HR (roughly Adam Dunn's 2005).
There are 540 AB, of which 160 are "TTO" outs and 40 are "TTO" hits, giving a "TTO" batting average of 0.200 (40/(160+40)) comprising about 37% of ABs (200/540). Thus, of the remaining 340 ABs, to reach a benchmark 0.300, 37% * 0.200 + 63% * BABIP = 0.300. A backoftheenvelope calculation requires this player's BABIP to be a pretty high 0.359. To reach his high benchmark of 0.350 the BABIP would need to be an absurd 0.438.
Let's a pitching "average" 0.300 BABIP, which would give us an aggregate average of 0.263 ... not necessarily low, but not high by any means (Dunn batted 0.247 in 2005).
I think the key to the TTO players having a low average is that TTO players use a lot of their official ABs on the TTO results and Ks are going to vastly outnumber HRs, which means of the BIP the TTO player needs a much higher BABIP to overcome the deficit caused by Ks.
For the record, the example above with the 0.300 BABIP would give a player with 0.263 / 0.398 / 0.548 (I ignore things like sacrifices, etc. also and I assume that roughly 25% of BIP hits are doubles, again cribbing Dunn). At 0.946 OPS that's a pretty good player.
That being said, I think it's a great idea to use simple algebra, maybe with a broader comp pool than I used, and some ingenuity and ink to illustrate some of the concepts of baseball we may think "obvious" (or maybe counter intuition!).
I'm going to reply my own post again, but just to provide a the other side. Say you have the same 650 PA and 110 BB, but with a more modest 70 K and 35 HR (BB, not figuring into average, really drop out of this) ... I'm cribbing the K and HR from Pujols here.
Here the TTO average is 0.333 in 19.4% of AB, so the equation is 19.4% * 0.333 + 80.6% * BABIP = 0.300, and now the required BABIP is 0.292 to reach 0.300. To reach 0.350 you'd only need 0.354 BABIP (high, yes, but not the absurd 0.438). If this players bats the "average" 0.300 BABIP his aggregate average works out to 0.306, much higher than our TTO's 0.263. As we look at players further and further from TTO players (Ichiro, I guess?) their average *should* tend towards the relatively high 0.300 average BABIP (the extreme being 0 Ks and 0 HRs, an "average" 0.300 BABIP hitter would bat exactly 0.300). Again, I'm sort of waving away any influence players have on BABIP, but the math remains pretty solid here to me.