Lies, Damned Lies: Randomness: Catch the Fever!
5/14Your favorite player hit .360 last season. If you know nothing else, what can you expect him to hit this season? This isn't meant to be a trick question; let's assume the guy had at least 500 at bats in the previous season. Gates Brown and Shane Spencer need not apply. What's your best guess? .350? .340? Not likely. The evidence is overwhelming. Let's look at all hitters since WWII who hit .350 or better in at least 500 at bats; the only other requirement is that they had at least 250 at bats in the year following.
Lies, Damned Lies: Binomial Distribution (or What the Heck is Up with Miguel Tejada and Alex Gonzalez?)
5/07The baseball season has reached its adolescence. Oh sure, there are the still the occasional temper tantrums, the delusions of grandeur, the fashion faux pas. But the season has been around for long enough that we can't totally dismiss it, even when it mouths off without reason or, convinced of its own invincibility, it pushes its limits a bit too far. The PECOTA system wasn't originally designed to update its forecasts in real time, but through some creative mathematics we can adapt it to that purpose. In particular, we can evaluate its projections by means of a something called a binomial distribution (geek alert: if you're uninterested in the math here, the proper sequence of keystrokes is Alt+E+F+"Blalock"). The binomial distribution is a way to test the probability that a particular outcome will result in a particular number of trials when we know the underlying probability of an event. For example, the probability of a "true" .300 hitter getting six or more hits in a sequence of 15 at bats is around 27.8 percent. (The binomial distribution's cousin, the Poisson distribution, has a cooler name but is less mathematically robust). A couple of important objections are going to be raised here. First, the binomial distribution is designed to test outcomes in cases in which there are mutually exclusive definitions of success and failure--for example, "hit" and "out," or "Emmy Nomination" and "WB Network." The measures of offensive performance that we tend to favor don't readily meet that criterion. Second, the binomial distribution assumes that we know the intrinsic probability of an event occurring, as we would with a dice roll or coin flip. But we never really know what a baseball player's underlying ability is--we're left to make a best guess based on his results, presumably coming closer to the mark as the sample size increases. The first problem has an intriguing, if mathematically sketchy solution in the form of Equivalent Average, which is scaled to take on roughly the same distribution as batting average, even though it accounts for all major components of offensive performance. So, we could test the probability of a "true" .300 EqA hitter putting up an EqA of .400 in 15 plate appearances by assuming that this is equivalent to six successes (40%) in 15 trials. Since I haven't heard any objections, let's roll with it.
continue reading chevron_rightchevron_rightLies, Damned Lies: Ticket Prices vs. Player Salaries
4/30You've been hanging 'round these parts long enough. You've heard the party line, once or twice or 20 times: Higher payrolls don't result in higher ticket prices. Correlation is not causation. Salaries don't shift demand curves. It's Economics 101. Simple, textbook stuff. The problem with this line of argument--the problem with a lot of economically-based arguments--is that it's easy to let the theory get ahead of the data. Well, I should state that more precisely: It's easy to let an oversimplified theory get ahead of the data. A lot of what you learn in Economics 102, and Economics 201, and graduate-level classes that I was too busy drinking Boone's Farm to take advantage of, is that much of the theory you master in an intro-level class is based on a particular set of assumptions that can prove to be quite robust in certain cases, and utterly misleading in others. A lot of people shun economics for this very reason--we've all had coffee shop conversations with the scruffy, Skynard-mangling philosophy major who is fond of spewing out faux-profundities about the irrationality of human nature. He's missing the point, of course, but so too is the Ayn Rand-spouting prepster from down the residence hall who conflates assumptions with hard rules. In either case, a little bit of knowledge is a dangerous thing. Economics, though it sometimes harbors pretensions to the contrary, is above all else a behavioral science, and an empirical science. If the theory doesn't match the data--well, it's not the data's fault. This is especially important to keep in mind when evaluating something like ticket prices to baseball games, a commodity that is unusual in many ways. As we've stressed frequently, ticket prices ought to be almost wholly determined by demand-side behavior--the marginal cost of allowing another butt in the seats is negligible. But baseball tickets are unusual in other ways, too: They're very much a luxury good, and their prices are determined by a finite number of decision-makers who may be subject to conflicts of interest. It's certainly worth evaluating the available data to see whether we can put our money where our mouth is.
continue reading chevron_rightchevron_rightLies, Damned Lies: Estimating Pitch Counts
4/23Silicone. Margarine. O'Doul's. Why fool around with watered-down imitations when you've got the real thing ready and available? Rightly or wrongly, a lot of attention has been focused on pitch counts in the past several years. That's partly because of the efforts of people like Rob Neyer, Keith Woolner, and Will Carroll, not to mention those coaches, executives and agents who understand the importance of protecting their golden-armed investments. Pitch counts have become easy to take for granted because pitch count data is more readily available now than it ever was in the past. These days, just about any self-respecting box score lists pitch counts alongside the rest of a pitcher's line, a far cry from the dirty newsprint days of yore, when pitch count references were about as common as mentions of Reality TV or the Information Superhighway. But what about when you don't have pitch count information available? Like, say, you're at a ballgame, and wondering whether Dusty Baker should send Kerry Wood out for another inning? Or you're perusing through minor league stats? Or you're looking at old boxes on Retrosheet, which wonderful as they might be (this, folks, was the first game I ever attended), don't contain any information on pitch counts? Well, it turns out that it's not that difficult to make a reasonable guess at pitch counts based on other information that's much easier to come by. Looking at a complete set of data from the 2001 and 2002 seasons as provided by Keith Woolner, I ran a simple linear regression of pitches thrown against various other characteristics of a pitcher's stat line. Here was the formula that I came up with:
continue reading chevron_rightchevron_rightLies, Damned Lies: Strikeout Rate, Redefined
4/16In last week's 6-4-3, Gary Huckabay wrote about the fact that our perceptions are more often colored by the way information is presented than by the substance of the information itself. There are plenty of examples of this, drawn both from the ballpark and the world at large. Get your hands on most any media guide, and you're sure to see the familiar rotisserie categories--batting average, home runs, RBI--presented prominently in bold face. Now, a typical media guide runs about 400 pages, and there's plenty of information to go around, ranging from the trivial (Mike Lincoln's career ERA at Busch Stadium is 14.29) to the frivolous (Joe Borowski's wife is named Tatum). Thus, should it really be that much of a surprise to find out that in the thick of that pulp forest, the people who rely on media guides to grab information on the fly--like beat writers pushing on a deadline and radio announcers trying to keep a cadence--gravitate toward those bits of knowledge that are literally staring them boldly in the face?
continue reading chevron_rightchevron_right