Jonathan JudgeSearch Articles by Jonathan JudgeAll Blogs (including podcasts) Active Columns Authors Article Types Archives
March 9, 2017 6:00 am
What if you could have a metric that accurately describes what a pitcher did while also reliably forecasting the skills that pitcher would bring to the future? Two years ago, I wrote the first DRA essay, focusing on the challenge of modeling descriptive versus predictive player performance. At the time, my prognosis for threading that needle was rather grim: What is it, exactly, that you want to know? For example: (1) Do you care primarily about a pitcher’s past performance? (2) Are you more worried about how many runs the pitcher will allow going forward? (3) Or do you want to know how truly talented the pitcher is, divorced from his results this year or next? The reader’s likely response is: “I’d like one metric that excels at all three!” Sadly, when it comes to composite pitcher metrics, this might not be possible. January 27, 2017 6:00 am
Where do we go from here? It has become fashionable to bemoan the absence of novel, raw baseball data on which the next generation of wouldbe analysts can hone their skills. In the case of Statcast, that certainly describes both the status quo and the foreseeable future, as far as public analysis is concerned. However, Statcast isn’t the only potential source of fresh baseball data. This week, we’d like to think we have made at least a small contribution along these lines: by reviewing our newlyreleased data bearing upon pitcher command, control, pitch tunnels, and pitch sequencing, both novice and seasoned analysts can unleash their creativity and hopefully teach the baseball community a thing or three. That said, this data presents some rather unique challenges that might be overlooked in the rush to “see what Excel can do” or apply the trendiest machinelearning technique. So, while I encourage readers to do whatever they want with our new data, I will also start you off with a few words of advice. Inference Versus Prediction First, effectively using tunnels data will almost certainly require you to appreciate the distinction statisticians make between “inference” and “prediction.” By “inference,” statisticians describe the process of isolating predictors that tend to be associated with certain outcomes. This usually occurs by isolating certain coefficients in a regression or classification problem, and exploring whether they are consistently meaningful. Examples of inference would be comparing a new drug to a placebo in preventing disease, or in the baseball context, looking at the effect of ballparks on runscoring. In both cases the outcome is important, but it is not the focus of the investigation. “Prediction,” on the other hand, is not particularly concerned with the precise contribution each input makes to an outcome. Rather, prediction seeks to forecast the outcome as correctly as possible as often as possible. Many baseball models tend to focus on prediction, deriving an “expected” rate of some event or another, such as a batter’s home run rate or a pitcher’s strikeout rate. Prediction is right in the wheelhouse of your most advanced machinelearning algorithms, which tend to build the shiniest, blackest box imaginable in exchange for terrific results. You often don’t really know how the algorithm got there; all you know is that it did a great job—whatever the hell it did.[1] January 26, 2017 6:00 am
Kyle Hendricks might be a lot closer to Greg Maddux than he thinks.
One of the challenges of bringing BP's new pitching data to light is figuring out whether it’s useful and how we can leverage it to better understand what is happening on the field. As mentioned previously, we look at this in much the same way we look at pitch movement or velocity; we need to figure out how these tunnels data points interact with other components of a player’s performance to unlock a deeper understanding of what is happening. Cubs righthander Kyle Hendricks is a perfect subject to start with. As we mentioned in "Two Ways to Tunnel," Hendricks has some of the smallest pitch tunnels in all of baseball. Hendricks is often compared to Greg Maddux (including by us!), and we can see how he is in fact like Maddux in certain respects. It gives us an idea of how he’s successful, but only an abstract one. That is, we rationalize Hendricks’ success because we’ve seen Maddux do it before, but we don’t really know how all of the moving pieces come together. In order to better understand how Hendricks is successful, we’ll have to dig into some of our new data to see what that can tell us about how he pitches. Hendricks has steadily learned how to strike out opposing batters, increasing his K% by 55 percent from 2014 to 2015 and 2016, and it’s clear the effect that has had on his game. In fact, Hendricks’ newfound ability to strike batters out has resulted in him becoming one of the best pitchers in baseball as he has posted a sub3.50 DRA over each of the past two seasons despite getting dinged for pitching (and winning an ERA title) in front of an elite defense. January 25, 2017 6:00 am
Tunneling from Greg Maddux and Barry Zito to Kyle Hendricks and Rich Hill, and everything in between.
The new pitch tunnels data released by Baseball Prospectus gives us a new glimpse into the repertoires of pitchers across the major leagues. Of course, this data is only as useful as the analysis it helps produce. To showcase how pitch tunnels data can help us better understand the success, or lack thereof, of certain pitchers, we’ll need to better understand how pitch tunnels manifest themselves in the real world. The title of this article— “Two Ways to Tunnel”—already signals that there isn’t a onesizefitsall approach to this new data. While game theory might suggest that each individual pitcher has an optimal approach (or approaches), there can be dramatic differences in how different pitchers attack majorleague hitters. As such, we should look at this tunnels data much like we would PITCHf/x data. It’s descriptive, and there are many ways to interpret and utilize the data. We’ll use modern pitchers to explain these concepts with requisite data, but first it’s worth revisiting a historical example. Jeff Long's very first post for BP over two years ago included the following quote about Greg Maddux, the patron saint of tunneling (yes, we know the majority of this quote is included in the introductory post about pitch tunnels, but it’s so good that it merits inclusion once again): January 24, 2017 6:00 am
Greg Maddux was on to something, whether he knew it or not. One day I sat a dozen feet behind Maddux’s catcher as three Braves pitchers, all in a row, did their throwing sessions sidebyside. Lefty Steve Avery made his catcher’s glove explode with noise from his 95mph fastball. His curve looked like it broke a footandahalf. He was terrifying. Yet I could barely tell the difference between Greg’s pitches. Was that a slider, a changeup, a twoseam or fourseam fastball? Maddux certainly looked better than most college pitchers, but not much. Nothing was scary. Afterward, I asked him how it went, how he felt, everything except “Is your arm okay?” He picked up the tone. With a cocked grin, like a Mad Dog whose table scrap doesn’t taste quite right, he said, “That’s all I got.” Then he explained that I couldn’t tell his pitches apart because his goal was late quick break, not big impressive break. The bigger the break, the sooner the ball must start to swerve and the more milliseconds the hitter has to react; the later the break, the less reaction time. Deny the batter as much information—speed or type of lastinstant deviation—until it is almost too late.  "Greg Maddux used methodical approach to get to Cooperstown" by Thomas Boswell Greg Maddux may have known about the concept of pitch tunnels. He may not have. Regardless, he knew how to put the concept into practice, and really that’s the important part. Maddux: January 23, 2017 6:00 am
Introducing new tools to evaluate command and control through the lens of strikes. About a year and a half ago, Baseball Prospectus revealed a suite of catching stats that formed the basis for our industryleading valuation of catchers. These new stats would shape how we perceived and discussed catcher value, but they also opened the door to better understanding the performance of pitchers. Two key statistics—CSAA and CS Prob—serve as the basis for the pitch framing portion of our catching metrics. Today, we’ll show how those same statistics can tell us a great deal about pitching as well. CS Prob was initially introduced in 2014 with Harry Pavlidis and Dan Brooks’ first catcher framing model. Early the next year, Jonathan Judge joined the effort and the team introduced CSAA, officially moving our framing models beyond WOWY. Of the two, CS Prob—short for Called Strike Probability—is the more straightforward: the likelihood of a given pitch being a strike. CS Prob goes beyond what the strike zone ought to be and instead reflects what it is: a set of probabilities that depends on batter and pitcher handedness, pitch location, pitch type, and count. Good pitchers understand that while the strike zone is a dynamic construct, it nonetheless has some consistencies depending on which combinations of these factors are present. We calculate CS Prob for every pitch regardless of the eventual outcome. The other statistic, CSAA, stands for Called Strikes Above Average; a measure of how many called strikes the player in question creates for his team. In the case of catchers, we isolate the effects of the pitcher, umpire, and other situational factors which allows us to identify how many additional called strikes the catcher is generating, above or below average. For catchers, this skill is commonly described as “framing” or, in more polite company, “presentation.” For pitchers, we can apply a similar methodology—controlling for the catcher, umpire, etc. to identify the additional called strikes created by the pitcher. CSAA is calculated only on taken pitches, an important nuance. A pitch must be taken in order to be eligible to be called a strike by the umpire, so while CS Prob looks at all pitches, CSAA only takes into account pitches where the outcome is left up to the umpire. What can these two statistics tell us about pitcher performance and skill? First, we should define a few important things: January 9, 2017 6:00 am
Have we been underrating the value catchers add via blocking skills? About this same time last year, I was in the midst of a trial in West Virginia when I got to thinking about wild pitches, as one does. In doing so, I realized that modeling passed balls and wild pitches as simple binomials—as we had been doing—did not fit the data as well as it should. To address the problem (or so I thought), I tweaked the parameters, recognized that a Poisson distribution seemed to be a better fit, and remodeled them accordingly. However, in reviewing those revised numbers after this season, Harry Pavlidis and I came to the same conclusion: our predicted numbers were still not quite right. Specifically, they are too low. In raw numbers, catchers tend to be worth anywhere from plus or minus five runs a season when it comes to blocking, but our models were giving them credit for only about one or two runs above or below average. Why were our models still underestimating the value of pitch blocking? The answer is that wild pitches follow an even more complex distribution than I had thought. Specifically, what I had decided to be a simple Poisson distribution was in fact a mixture distribution. Mixture distributions, in turn, require a more sophisticated approach. To understand mixture distributions, we need to start with nonmixture distributions and work our way up. The most famous probability distribution, typically described as the normal distribution, or “bell curve”, looks like this: November 14, 2016 6:00 am
Will Cy Young voters again be fooled by the Cubs' defense? We’ve reached awards season, with the Cy Young—designated for the best pitcher in each league—due to be awarded this coming week. In the National League, the named finalists are two Cubs (Kyle Hendricks, Jon Lester) and one National (Max Scherzer). Here is how they compare on various measures of pitcher quality: July 22, 2016 3:30 pm
More innerworkings of DRA 2016. A few weeks ago, BP author Rob Mains inquired about what he saw as a possible bias in Deserved Run Average (DRA) values in favor of flyball pitchers, and against groundball pitchers. Specifically, he observed that groundball pitchers were doing worse in DRA, on average, than they were in Runs Allowed per 9 innings (RA9). July 22, 2016 6:00 am
With DRA, solving BABIPand other reasons to be excited about what we're measuring. As many of you know, we updated the formulation of Deserved Run Average (DRA) once again for the 2016 baseball season. We gave you the overview of the changes here, discussed the innards here, and talked about the new runscaling mechanism here. This last article deals with arguably the most important question of all: What, exactly, is DRA trying to tell you? And what does it mean? Last year, DRA was focused on being a “better” RA9. After running one overall mixed model to create a value per plate appearance for each pitcher, we ran a second regression, using multiadaptive regression splines (MARS), to model the last three years of relationships between all pitcher value rates and parkadjusted pitcher linear weights allowed. The predictions from this second regression took each season’s mixed model results, forced them back into a runsallowed framework, and then converted PAs to IPs to get DRA. This approach did succeed in putting DRA onto an RA9 scale, but in some ways it was less than ideal. First, having moved one step forward with a mixed model, we arguably were taking a half step back by reintroducing the noisy statistics—raw linear weights and, effectively, RA9—that we were trying to get away from in the first place. The results were generally fine: Good pitchers did well, bad pitchers did poorly, and there were defensible reasons why DRA favored certain pitchers over others when it disagreed with other metrics. But, the fact that something works reasonably well is not, by itself, sufficient to continue doing it. Second, this approach forced us to make DRA an entirely descriptive metric with limited predictive value, since its yardstick metric, RA9, is itself a descriptive metric with limited predictive value. This did allow DRA to “explain” about 70 percent of sameseason runscoring (in an rsquared sense), which was significantly more than FIP and other metrics, but also required that we refer readers instead to cFIP to measure pitcher skill and anticipated future production. May 23, 2016 6:00 am
DRA in depth: Finding a runexpectancy curve that would eliminate the negative DRA. This is the second in a series of articles explaining in depth the updated formulation of Deserved Run Average. The overview can be found here, and Part I of the indepth discussion of the revised approach can be found here.
Call me Jonathan. For most of this offseason, my (entirely metaphorical) White Whale was baseball’s run expectancy curve; the distribution, if you will, between the minimum and the maximum number of runs yielded by pitchers per nine innings of baseball. Why would something so seemingly arcane be so very important to me? Let’s start with some background on run expectancy. In 2015, for pitchers with at least 40 innings pitched, their ERAs ranged from .94 (Wade Davis) to 7.97 (Chris Capuano). In more prosperous times, such as the 2000 season, pitcher ERAs at the same threshold ranged from 1.50 (Robb Nen) to 10.64 (Roy Halladay). For something more in the middle, we can turn to 1985, when a starter (!), Dwight Gooden, had the lowest ERA at 1.53, and Jeff Russell topped things off at 7.55. Here’s what those seasons look like on a weighted density plot, side by side: May 17, 2016 6:00 am
What you need to know before your sweeping take about a player's exit velocity. Note: Baseball Prospectus has removed the leaderboards mentioned in this article. Thank you for your interest in our work and for your patience as we attempt to resolve this issue. Last year, the folks at MLB Advanced Media started publishing what is commonly described as “exit velocity”: the pace at which the baseball is traveling off the bat of the hitter, as measured by the new Statcast system. As a statistic, exit velocity is attractive for several reasons. For one thing, it is new and fresh, and that’s always exciting. It also makes analysts feel like they are traveling inside the hitting process, and getting a more fundamental look at a hitter or pitcher’s ability to control the results of balls in play. However, we’ve seen many people take the raw average of a player’s exit velocities and assume it to be a meaningful indication, in and of itself, of pitcher or batter productivity. This is not entirely wrong: Raw exit velocity can correlate reasonably well with a batter’s performance. But this use of raw averages also creates some problems. First, if you use exit velocity as a proxy of player ability, then you must also accept that one player’s exit velocity is a function of his opponents, be they a batter or pitcher. Put more bluntly, a player’s average exit velocity is biased by the schedule of the player’s team. Second, and much more importantly, we have concluded Statcast exit velocity readings, as currently published, are themselves biased by the ballpark in which the event occurs. This goes beyond mere differences in temperature and park scoring tendencies. In fact, it appears that the same player generating the same hit will have its velocity rated differently from stadium to stadium, even if you control for other confounding factors.
