keyboard_arrow_uptop

If you’ve gotten this far, you’re interested not only in what DRA purports to do this year, but also in how it works. Here, we’ll get into some of the details, although I’ll continue to avoid math and speak about the issues conceptually instead.

* * * * * *

Assumptions Made, and Evaluated

In 2015, Deserved Run Average (“DRA”) was based on a single, linear mixed model that looked at various predictors of baseball batting events, using their respective linear weights as the output. This one model considered everything at once: routine outs, hit batsmen, home runs, intentional walks, and the rest. The benefits of this approach included: (1) the model was fast, (2) the model provided reasonable results, and (3) for those inclined to look under the hood, the general strategy was clear.

But this approach had some potential downsides too. First, by modeling every type of batting event at once, the 2015 model had to average the effect of relevant predictors — like temperature, catcher framing, and stadium — simultaneously over every type batting event, even though these predictors are not equally relevant to all of them. For example, temperature is not material to strikeouts, and stadium has little to do with walks. This isn’t a huge deal, because the model averaged the overall effect of those predictors, with each batting event contributing its share. But over the course of the offseason, we found ourselves wondering if a more granular analysis might be beneficial.

Second, as a linear mixed model, the previous DRA model assumed a continuous and normal distribution of batting events. This was a reasonable, but also a complicated assumption.

Speaking generally, the values we were using as outputs — linear weights for different batting events — are categories of human creation, not natural numbers. Ordinarily this would result in a categorical a/k/a “multinomial” distribution that is not suited for normal distribution treatment. Fortunately, while baseball linear weights represent categories, those categories have a definite hierarchy of value: an out is about -.25 runs, a home run is worth about 1.4 runs, and so on. As such, we end up with an ordinal data set. Ordinal data sets are those containing a natural rank between categories, and once they reach a certain number of levels, it is appropriate to just treat them as continuous.

But this underlying continuity did not get us completely out of the woods. Linear weights may be functionally continuous, but they are not at all normally distributed. Most batting events do not result in the middle of the value spectrum: walks, singles, and hit-batsmen. Rather, the vast majority of baseball events are outs, meaning that the distribution of linear weights is substantially and positively skewed. Is this a problem? Technically, yes. On the other hand, violating the assumption of normality is pretty routine in statistical analysis, and regression tends to be fairly robust to violations of that assumption.

So, while last year’s approach was defensible, it certainly would be nice to find a more natural statistical framework.

Our New Binomial World

This offseason, I spent way too many hours trying several different approaches to see if any alternative methods might improve DRA. Many were plausible, but all managed to create new logistical or substantive challenges. And then I went to the SABR Analytics seminar this past March, where I got to meet Jim Albert, of Bowling Green State University, and also got to catch up with Scott Powers of Stanford. Jim (whose work is a must-read for baseball quants) gave a presentation on the use of binomials to better approximate traditional statistics like batting average. Scott and a colleague gave a nifty presentation comparing the performance of shrinkage methods (like our mixed model approach), regression to the mean, and a ridge function.

As I headed home, the idea of using binomials took hold. A multinomial, after all, is mathematically a series of binomial comparisons. Statistics like weighted on base average (wOBA) take advantage of the binomial in trying to solve their multinomial formulas. I decided we could take the same approach in setting up component models. Moving to binomials had the potential to solve many issues at once: a binomial makes for a simple distribution, and modeling components separately allows us to include only the predictors actually relevant to each batting event.

So that’s what we’ve done. Where we once had one linear mixed model, we now have 24. Five of them are for hits (home runs, triples, doubles, infield singles, and outfield singles). Four are for not-in-play events (unintentional walks, intentional walks, hit-batsmen, and strikeouts. For single-out plays, we separately modeled putouts at each position, which makes for 9 more models. And finally, we modeled double-plays that began with 1 of the 6 infielders.

But how do you find the best predictors for each model? Last year we used the Akaike Information Criteria (AIC) and likelihood-ratio tests to find the best combinations. These are time-honored methods for evaluating mixed models, but they also test only in-sample data and make their own assumptions about how the data might be organized.

This year we decided to step up our game in two ways. First, while we still kept an eye on AIC and likelihood-ratio tests, we also moved to using 10-fold cross-validation, employing a random sample of half of the 2015 season as our testing ground. To test out of sample, we moved to using the Receiver Operating Characteristic Curve, commonly described in machine learning as the Area Under the Curve or AUC. Pioneered by World-War II radar operators, and then extended to statistics and other fields, AUC measures a model by the likelihood of returning false positives versus false negatives. A worthless score is .5, indicating that the outcomes of your model are essentially a coin toss. A perfect score is 1. Given the amount of random variation in baseball, our goal was to exceed at least .6 for each modeled event, and some models exceeded that threshold by a substantial amount.

A listing of the models we are using, along with the current parameters and AUC score, is provided in the table below. Parameters with asterisks are random effects.

Model

Parameters

2015 AUC

Data

HR

role, framing, bats, throws, temperature, pitcher*, stadium*, pitcher-hitting*, batter*, catcher*

0.65

All

Triple

IF-fld, batter*, stadium-bats*, Pos_3* Pos_4*, Pos_7*, Pos_8*, pitcher*

0.83

All

Double

pitcher*, batter*, Pos_4*, Pos_5*, Pos_7*, Pos_8*, stadium-bats*, IF-fld, inning_10

0.81

All

Single IF

bats, throws, pitcher*, batter*, Pos_3*, Pos_4*, Pos_5*, Pos_6*

0.83

All

Single OF

bats, throws, pitcher*, batter*, Pos_5*, Pos_7*, Pos_8*, Pos_9*

0.88

All

UIBB

batter*, pitcher*, Pos_2*, pitcher-hitting*, base-outs*, bats, throws, framing, TTO, home_team

0.59

All

IBB

bats, throws, role, inning_10, score_diff, pitcher*, open_1B_outs*, batter*, Pos_2*, fld_team*

0.9

All

HBP

bats, throws, batter*, pitcher*, base-outs*, fld_team*

0.69

All

SO

batter*, pitcher*, Pos_2*, pitcher-hitting*, umpire*, base-outs*, bats, throws, framing, TTO, home_team

0.61

All

Pitcher_PO

pitcher*, Pos_3*, batter*, assist*, throws, bats

0.9

no DP

Catcher_PO

pitcher*, Pos_2*, batter*, assist*, base-outs*

0.61

no DP

First_PO

pitcher*, Pos_3*, Pos_5*, batter*, assist*, base-outs*, base1_run_id*, pitcher-hitting*, bunt, throws, bats

0.92

no DP

Second_PO

Pos_4*, base1_run_id*, batter*, base-outs*, assist*, throws, bats

0.76

no DP

Third_PO

Pos_5*, base1_run_id*, batter*, base-outs*, assist*, throws, bats

0.67

no DP

Short_PO

pitcher*, Pos_6*, batter*, base-outs*, assist*, throws, bats

0.73

no DP

LF_PO

pitcher*, Pos_7*, batter*, stadium*, temperature, throws, bats, IF-fld

0.82

no DP

CF_PO

pitcher*, Pos_8*, stadium*, batter*, temperature, throws, bats, IF-fld

0.78

no DP

RF_PO

pitcher*, Pos_9*, batter*, Pos_8*, stadium*, throws, bats, temperature, IF-fld

0.82

no DP

Pitcher_DP

pitcher*, batter*, base1_run_id*

0.69

All

Catcher_DP

pitcher*, Pos_2*, Pos_3*, Pos_4*, base1_run_id*, throws, IF-fld

0.71

All

First_DP

pitcher*, batter*, Pos_3*, Pos_4*, Pos_6*, base1_run_id*, bats, throws

0.74

All

Second_DP

pitcher*, batter*, Pos_4*, Pos_6*, base1_run_id*, base2_run_id*, bats

0.72

All

Third_DP

pitcher*, batter*, Pos_3*, Pos_5*, stadium*, base1_run_id*, base2_run_id*, bats

0.69

All

Short_DP

pitcher*, batter*, Pos_3*, Pos_4*, Pos_6*, stadium*, base1_run_id*, bats, throws

0.75

All

Several aspects of these models are interesting, although we won’t comment on them at all now. But I’ll make a few general observations:

(1) Pitcher is consistently relevant to all events, although not always to a great extent.

(2) Certain fielders appear to have fairly limited relevance to the outcomes of balls put in play (looking at you, right fielders).

(3) Both hit-by-pitch and intentional walk models rely in part on the identity of the pitcher’s team, which is consistent with the belief that certain teams have a defined strategy of pitching inside or of being more open to the intentional walk.

(4) You might believe that these models could also be used to generate DRA-equivalent statistics for batters and fielders. We suspect you would be right.

As always, we appreciate your comments and feedback. The second installment of this article, which should be ready next week, will discuss how this version of DRA scales to RA9, and how we decided to evaluate its performance.

Thanks to the BP Stats Team for their extensive insight and assistance.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
TangoTiger1
5/06
Jonathan, thanks for the continued improvements. Just so I am following along, is your chart showing that the IBB is dependent on the identity of the catcher (over and above the team that the battery is on)? If so, can you show us this "IBB impact" for the leaders/trailers for catchers? Also, the putout impact of the 1B is dependent on the putout impact of the 3B? If so, is that because the talent level of the 3B allows the SS to move over which allows the 2B to move over? But that if you only focused on the impact of the 2B, that that did not have any relevance? Fascinating if true.
TangoTiger1
5/06
I also see that your inning parameter, which was so prevalent in your original model has all but disappeared, and limited to just the "extra innings or not". That was one where it was almost certainly an overfitting, especially when those values would change each year. So, it's good that you are using baseball knowledge in contructing your models, rather than relying on a regression/kitchen sink approach. In that respect, you'd probably want a "9th inning, tieing runner at bat or on base" parameter, which should be similar to the XI parameter you have. After all, bottom of 9th tie game is much closer to impact as XI tie game than 3rd inning tie game.
myshkin
5/09
I can come up with plausible explanations for the relevance of most of the Pos_N variables, but the importance of 2B (and absence of SS) for doubles and triples is rather puzzling.