keyboard_arrow_uptop
Image credit: USA Today Sports

“With or without you” (a/k/a WOWY) has been a staple of sports analysis for some time. In some sports, it is the backbone of so-called “plus/minus”; in baseball, Tom Tango popularized the idea as WOWY and it has essentially stuck. (Ironically, Tom revived this discussion very recently, using language very similar to what I swear I had already typed in this paragraph.)

In its most basic form, this analysis is arithmetic: the raw results for your team with you present versus with you absent. As regression methods gained popularity, analysts realized that multilevel modeling could effectively implement “WOWY-on-steroids.” By applying shrinkage to player estimates, multilevel models could assign responsibility more cautiously, and thus make it harder to be wrong about players. Multilevel models can also simultaneously perform WOWY on multiple players in a sample, rather than just one, borrowing strength from the performances of others to better analyze them all. Nonetheless, the underlying concept is the same: estimate the results with the player, estimate the results without said player, total up the average difference, and declare that to be the player’s estimated contribution.1

None of this is news to our readers. “WOWY-on-steroids” was coined here at BP by Max Marchi. A few years later, we explained how we moved “beyond WOWY” in establishing our current general framework for catcher framing, and we proceeded to use the same principles to set up Deserved Run Average (DRA) for pitchers. We then, however, encountered a problem: as applied to batters, at least in our private work, the approach did not seem to work very well. In fact, it sometimes gave bizarre values even when we ran the same models that worked well on the pitching side.

That got us to thinking: what if WOWY principles, however useful, were approaching their practical limits? Baseball, after all, allows for more than one possible outcome from each PA. In fact, there are multiple possible events that can occur, and Retrosheet provides separate codes ranging from 2 to 23. WOWY, in its basic form, considers one alternative to the event you are studying; it is often used as binomial comparison, comparing success to failure. But the alternative to a single is not necessarily an out: it could be a double or home run. It could also be a walk, providing much of a single’s value without putting the ball in play. Different players bring different combinations of these events to the diamond. Ideally, we would put them all on the same spectrum, and recognize the entire probability space of alternatives for an event. This is the concept of a multinomial: instead of Outcome A versus Outcome B, it is Outcome A vs. Outcome B vs. Outcome C, etc. Using run values, you can still estimate this with WOWY arithmetic, but we would like to do better.

The concept of treating baseball events as a multinomial is not new. But, setting up multilevel regression for multinomials is challenging. Multinomials do not arise from nicely-behaved exponential-family distributions. Instead, they rely on reparameterizations that require complicated conversions between a link function and a probability scale. Some of those methods (e.g., the “Poisson trick”) may not work for multilevel models at all.

There are off-the-shelf options that can automate multinomials for you, except they bring their own problems: the speedy ones employing maximum likelihood often don’t work with random effects (Ben Dilday’s nmre is an exception), and the robust, Bayesian deployments tend to be slow, particularly for baseball-size datasets. Alternatively, because the random effects of multilevel models can be approximated by ridge-type penalties, glmnet can fit a multinomial that somewhat approximates those effects. Also worth a look is Scott Powers’ improvement on glmnet, npmr, which further allows for the possibility of relationships between outcome classes.

But even then, these packages tend to have one common limitation: they expect you to specify the same model for every type of baseball event. For baseball modeling, this is unacceptably limiting, because predictors that are incredibly important for some events (like batted ball data) do not exist for events that do not involve putting the ball in play. You could set up a very simple model that uses only the batter, pitcher, and stadium, but with so much additional data available, and clear evidence that certain inputs confound certain events uniquely (strike zone management, temperature, and platoon effects), ignoring these covariates is suboptimal.

This was the problem we faced when we decided that it was time to “go multinomial.” In our minds at least, it was both clear what needed to be done and equally clear that it was not possible with any off-the-shelf solution. So, we decided to create our own.

DRA and DRC, therefore, rely on their own logistic multinomial framework. These statistics are both built on a series of simple binomial models2, the results of which are then combined into the multinomial conclusion. As a result, for every batting event during a baseball season, DRC+ does not merely select the most probable result out of all possibilities: it summarizes, from 0 to 100%, the respective probabilities of every such outcome that was considered. This approach not only works, but it provides results that are substantially more accurate than any other publicly-available batting statistic, at least by our current, preferred means of measurement.

Many readers have asked for more detail as to how DRC+ is calculated. We think the easiest way to show this is to work through the general principles, using one of the DRC+ models (home runs) as an example.

Like most of our models, DRC+ is coded in R, and so this example will be too. Let’s begin:

Data Acquisition and Preparation

Initially, let’s set up our environment and the packages the analysis will require. We’ll use the pacman package to install whatever packages you don’t already have, and then load what we need:

if (!require("pacman")) install.packages("pacman")
pacman::p_load(tidyverse,arm,lme4,blme)

The first step is to obtain and prepare our model variables. Using Retrosheet or some other data source of your choice, load all baseball batting events for your season of choice. (In Retrosheet, this will be all events with BAT_EVENT_FL equaling ’T“). For 2018, that should give you 185,139 events.

To move things along, we’ll load a previously-created file (DRC_start), ditch the unnecessary first column, and use the first 10 rows to illustrate the primary columns we will need:

DRC_start <- read_csv("DRC_start.csv")

DRC_start %>% dplyr::select(-c("X1")) %>% dplyr::slice(1:10) %>% dplyr::glimpse() 
## Observations: 10
## Variables: 8
## $ game_id    <chr> "MIA201803291", "MIA201803291", "MIA201803291", "MI...
## $ pitcher    <dbl> 66311, 66311, 66311, 66311, 66311, 66311, 66311, 66...
## $ batter     <dbl> 105437, 68520, 57514, 66719, 103751, 70633, 57396, ...
## $ home_team  <chr> "MIA", "MIA", "MIA", "MIA", "MIA", "MIA", "MIA", "M...
## $ bat_fld_cd <dbl> 8, 5, 3, 2, 7, 6, 9, 4, 1, 8
## $ event      <dbl> 23, 14, 16, 3, 2, 16, 14, 16, 2, 2
## $ bats       <chr> "L", "R", "L", "R", "L", "R", "L", "R", "L", "R"
## $ temp       <dbl> 75, 75, 75, 75, 75, 75, 75, 75, 75, 75

Next, we’ll reclassify a few of the categorical predictors, and ensure they are “factors” for modeling purposes. This is not always necessary, but it makes R happy.

DRC_start$batter <- as.factor(DRC_start$batter)
DRC_start$pitcher <- as.factor(DRC_start$pitcher)

DRC_start$stadium_bats <- as.factor(
  with(DRC_start, paste(home_team, bats, sep="_")))
DRC_start$pit_hit <- as.factor(
  with(DRC_start, ifelse(bat_fld_cd == 1, 1, 0)))

Some of the columns are self-explanatory: they include batter and pitcher identity, as well as gametime temperature.

We also created two other variables. The first, stadium_bats, is a set of 60 indicators creating a flag for when a left-handed or right-handed batter is at the plate for each stadium. This accounts for the reality that stadiums tend to play differently for each batter side, and allows us to park-adjust by handedness for each event.

The pit_hit predictor, by noting when the current batter is also the current pitcher of record, effectively imposes a prior belief that pitchers are ineffective hitters. The effect is to create a “poor man’s mixture model” without the extra computational hassle of an actual mixture model, which is computationally demanding. By doing so, we address one of the greatest challenges of modeling batting: the fact that there are two substantially separate distributions of contributors: position players and pitchers. Pitchers who rake (at least by comparison to others) still can push their numbers up through the batter random effect.

We also want to use what I call “composite” covariates: confounders that are not fully captured by individual play-by-play matchups. The first of these is the pitcher “time through the order penalty.” Much has been written about how to account for the “opener” in the starter-reliever dichotomy. Our preference has been to look at this more dynamically and track how many times through the batting order the current pitcher has traveled so far that game. This categorization targets the mechanism that drives much of the “opener” strategy, and avoids the need to decide whether a given pitcher is more akin to a traditional starter or a reliever.

We parse out the TTO status for pitchers separately, on a game-by-game basis, and then join those values back into the main data set:

DRC_start <- DRC_start %>%
  dplyr::group_by(pitcher, game_id) %>%
  dplyr::mutate(GBF = row_number(),
                TTO = ifelse(GBF<10,1,ifelse(GBF<19,2,3)))

DRC_start$TTO <- as.factor(DRC_start$TTO)

The second composite covariate relates to pitcher groundball rate. Tracking whether the pitch results in a groundball on a given play is not very helpful: for home runs it will always be untrue, even though most non-grounders are not home runs. Furthermore, many pitchers with smaller samples may not fully “deserve” a ground ball on a given play. Nonetheless, expected groundball rate is an important predictor of batter success for some events, and there may be some clustering effects unique to groundball rate which are not fully realized through a pure play-by-play model. So once again, we step outside our data set to create a new covariate: a smoothed groundball rate. Starting from a subset of the original data, we run a separate multilevel model to estimate a pitcher’s expected groundball rate, which controls for sample size and the identities of batters faced. We then merge these fitted results back into an updated dataset:

pitcher_gb <- readr::read_csv("pitcher.gb.csv")

gb.mod <- glmer(GB ~ (1|pitcher) + (1|batter),
                data=pitcher_gb,
                family=binomial(link='logit'),
                control = glmerControl(optimizer = "nloptwrap"),
                nAGQ=0)

pitcher_gb$gb_adj <- predict(gb.mod, type='response')

pitcher_gb_rate <- pitcher_gb %>%
  dplyr::group_by(pitcher) %>%
  dplyr::summarise(
    gb_adj_p = mean(gb_adj))

pitcher_gb_rate$pitcher <- as.factor(pitcher_gb_rate$pitcher)

DRC_data <- dplyr::left_join(DRC_start, pitcher_gb_rate)

Practitioners will note the use of the NLOPTR optimizer (which lme4 recently made standard, actually) and the use of the speediest optimization method (nAGQ=0) which is plenty accurate for our purposes.

Next, we center and scale our continuous predictors. This is virtually always a good idea, for a number of reasons: (1) it allows you to directly compare variables on different original scales, particularly when you model interactions; (2) scaled predictors are easier for regression models to fit; (3) centered variables make it easy to set priors for appropriate regularization.

The traditional approach to scaling is to locate the variable mean at 0 and divide by one standard deviation to get a z-score. I prefer instead the approach of centering and dividing by two standard deviations: this puts continuous variables on a unit-type scale similar to categorical predictors, and makes it easier to compare different variable types. The arm package’s rescale function does this for you with its default settings:

DRC_data$temp_sc <- arm::rescale(DRC_data$temp)
DRC_data$p_gb_rate_sc <- arm::rescale(DRC_data$gb_adj_p)

Prepare the Binomial Subsets

Our data preparation is done. Now, we must prepare the binomial datasets that will form the basis for our binomial models.

To do this, we need to choose one of our events as the “pivot”: the common category every other event will be regressed against. This works best when the pivot is also the largest category. For this exercise, that would be Retrosheet event code 2, ball-in-play-outs (“BIP outs”), although strikeouts would probably work just as well.3 For housekeeping purposes, we combine codes 2 and 19 (fielder’s choice), treating them all as outs or should-have-been-outs. Similarly, we combine intentional and unintentional walks, treating them as substantially related.

Even though we will discuss only the home run model in detail, here is an example of how you could create pivot datasets for multiple events at once:

model.df <- DRC_data %>%
  dplyr::mutate(
    event = case_when(
      event %in% c(2,19) ~ 'cd_2',
      event == 3 ~ 'cd_3',
      event %in% c(14,15) ~ 'cd_14',
      event == 16 ~ 'cd_16',
      event == 18 ~ 'cd_18', # limited to infield errors
      event == 20 ~ 'cd_20',
      event == 21 ~ 'cd_21',
      event == 22 ~ 'cd_22',
      event == 23 ~ 'cd_23',
      TRUE ~ 'NA')
  )

### create binomial data sets over pivot ###

df.cd2.cd3 <- model.df %>%
  dplyr::filter(event == "cd_2" | event == "cd_3") %>%
  dplyr::mutate(
    outcome = if_else(event == "cd_2" , 0, 1)
  )

df.cd2.cd14 <- model.df %>%
  dplyr::filter(event == "cd_2" |event == "cd_14") %>%
  dplyr::mutate(
    outcome = if_else(event == "cd_2", 0, 1)
  )

df.cd2.cd16 <- model.df %>%
  dplyr::filter(event == "cd_2" | event == "cd_16") %>%
  dplyr::mutate(
    outcome = if_else(event == "cd_2", 0, 1)
  )

df.cd2.cd18 <- model.df %>%
  dplyr::filter(event == "cd_2" | event == "cd_18") %>%
  dplyr::mutate(
    outcome = if_else(event == "cd_2", 0, 1)
  )

df.cd2.cd20 <- model.df %>%
  dplyr::filter(event == "cd_2" | event == "cd_20") %>%
  dplyr::mutate(
    outcome = if_else(event == "cd_2", 0, 1)
  )

df.cd2.cd21 <- model.df %>%
  dplyr::filter(event == "cd_2" | event == "cd_21") %>%
  dplyr::mutate(
    outcome = if_else(event == "cd_2", 0, 1)
  )

df.cd2.cd22 <- model.df %>%
  dplyr::filter(event == "cd_2" | event == "cd_22") %>%
  dplyr::mutate(
    outcome = if_else(event == "cd_2", 0, 1)
  )

df.cd2.cd23 <- model.df %>%
  dplyr::filter(event == "cd_2" | event == "cd_23") %>%
  dplyr::mutate(
    outcome = if_else(event == "cd_2", 0, 1)
  )

The other Retrosheet codes (“cd”) represented above are strikeouts (3), walks (14), hit-batsmen (16), reached base on error (18), singles (20), doubles (21), triples (22), and home runs (23). We only consider reached-on-error events fielded by infielders.

Modeling Home Runs

Let’s continue to our home run model, one of eight that we run. Our models for DRC tend to be simple. Parsimony is often best, and this simplicity allows us to apply DRC to any season for which there is play by play data, all the way back to 1921 (so far). It also reflects the fact that the batter identity parameter, given enough sample size and proper methods, tends to dominate other indicators for accuracy, even raw measurements like exit velocity and launch angle.

Each of our models gets run twice: the first time we run the model in lme4 on the speediest setting (nAGQ=0), so we can get initial, estimated coefficients for the fixed effects. All models are run as logistic regressions, and their fixed effects are then harvested for further use. Here is what the home run model looks like, along with the fixed effect extraction:

### initial model run ###

glmer.23.mod.0 <- glmer(
  outcome ~
    (1|pitcher) +
    (1|batter) +
    (1|stadium_bats) +
    pit_hit + temp_sc + TTO +
    p_gb_rate_sc,
  data=df.cd2.cd23,
  family="binomial",
  control = glmerControl(optimizer = "nloptwrap"),
  nAGQ=0)

### harvest fixed effect coefficients ###

glmer.23.fixef <- fixef(glmer.23.mod.0)

For the final model, we step things up a notch: some of the random effects (batter, and sometimes the others also) now get a semi-informative prior, the rationale for which which we discussed a few months ago. The fixed effects also receive priors of their own, normally distributed at a mean of 0 and with standard deviations set to the coefficients estimated by the first model run. This makes good use of our centered covariates, which already have a midpoint of zero. The fixed effect priors help regularize the fixed effect coefficients, which (1) further improves accuracy, (2) safeguards against new coefficient levels popping up unexpectedly, and (3) constrains early season estimates within reason.

In order to accommodate the priors, these final models are run through blme, which requires the Laplace Approximation (nAGQ=1)4:

glmer.23.mod <- bglmer(
  outcome ~
    (1|pitcher) +
    (1|batter) +
    (1|stadium_bats) +
    pit_hit + temp_sc + TTO +
    p_gb_rate_sc,
  data=df.cd2.cd23,
  family="binomial", fixef.prior = normal(sd=c(10, as.numeric(glmer.23.fixef[2:length(glmer.23.fixef)]))),
  cov.prior = list(batter ~ invgamma(shape=.5, scale=10), NULL),
  control = glmerControl(optimizer = "nloptwrap"),
  nAGQ=1)

Multinomial Predictions

Once all the models run, we need to make predictions on the logistic scale; the probabilities will come, but not yet.

That requires us to bring back our original data set, because the point of the various binomials is to predict over the entire set of original events, not just those you modeled to determine the coefficients. We create a prediction data set and then generate our fitted values from each model, accounting for the batter random effect only, and most importantly, retaining the link function:

df.preds <- DRC_data

df.preds$cd23_all_lp_bat <- predict(glmer.23.mod, newdata=df.preds,
                                    allow.new.levels = TRUE, type='link', re.form=~(1|batter))

After repeating this for all models, we need to combine the predictions back together on the probability scale so the probability of all events sums to 1. We also need probabilities for our “pivot” category, BIP outs. A softmax function does both these things for us, converting the fitted values from each model from the logistic scale and onto the probability scale with a range of 0 to 1. Again, despite only working through one model, we’ll pretend as if we worked through all of them, just to illustrate the process:

model.coef.final <- df.preds %>%
  dplyr::mutate(
    K_all = 1 +
      exp(cd3_all_lp_bat) +
      exp(cd14_all_lp_bat) +
      exp(cd16_all_lp_bat) +
      exp(cd18_all_lp_bat) +
      exp(cd20_all_lp_bat) +
      exp(cd21_all_lp_bat) +
      exp(cd22_all_lp_bat) +
      exp(cd23_all_lp_bat),
    cd2_all = 1 / K_all,
    cd3_all = exp(cd3_all_lp_bat) / K_all,
    cd14_all = exp(cd14_all_lp_bat) / K_all,
    cd16_all = exp(cd16_all_lp_bat) / K_all,
    cd18_all = exp(cd18_all_lp_bat) / K_all,
    cd20_all = exp(cd20_all_lp_bat) / K_all,
    cd21_all = exp(cd21_all_lp_bat) / K_all,
    cd22_all = exp(cd22_all_lp_bat) / K_all,
    cd23_all = exp(cd23_all_lp_bat) / K_all
  )

Now we are cooking with gas, as they say. We have not only a prediction of the most likely event for a given matchup; we have a prediction for the chances of all events we care about during that matchup, including BIP outs (a/k/a cd2).

In addition to providing more accuracy, this multinomial fit provides a baseline for us to compare actual versus expected outcomes of plays. For example, the winning run last year for the Rockies in the 2018 NL Wild Card was driven in by Tony WoltersTony Wolters! – who laced a single off very-good pitcher Kyle Hendricks during the top of the 13th inning. How probable was that, based on regular season performances? DRC+ saw the distribution of probabilities as follows:

BIP_Out Strikeout Walk HBP ROE Single Double Triple Home_Run
53% 15% 11% 2% 1% 13% 3% 1% 1%

Wolters’ single felt incredibly improbable at the time, but was it that improbable? DRC+ would expect it about 1 out of every 8 times. An out was overwhelmingly more likely once the ball was put into play, but the single that happened also was equally likely as a strikeout or a walk (Wolters does not strike out much and definitely takes his walks). Of course, these expectations apply only to the beginning of the PA, so the actual probability of Wolters’ single – on a 1-2 count – was probably even less than this.

Back to business. Now that we know what we expected to happen, based on the opening circumstances of each play, we need our baseline average expectation for all plays, because event run values tend to operate off the average MLB performance. The table function allows us to derive the cumulative, average frequency of each considered event as follows:

event.avg <- data.frame(table(df.preds$event) / nrow(df.preds))

event.avg
##    Var1         Freq
## 1     2 0.4499160091
## 2     3 0.2225733098
## 3    14 0.0797076791
## 4    15 0.0050178515
## 5    16 0.0103813891
## 6    17 0.0002160539
## 7    18 0.0081398301
## 8    19 0.0024954224
## 9    20 0.1421742583
## 10   21 0.0446367324
## 11   22 0.0045749410
## 12   23 0.0301665235

Again, these codes are from Retrosheet, and were laid out above.

This gives us the performance baseline, and, to bring things full circle, allows us finally to “move beyond WOWY.” Players are no longer being compared to their team without them, but rather to the relative average frequencies of all such events, regardless of whether a given batter was present or not. When subtracted from the shrunken expectations for their distribution of batting events, as calculated above, we end up with the baseline for a player’s Deserved Runs Created.

Combining it all into DRC+

We now have our predictions for all events and our event baselines. Two steps left. First, we tabulate our results and multiply by linear weights values for each event. This gives us runs above average, summarized, for all hitters above and below average, matched to the Retrosheet codes for each event:

batter.tab <- model.coef.final %>%
  dplyr::group_by(batter) %>%
  dplyr::summarise(
    PA = n(),
    out_rate_pred = mean(cd2_all), ### expected rates
    SO_rate_pred = mean(cd3_all),
    BB_rate_pred = mean(cd14_all),
    HBP_rate_pred = mean(cd16_all),
    ROE_rate_pred = mean(cd18_all),
    single_rate_pred = mean(cd20_all),
    double_rate_pred = mean(cd21_all),
    triple_rate_pred = mean(cd22_all),
    HR_rate_pred = mean(cd23_all),
      out_rate_avg = as.numeric(event.avg[event.avg$Var1==2,][2]) + ### average rates
       as.numeric(event.avg[event.avg$Var1==19,][2]),
      SO_rate_avg = as.numeric(event.avg[event.avg$Var1==3,][2]),
      BB_rate_avg = as.numeric(event.avg[event.avg$Var1==14,][2]) +
       as.numeric(event.avg[event.avg$Var1==15,][2]),
      HBP_rate_avg = as.numeric(event.avg[event.avg$Var1==16,][2]),
      ROE_rate_avg = as.numeric(event.avg[event.avg$Var1==18,][2]),
      single_rate_avg = as.numeric(event.avg[event.avg$Var1==20,][2]),
      double_rate_avg = as.numeric(event.avg[event.avg$Var1==21,][2]),
      triple_rate_avg = as.numeric(event.avg[event.avg$Var1==22,][2]),
      HR_rate_avg = as.numeric(event.avg[event.avg$Var1==23,][2]),
    out_rate_AA = out_rate_pred - out_rate_avg, ### rates above average
    SO_rate_AA = SO_rate_pred - SO_rate_avg,
    BB_rate_AA = BB_rate_pred - BB_rate_avg,
    HBP_rate_AA = HBP_rate_pred - HBP_rate_avg,
    ROE_rate_AA = ROE_rate_pred - ROE_rate_avg,
    single_rate_AA = single_rate_pred - single_rate_avg,
    double_rate_AA = double_rate_pred - double_rate_avg,
    triple_rate_AA = triple_rate_pred - triple_rate_avg,
    HR_rate_AA = HR_rate_pred - HR_rate_avg) %>%
  dplyr::left_join(lwts) %>% ### join and query the linear weights table
  dplyr::mutate(
    out_runs_AA = out_rate_AA * PA *
      as.numeric(lwts[lwts$event=="2",][2]),
    SO_runs_AA =  SO_rate_AA * PA *
      as.numeric(lwts[lwts$event=="3",][2]),
    BB_runs_AA =  BB_rate_AA* PA *
      as.numeric(lwts[lwts$event=="14",][2]),
    HBP_runs_AA = HBP_rate_AA * PA *
      as.numeric(lwts[lwts$event=="16",][2]),
    ROE_runs_AA =  ROE_rate_AA* PA *
      as.numeric(lwts[lwts$event=="18A",][2]), # infield ROE only
    single_runs_AA = single_rate_AA  * PA *
      as.numeric(lwts[lwts$event=="20",][2]),
    double_runs_AA = double_rate_AA * PA *
      as.numeric(lwts[lwts$event=="21",][2]),
    triple_runs_AA = triple_rate_AA * PA *
      as.numeric(lwts[lwts$event=="22",][2]),
    HR_runs_AA = HR_rate_AA  * PA *
      as.numeric(lwts[lwts$event=="23",][2])
  )

Finally, we calculate DRC+. For this, we follow the convention at FanGraphs for wRC+, and use a similar process to set 100 as the position-player average and calculate our final value:

DRC_output <- batter.tab %>%
    dplyr::group_by_(spl1) %>%
    dplyr::mutate(
      dRAA_raw = sum(out_runs_AA, SO_runs_AA, BB_runs_AA, HBP_runs_AA, ROE_runs_AA, 
                     single_runs_AA, double_runs_AA, triple_runs_AA, HR_runs_AA,
                     na.rm = TRUE),
      dRAA_PA_raw = dRAA_raw / PA
    )
  
  raw_mean = with(DRC_output, weighted.mean(dRAA_PA_raw, PA))
  
  league_mean = sum(RA_data$R) / sum(RA_data$PA) ### league runs mean
  
  metrics_noP <- DRC_output %>% ### statistics for non-pitchers
    dplyr::left_join(id_data, by=c("batter" = "player")) %>% 
    dplyr::filter(pos != "P") %>%
    dplyr::ungroup() %>%
    dplyr::summarise(mean_dRAA_PA_noP = wt.mean(dRAA_PA_raw, PA))
  
  metrics_wP <- DRC_output %>%
    dplyr::ungroup() %>%
    dplyr::summarise(mean_dRAA_PA_wP = wt.mean(dRAA_PA_raw, PA))
  
  DRC_output <- DRC_output %>% ### calculate DRC+
    dplyr::mutate(
      dRAA_PA = dRAA_PA_raw - raw_mean, 
      dRAA = dRAA_PA * PA,
      DRC_plus = (dRAA_PA + as.numeric(metrics_wP) + league_mean) / (league_mean + as.numeric(metrics_noP)) * 100,
      total_out_runs = out_runs_AA,
      total_hit_runs = single_runs_AA + double_runs_AA + triple_runs_AA + HR_runs_AA,
      total_NIP_runs = SO_runs_AA + BB_runs_AA + HBP_runs_AA)

DRC_output <- DRC_output %>% ### compile supporting components
  dplyr::ungroup() %>%
  dplyr::group_by(batter) %>%
  dplyr::summarize(
    dRAA = sum(dRAA),
    dRAA_PA = weighted.mean(dRAA_PA, PA),
    DRC_plus = weighted.mean(DRC_plus, PA),
    out_runs_AA = sum(out_runs_AA),
    SO_runs_AA = sum(SO_runs_AA),
    BB_runs_AA = sum(BB_runs_AA),
    HBP_runs_AA = sum(HBP_runs_AA),
    ROE_runs_AA = sum(ROE_runs_AA),
    single_runs_AA = sum(single_runs_AA),
    double_runs_AA = sum(double_runs_AA),
    triple_runs_AA = sum(triple_runs_AA),
    HR_runs_AA = sum(HR_runs_AA),
    total_out_runs = sum(total_out_runs),
    total_hit_runs = sum(total_hit_runs),
    total_NIP_runs = sum(total_NIP_runs),
    out_rate_pred = weighted.mean(out_rate_pred, PA),
    SO_rate_pred = weighted.mean(SO_rate_pred, PA),
    BB_rate_pred = weighted.mean(BB_rate_pred, PA),
    HBP_rate_pred = weighted.mean(HBP_rate_pred, PA),
    ROE_rate_pred = weighted.mean(ROE_rate_pred, PA),
    single_rate_pred = weighted.mean(single_rate_pred, PA),
    double_rate_pred = weighted.mean(double_rate_pred, PA),
    triple_rate_pred = weighted.mean(triple_rate_pred, PA),
    HR_rate_pred = weighted.mean(HR_rate_pred, PA),
    out_rate_avg = weighted.mean(out_rate_avg, PA),
    SO_rate_avg = weighted.mean(SO_rate_avg, PA),
    BB_rate_avg = weighted.mean(BB_rate_avg, PA),
    HBP_rate_avg = weighted.mean(HBP_rate_avg, PA),
    ROE_rate_avg = weighted.mean(ROE_rate_avg, PA),
    single_rate_avg = weighted.mean(single_rate_avg, PA),
    double_rate_avg = weighted.mean(double_rate_avg, PA),
    triple_rate_avg = weighted.mean(triple_rate_avg, PA),
    HR_rate_avg = weighted.mean(HR_rate_avg, PA),
    out_rate_AA = weighted.mean(out_rate_AA, PA),
    SO_rate_AA = weighted.mean(SO_rate_AA, PA),
    BB_rate_AA = weighted.mean(BB_rate_AA, PA),
    HBP_rate_AA = weighted.mean(HBP_rate_AA, PA),
    ROE_rate_AA = weighted.mean(ROE_rate_AA, PA),
    single_rate_AA = weighted.mean(single_rate_AA, PA),
    double_rate_AA = weighted.mean(double_rate_AA, PA),
    triple_rate_AA = weighted.mean(triple_rate_AA, PA),
    HR_rate_AA = weighted.mean(HR_rate_AA, PA),
    PA = sum(PA),
    year = s,
    lvl = "mlb"
  ) %>%
  dplyr::select(year,
                batter,
                dRAA,
                dRAA_PA,
                DRC_plus,
                PA,
                total_hit_runs,
                total_NIP_runs,
                total_out_runs,
                everything()) %>%
  dplyr::arrange(batter)

DRC_output <- as.data.frame(DRC_output)

If desired, you can bootstrap the datasets, re-run the models on each resampled dataset to get various “draws” of DRC+, and from those draws estimate a distribution of values for DRC+, or any other value derived above. We do this ourselves, using the Bayesian Bagging method we laid out in this article to approximate the uncertainty values that we publish in our tables.

Wrap-Up

We hope that this explanation helps readers understand how DRC+ gets to where it does. Alert readers will no doubt spot areas of potential improvement and share suggestions and feedback. We wholeheartedly encourage people to do this and look forward to seeing how baseball batting analysis can continue to move forward.


  1. Technically one does not have to “predict” from a multilevel model versus just directly querying the assigned intercept for each player, but prediction has the benefit of putting the outputs correctly on the scale of the outcome, a significant convenience when your model relies upon a link function, which most baseball models do.
  2. Technically, these are Bernoulli regressions, a special case of the binomial model.
  3. Strikeouts may eventually become the largest category, if current trends do not change.
  4. We are using maximum likelihood methods for speed: remember that we have to generate new player estimates each day, preferably by lunch. For those who are curious, we have recovered essentially identical results using MCMC, so in this case, maximum likelihood is sufficient for the task. That is not always true.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now
You need to be logged in to comment. Login or Subscribe
ChicagoOriole
4/08
It even ignores that hitting the first pitch is different than running the count to 3-2 and then fouling off 17 more pitches is different than hitting the 23rd pitch to get the same result. I think even working the count matters these days given more first-pitch hitters.
Dane Galloway
4/08
That's just a lot more shit to account for. If you're going to base a model off of how a hitter fares in a certain count, then you would also have to account for how the pitcher they are facing fares in that same count as well. That's a lot of factors.
Jonathan Judge
4/09
The count is an additional factor you can account for, but then the question becomes what is the "baseline" count and how you parcel out credit among participants to have the count get to where it is. The count is also implicitly reflected in the results, because pitchers who consistently end up in favorable counts also tend to get favorable results. So again with our conservative estimates, I think the overall effect of being in better or worse counts gets reflected, even if each individual PA does not get into that level of detail. Thanks for reading!
peter jensen
4/10
Thank you for the additional info on how DRC+ works. I am still unclear on how the weighting for the fixed factors are calculated and applied. Perhaps an example with actual numbers would help clarify things for me.
Jonathan Judge
4/10
Peter, thanks for reading. Happy to provide whatever further explanation would be helpful. Generally speaking, the role of the fixed effects is the same as the inclusion of the batter random effect: to provide the most accurate estimate of what occurred on each play, assuming an average pitcher and an average stadium. It is a straight prediction in that sense. We needn't trouble ourselves much with what the coefficient is because it is part of the prediction, which is what we are looking for. The baseline is the league average for each one of the events. The intuition is that in a multinomial framework, batter success is about tradeoffs, so if you are getting more strikeouts, to take one example, where is that coming from? If you are just substituting for BIP outs and getting more home runs to boot, and are otherwise largely league average, that is a worthy trade and will result in above-average DRC+. On the other hand, if you are getting more strikeouts than average and fewer walks than average, with everything else otherwise being largely average, then that is not a good trade-off. At the the end of the day, each player is rated by how the expected frequency in each category (as predicted by each of the models) relates to average, and the more positive their overall distribution of batting events is in terms of run value, the more valuable they are above 100, and vice versa. Does that help? If not, let me know how I might explain it better.