World Series time! Enjoy Premium-level access to most features through the end of the Series!
March 15, 2011
Team Injury Projection
How CHIPPER Works
Throughout the past week, Corey Dawkins and Marc Normandin have been using the Comprehensive Health Index [of] Pitchers [and] Players [with] Evaluative Result, otherwise known as CHIPPER, to break down the expected health of teams. If you've missed any installments and want to page back through them, you can visit the Team Injury Projection homepage by clicking here. We've heard your questions about what CHIPPER means, where the projections came from, and how they differ from what others provide, and it's time for us to answer them.
Let's take the last one first. What makes CHIPPER different from the other injury projections out there? First off, our injury database contains not only major league injuries from the past eight years, but also minor league, spring training, and winter league data. We're even starting to collect injury data from colleges. All told, we have over 400,000 player days missed to injury in the database. Secondly, CHIPPER does more than just project whether a player is going to miss time: it also tries to provide a ballpark figure for how much time a player is going to miss.
So how does it work? Let's take a look at the Boston Red Sox Team Injury Projection, which is free for all readers, and we'll cover the details.
The Dashboard gives you some context about the team's injury situation in aggregate. In the 2010 Recap section, you'll find that Boston had 71 entries in the CHIPPER database for 2010--all of the disclosed injury incidents for which we have a record. Of those entries, 21 were DL trips, or stays on the Disabled List. Boston had 1349 TDL, or Total Days Lost to Injury, and 19 DMPI, or Days Missed Per Injury, during the 2010 season. You can see Boston's historical TDL and DMPI across the rest of the dashboard; the numbers below the graph labels display the team's ranking league-wide, and the graphs are color-coded to reflect the team's injury performance relative to its competitors.
Hitters in approximate Depth Charts order at time of publication
CHIPPER's goal in life is to predict the chance that a player is going to miss time in 2011. It considers the likelihood of a player missing one or more games to injury, more than 15 games to injury, and more than 30 games to injury, and rates that risk as either green, yellow or red. We've represented these on the player lines with a color scheme you're used to but symbols you aren't.
Green: ~15 percent or lower chance; Yellow: ~15-85 percent chance; Red: ~ 85 percent and up chance of the player missing this many games with injury.
It's important to note that we're considering only games lost to a disclosed injury, not simply days off. While almost no one plays 162 games anymore, many players don't actually have any injuries reported in a given season. The database behind CHIPPER tracks injury reports beyond just DL visits, but there has to be a reported injury; we're not tracking or reporting routine days off.
Among the sample hitters above, Dustin Pedroia is the best bet to miss a significant amount of time coming off of his injury troubles last year, Carl Crawford is very likely to miss a few games, and Darnell McDonald's fairly unscathed past and positive markers give him a better profile for injury risk.
Pitchers in approximate Depth Charts order at time of publication
In the sample of pitchers above, Jon Lester profiles as one of the least-risky aces in the majors, John Lackey was durable last year but missed considerable playing time in 2008 and 2009 and remains a risk, and Tim Wakefield's age and significant time lost in 2009 make him very likely to spend an extended period in the trainer's room in 2011.
CHIPPER uses logistic regression to determine whether a player is going to miss time at each of the thresholds we've set. For position players, we consider age, position, time lost to injury during the previous three seasons, and proxy variables to represent player type. For pitchers, the categories include age and time lost to injury during the previous three seasons. Surprisingly, including workload didn't make much of a difference in the results.
You're probably wondering what this means to you as a fan or a fantasy owner. If we say a player is a high risk, should you expect him to head to the DL? Well, yes and no. I expect at least 70 percent of the players we indicate as high-risk to hit their injury threshold, but I can't tell you which ones. If I could, I'd be making a killing in Vegas and not sharing my predictions with you. Since you're not going to get a firmer guarantee out of me, let's move on to what's still to come with CHIPPER this season.
The Team Injury Projections we've run to this point are based only on major-league injury history and contain only the players who saw time in the bigs last season. That should change this week, as we include the minor-league data from our injury database. We'll be sure to let you know when things are updated, and the Player Forecast Manager will always contain the most up-to-date projections. We're still just getting to know this data; looking further ahead, we're working on improving the specificity of our injury projections enough to use it as an input to our PECOTA projection system. We'll also be rolling out this information in team reports and player cards.
We're also planning to add more data about the previous injuries to the mix. Someone who's had hamstring issues is probably more likely to suffer from a reoccurrence than someone who broke a finger. The difficulty here is small sample size and proper categorization of injuries. That's why we hired an athletic trainer with extensive medical training and experience. Beyond that, we'll continue to refine the model where we see opportunities for improvement, and we hope to introduce additional tools to help you understand, measure, and react to injuries as they occur.
Our goal is to give you the best team injury reports in the business, backed by real-world injury experience, expertise in data analysis, and the only verifiable data set of its kind in the field. Data-driven injury analysis is a relatively untapped area, and there's plenty left to explore. If you have any ideas or suggestions, please let us know.