keyboard_arrow_uptop

Throughout the past week, Corey Dawkins and Marc Normandin have been using the Comprehensive Health Index [of] Pitchers [and] Players [with] Evaluative Result, otherwise known as CHIPPER, to break down the expected health of teams. If you've missed any installments and want to page back through them, you can visit the Team Injury Projection homepage by clicking here. We've heard your questions about what CHIPPER means, where the projections came from, and how they differ from what others provide, and it's time for us to answer them.

Let's take the last one first. What makes CHIPPER different from the other injury projections out there? First off, our injury database contains not only major league injuries from the past eight years, but also minor league, spring training, and winter league data. We're even starting to collect injury data from colleges. All told, we have over 400,000 player days missed to injury in the database. Secondly, CHIPPER does more than just project whether a player is going to miss time: it also tries to provide a ballpark figure for how much time a player is going to miss.

So how does it work?  Let's take a look at the Boston Red Sox Team Injury Projection, which is free for all readers, and we'll cover the details.

BOSTON RED SOX
Team Audit | Depth Chart
 

Dashboard


2010 Recap
 

2010
 

2009
 

2008
 

2007
Third in AL East
71 entries
21 DL trips
               

1349
TDL

19
DMPI
 

1349
TDL
27th
 

19
DMPI
12th
 

1073
TDL
19th
 

17
DMPI
10th
 

939
TDL
13th
 

14
DMPI
7th
 

884
TDL
10th
 

18
DMPI
4th

The Dashboard gives you some context about the team's injury situation in aggregate. In the 2010 Recap section, you'll find that Boston had 71 entries in the CHIPPER database for 2010–all of the disclosed injury incidents for which we have a record. Of those entries, 21 were DL trips, or stays on the Disabled List. Boston had 1349 TDL, or Total Days Lost to Injury, and 19 DMPI, or Days Missed Per Injury, during the 2010 season. You can see Boston's historical TDL and DMPI across the rest of the dashboard; the numbers below the graph labels display the team's ranking league-wide, and the graphs are color-coded to reflect the team's injury performance relative to its competitors.

Hitters in approximate Depth Charts order at time of publication

 

Days Lost to Injury

2011 Injury Risk

Player

Age

2008

2009

2010

1-day

15-days

30-days
Dustin Pedroia

27

0

4

99

Red

Red

Red
Carl Crawford

29

50

4

9

Red

Yellow

Yellow
Darnell McDonald

32

0

0

1

Yellow

Green

Green

CHIPPER's goal in life is to predict the chance that a player is going to miss time in 2011. It considers the likelihood of a player missing one or more games to injury, more than 15 games to injury, and more than 30 games to injury, and rates that risk as either green, yellow or red. We've represented these on the player lines with a color scheme you're used to but symbols you aren't.

 Green: ~15 percent or lower chance;  Yellow: ~15-85 percent chance;  Red: ~ 85 percent and up chance of the player missing this many games with injury.

It's important to note that we're considering only games lost to a disclosed injury, not simply days off. While almost no one plays 162 games anymore, many players don't actually have any injuries reported in a given season. The database behind CHIPPER tracks injury reports beyond just DL visits, but there has to be a reported injury; we're not tracking or reporting routine days off.

Among the sample hitters above, Dustin Pedroia is the best bet to miss a significant amount of time coming off of his injury troubles last year, Carl Crawford is very likely to miss a few games, and Darnell McDonald's fairly unscathed past and positive markers give him a better profile for injury risk.

Pitchers in approximate Depth Charts order at time of publication

 

Days Lost to Injury

2011 Injury Risk

Player

Age

2008

2009

2010

1-day

15-days

30-days
Jon Lester

27

0

0

0

Green

Green

Green
John Lackey

32

53

50

0

Yellow

Yellow

Green
Tim Wakefield

44

19

62

0

Red

Red

Red

In the sample of pitchers above, Jon Lester profiles as one of the least-risky aces in the majors, John Lackey was durable last year but missed considerable playing time in 2008 and 2009 and remains a risk, and Tim Wakefield's age and significant time lost in 2009 make him very likely to spend an extended period in the trainer's room in 2011.

CHIPPER uses logistic regression to determine whether a player is going to miss time at each of the thresholds we've set. For position players, we consider age, position, time lost to injury during the previous three seasons, and proxy variables to represent player type. For pitchers, the categories include age and time lost to injury during the previous three seasons. Surprisingly, including workload didn't make much of a difference in the results.

You're probably wondering what this means to you as a fan or a fantasy owner.  If we say a player is a high risk, should you expect him to head to the DL? Well, yes and no. I expect at least 70 percent of the players we indicate as high-risk to hit their injury threshold, but I can't tell you which ones. If I could, I'd be making a killing in Vegas and not sharing my predictions with you. Since you're not going to get a firmer guarantee out of me, let's move on to what's still to come with CHIPPER this season.

The Team Injury Projections we've run to this point are based only on major-league injury history and contain only the players who saw time in the bigs last season. That should change this week, as we include the minor-league data from our injury database. We'll be sure to let you know when things are updated, and the Player Forecast Manager will always contain the most up-to-date projections. We're still just getting to know this data; looking further ahead, we're working on improving the specificity of our injury projections enough to use it as an input to our PECOTA projection system. We'll also be rolling out this information in team reports and player cards.

We're also planning to add more data about the previous injuries to the mix.  Someone who's had hamstring issues is probably more likely to suffer from a reoccurrence than someone who broke a finger. The difficulty here is small sample size and proper categorization of injuries. That's why we hired an athletic trainer with extensive medical training and experience. Beyond that, we'll continue to refine the model where we see opportunities for improvement, and we hope to introduce additional tools to help you understand, measure, and react to injuries as they occur.

Our goal is to give you the best team injury reports in the business, backed by real-world injury experience, expertise in data analysis, and the only verifiable data set of its kind in the field. Data-driven injury analysis is a relatively untapped area, and there's plenty left to explore. If you have any ideas or suggestions, please let us know.

You need to be logged in to comment. Login or Subscribe
Asinwreck
3/15
Thank you for showing your work.
hotstatrat
3/15
Thanks for this.
tomgorman
3/15
Red = 85% risk of injury. But you only expect 70% of red players to hit the DL? Did you do any testing on the system? Input data through 2009 and see if it correctly predicted 2010?
dianagramr
3/15
You know what I love about BP (amongst many things)? Its that staff writers will *publicly* comment on and critique other staff writers' work. It shows this is not a rubber-stamp shop. Thanks guys!
dturkenk
3/15
Yes, I did test this against 2010 data. There are some limitations of the logistic regression model that cause discrepancies at the extreme. A few more green players than we expect will end up injured, and slightly fewer red players will get injured - the difficulty is predicting which.
Lopecci
3/15
I have to admit, I do like the injury projection system. But I HATE the name chipper for it. Show Larry some respect, the guys a future hall of famer, a super star in his prime, and a tried & true first class dude. He restructured his contract, to stay on one team. He moved to left field when the Braves needed a left fielder. I mean seriously, with all the data you have at your hands, could you not have come up with a different name? Chipper is atleast willing to step on the field if he is hurt, most guys will take a week off & think nothing of it. Sorry for the rant, hope I got my point across !!
lmarighi
3/15
I like Chipper as a player, but "first-class dude" might take it a little too far. Like many celebrities, he has made mistakes (e.g. http://sportsillustrated.cnn.com/baseball/mlb/news/1998/10/22/jones_paternity/ ). Also, I seem to remember that CHIPPER was one of the entries from when Will Carroll ran a contest to name a new system. . .
yankeehater32
3/15
I had a lot to do with naming it CHIPPER, and let me tell you that I did so because he's a player I love. My other choice was NOMAR, another favorite, so know there is no ill will meant.
dturkenk
3/15
I'm curious, what did you reverse acronymize NOMAR out to?
yankeehater32
3/16
I never came up with one I was satisfied with, which was part of the reason CHIPPER won that internal struggle of mine.
tmangell
3/15
tremendous job - thanks! I checked out the Phillies page, and there's going to be a big red cross next to J-Roll's name on my auction sheet!
georgeforeman03
3/15
A couple of questions if you don't mind. You said, "I expect at least 70 percent of the players we indicate as high-risk to hit their injury threshold, but I can't tell you which ones." Now, given that you'd already given a point estimate (85%) that these high-risk players will meet their thresholds, I'm curious where 70% came from. And that sort of leads to my second question, which is how does the system perform with test data? I assume you did assessments based on cross-validation or data from previous years, and I think it would be informative for you to share how well it did. For example, if you ran system for the 2010 season (obviously using data from previous years), how accurate would it be?
dturkenk
3/15
The model was based off half the data set from 2010 and tested off the other half. The total number of expected injuries and actual injuries match up quite well, but we do see some discrepancies at the extremes as I mention above. Basically, for pitchers at 30+ games (the worst estimate), we're over-estimating the red risk by about 40%. For position players at 1+ games (the best estimate), we're over-estimating the red risk by about 15%. I'll try to run the model against some earlier seasons later this week if I get a chance, but that's behind adding in minor league injuries in my to-do list.
georgeforeman03
3/15
Cool. Those sound like reasonable results. When you say you over-estimated the red risk for that subset of pitchers by 40%, do you mean, "the risk was really 40% and we projected 80%" or "the risk was really 40% and we projected 56%"? I assume the latter but it seemed ambiguous.
dturkenk
3/16
Yeah, it was a little ambiguous. Sorry about that. It's closer to the second. Let's say we have a hypothetical situation where we have 100 green, 100 yellow, and 100 red players. We'd estimate probably 7 green, 50 yellow and 95 red to get hurt. The actual results are more likely something like 30 green, 50 yellow and 66 red. Is that more clear?
leites
3/15
Are types of injury differentiated in this model? For instance, if John Lester had missed a year due to shoulder surgery rather than cancer, would his projection be any different?
dturkenk
3/15
Not yet. If we look very specifically at injury details - bucketing every strained groin together, for example, then our sample size for each category becomes really small. The right answer I think is to more broadly categorize injuries - maybe muscle and tendon problems in the throwing arm, for example. That's going to take time, and a lot more medical knowledge than I have. But I do have help (http://www.baseballprospectus.com/article.php?articleid=13009).
leites
3/15
I like the way you're approaching it. Another sample issue will be that, for certain types of injuries, the medical treatments and/or rehabilitation regimes have become more effective. So historical data may in some cases be misleading.
Ogremace
3/15
The expansion of this database could also give us a way of comparing team medical staffs against one another, though there would always be a limit to the possible sample size of any group's set.
cidawkins
3/16
That is true that techniques have advanced as has therapy. For instance the treatment for hip labral tears and injuries inside the joint used to be an open surgery where they actually dislocate the hip in the operating room. Thankfully arthroscopy for the hip is improving and we now have many more surgeons who are trained by the best. The data that I've gathered is very detailed and has a separate field that covers the injury in greater detail than simply shoulder strain. As much information that is known goes straight into the database. So for instance if it was an older player who had the open procedure it would state this (it would have to be a significantly older player) vs a younger player who would be counted as arthroscopy. This is the case for every injury I encounter. Thanks for the feedback.
pjbenedict
3/15
I would LOVE to have this linked to each player's page, or better yet, have it in the PFM.
cwyers
3/15
It's in the PFM. Look in the configuration options, under display, and you can set it to show the injury projections along with the PFM output.
mgolovcsenko
3/16
I was looking for but never saw in the article IN CAPS below: "We've heard your questions about what chipper means, where the projections came from, and HOW THEY DIFFER FROM WHAT OTHERS PROVIDE, and it's time for us to answer them." How does CHIPPER differ from last year's system ... other systems? (NOMAR?)
cidawkins
3/16
This is the only place where you will be able to find this level of data on any injury or medical condition, not just on DL trips, and use it for projections. I will leave the other parts of the answer to the smart guys who can explain the mathematical side of things much better than I can.
yankeehater32
3/16
There is no NOMAR. That was just me talking about names I considered.
mgolovcsenko
3/16
Caught that ... poor humor on my part.
mattymatty2000
3/16
There is no NOMAR, only Zuul!
jthom17
3/16
I realize this system is new. However, I think it is important to add other elements to the projections. The Verducci Effect (SPs under age 25 with an 30 IP increase)has proven itself. Your data base projected Phil Hughes as the lowest risk of any Yankee pitcher despite his 40+ IP increase in 2010. I have knocked him down my board due to the risk factor. Keep up the good work -- Very useful information.
JeffZimmerman
3/18
Once a dataset was available to check the data, the Verducci Effect was proven to be not true: http://baseballanalysts.com/archives/2010/02/verducci_effect.php
pjbenedict
3/16
The CHIPPER info is not available under "Display" in PFM as far as I can see. I would really love to see it there though! The only options there in my view are playing time, expert rankings, minimum dollars, and biographical data.
dpease
3/16
Please try that link again--sorry for the confusion.